[antlr-interest] DMQL Grammar - ANTLR Eats Characters

Mihai Danila viridium at gmail.com
Fri Mar 20 05:34:44 PDT 2009

Thanks Indhu,
In the link you sent, you troubleshoot a slightly different, but the post
did help.

In my scenario, the lexer chooses a rule based on a prefix and fails to fall
back to try a collection of shorter tokens. The lexer doesn't go as far as
TOR before deciding simply because by the time a TO is read there is no
alternative to TO in lexer scope (except there would be if it wasn't greedy
as per my note below). Your indication about the longest possible token
policy has cleared it up for me. The only alternative to TODAY by the time
TO has been read is to create an alphanumeric out of alphanumericTokens, and
of course that is a parser rule and is therefore is outside of the lexer's
horizon. This must be the problem.

A question still remains. If the lexer cannot create a valid token without
dropping characters, shouldn't it fall back and try to produce smaller
tokens (which my grammar allows for, the smaller tokens being D and A) to
give a chance to the parser? Apparently, the lexer is prematurely moving
into an error state without noticing that a different token arrangement
would keep it in the green.


On Tue, Mar 10, 2009 at 3:48 AM, Indhu Bharathi <indhu.b at s7software.com>wrote:

> Try this:
> Today: ( (Today_) => 'Today' ) ;
> fragment Today_
>     :    'Today'
>     ;
> However, I'm not sure if this's the most elegant way to fix it.
> Read the following thread to understand more on why exactly this happens:
> http://www.antlr.org/pipermail/antlr-interest/2009-February/032959.html
> - Indhu
> ----- Original Message -----
> From: Mihai Danila <viridium at gmail.com>
> To: antlr-interest at antlr.org
> Sent: Tuesday, March 10, 2009 6:30:43 AM GMT+0530 Asia/Calcutta
> Subject: [antlr-interest] DMQL Grammar - ANTLR Eats Characters
> Hi,
> I thought I had my DMQL grammar nailed after several months of no issues,
> until recently a query failed. I've already massaged the grammar in a few
> ways so I'm a bit at a loss as to what the problem is this time. Do I have
> to enumerate all the possible token prefixes (including TO, TOD, TODA, N,
> NO, A, AN, O) in the alphanumericToken rule to fix this one? Am I missing
> something?
> Here's the query:
> (f=I?TORO)
> If I debug this, here's what ANTLR parses:
> (f=I?O)
> Here's the grammar:
> grammar Dmql;
> options {
> output=AST;
> }
> tokens {
>  Or; And; Not;
>  FieldCriteria;
>  LookupAnd; LookupNot; LookupOr; LookupAny;
>  StringList; StringEquals; StringStartsWith;
>  StringContains; StringChar; EmptyString;
>  RangeList; RangeBetween; RangeGreater; RangeLower;
>  ConstantValue;
> }
> @header { package com.stratusdata.dmql.parser.antlr; }
> @lexer::header { package com.stratusdata.dmql.parser.antlr; }
> @rulecatch {
>   catch (RecognitionException re) {
>     throw re;
>   }
> }
> dmql: searchCondition;
> searchCondition: queryClause (('|' | BoolOr) queryClause)* -> ^(Or
> queryClause+);
> queryClause: booleanElement ((',' | BoolAnd) booleanElement)* -> ^(And
> booleanElement+);
> booleanElement: queryElement | ('~' | BoolNot) queryElement -> ^(Not
> queryElement);
> queryElement: '('! (fieldCriteria | searchCondition) ')'!;
> fieldCriteria: field '=' fieldValue -> ^(FieldCriteria field fieldValue);
> field: ('_' | alphanumericToken)+ -> ConstantValue[$field.text];
> fieldValue: lookupList | stringList | rangeList | nonInteger | period |
> stringLiteral | empty;
> stringLiteral: StringLiteral;
> empty: '.EMPTY.' -> EmptyString;
> lookupList: lookupOr | lookupAnd | lookupNot | lookupAny;
> lookupOr: '|' lookup (',' lookup)* -> ^(LookupOr lookup+);
> lookupAnd: '+' lookup (',' lookup)* -> ^(LookupAnd lookup+);
> lookupNot: '~' lookup (',' lookup)* -> ^(LookupNot lookup+);
> lookupAny: '.ANY.' -> LookupAny;
> lookup: alphanumeric | stringLiteral;
> stringList: string (',' string)* -> ^(StringList string+);
> string: stringEq | stringStart | stringContains | stringChar;
> stringEq: alphanumeric -> ^(StringEquals alphanumeric);
> stringStart: alphanumeric '*'  -> ^(StringStartsWith alphanumeric);
> stringContains: '*' alphanumeric '*' -> ^(StringContains alphanumeric);
> stringChar: alphanumeric? ('?' alphanumeric?)+ -> ^(StringChar
> ConstantValue[$stringChar.text]);
> rangeList: dateTimeRangeList | dateRangeList | timeRangeList |
> numericRangeList;
> dateTimeRangeList: dateTimeRange (',' dateTimeRange)* -> ^(RangeList
> dateTimeRange+);
> dateRangeList: dateRange (',' dateRange)* -> ^(RangeList dateRange+);
> timeRangeList: timeRange (',' timeRange)* -> ^(RangeList timeRange+);
> numericRangeList: numericRange (',' numericRange)* -> ^(RangeList
> numericRange+);
> dateTimeRange: x=dateTime '-' y=dateTime -> ^(RangeBetween $x $y)
>  | x=dateTime '-' -> ^(RangeLower $x)
>  | x=dateTime '+' -> ^(RangeGreater $x);
> dateRange: x=date '-' y=date -> ^(RangeBetween $x $y)
>  | x=date '-' -> ^(RangeLower $x)
>  | x=date '+' -> ^(RangeGreater $x);
> timeRange: x=time '-' y=time -> ^(RangeBetween $x $y)
>  | x=time '-' -> ^(RangeLower $x)
>  | x=time '+' -> ^(RangeGreater $x);
> numericRange: x=number '-' y=number -> ^(RangeBetween $x $y)
>  | x=number '-' -> ^(RangeLower $x)
>  | x=number '+' -> ^(RangeGreater $x);
> period: (isoDateTime | isoDate | isoTime) -> ConstantValue[$period.text];
> dateTime: (isoDateTime | Now) -> ConstantValue[$dateTime.text];
> date: (isoDate | Today) -> ConstantValue[$date.text];
> time: isoTime -> ConstantValue[$time.text];
> number: integer | nonInteger;
> integer: D+ -> ConstantValue[$integer.text];
> nonInteger: (negativeNumber | positiveDecimal) ->
> ConstantValue[$nonInteger.text];
> negativeNumber: '-' D+ ('.' D+)?;
> positiveDecimal: D+ '.' D+;
> timeZoneOffset: ('+' | '-') D D ':' D D;
> isoDate: D D D D '-' D D '-' D D;
> isoTime: D D ':' D D ':' D D ('.' D (D D?)?)?;
> isoDateTime: isoDate 'T' isoTime ('Z' | timeZoneOffset)?;
> alphanumeric: alphanumericToken+ -> ConstantValue[$alphanumeric.text];
> alphanumericToken: (D | A | BoolNot | BoolAnd | BoolOr | Now | Today | 'T'
> | 'Z');
> BoolNot: 'NOT';
> BoolAnd: 'AND';
> BoolOr: 'OR';
> Now: 'NOW';
> Today: 'TODAY';
> StringLiteral: ('"' (~('\u0000'..'\u001F' | '\u007F' | '"') | ('""'))*
> '"');
> A: (('A'..'Z') | ('a'..'z'));
> D: ('0'..'9');
> Whitespace: (' ' | '\t' | '\n') { $channel = HIDDEN; };
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090320/653a3b0e/attachment.html 

More information about the antlr-interest mailing list