[antlr-interest] DMQL Grammar - ANTLR Eats Characters

Mihai Danila viridium at gmail.com
Mon Mar 9 18:00:43 PDT 2009


Hi,

I thought I had my DMQL grammar nailed after several months of no issues,
until recently a query failed. I've already massaged the grammar in a few
ways so I'm a bit at a loss as to what the problem is this time. Do I have
to enumerate all the possible token prefixes (including TO, TOD, TODA, N,
NO, A, AN, O) in the alphanumericToken rule to fix this one? Am I missing
something?

Here's the query:
(f=I?TORO)

If I debug this, here's what ANTLR parses:
(f=I?O)

Here's the grammar:
grammar Dmql;

options {
output=AST;
}

tokens {
Or; And; Not;
FieldCriteria;
LookupAnd; LookupNot; LookupOr; LookupAny;
StringList; StringEquals; StringStartsWith;
StringContains; StringChar; EmptyString;
RangeList; RangeBetween; RangeGreater; RangeLower;
ConstantValue;
}

@header { package com.stratusdata.dmql.parser.antlr; }
@lexer::header { package com.stratusdata.dmql.parser.antlr; }

@rulecatch {
  catch (RecognitionException re) {
    throw re;
  }
}

dmql: searchCondition;
searchCondition: queryClause (('|' | BoolOr) queryClause)* -> ^(Or
queryClause+);
queryClause: booleanElement ((',' | BoolAnd) booleanElement)* -> ^(And
booleanElement+);
booleanElement: queryElement | ('~' | BoolNot) queryElement -> ^(Not
queryElement);
queryElement: '('! (fieldCriteria | searchCondition) ')'!;

fieldCriteria: field '=' fieldValue -> ^(FieldCriteria field fieldValue);
field: ('_' | alphanumericToken)+ -> ConstantValue[$field.text];
fieldValue: lookupList | stringList | rangeList | nonInteger | period |
stringLiteral | empty;
stringLiteral: StringLiteral;
empty: '.EMPTY.' -> EmptyString;

lookupList: lookupOr | lookupAnd | lookupNot | lookupAny;
lookupOr: '|' lookup (',' lookup)* -> ^(LookupOr lookup+);
lookupAnd: '+' lookup (',' lookup)* -> ^(LookupAnd lookup+);
lookupNot: '~' lookup (',' lookup)* -> ^(LookupNot lookup+);
lookupAny: '.ANY.' -> LookupAny;
lookup: alphanumeric | stringLiteral;

stringList: string (',' string)* -> ^(StringList string+);
string: stringEq | stringStart | stringContains | stringChar;
stringEq: alphanumeric -> ^(StringEquals alphanumeric);
stringStart: alphanumeric '*'  -> ^(StringStartsWith alphanumeric);
stringContains: '*' alphanumeric '*' -> ^(StringContains alphanumeric);
stringChar: alphanumeric? ('?' alphanumeric?)+ -> ^(StringChar
ConstantValue[$stringChar.text]);

rangeList: dateTimeRangeList | dateRangeList | timeRangeList |
numericRangeList;
dateTimeRangeList: dateTimeRange (',' dateTimeRange)* -> ^(RangeList
dateTimeRange+);
dateRangeList: dateRange (',' dateRange)* -> ^(RangeList dateRange+);
timeRangeList: timeRange (',' timeRange)* -> ^(RangeList timeRange+);
numericRangeList: numericRange (',' numericRange)* -> ^(RangeList
numericRange+);
dateTimeRange: x=dateTime '-' y=dateTime -> ^(RangeBetween $x $y)
| x=dateTime '-' -> ^(RangeLower $x)
| x=dateTime '+' -> ^(RangeGreater $x);
dateRange: x=date '-' y=date -> ^(RangeBetween $x $y)
| x=date '-' -> ^(RangeLower $x)
| x=date '+' -> ^(RangeGreater $x);
timeRange: x=time '-' y=time -> ^(RangeBetween $x $y)
| x=time '-' -> ^(RangeLower $x)
| x=time '+' -> ^(RangeGreater $x);
numericRange: x=number '-' y=number -> ^(RangeBetween $x $y)
| x=number '-' -> ^(RangeLower $x)
| x=number '+' -> ^(RangeGreater $x);
period: (isoDateTime | isoDate | isoTime) -> ConstantValue[$period.text];
dateTime: (isoDateTime | Now) -> ConstantValue[$dateTime.text];
date: (isoDate | Today) -> ConstantValue[$date.text];
time: isoTime -> ConstantValue[$time.text];
number: integer | nonInteger;
integer: D+ -> ConstantValue[$integer.text];
nonInteger: (negativeNumber | positiveDecimal) ->
ConstantValue[$nonInteger.text];
negativeNumber: '-' D+ ('.' D+)?;
positiveDecimal: D+ '.' D+;

timeZoneOffset: ('+' | '-') D D ':' D D;
isoDate: D D D D '-' D D '-' D D;
isoTime: D D ':' D D ':' D D ('.' D (D D?)?)?;
isoDateTime: isoDate 'T' isoTime ('Z' | timeZoneOffset)?;

alphanumeric: alphanumericToken+ -> ConstantValue[$alphanumeric.text];
alphanumericToken: (D | A | BoolNot | BoolAnd | BoolOr | Now | Today | 'T' |
'Z');

BoolNot: 'NOT';
BoolAnd: 'AND';
BoolOr: 'OR';
Now: 'NOW';
Today: 'TODAY';
StringLiteral: ('"' (~('\u0000'..'\u001F' | '\u007F' | '"') | ('""'))* '"');
A: (('A'..'Z') | ('a'..'z'));
D: ('0'..'9');
Whitespace: (' ' | '\t' | '\n') { $channel = HIDDEN; };
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090309/8061fd36/attachment.html 


More information about the antlr-interest mailing list