[antlr-interest] Parsing question

Vinay Pandit vpandit at quantivo.com
Thu Aug 2 09:48:57 PDT 2012


The date parsing made sense to me. I was just wondering about the signed and unsigned integer comment. If I make the decision about the sign in the parser I just thought it would clutter it all up. Which is the reason why I moved it into the LEXER.

Regards
Vinay



-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Thursday, August 02, 2012 9:41 AM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Parsing question

That's what I thought. You are applying way too much context in to the mix.

Take out all the special attempts to handle date in either the lexer or the parser and just accept SQUOTE (as in the simple string). And also fix trying to have a signed and unsigned integer in the lexer - the parser will have to do that.

Then when you verify your AST (or as you parse if no AST), call a function that validates the date (you can just use standard Java Date stuff). Then you issue a semantic error if it is invalid.

In SQL you may not be able to tell this until execution time unless you have access to the table metadata so that you can see that a field is a date type:

... WHERE T.myDate < '1964-07-14'


Jim


> -----Original Message-----
> From: Vinay Pandit [mailto:vpandit at quantivo.com]
> Sent: Thursday, August 02, 2012 9:35 AM
> To: Jim Idle; antlr-interest at antlr.org
> Subject: RE: [antlr-interest] Parsing question
>
> Yes, I think I was not clear enough. Here is what I wanted to do. In 
> SQL we have a date string of the form date '2001-01-01'. I wanted to 
> try and parse this date literal. I was just trying to figure out the 
> dateValue subrule in my earlier mail.
>
> Here is the grammar I came up with (which does not seem to work). I am 
> excluding timeLiteral and timestamp literal for brevity. I was just 
> not sure that I could get rid of the ambiguity by moving things into 
> the lexer. For e.g. ultimately '2001-01-01' fragment of the input 
> would match a STRING token, but because I have the 'date' in from of 
> it the parser should use that rule. I am used to Javacc where you can 
> provide lookaheads in order to tackle ambiguities.
>
> Hope this email clarifies my problem. Please let me know if you need 
> any more input
>
> Thanks for your help
> Vinay
>
> -------------------------------------------
> datetimeLiteral
>     	: dateLiteral | timeLiteral | timestampLiteral;
>
> dateLiteral : DATE dateString;
>
> dateString : QUOTE dateValue QUOTE;
>
> dateValue : UNSIGNED_INTEGER MINUS UNSIGNED_INTEGER MINUS 
> UNSIGNED_INTEGER;
>
> The Lexer rules are
>
> fragment
> DIGIT : ('0'..'9');
> DATE          : ('D'|'d')('A'|'a')('T'|'t')('E'|'e');
> UNSIGNED_INTEGER : (DIGIT) +;
> MINUS         : '-' ;
> QUOTE         : '\'';
>
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- 
> bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Thursday, August 02, 2012 9:22 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Parsing question
>
> OK - your example was not clear enough. You do need a fragment there.
>
> However it sounds like you are trying to get the lexer to handle 
> negative numbers and that is usually the wrong way - you want to 
> handle that in the parser's expression tree. However, I might be 
> tempted to handle the date literal in the lexer rather than the parser 
> as you will otherwise create a lot of conflicts.
>
>
> MINUS : '-';
> fragment DATE :;
> INTEGER : '0'..'9'+
>           (('-' '0'..'9'+ '-' '0'..'9')=>('-' '0'..'9'+ '-' '0'..'9'+) 
> { $type = DATE; })?
> ;
>
> Are you sure that your language allows date strings that are not quote 
> delimited? There is an obvious conflict with the subtract operator 
> unless there are separate expression trees based on context.
>
> Jim
>
> > -----Original Message-----
> > From: Vinay Pandit [mailto:vpandit at quantivo.com]
> > Sent: Wednesday, August 01, 2012 11:14 PM
> > To: Jim Idle; antlr-interest at antlr.org
> > Subject: RE: [antlr-interest] Parsing question
> >
> > Thanks for the reply. That did not work either.
> >
> > Regards
> > Vinay
> >
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- 
> > bounces at antlr.org] On Behalf Of Jim Idle
> > Sent: Wednesday, August 01, 2012 10:48 PM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Parsing question
> >
> > That should be:
> >
> > fragment
> > DIGIT ....
> >
> > And you don't need separate parser rules for yearValue and the other 
> > two - they are the same thing, just use UNSIGNED_INTEGER directly.
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- 
> > > bounces at antlr.org] On Behalf Of Vinay Pandit
> > > Sent: Wednesday, August 01, 2012 9:44 PM
> > > To: antlr-interest at antlr.org
> > > Subject: [antlr-interest] Parsing question
> > >
> > > I am trying to parse a date time literal in ANTLR and I am having 
> > > issues with the grammar.
> > >
> > > Here are the rules defined in the parser
> > >
> > > dateValue : ( yearValue MINUS monthValue MINUS dayValue);
> > >
> > > yearValue : datetimeValue ;
> > >
> > > monthValue : datetimeValue;
> > >
> > > dayValue : datetimeValue;
> > >
> > > datetimeValue : UNSIGNED_INTEGER;
> > >
> > > The Lexer has
> > >
> > > MINUS         : '-' ;
> > > DIGIT : ('0'..'9');
> > > UNSIGNED_INTEGER : (DIGIT) +;
> > >
> > >
> > > When I parse a date like 2012-01-01 for the dateValue rule, the
> > parser
> > > throws an exception.
> > >
> > > com. qexpr.ParseException: line 1:4 - mismatched input '-01'
> > expecting
> > > MINUS
> > >                at
> > >
> >
> com.quantivo.qexpr.AbstractQParser.reportError(AbstractQParser.java:77
> > )
> > >                at
> > > com.quantivo.qexpr.SQLGrammar.dateValue(SQLGrammar.java:4730)
> > >                at
> > >
> >
> com.quantivo.qexpr.model.SQLGrammarTest.testDateValue(SQLGrammarTest.j
> > > a
> > > va:25)
> > > ...
> > >
> > > Looking at the error message it is obvious that I am not getting
> the
> > > Minus token. Instead the internal token that I get is an INTEGER 
> > > (signed). I tried the greedy=false option, but that did not seem 
> > > to help either. I am running out of ideas as to why the input does 
> > > not match. Obviously I am doing something wrong, but I am not sure
> what!
> > >
> > > Regards
> > > Vinay
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> > interest/your-
> > > email-address
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list