[antlr-interest] how can a rule recognize more than one literal?

Jim Idle jimi at temporal-wave.com
Thu Oct 18 09:21:55 PDT 2007


Without knowing what else you are trying to parse/lex, I think that you want
to either do this more precisely in the lexer, or less precisely in the
lexer (said the walrus ;-)) . I think we should have a mode where you can't
put literals in the parser and auto produce lexer tokens as it is too
confusing for freshmen. I think it is dangerous for anyone really, but
that's just my opinion. So, first thing is that until you know what you are
doing, don't use text strings for keywords in the parser, make tokens in the
LEXER, as this will tell you about lexing clashes better than the parser
will and you will 'see' what you are doing more clearly (this goes for
everyone ;-).

What I mean here is that the lexer can only recognize 'y' as the same token,
regardless of the context in which you (the human) see it in the parser
rules. So, if you are sure that there is no other text string that would
clash with this dateformat, then you can code it in the lexer as:

DATEFORM: 'yy' 'yy'? '-'? 'mm' '-'? 'dd' ;

And then the parser rules (they start with lower case letters) can just be:

Datestuff: DATEFORM ;

The problem here though is that if someone mistypes a date format then it is
the lexer that will barf on it, and really you don't want the lexer to do
anything but recognize ALL strings as some kind of token and then let the
parser spot syntax errors.

So, what I think you have here, is a semantic check, and that your lexer
just needs to recognize say WORDS. When the parser verifies that
syntactically you have something that ought to be a valid date format string
(WORD), then you implement some code to verify that the string is in fact a
correct one.

In other words, you want to do the higher lever checks higher up and as you
get closer to the lexer, do simpler an simpler things. I hope this makes
sense, as you are obviously new to ANTLR. If you are trying to do anything
serious and can afford to, I would buy Terence's ANTLR 3 book (see elsewhere
on the web site).

Hope this helps,

Jim

PS: Another thing I am seeing a lot from people lately is the use of
backtracking. This is really a prototyping tool as while it seems to make
ambiguities go away, it does so at the expense of lots of overhead and
probably masks that you could structure you grammar much more
precisely/neatly. Personally, I would recommend NOT using this option, other
than at the rule level if you can't think of a structure or suitable
predicate for an ambiguity.

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of OJAY78 at gmx.de
> Sent: Thursday, October 18, 2007 7:36 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] how can a rule recognize more than one
> literal?
> 
> Hi,
> 
> 
> I try to create a rule which recognizes dateformat strings like this
> :yyyy-mm-dd or yymmdd....
> 
> My rule for this:
> 
> dateFormatLiterals
> 	:( 'y' | 'M' | 'd' | 'h' | 'H' | 'm' | 's' | 'S' | '-' | '.' |
> '/' )*
> 	;
> 
> I am using the interpreter of ANTLRWorks for that but it does not work
> for more than one literal. When I type y then the interpreter will
> build the tree as expected but when I try yyyymmdd I will receive a
> MismatchTokenExecption.
> 
> Why is that so? I do not want to make for that case special tokens
> because I just need them in that form for this rule
> 
> 
> Thanks for your help
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört?
> Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.488 / Virus Database: 269.15.0/1076 - Release Date:
> 10/17/2007 7:53 PM
> 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.0/1076 - Release Date: 10/17/2007
7:53 PM
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071018/cbf1864b/attachment-0001.html 


More information about the antlr-interest mailing list