[antlr-interest] Looking for reference to how ANTLR performs ... special example will not work???
John B. Brodie
jbb at acm.org
Fri Sep 11 12:39:42 PDT 2009
Greetings!
On Fri, 2009-09-11 at 10:20 -0400, Sylvain, Gregory [USA] wrote:
> Great replies thank you, I was assumed the longest-match wins rules
> applied, but I wasn't sure - thanks.
>
> Here is an example of the sort of problems I am trying to figure out.
>
>
> r : 'BEGIN/' f1=(number 'T') f2=field EOT EOL
> number : INT | FLOAT ;
> field : ALPHANUM_CHAR+;
>
> ALPHANUM_CHAR : ( ALPHA_CHAR | DIGIT | SPECIAL_CHAR | ' ')+;
> INT : DIGIT+ ;
> FLOAT : DIGIT+ '.' DIGIT+;
> fragment DIGIT : '0' .. '9' ;
> fragment ALPHA_CHAR : 'A' .. 'Z' ;
> SPECIAL_CHAR: ( ',' | '(' | ')' | '\\' ); // more special chars can
> be added here....
> EOT : '//'
> EOL : '\n';
>
>
> The following will work:
>
> BEGIN/1231T/P//
>
> However, the following will NOT :
>
> BEGIN/1231T/T//
>
> some lexer tests lead me to believe this should work:
>
> 'T' is a ALPHA_CHAR
> 'T' is also an ALPHANUM_CHAR
No! 'T' is the Keyword: 'T'
a completely separate Token unto itself
>
> HOWEVER,
> 'T' is NOT a field???
Correct! 'T' is NOT a field, it is a Keyword (e.g. a Reserved Word).
>
> Again, I can fix this by making f1 match a field instead of a number
> followed by a T. But I would like to understand what is going on
> here...
>
> Basically, if a field is supposed to have a certain suffix, I would
> like to put that in the grammar. Of course, I still want to be able
> to accept those suffixes as a part of a more general field as well.
>
This is the same as the oft discussed "Keywords as Identifiers"
question...
As Jim suggests moving all of your literals out of the Parser and into
the Lexer by making them explicit Lexer rules (at least for now, until
you get used to ANTLR) might make this clearer.
In any case, the 'T' is still gonna be a separate Token type of its own.
And if there is any parsing context in which 'T' (or any other suffix)
should be permitted then you must mention it in that parsing rule.
So instead of:
field : ALPHANUM_CHAR+;
try
field : ALPHANUM_CHAR+ | 'T';
there is also a way to keep the 'T' as an ALPHANUM_CHAR but use a
Predicate in the Parser to test an ALPHANUM_CHAR for the value 'T' and
thereby recognize the 'T' in its special context.
Search the mail archives for "Keywords as Identifiers" or similar
searches for the often re-occurring discussion of this topic. It might
even have a Wiki entry, haven't looked in awhile...
(i would try to post an actual example here but cannot remember the
ANTLR syntax at the moment... sorry)
Hope this helps.
-jbb
More information about the antlr-interest
mailing list