[antlr-interest] Looking for reference to how ANTLR performs ... special example will not work???

John B. Brodie jbb at acm.org
Fri Sep 11 12:39:42 PDT 2009


Greetings!

On Fri, 2009-09-11 at 10:20 -0400, Sylvain, Gregory [USA] wrote:
> Great replies thank you, I was assumed the longest-match wins rules
> applied, but I wasn't sure - thanks.
>  
> Here is an example of the sort of problems I am trying to figure out.
>  
>  
> r            : 'BEGIN/' f1=(number 'T') f2=field EOT EOL
> number : INT | FLOAT ;
> field      : ALPHANUM_CHAR+;
>  
> ALPHANUM_CHAR : ( ALPHA_CHAR  | DIGIT | SPECIAL_CHAR | ' ')+;
> INT : DIGIT+ ;
> FLOAT : DIGIT+ '.' DIGIT+;
> fragment DIGIT : '0' .. '9' ;
> fragment ALPHA_CHAR : 'A' .. 'Z' ;
> SPECIAL_CHAR: ( ',' | '(' | ')' | '\\' );  // more special chars can
> be added here....
> EOT : '//'
> EOL : '\n';
>  
>  
> The following will work:
>  
> BEGIN/1231T/P//
>  
> However, the following will NOT :
>  
> BEGIN/1231T/T//
>  
> some lexer tests lead me to believe this should work:
>  
> 'T' is a ALPHA_CHAR
> 'T' is also an ALPHANUM_CHAR

No! 'T' is the Keyword: 'T'

a completely separate Token unto itself

>  
> HOWEVER, 
> 'T' is NOT a field???

Correct! 'T' is NOT a field, it is a Keyword (e.g. a Reserved Word).

>  
> Again, I can fix this by making f1 match a field instead of a number
> followed by a T.  But I would like to understand what is going on
> here...  
>  
> Basically, if a field is supposed to have a certain suffix, I would
> like to put that in the grammar.  Of course, I still want to be able
> to accept those suffixes as a part of a more general field as well.
>  

This is the same as the oft discussed "Keywords as Identifiers"
question...




As Jim suggests moving all of your literals out of the Parser and into
the  Lexer by making them explicit Lexer rules (at least for now, until
you get used to ANTLR) might make this clearer.

In any case, the 'T' is still gonna be a separate Token type of its own.
And if there is any parsing context in which 'T' (or any other suffix)
should be permitted then you must mention it in that parsing rule.

So instead of:

field      : ALPHANUM_CHAR+;

try

field      : ALPHANUM_CHAR+ | 'T';

there is also a way to keep the 'T' as an ALPHANUM_CHAR but use a
Predicate in the Parser to test an ALPHANUM_CHAR for the value 'T' and
thereby recognize the 'T' in its special context.

Search the mail archives for "Keywords as Identifiers" or similar
searches for the often re-occurring discussion of this topic. It might
even have a Wiki entry, haven't looked in awhile...

(i would try to post an actual example here but cannot remember the
ANTLR syntax at the moment... sorry)

Hope this helps.
   -jbb







More information about the antlr-interest mailing list