[antlr-interest] tokens - when to use?

Fri Oct 10 01:58:22 PDT 2008

At 19:17 10/10/2008, Juergen Weber wrote:
 >Yes, I saw in the debugger that the token IDs did not match.
 >But why would 'NONE' get a token ID and why would ANTLR try
 >to match the ID?

This:

r  : r1 | r2;
r1 : 'DELUSER' ('ALL' | 'NONE');
r2 : 'REMOVE' id;
id : ID | QUOTEDSTRING;

ID : ... ;
QUOTEDSTRING : ...;

is essentially identical to this:

r  : r1 | r2;
r1 : T16 (T17 | T18);
r2 : T19 id;
id : ID | QUOTEDSTRING;

T16 : 'DELUSER';
T17 : 'ALL';
T18 : 'NONE';
T19 : 'REMOVE';
ID : ... ;
QUOTEDSTRING : ...;

Once you realise that, the next step to understanding how ANTLR 
works is to look at just the lexer rules alone (forget that the 
parser rules are even there!) to see how ANTLR tokenises the 
input.  Only *after* the lexer has completely finished and 
tokenised everything does the parser start processing its first 
rule.

In general I think people (especially newcomers to ANTLR) should 
avoid using quoted literals in parser rules.  It gets you into bad 
habits and makes it too easy to forget (or not realise) how things 
work under the covers :)