[antlr-interest] Can lexer take hints

Wed Jan 18 14:34:19 PST 2006

Hello Antlr experts.

I'm an antlr newbie struggling with all these pesky nondeterminism 
warnings. I'm trying to implement a parser for ABNF grammar that has 
overlaping tokens and matching rules. For example, it may have a token 
"media" as well as matching rules a="a..z" and b="a..z0..9". Essentially 
token "media" will match rule a and rule b, while a string like "blah" 
will match rule a and rule b. To make it even worse, tokens have a long 
and short term notation (e.g. "media" and "m" mean the same thing).

My question is if it's possible for parser to instruct lexer to use only a 
subset of tokens. For example, let's say I have the following tokens 
defined in lexer:

ID1: (ALPHA)+;
ID2: (DIGIT)+;
ID3: (ALPHA | DIGIT)+;
TOKEN: "MY_TOKEN";

Now I know in parser that at a particular point of time I only expect ID2 
or TOKEN and ask it not to match ID1 and ID2. For example:

messageStart:
   (ID2 | TOKEN)
   { System.out.println("Detected message start"); }
   ;

When I compile code similar to the one above lexer matches all 4 (ID1, 
ID2, ID3, TOKEN) giving me unexpected results. So I don't think it works.

Essentially what I'm trying to do is create a list of all possible lexer 
tokens and then specify in parser which ones to expect at any particular 
time. Is it possible to do with some sort of custom lexer/parser? If not, 
what would be the best approach to implementing this? I suspect that 
states is the only way - but they look very messy and I'm afraid they will 
cause the grammar to depart even further from original ABNF syntax and 
make it difficult to read.

Thank you in advance for any help/pointers/examples on this topic.

Similar questions must have been posted a million times on this forum, I 
apologize if mine is not much different (although it appears so to me!).

Art.