[antlr-interest] Can lexer take hints
Artem Dmytrenko
admytren at engin.umich.edu
Wed Jan 18 14:34:19 PST 2006
Hello Antlr experts.
I'm an antlr newbie struggling with all these pesky nondeterminism
warnings. I'm trying to implement a parser for ABNF grammar that has
overlaping tokens and matching rules. For example, it may have a token
"media" as well as matching rules a="a..z" and b="a..z0..9". Essentially
token "media" will match rule a and rule b, while a string like "blah"
will match rule a and rule b. To make it even worse, tokens have a long
and short term notation (e.g. "media" and "m" mean the same thing).
My question is if it's possible for parser to instruct lexer to use only a
subset of tokens. For example, let's say I have the following tokens
defined in lexer:
ID1: (ALPHA)+;
ID2: (DIGIT)+;
ID3: (ALPHA | DIGIT)+;
TOKEN: "MY_TOKEN";
Now I know in parser that at a particular point of time I only expect ID2
or TOKEN and ask it not to match ID1 and ID2. For example:
messageStart:
(ID2 | TOKEN)
{ System.out.println("Detected message start"); }
;
When I compile code similar to the one above lexer matches all 4 (ID1,
ID2, ID3, TOKEN) giving me unexpected results. So I don't think it works.
Essentially what I'm trying to do is create a list of all possible lexer
tokens and then specify in parser which ones to expect at any particular
time. Is it possible to do with some sort of custom lexer/parser? If not,
what would be the best approach to implementing this? I suspect that
states is the only way - but they look very messy and I'm afraid they will
cause the grammar to depart even further from original ABNF syntax and
make it difficult to read.
Thank you in advance for any help/pointers/examples on this topic.
Similar questions must have been posted a million times on this forum, I
apologize if mine is not much different (although it appears so to me!).
Art.
More information about the antlr-interest
mailing list