[antlr-interest] ambigous lexer tokens
Wincent Colaiuta
win at wincent.com
Thu Jun 28 02:13:01 PDT 2007
El 27/6/2007, a las 22:44, Torsten Curdt escribió:
> I would like to write a grammar for the following output:
>
> drwxr-xr-x 23 tcurdt tcurdt 782 Jun 24 22:54 ..
> -rw-r--r-- 1 tcurdt tcurdt 18545 Nov 1 2006
> ASMContentHandler.Rule.html
>
> Of course that means that the tokens (TYPE/MODS/INT/NAME/HOUR/YEAR)
> for the lexer are ambiguous.
> How should such a grammar look like? Pointers?
I think you have a number of options:
1. Given that many of the tokens look the same, don't try to
differentiate between them in the lexer. Instead handle everything in
the parser.
2. Use predicates in the lexer to turn alternatives on and off
depending on which "column" you're in (ie. make a context-sensitive
lexer).
3. Don't use ANTLR for this task. The input is so limited and regular
that it may be quicker to just write something by hand.
I personally would go with "3" in this case because I think you are
much more likely to come up with a correct parser by hand; ANTLR is a
very complex tool and it can deviate from your expectations in
incredibly subtle and hard-to-understand ways.
Cheers,
Wincent
More information about the antlr-interest
mailing list