[antlr-interest] dfa-based lexers versus top-down antlr lexers
Terence Parr
parrt at jguru.com
Fri Apr 25 13:20:47 PDT 2003
Does anyone have an opinion concerning ANTLR's construction of top-down
lexers versus more traditional dfa-based state machine solutions?
I just got really irritated this week building the PostScript
interpreter due to lexer lookahead issues.
PROS
Very readable lexers. ('0'..'9')+ turns into a while look you can
debug/read.
Very powerful...can literally parse (because you can call other rules)
in the lexer.
Consistent with all other antlr-generated recognizers. Lexers,
parsers, tree parsers all use the same analyzer and code generator.
Can execute an action anywhere during the recognition of a token...with
DFA based systems you have to wait until you know the complete token
has been match.
CONS
Some lexer rules are a huge pain to specify in ANTLR because of the
limitations of lookahead.
Lexer rules really confuse people since there are lots of special cases
in the lookahead and stuff.
I wonder if a compromise is possible. Use a DFA-based decision to
determine which rule will match; can do arbitrary lookahead and all of
that automatically to make a correct decision. The problem is speed:
you'd parse twice each token. Perhaps I can simply combine all rules
with common left edges automatically to avoid weirdness. For example:
INT : ('0'..'9')+ ;
FLOAT : ('0'..'9')+ ('.' ('0'..'9')*)? | '.' ('0'..'9')+ ;
could be turned into a combine rule by the system:
INT_FLOAT
: ('0'..'9')+ ('.' ('0'..'9')*)?
| '.' ('0'..'9')+
;
or whatever.
Still, my lookahead computations are really twisted for lexical rules
when you can see past end of a rule--literally any character can follow
a token (if you consider erroneous input).
ANYWAY, who has thoughts on this? I'd like thoughts also from people
with *no* experience using DFA-based tools like lex/flex. Do ANTLR
lexers seem "natural"?
Thanks!
Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Co-founder, http://www.peerscope.com link sharing, pure-n-simple
Lecturer in Comp. Sci., University of San Francisco
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list