[antlr-interest] dfa-based lexers versus top-down antlr lexers
Greg Lindholm
glindholm at yahoo.com
Wed Apr 30 07:05:03 PDT 2003
I've done a couple projects using ANTLR and have not yet
been able to use the lexer. I've had to hand-craft my own
lexer's because of column positional token's issues.
I think it's a strength of ANTLR that you have multiple
choices on how to build a lexer.
Adding a dfa-based lexer option is a great idea.
(I wouldn't change the current lexer, I would add second
style.)
--- Terence Parr <parrt at jguru.com> wrote:
> Does anyone have an opinion concerning ANTLR's construction of
> top-down
> lexers versus more traditional dfa-based state machine solutions?
>
> I just got really irritated this week building the PostScript
> interpreter due to lexer lookahead issues.
>
> PROS
>
> Very readable lexers. ('0'..'9')+ turns into a while look you can
> debug/read.
>
> Very powerful...can literally parse (because you can call other
> rules)
> in the lexer.
>
> Consistent with all other antlr-generated recognizers. Lexers,
> parsers, tree parsers all use the same analyzer and code generator.
>
> Can execute an action anywhere during the recognition of a
> token...with
> DFA based systems you have to wait until you know the complete token
> has been match.
>
> CONS
>
> Some lexer rules are a huge pain to specify in ANTLR because of the
> limitations of lookahead.
>
> Lexer rules really confuse people since there are lots of special
> cases
> in the lookahead and stuff.
>
> I wonder if a compromise is possible. Use a DFA-based decision to
> determine which rule will match; can do arbitrary lookahead and all
> of
> that automatically to make a correct decision. The problem is speed:
>
> you'd parse twice each token. Perhaps I can simply combine all rules
>
> with common left edges automatically to avoid weirdness. For
> example:
>
> INT : ('0'..'9')+ ;
>
> FLOAT : ('0'..'9')+ ('.' ('0'..'9')*)? | '.' ('0'..'9')+ ;
>
> could be turned into a combine rule by the system:
>
> INT_FLOAT
> : ('0'..'9')+ ('.' ('0'..'9')*)?
> | '.' ('0'..'9')+
> ;
>
> or whatever.
>
> Still, my lookahead computations are really twisted for lexical rules
>
> when you can see past end of a rule--literally any character can
> follow
> a token (if you consider erroneous input).
>
> ANYWAY, who has thoughts on this? I'd like thoughts also from people
>
> with *no* experience using DFA-based tools like lex/flex. Do ANTLR
> lexers seem "natural"?
>
> Thanks!
> Ter
> --
> Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org
> Co-founder, http://www.peerscope.com link sharing, pure-n-simple
> Lecturer in Comp. Sci., University of San Francisco
>
>
>
>
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/
>
>
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list