[antlr-interest] dfa-based lexers versus top-down antlr lexers

Fri Apr 25 15:45:59 PDT 2003

I have no experience with DFA lexer construction.

Yes, the lookahead issues can be extremely tricky - they've caused me
confusion on several occasions, and it often took me a long time to figure
out what was going on.

The lexer that I had to build is a very miserable one, because it has macro
processing mixed right into it, and the macro processing is downright
psychotic. For example, I may have to do macro expansion or include file
expansion literally inside the middle of a token. It was very difficult to
build using Antlr's lexer construction, but for all I know, it may have
been much worse or even impossible to try to build with a DFA-based tool.

This may be completely off topic, but I have one other issue with lexer
construction, and that is a performance issue. For example, one of my
parser test runs, with profiling, showed that of the entire lex+parse, 18%
of the run was spent within a lexer comment rule! It spent so much time
there because I was unable to convince Antlr to generate code to first
process the most likely possibility within comments (any character other
than '*')... Antlr insisted on checking for "*/" first, and then processing
any other character. I don't know if a DFA-based tool would have been
better for this. Perhaps there's a way in Antlr that I don't know of to
force one check to be done prior to another. I worked around the issue by
coding the comment rule by hand, and there are other rules that I have my
eye on next for rewrites by hand.

Beyond that... I like the lexer construction. It's really nice to be able
to use the same Antlr syntax for lexer construction as for parser
construction.

My $.02.
John

--On 4/25/2003 1:20 PM -0700 Terence Parr wrote:

> Does anyone have an opinion concerning ANTLR's construction of top-down 
> lexers versus more traditional dfa-based state machine solutions?
> 
> I just got really irritated this week building the PostScript 
> interpreter due to lexer lookahead issues.
> 
> PROS
> 
> Very readable lexers.  ('0'..'9')+ turns into a while look you can 
> debug/read.
> 
> Very powerful...can literally parse (because you can call other rules) 
> in the lexer.
> 
> Consistent with all other antlr-generated recognizers.  Lexers, 
> parsers, tree parsers all use the same analyzer and code generator.
> 
> Can execute an action anywhere during the recognition of a token...with 
> DFA based systems you have to wait until you know the complete token 
> has been match.
> 
> CONS
> 
> Some lexer rules are a huge pain to specify in ANTLR because of the 
> limitations of lookahead.
> 
> Lexer rules really confuse people since there are lots of special cases 
> in the lookahead and stuff.
> 
> I wonder if a compromise is possible.  Use a DFA-based decision to 
> determine which rule will match; can do arbitrary lookahead and all of 
> that automatically to make a correct decision.  The problem is speed: 
> you'd parse twice each token.  Perhaps I can simply combine all rules 
> with common left edges automatically to avoid weirdness.  For example:
> 
> INT : ('0'..'9')+ ;
> 
> FLOAT : ('0'..'9')+ ('.' ('0'..'9')*)? | '.' ('0'..'9')+ ;
> 
> could be turned into a combine rule by the system:
> 
> INT_FLOAT
> 	:	('0'..'9')+ ('.' ('0'..'9')*)?
> 	|	'.' ('0'..'9')+
> 	;
> 
> or whatever.
> 
> Still, my lookahead computations are really twisted for lexical rules 
> when you can see past end of a rule--literally any character can follow 
> a token (if you consider erroneous input).
> 
> ANYWAY, who has thoughts on this?  I'd like thoughts also from people 
> with *no* experience using DFA-based tools like lex/flex.  Do ANTLR 
> lexers seem "natural"?
> 
> Thanks!
> Ter
> --
> Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org
> Co-founder, http://www.peerscope.com link sharing, pure-n-simple
> Lecturer in Comp. Sci., University of San Francisco
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
> 
> 
>  

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/