[antlr-interest] Does lexer EVER use longest match?

J. Stephen Riley Silber jsrs701 at yahoo.com
Tue Oct 13 09:00:58 PDT 2009


Ter,

This is an issue that enough of us run into and eventually figure out how to hack around.  I honestly can't remember if it's covered in the First Book, but might this be a candidate for inclusion in a later edition, a kind of anti-pattern to be on the lookout for?




________________________________
From: Graham Wideman <gwlist at grahamwideman.com>
To: antlr-interest at antlr.org
Sent: Tue, October 13, 2009 2:19:08 AM
Subject: Re: [antlr-interest] Does lexer EVER use longest match?

(Kirby sent me a lengthy email on observations about ANTLR's lexer behavior, prompting me to nose around some more...)

Thanks Kirby for your lengthy comments.

Actually, I guess I was aware that mToken would indeed choose the longest *fixed-length* alternative -- what I was really puzzling over was patterns where the length of the match isn't fixed -- so some pattern with repeats or alternate combinations in the middle.

I now see that doing this can prompt ANTLR to produce a DFA for mToken to use (instead of ifs and switches), and this indeed seems to be able to find the longest match for alternative variable-length patterns.  Example:

-------------------------------
grammar Test02;

file
  : (X1 | X2 | X3 | X4 | X5 | X6)+ EOF
  ;

X1: 'ab';
X2: 'cd';
X3: 'ef';
X4: 'xy'; 
X5: 'abc' .* 'pq';
X6: X1 .* X4;
-------------------------------
(And noting the special non-greedy behavior of .*)

This correctly eats: 'abcdefpqxy' as X6, despite the possibility of getting sidetracked onto X5.

OK, I think I'm satisfied on this particular point. Within the scope of the lexer, it does at least make an effort to choose the longest match. I'm not sure what limitations there may be on this. (And obviously since lexing is independent of parsing, ANTLR's code does not make alternative lexer decisions based on ability to *parse* a larger chunk.)

-- Graham




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091013/c90d1a50/attachment.html 


More information about the antlr-interest mailing list