[antlr-interest] Understanding Lexer rules

Darien Hager darien.hager at etelos-inc.com
Wed Feb 20 15:00:46 PST 2008


[Drat. Yet again I click reply and don't fix the cc/to fields.]

Loring Craymer wrote:
>  In cases of ambiguity, the ordering of alternatives matters; for the lexer, that
> means the order in which they appear in the mTokens rule.  At present, I
> believe that that order reflects the order in which the rules are defined (IIRC, the
> rule names are kept in a OrderedHashSet).


Yeah, I'm not so good with the the DFA diagrams and theory. Okay, so
 to revise my mental-model-of-thumb:

 "The lexer keeps a set of "still possible" tokens, and inches through
the stream character by character. Each time, it iterates through the
remaining token definitions (in the order the appear in the grammar)
and removes the ones which do not match. It does some more checks so
it knows if it should error or not before starting anew, but if a
token is emitted it will have been that last-one-standing. "

This would explain why order only matters when two token definitions
have both ambiguity and equal length.

Ergo, if you have:

A: 'test' '0'..'9' ;
B: 'test2';
Z: 'a'..'z';

The parser will take the string "test2a" and generate  [A,Z].

If tokens were defined in the order BAZ inside the grammar, the lexer
would generate [B,Z]. (Not yet tested.)

-- 
Darien Hager
Developer
Etelos, Inc.
darien at etelos.com

http://www.etelos.com
"Revolutionizing the way applications are developed, distributed and consumed."

This e-mail message, including attachments, may contain confidential
information for the sole use of the intended recipient(s). If you are
not the intended recipient, then this is notice that any use,
disclosure, dissemination, distribution or copying is strictly
prohibited. If you have received this message in error please contact
the sender by reply mail and destroy all copies of the original
message.

This e-mail message, including attachments, may contain confidential information for the sole use of the intended recipient(s). If you are not the intended recipient, then this is notice that any use, disclosure, dissemination, distribution or copying is strictly prohibited.  If you have received this message in error please contact the sender by reply mail and destroy all copies of the original message.



More information about the antlr-interest mailing list