[antlr-interest] Lexer strangeness

Austin Hastings Austin_Hastings at Yahoo.com
Sat Oct 6 02:40:32 PDT 2007


Clifford,

Before you get too involved in this problem, consider your statement,

"IDENTIFIER may not end with a digit, but VARIABLE might."

That right there tells me you are going to have grammar problems later, 
because you will never completely trust the lexer to determine 
IDENTIFIERs from VARIABLES - after all, what is "foo"?

You should probably establish a firm rule - variables always end in a 
digit - to simplify things. Or handle disambiguation in the parser, 
rather than in the lexer. (My favorite.)

=Austin



Clifford Heath wrote:
> New to Antlr, not to parser-generators. Hi all!
> I'm using ANTLRWorks downloaded today on a MAC under OS/X.
>
> I have the following snippet which works when I do it this way:
>
> IDENTIFIER: ID;
> fragment ID: LETTER ( ( LETTER|DIGIT )* LETTER)? DIGIT?;
> // with:
> fragment LETTER    :    ('_'|'A'..'Z'|'a'..'z');
> fragment DIGIT: '0'..'9';
>
> but not when I do it this way:
>
> IDENTIFIER: ID DIGIT?;
> fragment ID: LETTER ( ( LETTER|DIGIT )* LETTER)?;
>
> In the latter case it fails to match the IDENTIFIER 'f3', for example.
>
> I have it structured this way because in the full grammar,
> I have two similar things:
> IDENTIFIER: ID;
> VARIABLE: ID DIGIT?;
>
> where IDENTIFIER may not end with a digit, but VARIABLE might.
> Use of a VARIABLE always makes the lexer/parser fail. I thought
> I might have an greedy/non-greedy problem here, but even without
> the similar definitions, the above fails.
>
> What's happening here? Can someone point me in the right direction 
> please?
>
> Clifford Heath.
>
>
>



More information about the antlr-interest mailing list