[antlr-interest] Lexer question
Johannes Luber
jaluber at gmx.de
Mon Apr 9 03:15:24 PDT 2007
John Howard wrote:
> With Gavin Lambert input (Many thanks Gavin) I have moved my grammar
> forward, but still have an issue with one aspect. I'm trying to parse
> tokens such as '53xx' '6xxx' '3334' and the following simple grammar
> works if I have token SHAPE defined, but if I use shapeDist I get a
> mis-match of against ID for the first 'x'. 333x parses OK, but 33xx
> doesn't. I can's use SHAPE, because that causes other problems with the
> grammar. Is there any way I can get shapeDist to work?
>
> Thanks,
>
> John
>
> // This works
> dist : '^' SHAPE
> ;
>
> ID : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')+;
> SHAPE :(DIGIT (DIGIT|'*'|'x'|'X') (DIGIT|'*'|'x'|'X')
> (DIGIT|'*'|'x'|'X')); DIGIT : ('0'..'9') ;
> WS : (' '|'\r'|'\t'|'\n')+{$channel=HIDDEN;} ;
>
>
>
> // This fails
> dist : '^' shapeDist
> ;
>
> shapeDist
> : (DIGIT (DIGIT|'*'|'x'|'X') (DIGIT|'*'|'x'|'X') (DIGIT|'*'|'x'|'X'))
> ;
>
> ID : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')+;
> DIGIT : ('0'..'9') ;
> WS : (' '|'\r'|'\t'|'\n')+{$channel=HIDDEN;} ;
I haven't test my suggestions (without the whole grammar it may be
useless anyway), but maybe the problem is a non-determinism or an
ambiguity as described on page 287 of the Beta Book. The difference
between SHAPE and shapeDist is, that SHAPE is a lexer rule and shapeDist
is a parser rule. When using SHAPE, DIGIT may have to be a fragment rule.
Three other things I've noticed in your grammar: The first one is that
ID doesn't allow single character identifiers, as you use + and not *.
This looks as an oversight to me. The second thing that you should
factor (DIGIT|'*'|'x'|'X') out into another rule (possibly making it
also fragment). Lastly, you shouldn't use parentheses to group rules
elements, unless necessary. It is distracting over long rules like
SHAPE/shapeDist.
Best regards,
Johannes Luber
More information about the antlr-interest
mailing list