[antlr-interest] Lexer consumes input but doesn't emit all tokens
Kevin Cummings
cummings at kjchome.homeip.net
Tue Aug 7 23:38:50 PDT 2012
On Aug 6, 2012, at 16:22, Glenn McGregor <glenn at fenris.net> wrote:
> NAME_LITERAL
> : '\\'? ALPHA_NUM ( ( ':' | '_' | '-' | ALPHA_NUM )* ALPHA_NUM )? ;
>
> ANY : . ;
>
>
>
> I would like the input
>
> test:ack:
>
> to arrive as two tokens, a NAME_LITERAL of 'test:ack', and a COLON.
>
> Instead, this input disappears entirely, but parses successfully.
>
> Any suggestions?
Your problem is the way you specified your NAME_LITERAL. After it lexes test:ack it sees the next : character and continues with the ()* loop. What you need to do is only recognize the : inside the loop if an ALPHA_NUM follows it. In that way you will stop the NAME_LITERAL after test:ack and be left with a : character.
The reason why adding the white space works is that your lexer handles it as a token delimiter and stops Lexington the NAME_LITERAL when it is encountered.
Look at some of the older examples for multi-line comments, how they handle the closing */ combinations and allow * characters inside the comment with a predicate checking the following LT. You can do the same thing with your internal : characters.
--
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232
(http://www.linuxcounter.net/)
More information about the antlr-interest
mailing list