[antlr-interest] Lexer consumes input but doesn't emit all tokens

Kevin Cummings cummings at kjchome.homeip.net
Tue Aug 7 23:38:50 PDT 2012


On Aug 6, 2012, at 16:22, Glenn McGregor <glenn at fenris.net> wrote:
> NAME_LITERAL
>     :    '\\'? ALPHA_NUM ( ( ':' | '_' | '-' | ALPHA_NUM )* ALPHA_NUM )? ;
> 
> ANY    :    . ;
> 
> 
> 
> I would like the input
> 
> test:ack:
> 
> to arrive as two tokens, a NAME_LITERAL of 'test:ack', and a COLON.
> 
> Instead, this input disappears entirely, but parses successfully.
> 
> Any suggestions?

Your problem is the way you specified your NAME_LITERAL.  After it lexes test:ack it sees the next : character and continues with the ()* loop.  What you need to do is only recognize the : inside the loop if an ALPHA_NUM follows it.  In that way you will stop the NAME_LITERAL after test:ack and be left with a : character.

The reason why adding the white space works is that your lexer handles it as a token delimiter and stops Lexington the NAME_LITERAL when it is encountered.

Look at some of the older examples for multi-line comments, how they handle the closing */ combinations and allow * characters inside the comment with a predicate checking the following LT.  You can do the same thing with your internal : characters.

--
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232
(http://www.linuxcounter.net/)


More information about the antlr-interest mailing list