[antlr-interest] Lexer too quick to grab a token?
Todd O'Bryan
toddobryan at gmail.com
Sun May 1 16:19:02 PDT 2011
I have created an obnoxious grammar and need help lexing it.
Basically, a left-bracket plus a string represents an open tag, and
there's a matching close tag with a right bracket. If you really want
a bracket, you type the bracket twice.
To be concrete,
[/ this is text in a tag /]
should lex as
L_TAG(text="[/") ... tokens representing "this is text in a tag" ...
R_TAG(text="/]")
The problem comes when I want to explain this grammar using the grammar.
To put stuff in a tag, type [[/ stuff /]]
should lex as
... lots of tokens ... L_BRACKET(text="[[") ... tokens representing "/
stuff /" ... R_BRACKET(text="]]")
Unfortunately, I can't figure out how to keep the lexer from matching
"/]" as an R_TAG and then having the extra "]" left over.
Conceptually, what I'd like to do is say that R_TAG matches a
character of the appropriate type followed by ']', as long as there's
no ']' immediately after. If there is are two right brackets after the
character, the lexer should make those a R_BRACKET token and make the
first character a simple text token.
Does this make any sense? Is there some way to deal with it?
Thanks,
Todd
More information about the antlr-interest
mailing list