[antlr-interest] Bug with number of Tokens in lexers? (was: XML QName Character Validation)
Martin Probst
mail at martin-probst.com
Mon Apr 7 01:44:24 PDT 2008
Hi,
> For NCName, I suggest you look only at the first character, then
> accept anything which is not a delimiter (e.g. ":", space, angle
> bracket, etc.. After the match, call a routine to check that the
> match is a valid name This has two advantages:
The trouble is that only looking at the first character doesn't really
help - I'm already in trouble with the first decision.
I think I'm running into either a undocumented (= unknown to me ;-))
limitation of ANTLRs lexer generation, or a bug.
The attached lexer grammar is the lexical part of my XQuery grammar.
I've commented out most of the tokens section, see the block comment
starting at line 37.
The weird thing is that with those tokens up to line 36, everything
works as expected. If I comment in one more token (e.g. include the
EVERY token), ANTLR suddenly starts complaining about ambiguities in
the file.
If I replace the complex letter rule with a simpler 'a'..'z' |
'A'..'Z', everything works fine again and I can have (apparently) as
much tokens as I want.
How does this happen? Is there a limit to the number of decisions in a
lexer?
Thanks for your help,
Martin
More information about the antlr-interest
mailing list