[antlr-interest] Question
Gavin Lambert
antlr at mirality.co.nz
Wed Oct 31 01:50:55 PDT 2007
At 21:41 31/10/2007, Mikael Sandberg wrote:
>The language is basically stripped from all spaces before
>passed to the parser. It becomes difficult to parse and
>differetiate between for instance an ID and a literal
>folowed by a int, like in this short example:
[...]
>The input "bit 1" works fine but without the space "bit1" the
>parser or rather the lexer creates a token for "bit1" that is
>not part of the language. Is there a fast fix for this problem?
>You write in the book that this was a common situation and that
>ANTLR takes care of it but it seems that in this case it is
>not so.
The normal case is to have an additional lexer rule that
recognises and skips (or assigns to the hidden channel) any
whitespace.
Are you manually stripping the whitespace from the input yourself
before passing it to the lexer? If so, don't.
If the input is actually coming in like that, then your grammar is
infeasible. If an identifier is allowed to have trailing digits
and there is no requirement for separation between such an
identifier and a following number, then there is no way to
disambiguate it. For example, how could you tell whether "foo123"
was supposed to be itself or "foo 123" or "foo1 23" or even "foo1
2 3"?
More information about the antlr-interest
mailing list