[antlr-interest] Question

Wed Oct 31 01:50:55 PDT 2007

At 21:41 31/10/2007, Mikael Sandberg wrote:
 >The language is basically stripped from all spaces before
 >passed to the parser. It becomes difficult to parse and
 >differetiate between for instance an ID and a literal
 >folowed by a int, like in this short example:
[...]
 >The input "bit 1" works fine but without the space "bit1" the
 >parser or rather the lexer creates a token for "bit1" that is
 >not part of the language. Is there a fast fix for this problem?
 >You write in the book that this was a common situation and that
 >ANTLR takes care of it but it seems that in this case it is
 >not so.

The normal case is to have an additional lexer rule that 
recognises and skips (or assigns to the hidden channel) any 
whitespace.

Are you manually stripping the whitespace from the input yourself 
before passing it to the lexer?  If so, don't.

If the input is actually coming in like that, then your grammar is 
infeasible.  If an identifier is allowed to have trailing digits 
and there is no requirement for separation between such an 
identifier and a following number, then there is no way to 
disambiguate it.  For example, how could you tell whether "foo123" 
was supposed to be itself or "foo 123" or "foo1 23" or even "foo1 
2 3"?