[antlr-interest] natural-language parsing problem: How to distinguish between special words and regular words

Wed Jan 28 11:05:02 PST 2009

Sven Prevrhal wrote:
>
> If I do
>
> WORD:
>
> LETTER+;
>
> UNIT:
>
> “cups”;
>
> the lexer will emit WORD for “cups” as well at least that’s what I see 
> happening.
>

The reason why you always get a WORD token is that the lexer returns the 
token that was defined first when more than one token match the same 
amount of characters. You might want to try to swap the order these 
rules are defined in.

> If I place the burden on the parser say as
>
> unit:
>
> w=WORD
>
> {
>
> if ($w == “cups”) return $w;
>
> }
>
> ;
>
> and the WORD token is actually not a unit I have lost the token to the 
> parser. Should I / How can I place that nonmatch token back into the 
> token stream? Or what’s the solution to that??
>
I don't know much about your grammar, but if you want to do this then 
gated semantic predicates might help:

unit: {IsUnit(input.LA(1).Text)}?=> WORD;

This uses the IsUnit method to decide if the next token is a unit 
without taking the token off the token stream. If IsUnit returns false 
the unit rule will be invisible to the parser and it will try to find 
another rule that matches.

Markus