[antlr-interest] natural-language parsing problem: How to distinguish between special words and regular words
Markus Stoeger
spamhole at gmx.at
Wed Jan 28 11:05:02 PST 2009
Sven Prevrhal wrote:
>
> If I do
>
> WORD:
>
> LETTER+;
>
> UNIT:
>
> “cups”;
>
> the lexer will emit WORD for “cups” as well at least that’s what I see
> happening.
>
The reason why you always get a WORD token is that the lexer returns the
token that was defined first when more than one token match the same
amount of characters. You might want to try to swap the order these
rules are defined in.
> If I place the burden on the parser say as
>
> unit:
>
> w=WORD
>
> {
>
> if ($w == “cups”) return $w;
>
> }
>
> ;
>
> and the WORD token is actually not a unit I have lost the token to the
> parser. Should I / How can I place that nonmatch token back into the
> token stream? Or what’s the solution to that??
>
I don't know much about your grammar, but if you want to do this then
gated semantic predicates might help:
unit: {IsUnit(input.LA(1).Text)}?=> WORD;
This uses the IsUnit method to decide if the next token is a unit
without taking the token off the token stream. If IsUnit returns false
the unit rule will be invisible to the parser and it will try to find
another rule that matches.
Markus
More information about the antlr-interest
mailing list