[antlr-interest] natural-language parsing problem: How to distinguish between special words and regular words

Sven Prevrhal sven.prevrhal at ucsf.edu
Tue Jan 27 17:11:29 PST 2009


I want to parse recipes. How can I distinguish (for instance) between a
measuring unit such as "cups" and other general words?

 

If I do 

 

WORD:

                LETTER+;

UNIT:

                "cups";

 

the lexer will emit WORD for "cups" as well at least that's what I see
happening. I tried

 

WORD:

        u=UNIT { 
          $u.setType(UNIT);

          emit($u);

                } | LETTER+;

 

but that causes an error saying that UNIT can never be matched.

 

If I place the burden on the parser say as

 

unit:

                w=WORD 

{

                if ($w == "cups") return $w; 

}

;

 

and the WORD token is actually not a unit I have lost the token to the
parser. Should I / How can I place that nonmatch token back into the token
stream? Or what's the solution to that??

 

Thanks a lot - Sven

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090127/4ab61f54/attachment.html 


More information about the antlr-interest mailing list