[antlr-interest] A "lexicon" for ANTLR

Wed Jan 29 10:51:50 PST 2003

On Wednesday, January 29, 2003, at 01:47 AM, ttsoris 
<ttsoris at yahoo.com> wrote:

> Hi,
> I was wondering if it is possible to create a parser with ANTLR
> to parse simple english sentences.

Sure...i've done it for a simple adventure-game like interface.  Simple.

>  The problem that I am thinking of
> is the amount of recognized words. If I create rules like:
> NOUN_SIN : "ant" | "cat" | "mouse" .... ;
> NOUN_PLU : "ants" | "cats" | "mice" .... ;
> with thousands of alternatives for nouns and some more for verbs and

except this part...

> other parts of speech, it will produce enormous source files
> and an enormous executable (and probably completely inefficient).
> Is there a way of specifing an external "lexicon" for the lexemes
> (words), that is more efficient? Or is there another way I haven't
> think of?

You have to create a data base that has the verbs then nouns etc...  
You must load that in at lexer start up, turn off global literals 
checking probably and then say:

ID : ('a'..'z')+ {$setType(lookupWord($getText));} ;

or something where the lookupWord() method just looks in the various 
tables to see what the "part of speech" that word is.

Don't do the vocabulary thing literally in ANTLR rules, use a few text 
files with the right words in them :)

Your problem is going to be ambiguities before you get too far.  Let's 
see...what's an example...a "bed" is a noun and a verb ;)

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/