[antlr-interest] How can I ignore reserved words in certain cases ?

Randall R Schulz rschulz at sonic.net
Thu Nov 30 08:20:24 PST 2006


JeanChristophe,

On Thursday 30 November 2006 00:36, JeanChristophe Gautier wrote:
> Hi,
>
> I am writing a command line editor that has reserved words, such as
> "print", that should accept, at times, any string value. For example
> the following should be allowed:
>
> print hello
> print print
>
> The grammar is defined as follows:
>
>...

I have a similar issue with a grammar for which I used JavaCC. I believe 
the technique I used there would apply in ANTLR, as well.

Basically, I define all my keywords as specific lexical tokens. In 
contexts where they're allowable, I have a non-terminal (parser rule) 
that matches either a generic word (which excludes the keywords because 
they're caught by the lexer as such) along with the keywords as 
explicit alternatives. Then any place that the keywords are in effect 
uses the regular, generic "word" token and in any grammatical context 
in which the keywords are not special and must be treated like ordinary 
words uses the rule that explicitly allows them. That way you only need 
to enumerate the keywords once (well, twice, once in the lexical 
specification and once in the rule that allows them).

To sketch it out as a minimal example:

KEYWORD1 : "keyword1" ;
KEYWORD2 : "keyword2" ;
KEYWORD3 : "keyword3" ;

KEYWORD: (KEYWORD1 | KEYWORD2 | KEYWORD3) ;

WORD : 'a'..'z' ( ( 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' ) * ) ;

ANYWORD : KEYWORD1 | KEYWORD2 | KEYWORD3 | word ;

keywordPhrase : KEYWORD <<<more stuff>>> ;

plainPhrase : <<<left stuff>>> ANYWORD <<<right stuff>>> ;


I think the only thing you have to look out for is ambiguity, but that's 
always true...

Would this approach work for you?

Randall Schulz


More information about the antlr-interest mailing list