[antlr-interest] identifier vs keyword issue

Jim Idle jimi at temporal-wave.com
Thu Apr 15 14:16:21 PDT 2010


Don't define the tokens as inline literals. Make them 'real' lexer tokens:

KEY1: 'key1';
KEY2: 'key2';
IDENT LETTER (LETTER|'0'..'9')* ;

Then:

ident
 : IDENTIFIER
 ;

keywId
 : IDENTIFIER -> IDENTIFIER
 | KEY1 -> IDENTIFIER[KEY1]
 | KEY2 -> IDENTIFIER[KEY2]
 ;


And use the keywId in your pragma statement and ident anywhere that keywords are not valid. In fact as a more general practice, you could allow keywords all the time, then reject them using a semantic error rather than syntactically via the parser. However you need to be careful to add some single token predicates when there is an ambiguity (or put up with lots of warnings).

There are other ways to do this, such as not defining the keywords and just checking the string contained in IDENTIFIER with a predicate, but I find that looks awkward myself.

Jim



> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Joep Suijs
> Sent: Thursday, April 15, 2010 1:02 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] identifier vs keyword issue
> 
> Hi guys,
> 
> 
> I am working on parser for JAL, 'Just Another Language' for Microchips
> PIC microcontrollers.
> 
> Within JAL you can define a variable like:
> 
>    var byte data
> 
> where 'data' is a valid identifier, matched by
> 
>    IDENTIFIER : LETTER (LETTER|'0'..'9')* ;
> 
>    fragment LETTER : 'A'..'Z' | 'a'..'z' | '_' ;
> 
> 
> Now I want to add support for pragma statement, used for the setup of
> the microcontroller. An example (one of quite a few) of this is:
> 
>    pragma  data    0x20-0x6F,0xA0-0xEF
> 
> With:
> 
> pragma
>     : 'pragma'^ (
> 	( 'target' pragma_target )
> 	| ( 'data' constant '-' constant (',' constant '-' constant)* )
> 
>     )
>     ;
> 
> This works okay, but does break the parsing of identifiers like 'data'.
> 
> How can antlr (in general) handle keywords within a specific context,
> while retaining the possibility to use these keywords as identifiers
> in the general context?
> 
> Any advice is appreciated!
> 
> Joep
> 
> PS Amongst others, the full grammar (current state, not completed) is
> at http://code.google.com/p/jallib/source/browse/#svn/trunk/grammar
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list