[antlr-interest] XPath - identifiers are keywords

mzukowski at yci.com mzukowski at yci.com
Wed Jul 3 06:40:58 PDT 2002


This is actually a difficult problem.  See
http://www.jguru.com/faq/view.jsp?EID=140.  If there aren't too many of the
keywords that can be identifiers then you can handle it in the parser by
getting rid of your COMMENT rule and testing for it in your parser.  Instead
of testing for NCNAME, you would have a rule ncname, e.g.

ncname: NCNAME | "comment";

Then where you have ambiguities you disambiguate with a syntactic predicate:

whatever:
	("comment" LPAREN RPAREN)=> commentRule
	| NCNAME
	;

You could also set up a middle stage parser to handle your approach of
making "comment()" into COMMENT LPAREN RPAREN but leaving "comment" as
NCNAME.  See http://www.codetransform.com/filterexample.html.  But then you
have to be sure that you will never see comment() in a place where you could
see NCNAME.  In other words, will "comment()" ALWAYS be COMMENT LPAREN
RPAREN?

Also, bob at werken.com has done an XPATH parser, not sure if it is in ANTLR
or not but he knows his ANTLR well so if not then there's a reason.
http://jaxen.org/faq.html

Monty


> -----Original Message-----
> From: John Merrells [mailto:merrells at sleepycat.com]
> Sent: Tuesday, July 02, 2002 3:29 PM
> To: antlr-interest
> Subject: [antlr-interest] XPath - identifiers are keywords
> 
> 
> 
> This is probably simple, but I've failed to find an explicit answer
> in the documentation or the FAQ. I have a language (XPath) and
> it has some keywords that can also be identifiers. How do I set
> up my lexer rules for this? The identifier is called NCNAME and
> is defined thus:
> 
> k=2;
> 
> NCNAME
>  : (LETTER|'_') (NCNAMECHAR)*
>  ;
> 
> protected
> NCNAMECHAR: LETTER|'0'..'9'|'.'|'-'|'_';
> 
> protected
> LETTER: ('a'..'z'|'A'..'Z');
> 
> The keyword is 'comment', and when it's a keyword it is always
> followed by '(' ')'. So I tried just adding...
> 
> COMMENT: "comment()";
> 
> But that's ambiguous, so I tried adding a lexer rule to disambiguate
> between them, but I either got that wrong, or it didn't work. Should
> I increase k to be length("comment()")? That seems like overkill.
> I'm sure there's something obvious that I'm missing here...
> 
> John
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/ 


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list