[antlr-interest] Matching a token from only one rule?

Piper, Martin Martin.Piper at qg.com
Fri Oct 1 11:45:28 PDT 2010


Tokens are decided by the lexer, without regard to how they are eventually used in parser rules.
You really can't have tokens defined by what other tokens are around them, this is a parsing thing, so you can't have the lexer recognize a given string of characters as TOKEN1 in one portion of the input and TOKEN2 in another.
What are the rules for ID? 
If ID is allowed the same characters or a subset of the characters that DECL is allowed it will never be checked because DECL will match it first.

If they both allow the same characters have one token definition, and have the rules decide how that token is used. If in the end you want to have different token names, you can use rewrite rules to make that happen.

elem 	
	: declaration
	| assignment
;
declaration:
ID ';' -> DECL[ID]
;
assignment:
	ID '=' expr ';'
;

Also I'd recommend putting ';' and '=' into their own tokens. 

SEMI: ';' ;
EQUAL: '=' ;



-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Ryan Twitchell
Sent: Monday, September 27, 2010 7:40 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Matching a token from only one rule?

 Hi all,

At the start of one parser rule I would like, as one alternative, to
match nearly any input ending before a certain character value.  I would
like this to match as a single token if possible.  I am not sure how to
achieve this, and have tried a number of things so far.  Here is my best
shot so far:

elem
    :    DECL ';'
    |    ID '=' expr ';'
    ;

DECL: (DECL_CHAR+ ';') => DECL_CHAR+
    ;

fragment
DECL_CHAR
    :    ~(';'|'=')
    ;

Working with the above, ANTLR reports that tokens such as ID can never
be matched, since DECL matches them already.  I had not thought this
would be the case with a syntactic predicate in front of the alternative.


So far, I have only had success by incorporating the end character into
the token, as follows.  But I believe this will lead to the token
matching in other, unexpected places.

DECL:  DECL_CHAR+ ';'
    ;

The important problem is that I don't want DECL to match at other parts
of the grammar. 

TIA for any advice,

Ryan Twitchell


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list