[antlr-interest] token-matching problem

DM donalmurtagh at yahoo.co.uk
Thu Jan 5 03:37:08 PST 2006


Hi,

I'm using ANTLR to process a file which consists of a series of nested blocks. Most of the blocks
have names which look similar to java identifiers and are matched by the lexer rule:

ID :	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'/')*
;

However one of the blocks looks like this:

SubscriptionManager:2
{  
}

Currently, the (simplified) parser rule I'm using to match this is:

subMgr : "SubscriptionManager"! ":"! "2"!
LBRACE!
RBRACE!
;

This rule doesn't add any nodes to the AST, but I need to change it in order to add
"SubscriptionManager:2" to the tree as a single token.

I tried modifying the ID lexer rule to be:

ID :	('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'/')* (":2")?
;

And changing the parser rule to:
	
subMgr : "SubscriptionManager:2"^
LBRACE!
RBRACE!
;

But this produced a NullPointerException in a subrule of subMgr (not shown in simplified form
above).

I also tried defining a new lexer token type (after undoing the above changes):

SUB_MGR	: ID ':' '2'
;

And changing the parser rule to match a token of this type:
	
subMgr : SUB_MGR^
LBRACE!
RBRACE!
;

This produces the error message: "Exception in thread "main" line 515:15: expecting ':', found
'\r'"

Line 515 contains the first significant (i.e. neither whitespace nor comment) token in the file.
So it seems as though my new SUB_MGR rule is being used as some kind of default. I don't really
understand why?

I guess I could just define a new lexer rule like

SUB_MGR : "SubscriptionManager:2"
;

And then match a token of this type in the parser, but I'd also have to increase the lookahead,
which is currently 3, to about 15 - and I really don't want to do this.

Thanks in advance,
DM


		
___________________________________________________________ 
NEW Yahoo! Cars - sell your car and browse thousands of new and used cars online! http://uk.cars.yahoo.com/


More information about the antlr-interest mailing list