[antlr-interest] token-matching problem
DM
donalmurtagh at yahoo.co.uk
Thu Jan 5 03:37:08 PST 2006
Hi,
I'm using ANTLR to process a file which consists of a series of nested blocks. Most of the blocks
have names which look similar to java identifiers and are matched by the lexer rule:
ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'/')*
;
However one of the blocks looks like this:
SubscriptionManager:2
{
}
Currently, the (simplified) parser rule I'm using to match this is:
subMgr : "SubscriptionManager"! ":"! "2"!
LBRACE!
RBRACE!
;
This rule doesn't add any nodes to the AST, but I need to change it in order to add
"SubscriptionManager:2" to the tree as a single token.
I tried modifying the ID lexer rule to be:
ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'/')* (":2")?
;
And changing the parser rule to:
subMgr : "SubscriptionManager:2"^
LBRACE!
RBRACE!
;
But this produced a NullPointerException in a subrule of subMgr (not shown in simplified form
above).
I also tried defining a new lexer token type (after undoing the above changes):
SUB_MGR : ID ':' '2'
;
And changing the parser rule to match a token of this type:
subMgr : SUB_MGR^
LBRACE!
RBRACE!
;
This produces the error message: "Exception in thread "main" line 515:15: expecting ':', found
'\r'"
Line 515 contains the first significant (i.e. neither whitespace nor comment) token in the file.
So it seems as though my new SUB_MGR rule is being used as some kind of default. I don't really
understand why?
I guess I could just define a new lexer rule like
SUB_MGR : "SubscriptionManager:2"
;
And then match a token of this type in the parser, but I'd also have to increase the lookahead,
which is currently 3, to about 15 - and I really don't want to do this.
Thanks in advance,
DM
___________________________________________________________
NEW Yahoo! Cars - sell your car and browse thousands of new and used cars online! http://uk.cars.yahoo.com/
More information about the antlr-interest
mailing list