[antlr-interest] Simple parsing question

John B. Brodie jbb at acm.org
Fri Sep 5 20:57:09 PDT 2008


Greetings!

On Friday 05 September 2008 10:58:35 pm George J. Shannon wrote:
> Attached is a snippet of the grammar in question, where tagCommentNbr is
> the integer value enclosed in brackets that I referred to in my email post.
> George
>
> tagCommentElement returns [ParserTagCommentElement pTagCommentElement]
> @init 	{
> 	pTagCommentElement = new ParserTagCommentElement(); //db not req'd
> 	}
>
> 	tagCommentNbr (elementName)?
> 	{
> 	pTagCommentElement.tagCommentNbr = $tagCommentNbr.text;
> 	pTagCommentElement.elementName = $elementName.text;
> 	}
> 	;
>
> tagCommentNbr
>
> 	'[' IntValue  ']'
> 	;
>
> elementName
>
> 	'.' alphaN
> 	;
>
> IntValue
>
> 	('0'..'9')+
> 	;
>

The above snippets from your Grammar are semi-useful. It would be best if you 
post the smallest, simplest, yet *COMPLETE* Grammar that exhibits your 
problem at hand.  That way others may be better able to simply try the 
grammar in order to work out where the problem lies.

However I see a reference to a Parser Rule - alphaN - in your snippet above 
which leads me to speculate that you have utilized a '0' in that rule (or 
perhaps elsewhere). 

If you have used '0' in a Parser Rule then that means that a single 0 is a 
KEYWORD in your language, e.g. a separate Token that will be emitted by your 
Lexer.

Recall that ANTLR Lexers are greedy and will match the longest sequence 
possible. But when a given sequence matches more than one lexer rule, the 
rule that appears first wins.

So a sequence of "00" is greedily identified as an InvValue token. But a 
single "0" might match both the IntValue and the '0' token from a Parser Rule 
(this is speculation on my part based on your above grammar snippet that 
alphaN might refer to a '0' inside). So now I postulate that you have two 
lexer rules that can match the single "0" - the explicit IntValue and the 
implicit '0' parser ref. As it happens the  implicit tokens introduced by 
using quoted strings in the parser (e.g. the postulated '0') are considered 
to be first when breaking such a tie. So your tagCommentNbr when given the 
string "[0]" sees the three tokens '[', <implicit '0'>. and ']' rather 
than '[', IntValue, and ']'.

The mismatched token error message you are getting should be something 
like: "expecting IntValue, got '0'" or something similar.

Try making alphaN (and any other Parser Rule that involves single characters) 
into a Lexer Rule(s).

Hope this helps
   -jbb


More information about the antlr-interest mailing list