[antlr-interest] charVocabulary problem solved. :)

ooobles oobles at hotmail.com
Tue May 28 21:48:53 PDT 2002

Hi Again..

I took Bogden's idea of using the = as part of the token.  Thanks.. 
great example of looking outside the square.  I ended up with a 
couple of rules with one of them like this..

EMAIL	:	'='! '<' ( options {greedy=false;} : . )* '>'

Running a few tests through the parser I came across the following 

antlr.TokenStreamRecognitionException: expecting '>', found 'u'

on the following line:

smtp[1622]: 121 Statistics: duration=0.74 user=<bugtraq-return-4852-
stewart=websecure.com.au at securityfocus.com> id=H8zI sent=4571 
rcvd=347 srcif=eth1 src= cldst= 
svsrc= dstif=eth0 dst= op="To 1 
recips" arg=<F112nx5uGlshK8YeQVY0001437e at hotmail.com> result="250 OK" 
proto=smtp rule=12

I've tracked this down to the @ symbol in the 'user' value.  Can 
anyone explain why it should fail?  I have k=2.  The code antlr 
produced for the above rule follows.  The only spot I can see that 
this function could return is on the _tokenSet_1.member(LA(2)) call.

............................................ ding!

I just found my answer.  Amazing what thinking through a problem 
does.  I'll post it anyway just for anyone else who comes across the 

This has to do with the charVocabulary option.  Antlr only matches 
the characters used in the lexer.  This is what _tokenSet_1 is.. a 
list of valid characters.  For the lexer to recognise the general 
eight-bit character set I need to add charVocabulary = '\3'..'\377';

Just tested and it works.



Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

More information about the antlr-interest mailing list