[antlr-interest] charVocabulary having no effect

Colm McHugh colmmagoo at yahoo.com
Mon Dec 13 11:43:40 PST 2004



Hi Andre,

My understanding (and experience) is that you are
going to get a lexer exception ("bad character" or
whatever) for any character that is not explicitly
used to define a token in your lexer (try defining the
lower-case letter range of ID as 'a'..'y', and you
should get an exception if you enter a 'z').

The charVocabulary is used if you define a token as
_not_ being a certain character or characters; then
the charVocabulary is used to determine the set of
characters the token can be. 

The classic case is a STRING token, the text of which
is often defined as "anything except the quote
character". What this really means is 'any
charVocabulary character except a quote'. If you
didn't specify a charVocabulary set, then your
charVocabulary would be the set of characters
explicitly used to define the tokens in your lexer.

Hope this helps,
Colm.

> 
> 
> I'm struggling a bit with charVocabulary. After
> getting lot's of
> strange "unexpected character" errors I figured that
> this was a rather
> important option. I therefore added
> 
>     charVocabulary = '\3'..'\377';
> 
> To my Lexer options.
> 
> But I'm still getting unexpected char errors. I have
> a fairly simple
> grammar with a non-greedy rule to match the contents
> of a specific
> portion. When the lexer encounters the char '=' in
> this portion it
> stops saying "unexpected character". If I then add
> this:
> 
> POINTLESS : '=' ;
> 
> The error goes away, but then it stops on some other
> char. This
> continues until I've added all the chars not listed
> in some rule in
> the lexer. So to be sure it seems I will have to
> explicitly list *all*
> the ASCII characters.
> 
> Grepping through the generated code I could not find
> a single
> reference to "charVocabulary" or "vocabulary". Is
> this option broken?
> 
> I'm using Antlr 2.7.4 on Linux (Mandrake 10.0) with
> Java 1.4.2.
> 
> The lexer definition from the grammar file:
> 
> class QuerySchemaLexer extends Lexer;
> options {
>     charVocabulary = '\3'..'\377';
>  	caseSensitiveLiterals = false;
> }
> 
> RPAREN : ')';
> LPAREN : '(';
> COLON  : ':';
> SEMI   : ';';
> COMMA  : ',';
> 
> ID
> options {
>   testLiterals = true;
>   paraphrase = "an identifer";
> }
> 	:	('a'..'z'|'A'..'Z'|'_')
> ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
> 	;
> 
> 
> WS  :   (   ' '
>         |   '\t'
>         |   '\r' '\n' { newline(); }
>         |   '\n'      { newline(); }
>         )
>         {$setType(Token.SKIP);} //ignore this token
>     ;
> 
> 
> 
> 
> 
>  
> Yahoo! Groups Links
> 
> 
>     antlr-interest-unsubscribe at yahoogroups.com
> 
>  
> 
> 
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list