[antlr-interest] A couple of questions regarding literals and unicode

Terence Parr parrt at jguru.com
Fri Dec 6 13:51:32 PST 2002


On Friday, December 6, 2002, at 12:47  PM, davidjpenton2002 wrote:

> Greetings.  I am struggling a little with getting literals recognized.
> I seem to have problems getting non-alphabetic characters to be
> recognized in literals. For example:
>
> class P extends Parser;
>
> startRule
>   :  "<?xml" SOMETHING
>   ;
>
> class L extends Lexer;
> options
> {
>   charVocabulary="\003'..'\377';
> }
>
> SOMETHING : "abcd";
>
> The inclusion of the non-alphabetic characters "<?" in the literal
> seems to cause problems.

The literals in the parser are tested in the lexer, but you have to 
have a rule that matches those char.  <? is not matched by any rule so 
the lexer cannot return that token.

>
> As you might guess, I am trying to parse some xml.  So this leads to a
> more general question. Does antlr handle unicode?  The info on the
> website does not seem to make it clear whether it does or not.

It does and I'm thinking of making enhancements real quick before 2.7.2 
comes out.

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list