[antlr-interest] problem with unicode characters in comments within ANTLR .g files ...

Gavin Lambert antlr at mirality.co.nz
Sat May 24 05:13:41 PDT 2008


At 02:19 24/05/2008, Raymer David-fdr017 wrote:
>This fragment generates an exception ...
>
>O_SQUOTE     : '\u2018'; // '
>C_SQUOTE     : '\u2019'; // '
>DQUOTE       : '\"';
>O_DQUOTE     : '\u201C'; // "
>C_DQUOTE     : '\u201D'; // "
[...]
>The problem appears to the be the non-\ encoded unicode 
>characters. Is this behavior expected?

ANTLR grammars can't contain any Unicode characters, since the 
grammar itself is still parsed with ANTLR v2, and v2 can't cope 
with Unicode.  (Lexers generated by ANTLR v3 can recognise Unicode 
just fine though, but you need to escape them within the grammar.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080525/cc0da7c4/attachment.html 


More information about the antlr-interest mailing list