[antlr-interest] problem with unicode characters in comments within ANTLR .g files ...

Tue May 27 05:52:26 PDT 2008

Tried all the various combinations of \', etc, same errors. 

I've worked around it be dropping the comments for now.

Dave Raymer
D: (817)-245-6834 	M:(817)501-2665 	ICBM: 32.9° N 97.2° W 
All that is necessary for the triumph of evil is for good men to do nothing
-- attributed to Edmund Burke 

-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu] 
Sent: Friday, May 23, 2008 1:47 PM
To: Raymer David-fdr017
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] problem with unicode characters in comments within ANTLR .g files ...

Hi. v3.1Gives you a better error message, though still does not like  
those characters. basically, the comment scanner should look to the  
end of line, but apparently it reuses a rule in the grammar lexer that  
skips over strings and so on.  Try \' etc...
Ter
On May 23, 2008, at 7:19 AM, Raymer David-fdr017 wrote:

> Found what may be a problem ...
>
> The follow ANTRL v3 fragment generates just fine ...
>
> O_SQUOTE     : '\u2018'; //
> C_SQUOTE     : '\u2019'; //
> O_DQUOTE     : '\u201C'; //
> C_DQUOTE     : '\u201D'; //
>
> This fragment generates an exception ...
>
> O_SQUOTE     : '\u2018'; // '
> C_SQUOTE     : '\u2019'; // '
> DQUOTE       : '\"';
> O_DQUOTE     : '\u201C'; // "
> C_DQUOTE     : '\u201D'; // "
>
> [09:04:26] BPL.g:456:24: unexpected char: ' '
>  at org.antlr.tool.ANTLRLexer.nextToken(ANTLRLexer.java:321)
>  at  
> antlr 
> .TokenStreamRewriteEngine.nextToken(TokenStreamRewriteEngine.java:161)
>  at antlr.TokenBuffer.fill(TokenBuffer.java:69)
>  at antlr.TokenBuffer.LA(TokenBuffer.java:80)
>  at antlr.LLkParser.LA(LLkParser.java:52)
>  at org.antlr.tool.ANTLRParser.altList(ANTLRParser.java:1453)
>  at org.antlr.tool.ANTLRParser.rule(ANTLRParser.java:1236)
>  at org.antlr.tool.ANTLRParser.rules(ANTLRParser.java:655)
>  at org.antlr.tool.ANTLRParser.grammar(ANTLRParser.java:389)
>  at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:521)
>  at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:497)
>  at org.antlr.works.grammar.EngineGrammar.createNewGrammar(Unknown  
> Source)
>  at  
> org.antlr.works.grammar.EngineGrammar.createParserGrammar(Unknown  
> Source)
>  at  
> org.antlr.works.grammar.EngineGrammar.createCombinedGrammar(Unknown  
> Source)
>  at org.antlr.works.grammar.EngineGrammar.createGrammars(Unknown  
> Source)
>  at org.antlr.works.grammar.EngineGrammar.analyze(Unknown Source)
>  at org.antlr.works.grammar.CheckGrammar.run(Unknown Source)
>  at java.lang.Thread.run(Unknown Source)
> The problem appears to the be the non-\ encoded unicode characters.  
> Is this behavior expected?
>
> Dave Raymer
> Motorola Labs, Network Research CoE, NIRL Autonomics and Policy  
> Research Group
> Software and System Architect for Autonomics and Policy Research
> Distinguished Member of the Technical Staff
> Distinguished Fellow of the TeleManagement Forum
> D: (817)-245-6834       M:(817)501-2665         ICBM: 32.9° N 97.2° W
> All that is necessary for the triumph of evil is for good men to do  
> nothing
> -- attributed to Edmund Burke 
>
>