[antlr-interest] problem with unicode characters in comments within ANTLR .g files ...
Terence Parr
parrt at cs.usfca.edu
Fri May 23 11:46:52 PDT 2008
Hi. v3.1Gives you a better error message, though still does not like
those characters. basically, the comment scanner should look to the
end of line, but apparently it reuses a rule in the grammar lexer that
skips over strings and so on. Try \' etc...
Ter
On May 23, 2008, at 7:19 AM, Raymer David-fdr017 wrote:
> Found what may be a problem ...
>
> The follow ANTRL v3 fragment generates just fine ...
>
> O_SQUOTE : '\u2018'; //
> C_SQUOTE : '\u2019'; //
> O_DQUOTE : '\u201C'; //
> C_DQUOTE : '\u201D'; //
>
> This fragment generates an exception ...
>
> O_SQUOTE : '\u2018'; // ‘
> C_SQUOTE : '\u2019'; // ’
> DQUOTE : '\"';
> O_DQUOTE : '\u201C'; // “
> C_DQUOTE : '\u201D'; // ”
>
> [09:04:26] BPL.g:456:24: unexpected char: ' '
> at org.antlr.tool.ANTLRLexer.nextToken(ANTLRLexer.java:321)
> at
> antlr
> .TokenStreamRewriteEngine.nextToken(TokenStreamRewriteEngine.java:161)
> at antlr.TokenBuffer.fill(TokenBuffer.java:69)
> at antlr.TokenBuffer.LA(TokenBuffer.java:80)
> at antlr.LLkParser.LA(LLkParser.java:52)
> at org.antlr.tool.ANTLRParser.altList(ANTLRParser.java:1453)
> at org.antlr.tool.ANTLRParser.rule(ANTLRParser.java:1236)
> at org.antlr.tool.ANTLRParser.rules(ANTLRParser.java:655)
> at org.antlr.tool.ANTLRParser.grammar(ANTLRParser.java:389)
> at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:521)
> at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:497)
> at org.antlr.works.grammar.EngineGrammar.createNewGrammar(Unknown
> Source)
> at
> org.antlr.works.grammar.EngineGrammar.createParserGrammar(Unknown
> Source)
> at
> org.antlr.works.grammar.EngineGrammar.createCombinedGrammar(Unknown
> Source)
> at org.antlr.works.grammar.EngineGrammar.createGrammars(Unknown
> Source)
> at org.antlr.works.grammar.EngineGrammar.analyze(Unknown Source)
> at org.antlr.works.grammar.CheckGrammar.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> The problem appears to the be the non-\ encoded unicode characters.
> Is this behavior expected?
>
> Dave Raymer
> Motorola Labs, Network Research CoE, NIRL Autonomics and Policy
> Research Group
> Software and System Architect for Autonomics and Policy Research
> Distinguished Member of the Technical Staff
> Distinguished Fellow of the TeleManagement Forum
> D: (817)-245-6834 M:(817)501-2665 ICBM: 32.9° N 97.2° W
> All that is necessary for the triumph of evil is for good men to do
> nothing
> -- attributed to Edmund Burke
>
>
More information about the antlr-interest
mailing list