[antlr-interest] Re: C++ Parsers - charVocabulary option

therealtootalltimmy therealtootalltimmy at yahoo.com
Tue Jan 8 10:38:40 PST 2002


--- In antlr-interest at y..., Ric Klaren <klaren at c...> wrote:
> Hi,
> 
> On Mon, Jan 07, 2002 at 09:11:37PM -0000, therealtootalltimmy wrote:
> > I have a simple grammar that just handles comments.
> >
> > When I generate a Java parser and feed it a comment with a 
copyright symbol
> > in it, it works (does not complain about unexpected tokens).
> >
> > When I generate a C++ parser and feed it a comment with a 
copyright symbol
> > in it, it complains about an unexpected token.
> 
> Is your input file unicode? If so then you're unlucky.

Ric,
   Thanks a lot for replying to my question.  I failed to mention 
that I 1) am parsing ASCII input only and 2) that I am running on 
Windows 2000.

Here is the grammar that I'm having problems with:

/*
header "post_include_hpp"
{
#include <iostream>
using namespace std;
}

options
{
   language="Cpp";             // Generate C++ Code
   namespaceAntlr="antlr";
}
*/

class MyParser extends Parser;

foo
   : (COMMENT)+
   ;

class MyLexer extends Lexer;
options {
   charVocabulary='\003'..'\377';
}

WS
   : ( ' ' | '\t' )+
   ;

COMMENT
   : '\'' (~('\n'|'\r'))* (NEWLINE)?
   ;

NEWLINE
   : ( '\n' | '\r' '\n' )
   ;

By uncommenting the C++ specific settings I can build a C++ parser.

Here is my input:

' © lll

When I run my C++ parser on this file, I get:

unexpected char: <a character that looks like an upper left corner of 
an ASCII box>

Running my java parser on this file, I get no output.

In the C++ parser's MyLexer::mCOMMENT method, when LA(1) returns the 
copyright symbol, the else branch of:

if ((_tokenSet_0.member(LA(1))))

is executed and the loop is exited.

Thanks again for your help.

Tim


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list