[antlr-interest] Parser acessing lexer rules info for error recovery

Thiago Silva thiago.silva at kdemail.net
Wed Dec 7 09:25:52 PST 2005


Oh-oh, I guess I made a little confusion. my mistake! 

Turns out that I can't do a lexer rule T_DATATYPE naturally, cause it would 
conflict with a T_IDENTIFIER (wich matches standard letters/numbers and so 
on, showing all those nondeterminism warnings).

T_IDENTIFIER : T_LETTER ( T_LETTER | T_DIGIT )*;
T_DATATYPE : "foo" | "bar"; //show warnings 

So I move the "foo" and "bar" datatypes to tokens{} secion. Now, there is no 
way to group those tokens in a lexer rule, since the tokens in tokens{} 
secion (sadly) are not visible to the lexer section:

tokens { T_FOO="foo"; T_BAR="bar"; }
T_DATATYPE : T_FOO | T_BAR; //error, T_FOO and T_BAR not declared


Latter, I created a parser rule for datatype:

datatype : T_FOO | T_BAR; //T_FOO and T_BAR declared on lexer's tokens{} 
section

But, in the error handling functions, there is no way to check if a given 
token is a datatype on a single call/comparision (ie. call the generated 
function datatype() and pass it a token to analize).

So, my last alternative (the one that motivated me to write to the list) was 
to call testLiteralsTable() on my catch[] blocks. This wouldn't solve the 
problem in a clean way, but would help. But then, that doesn't seem to be 
possible (AFAIK there is no way to access this method from the parser).

I wonder now what kind of new features v3 would have.
Is there any docs informing that?

On Wednesday 07 December 2005 18:13, Peggy Fieland wrote:
> assuming you meant
>
>   class somelexer extends lexer ...
>
> in your parser you can say:
>   if (LA(1) == T_DATATYPE) ...
> or
>   if (LT(1).getText() == "FOO" ...
>
> If you actually have to check something against the
> keyword tokens you'll
> need to be a bit trickier.
>
> --- Thiago Silva <thiago.silva at kdemail.net> wrote:
> > Yeah, thats something I've already done. But, still,
> > I have to keep the
> > function up to date. Not that this is hard (they are
> > all simple rules, at
> > most), but it's not good having to replicate
> > something I've already defined
> > elsewhere.
> >
> > I was wondering if wouldn't be possible to acess the
> > lexer and ask it if a
> > given token matches a given rule. For me, it seems
> > natural to be able to do
> > such thing. But, so far, looking at the sources, it
> > doesn't seem possible.
> >
> > thanks for your reply,
> > Thiago
> >
> > On Wednesday 07 December 2005 16:50, you wrote:
> > > If you need to make the same test in multiple
> >
> > catch
> >
> > > blocks, you can
> > > write a function to do it and just call it in all
> >
> > the
> >
> > > catch blocks.
> > >
> > > --- Thiago Silva <thiago.silva at kdemail.net> wrote:
> > > > Hello,
> > > > I'm having a problem on parser error recovery.
> >
> > Some
> >
> > > > times I need lexer rules
> > > > info to proceed with the recovery. But I'm not
> >
> > so
> >
> > > > sure the way to proceed,
> > > > after reading the manual, the generated sources
> >
> > and
> >
> > > > antlr sources.
> > > >
> > > > As a simple example:
> > > >
> > > > ----------------------
> > > > class SomeParser extends Parser
> > > >
> > > > somerule : (....);
> > > >
> > > > exception
> > > > catch[...] {
> > > >    //here I need to check, for instance, if
> >
> > LA(1) is
> >
> > > > a T_DATATYPE
> > > >   //or if LA(1) belongs to the token section
> > > > (testLiteralsTable()?)
> > > > }
> > > >
> > > >
> > > > class SomeParser extends Lexer;
> > > >
> > > > T_DATATYPE : T_FOO | T_BAR
> > > >
> > > > T_FOO: "foo";
> > > > T_BAR: "bar"
> > > > ----------------------
> > > >
> > > > Now, what I'm doing is checking the LA in the
> >
> > catch
> >
> > > > block, manually writing
> > > > (again) the members of T_DATATYPE:
> > > >
> > > > catch[..]
> > > >  if(la_token == T_FOO | la_token == T_BAR) {
> > > >    //print a warning message
> > > >  } else {
> > > >    //print a different warning message
> > > >  }
> > > >
> > > >
> > > > So, if T_DATATYPE changes, I would have to
> >
> > update
> >
> > > > all the catch[] blocks that
> > > > checks for T_DATATYPE.
> > > >
> > > > Did I miss something in the docs? Or is this the
> > > > only way possible to do it?
> > > > By the way, I'm using C++.
> > > >
> > > > Thanks in advance,
> > > > Thiago


More information about the antlr-interest mailing list