[antlr-interest] Development of an XQuery parser with full-text extensions, project report

Johannes Luber jaluber at gmx.de
Tue Dec 25 10:23:29 PST 2007


Andreas Ravnestad schrieb:
> Hi all,
> 
> This autumn a fellow student and myself have been working on an XQFT
> parser front end, putting ANTLR to the test as the designated parser
> generator. Here is the complete project report:
> http://folk.ntnu.no/andrerav/report.pdf
> 
> The report details a fair amount of ANTLR features (all the way from
> lexing to AST generation), as well as some limitations. I figured you
> guys might be interested in checking it out :)
> 
> -Andreas

I'd like to say that everyone doing more complex things with ANTLR
should read it, as it contains some interesting things. I noticed that
your UnbufferedCommonTokenStream could be improved somewhat.
fillBuffer(int k) has some code duplication which can removed by
applying the one-and-a-half loop pattern:

protected void fillBuffer(int k)
{
   int no = 0;
   do
   {
      Token t = tokenSource.nextToken();
      if (t==null || t.getType()==CharStream.EOF)
          break;
      t.setTokenIndex(tokenIndex);
      tokens.add(t);
      tokenIndex++;
      if(t.getChannel()==channel)
         if(++no == k){
            p = skipOffTokenChannels(p);
            break;
         }
   } while (true);
}

In case, you and other people aren't convinced of my position, I'd like
to refer you to <http://david.tribble.com/text/goto.html> and
<http://www.cis.temple.edu/~ingargio/cis71/software/roberts/documents/loopexit.txt>.

Regarding the lexer: How did you know exactly that you have to
distinguish a certain situation with a state? Would it be possible to
change the lexer that it would relex the input after changing the state?
That seems to be easier than to reduce the token lookahead (which may
nonetheless cross borders via syntactic predicates). FYI, in ANTLR 3.1
all automatic recovery has been removed - at least it should happen
there. A glance into Lexer.java tells me that nextToken() still has the
same unfortunate behaviour with no added throws-clause. Maybe Ter didn't
get to it yet.

Johannes




More information about the antlr-interest mailing list