[antlr-interest] Dumping out lexer token stream?

Sat Jun 23 09:35:05 PDT 2007

On 6/24/07, Randall R Schulz <rschulz at sonic.net> wrote:
> On Saturday 23 June 2007 01:28, Wincent Colaiuta wrote:
> > El 23/6/2007, a las 3:41, Cameron Esfahani escribió:
> > > To help with my debugging, I would like to see the tokenized output
> > > from the lexer.  Before the parser gets a chance at, well, parsing
> > > it.
> > >
> > > I can't seem to find anything in ANTLRWorks which will do this.
> > > Does anyone have any suggestions?
> > >
> > > Cameron Esfahani
> > > dirty at apple.com
> >
> > Normally the lexer is invoked automatically by the parser, which
> > repeatedly calls the "next token" method/function. So you can do the
> > same and watch the token stream that way. For example, in the C
> > target, something like the following (assuming you lexer is in the
> > variable "lexer"):
>
> Oddly enough, I wanted to do exactly the same right now when I've only
> written the lexical portion of my grammar.
>
> I wrote this test code (use a fixed-width font, of course):
>
>   CLIFLexer           lexer       = null;
>   PrintStream         out         = System.out;
>
>   try {
>     lexer = new CLIFLexer(new ANTLRFileStream(fileName));
>   }
>
>   catch (IOException exIO) {
>     System.err.printf("CLIF: Cannot open file \"\%s\"\%n", fileName);
>     return;
>   }
>
>
>   out.format("\%nParsing \"\%s\"\%n", fileName);
>
>   TokenStream         tokens      = new CommonTokenStream(lexer);
>   int                 nTokens     = tokens.size();
>
>   for (int iToken = 0; iToken < nTokens; iToken++) {
>     Token             token       = tokens.get(iToken);
>
>     out.format("\%6d: \%4d.\%3d: T\%3d-C\%3d; \"\%s\"\%n",
>                iToken,
>                token.getLine(), token.getCharPositionInLine(),
>                token.getType(), token.getChannel(),
>                token.getText());
>   }
>
>
> When I apply this to a file with lots of source code that matches the
> lexical grammar I've defined, I always get an nTokens value of 0.
>
> The JavaDoc comment on CommonTokenStream implies that it will scan the
> entire input and build a sequence of tokens in advance, yet that does
> not seem to be happening.
>
> And no exception is thrown (unless the file name is not valid).
>
>
> What am I missing / doing wrong?
>
The size method doesn't trigger the filling of the token buffer.
Calling the LT method (or one of the other methods that trigger the
buffer to be filled) before checking the size should fill the entire
buffer and make your code work correctly. Or you can use getTokens()
to fill the buffer and then operate on the returned reference to the
buffer.
Perhaps size() could be altered to either fill the buffer if it hasn't
already been or return -1 in this case as this seems like an obvious
pattern to use and without looking at the source it isn't clear what's
going on.

>
> > ...
> >
> > Cheers,
> > Wincent
>
>
> Randall Schulz
>

Tom.