[antlr-interest] Accessing input stream object with ANTLR and C++?

Ric Klaren klaren at cs.utwente.nl
Thu Aug 29 05:26:10 PDT 2002


Hi,

On Tue, Aug 27, 2002 at 02:47:22PM -0600, Reid Rivenburgh wrote:
> So there's no good way to do my own reading of the input a la flex?

Well you have to go about it in a different way than in flex.

> The multiplexing idea you suggested looks interesting and sort of
> helps solve my problem, but in a sense it just pushes the problem down
> the line.

I don't quite see this.

> With flex, I was able to do my own reading of binary data using yyin after
> finding a token in the grammar.  Your suggestion seems to still imply that
> all of the input must match tokens in the grammar, whichever tokenstream
> they may be coming in on, but this extra data is outside the grammar in my
> (possibly wrong!) way of thinking about it.

I seem to have given a wrong impression, you can safely ignore parts of the
input (have a look at the preserveWhiteSpace example for example). Even a
multiplexed lexer need not return a token, when it is switched to/from.

> Perhaps some trick of finding a token, marking the location, finding the
> next token, and processing the data between the two as a string...?

Well this is what you can do with tokenstream multiplexing. You see the
token, you switch to another lexer untill the endmarker then you switch
back. Inside the special lexer for the part between the markers you can do
whatever you want with it. e.g. accumulate in a string, just ignore it,
feed it to something else.

You can also make a custom Lexer just subclass from the TokenStream class
and multiplex it like any other lexer. That way you can handcode something
for performance.

The one thing you have to keep in mind is that all the lexers operating on
the input should share the same LexerInputState (or LexerSharedInputState
to be precise). This structure is used to keep track the InputBuffer that
is attached to your input data and it keeps track of when the lexer is
using backtracking to solve a ambiguity and also the line/column
information is kept inside it.

Going around the lexer input state will of course yield funny but probably
very 'interesting' behaviour.

Maybe have a look at the doxygen info of the C++ support library you can
find a preliminary version on my antlr hacking page:

http://wwwhome.cs.utwente.nl/~klaren/antlr/

Or read through the code, see how the lexers work, keywords: InputBuffer,
LexerSharedInputState, CharScanner (consume/LA). Read the code generated
for a few lexers (preferably a few that use backtracking). That way you'll
easily get a feel for how it works.

HTH,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- klaren at cs.utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
     "Never argue with an idiot, for they will bring you down to their
              level and beat you with experience." --- Unknown


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list