[antlr-interest] passing stuff from lexer to parser

Tue Jan 1 21:34:11 PST 2008

On Jan 2, 2008 3:48 PM,  <siemsen at ucar.edu> wrote:
> Gavin,
>
> Thanks, that makes perfect sense.  It's certainly better than what I
> was trying to do with a HashMap.  I think I'm thinking about this
> more clearly now.
>
> I understand the idea, but I can't seem to implement it.  I have a
> "PragmaInclude" lexer rule that reads each include statement and
> switches the input stream to the new file.  It works.  I'd like to do
> what you suggest, and access the PragmaInclude token in the parser,
> so the parser can see the file name.  The odd thing is that the lexer
> doesn't seem to generate a PragmaInclude token.
>
> Attached is the grammar.  In it, the "compilerDirective" parser rule
> uses the PragmaInclude token.  I couldn't get compilerDirective to
> "fire" while parsing.  I discovered that I could comment the
> compilerDirective rule completely and the translator would still
> behave the same.  It seems to me that the lexer never creates a
> PragmaInclude token, even though the PragmaInclude definitely executes.
>
> What am I missing?
The call to Lexer.reset() clears the information token information
from the PragmaInclude rule. In fact the call to setCharStream calls
reset() also so this seems redundant and removing it won't solve the
issue (the extra call will additionally seek the new token stream to 0
but this shouldn't be needed). Rather than calling setCharStream you
could update input directly and not call reset though this is not
really advisable as future versions of ANTLR could easily break this
(I think 3.1 will).
Your design seems somewhat strange. Can the top level file also
include normal statements or only includes? Where does the output for
normal statements go? Can the included files contain includes and if
so what happens with the output for them?
It looks like you're processing a list of different input files to be
separately processed, not a file with includes. In that case I think
Gavin's suggestion of separately processing each file is better. Then
you top level grammar would just handle the include syntax and end up
with a list linking include file names to ASTs or templates or
whatever the result of processing each include is.

Tom.
>
> -- Pete
>
>
>
>
> On Jan 1, 2008, at 3:13 PM, Gavin Lambert wrote:
>
> > At 10:02 2/01/2008, siemsen at ucar.edu wrote:
> >> To handle the include statements, I use the mechanism described in
> >> the ANTLR Wiki page titled "How do I implement include files?".
> >> It works great.  It does its magic during the lexer phase.  So all
> >> the source files are lexed first into one big token stream, then
> >> the parser starts.
> >>
> >> Currently, my translator just emits output to standard out, as one
> >> text stream.  Now I'm ready to make it put the output into
> >> directories and files.  The source text is a set of things with
> >> names like CIM_DatabaseResourceStatistics, so I know what to name
> >> each output file.  I just need to know what directory to put each
> >> output file in.
> >
> >> During the lexer phase, I store the name-to-directory information
> >> in a HashMap.  So for example, the HashMap tells me that the
> >> output file named CIM_DatabaseResourceStatistics.java belongs in
> >> the output subdirectory named "Database".
> >>
> >> I need to pass the HashMap from the lexer to the parser.  Is there
> >> a good way to do it?  Am I thinking about the problem correctly?
> >
> > Probably the easiest way to do this is to pass an INCLUDE token up
> > to the parser that contains the full filename, and let the parser
> > reconstruct the HashMap itself.  Or you could use it in a scope
> > instead, since presumably everything else is logically contained
> > within one or more INCLUDEs.
> >
>
>
>