[antlr-interest] Q: how to incorporate a preprocessor in the flow?

Wed Apr 6 08:12:03 PDT 2011

The usual way is to write a pre-processor that just sends the processed
source out to the parser with file and line number stamps that you store
in a table and cross reference with the tokens. Less complex
pre-processors such as C# are done within the lexer. Look at the way the C
pre-processor works for an example. When the pre-processor gets
complicated, then it is probably better as a separate phase in the tool
chain unless there is a sever performance penalty.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Phil Ratzloff
> Sent: Wednesday, April 06, 2011 6:33 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in
> the flow?
>
> This seems like a useful feature to have. Is it reasonable to consider
> making this easier in antlr4?
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of A Z
> Sent: Tuesday, April 05, 2011 2:16 AM
> To: Martin d'Anjou
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in
> the flow?
>
> I tried that approach when I first started with ANTLR but had
> difficulty handling arbitrary token rearrangement. Early on I couldn't
> figure out how to backtrack in the token stream in order to detect
> identifier construction using macros. Something like the following
> requires that 'prefix' be lexed again after macro substitution in order
> to detect if the string from suffix and 'prefix' will be merged into
> one identifier.
>
> define suffix(name) name
> prefix`suffix
>
> We use this often in RTL for bus port lists. Even though the spec seems
> to explicitly disallow this, Modelsim and DC will accept it. Lexing
> twice solves this case easily but now the tokens point to a non-
> existent source.
>
>
> On Mon, Apr 4, 2011 at 8:59 PM, Martin d'Anjou <point14 at magma.ca>
> wrote:
>
> > Hi,
> >
> > Thanks to both of you for sharing your approaches. Right now I am
> > pondering how to alter the sequence of tokens before they hit the
> > parser. Intuitively I want to have three processing units (lexer,
> > pre-processor, parser) connected together through io pipes of tokens
> > (e.g. token fifos), but this is not how ANTLR was architected (it's
> > how I would have done it in hardware though!).
> >
> > Martin
> >
> >
> >
> > On 11-04-04 09:25 AM, Sam Harwell wrote:
> >
> >> I used a hand-crafted implementation of TokenSource between the
> lexer
> >> and parser. In the preprocessor, whenever I manipulated a token I
> >> used a new token class derived from CommonToken (call it
> >> SubstitutedToken) which contained a linked list leading from the
> >> effective position in the stream (stored in CommonToken) all the way
> >> back to the original location (file and
> >> position) of the token definition. When a CommonToken substitution
> >> occurs, the linked list has one node containing the original source
> >> position where defined. Whenever a SubstitutedToken substitution
> >> occurs, a new node for the token's previous effective position is
> >> added to the linked list and that new head pointer is stored in the
> >> new token.
> >>
> >> `define x 3
> >> `define y `x
> >> `y
> >>
> >> In this case, token `y is eventually replaced with a
> SubstitutedToken
> >> which appears at (line 2, column 1, length 1, text "3") containing
> >> the following linked list:
> >>
> >> Line 3, column 1, length 2 (list head, the location where `y was
> >> substituted with `x) Line 2, column 11, length 2 (the location where
> >> `x was substituted with
> >> '3')
> >> Line 1, column 11, length 1 (the actual source location where the
> >> token '3'
> >> is defined)
> >>
> >> This list allows true relative ordering of all tokens in the
> >> processed
> >> source: when two tokens appear to be at the same location in the
> >> preprocessed stream, you simply compare the positions of the first
> >> node in the position list.
> >>
> >> Sam
> >>
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org
> >> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
> >> Sent: Monday, April 04, 2011 12:13 AM
> >> To: Martin d'Anjou
> >> Cc: antlr-interest at antlr.org
> >> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor
> in
> >> the flow?
> >>
> >> Hi Martin,
> >>
> >>   I just completed an SV preprocessor which can parse UVM 1.0
> >> successfully.
> >> After 2 revisions I settled on a completely separate
> >> preprocessor(lexer and parser). As you saw, you need to tokenize the
> >> macro_text in order to easily support macros with arguments and
> >> detect the three escaped tokens `", `\`"
> >> and ``. I'm not sure how well a lexer only approach could handle
> >> cases where a macro substitution can merge text with a previously
> >> lexed token. The separate approach still has flaws, such as good
> >> error reporting. Of course I could be missing an obvious easy
> >> solution.
> >>
> >>
> >>
> >> On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou<point14 at magma.ca>
> wrote:
> >>
> >>  Hello,
> >>>
> >>> I am trying to find a way to incorporate a preprocessor in the
> ANTLR
> >>> flow. I thought of doing this before the lexer, but I need to
> >>> tokenize the incoming char stream for macro substitution to be
> easy.
> >>> I thought of doing it between the lexer and the parser, and replace
> >>> the preprocessor tokens with their expansion before feeding the
> >>> token stream to the parser, so I guess I would end up using
> >>> something like the TokenRewriteStream??? Can someone steer me in
> the
> >>> right direction please? Or should I be using lexer rule actions? In
> >>> which case, any example on how to access the token stream of the
> >>> replacement token list of an identifier? Too many questions sorry.
> >>>
> >>> The language I am hoping to tokenize is SystemVerilog and has C-
> like
> >>> preprocessor macros (`include, `ifdef, `define NAME(params,...),
> >>> token concatenation, etc.).
> >>>
> >>> Regards,
> >>> Martin
> >>>
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> addre
> >>> ss
> >>>
> >>>  List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> addres
> >> s
> >>
> >>
> >>
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address