[antlr-interest] Q: how to incorporate a preprocessor in the flow?

Mon Apr 4 06:25:03 PDT 2011

I used a hand-crafted implementation of TokenSource between the lexer and
parser. In the preprocessor, whenever I manipulated a token I used a new
token class derived from CommonToken (call it SubstitutedToken) which
contained a linked list leading from the effective position in the stream
(stored in CommonToken) all the way back to the original location (file and
position) of the token definition. When a CommonToken substitution occurs,
the linked list has one node containing the original source position where
defined. Whenever a SubstitutedToken substitution occurs, a new node for the
token's previous effective position is added to the linked list and that new
head pointer is stored in the new token.

`define x 3
`define y `x
`y

In this case, token `y is eventually replaced with a SubstitutedToken which
appears at (line 2, column 1, length 1, text "3") containing the following
linked list:

Line 3, column 1, length 2 (list head, the location where `y was substituted
with `x)
Line 2, column 11, length 2 (the location where `x was substituted with '3')
Line 1, column 11, length 1 (the actual source location where the token '3'
is defined)

This list allows true relative ordering of all tokens in the processed
source: when two tokens appear to be at the same location in the
preprocessed stream, you simply compare the positions of the first node in
the position list.

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
Sent: Monday, April 04, 2011 12:13 AM
To: Martin d'Anjou
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in the
flow?

Hi Martin,

  I just completed an SV preprocessor which can parse UVM 1.0 successfully.
After 2 revisions I settled on a completely separate preprocessor(lexer and
parser). As you saw, you need to tokenize the macro_text in order to easily
support macros with arguments and detect the three escaped tokens `", `\`"
and ``. I'm not sure how well a lexer only approach could handle cases where
a macro substitution can merge text with a previously lexed token. The
separate approach still has flaws, such as good error reporting. Of course I
could be missing an obvious easy solution.

On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou <point14 at magma.ca> wrote:

> Hello,
>
> I am trying to find a way to incorporate a preprocessor in the ANTLR 
> flow. I thought of doing this before the lexer, but I need to tokenize 
> the incoming char stream for macro substitution to be easy. I thought 
> of doing it between the lexer and the parser, and replace the 
> preprocessor tokens with their expansion before feeding the token 
> stream to the parser, so I guess I would end up using something like 
> the TokenRewriteStream??? Can someone steer me in the right direction 
> please? Or should I be using lexer rule actions? In which case, any 
> example on how to access the token stream of the replacement token 
> list of an identifier? Too many questions sorry.
>
> The language I am hoping to tokenize is SystemVerilog and has C-like 
> preprocessor macros (`include, `ifdef, `define NAME(params,...), token 
> concatenation, etc.).
>
> Regards,
> Martin
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address