[antlr-interest] Q: how to incorporate a preprocessor in the flow?

Wed Apr 6 06:32:56 PDT 2011

This seems like a useful feature to have. Is it reasonable to consider making this easier in antlr4?

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
Sent: Tuesday, April 05, 2011 2:16 AM
To: Martin d'Anjou
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in the flow?

I tried that approach when I first started with ANTLR but had difficulty
handling arbitrary token rearrangement. Early on I couldn't figure out how
to backtrack in the token stream in order to detect identifier construction
using macros. Something like the following requires that 'prefix' be lexed
again after macro substitution in order to detect if the string from suffix
and 'prefix' will be merged into one identifier.

define suffix(name) name
prefix`suffix

We use this often in RTL for bus port lists. Even though the spec seems to
explicitly disallow this, Modelsim and DC will accept it. Lexing twice
solves this case easily but now the tokens point to a non-existent source.

On Mon, Apr 4, 2011 at 8:59 PM, Martin d'Anjou <point14 at magma.ca> wrote:

> Hi,
>
> Thanks to both of you for sharing your approaches. Right now I am pondering
> how to alter the sequence of tokens before they hit the parser. Intuitively
> I want to have three processing units (lexer, pre-processor, parser)
> connected together through io pipes of tokens (e.g. token fifos), but this
> is not how ANTLR was architected (it's how I would have done it in hardware
> though!).
>
> Martin
>
>
>
> On 11-04-04 09:25 AM, Sam Harwell wrote:
>
>> I used a hand-crafted implementation of TokenSource between the lexer and
>> parser. In the preprocessor, whenever I manipulated a token I used a new
>> token class derived from CommonToken (call it SubstitutedToken) which
>> contained a linked list leading from the effective position in the stream
>> (stored in CommonToken) all the way back to the original location (file
>> and
>> position) of the token definition. When a CommonToken substitution occurs,
>> the linked list has one node containing the original source position where
>> defined. Whenever a SubstitutedToken substitution occurs, a new node for
>> the
>> token's previous effective position is added to the linked list and that
>> new
>> head pointer is stored in the new token.
>>
>> `define x 3
>> `define y `x
>> `y
>>
>> In this case, token `y is eventually replaced with a SubstitutedToken
>> which
>> appears at (line 2, column 1, length 1, text "3") containing the following
>> linked list:
>>
>> Line 3, column 1, length 2 (list head, the location where `y was
>> substituted
>> with `x)
>> Line 2, column 11, length 2 (the location where `x was substituted with
>> '3')
>> Line 1, column 11, length 1 (the actual source location where the token
>> '3'
>> is defined)
>>
>> This list allows true relative ordering of all tokens in the processed
>> source: when two tokens appear to be at the same location in the
>> preprocessed stream, you simply compare the positions of the first node in
>> the position list.
>>
>> Sam
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
>> Sent: Monday, April 04, 2011 12:13 AM
>> To: Martin d'Anjou
>> Cc: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in the
>> flow?
>>
>> Hi Martin,
>>
>>   I just completed an SV preprocessor which can parse UVM 1.0
>> successfully.
>> After 2 revisions I settled on a completely separate preprocessor(lexer
>> and
>> parser). As you saw, you need to tokenize the macro_text in order to
>> easily
>> support macros with arguments and detect the three escaped tokens `", `\`"
>> and ``. I'm not sure how well a lexer only approach could handle cases
>> where
>> a macro substitution can merge text with a previously lexed token. The
>> separate approach still has flaws, such as good error reporting. Of course
>> I
>> could be missing an obvious easy solution.
>>
>>
>>
>> On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou<point14 at magma.ca>  wrote:
>>
>>  Hello,
>>>
>>> I am trying to find a way to incorporate a preprocessor in the ANTLR
>>> flow. I thought of doing this before the lexer, but I need to tokenize
>>> the incoming char stream for macro substitution to be easy. I thought
>>> of doing it between the lexer and the parser, and replace the
>>> preprocessor tokens with their expansion before feeding the token
>>> stream to the parser, so I guess I would end up using something like
>>> the TokenRewriteStream??? Can someone steer me in the right direction
>>> please? Or should I be using lexer rule actions? In which case, any
>>> example on how to access the token stream of the replacement token
>>> list of an identifier? Too many questions sorry.
>>>
>>> The language I am hoping to tokenize is SystemVerilog and has C-like
>>> preprocessor macros (`include, `ifdef, `define NAME(params,...), token
>>> concatenation, etc.).
>>>
>>> Regards,
>>> Martin
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>>  List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>>
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address