[antlr-interest] Trying to keep whitespace in an AST

Jamie Penney jpen054 at ec.auckland.ac.nz
Fri Feb 8 16:48:33 PST 2008


Thanks for all the replies everyone. I will give the off channel idea a

shot and see if it is feasible. I am working on a simple proof of 
concept grammar
over the weekend so I will report back with how difficult it is to 
associate the whitespace/comments with the correct node.

Thanks
Jamie

Jim Idle wrote:
> Well, remember that the AST is, err abstract ;-). It is just a construct 
> made from the token stream that you parsed. The parser skips tokens that 
> you create "off-channel", such as comments:
>
> COMMENT: '//' ~NL*  { $channel = 2; } ;
>
> Now, when you walk you AST and find a method, you just need the token 
> index of the start sequence of your method declaration (this of course 
> depends on the language). Then you can traverse backwards in the token 
> stream (the stream you passed to the parser, mostly CommonTokenStream) 
> for that index, and pick up any off-channel tokens that were ignored by 
> the parser. If your common token stream is called tstream, then:
>
> tstream.get(index) will return the token at index n, whether it is on 
> the parsing channel or not. There is also tstrem.getRange(.., which will 
> return a List of the tokens in a range, whether on channel or off 
> channel.
>
>
> So, you hit the 'method' keyword/node/token and find out its index (or 
> the index of a real token rather than an imaginary one perhaps). Then 
> you traverse back through the stream until some trigger point such as 
> the first on-channel token before the comments or something. Only you 
> can know exactly where you start and stop, and the problem of 
> associating comments with the correct syntactical element is a thorny 
> one!
>
> Jim
>
>   
>> -----Original Message-----
>> From: Jamie Penney [mailto:jpen054 at ec.auckland.ac.nz]
>> Sent: Thursday, February 07, 2008 7:51 PM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Trying to keep whitespace in an AST
>>
>> Hi all,
>> I am trying to work out how to create a grammar that will build an AST
>> that keeps both comments and some whitespace. Basically the output 
>>     
> will
>   
>> be formatted code, but we need the semantic information provided by 
>>     
> the
>   
>> AST for other parts of the system. Any comments and blank lines need 
>>     
> to
>   
>> be kept in the output code. Is it possible to have rewriting and AST
>> generation turned on at the same time, or do I have to write two
>> separate grammars? I am new to ANTLR so sorry if I have the wrong idea
>> about anything.
>> To give a concrete example, say I have a language that represents 
>>     
> basic
>   
>> C style statements like so:
>>
>> int a    = 0;
>> int b    = 1;
>> int c    = 2;
>>
>> // reassign a
>> a = b + c;
>>
>> What I need is the semantic information provided by an AST (whether a
>> statement is a declaration, assignment, ect), but I need to transform
>> the language partially too. I need to format the individual elements
>> consistently, so each would be of the form a = b + c; but I also need
>> to
>> retain the newlines and comments between elements.
>>
>> If anyone could point me in the right direction I would be very
>> grateful.
>>
>> Thanks,
>> Jamie Penney
>>     
>
>
>   

-- 
Jamie Penney

http://www.jamiepenney.co.nz



-- 
Jamie Penney

http://www.jamiepenney.co.nz


More information about the antlr-interest mailing list