[antlr-interest] ANTLR / Generic parser / AST

Thu Oct 17 11:05:43 PDT 2002

Hi Xavier,
	I would recommend not putting whitespace and comments into your AST,
because then if you manipulate the AST you have to figure out what to do
with the whitespace and comments.  Instead I recommend keeping whitespace
around separately so that AST nodes can mapped to their original positions
in the input file and then the whitespace and comments (anything that is
ignored) can be found to be copied into the output.
	So visualize your input text file.  Now mark every token that
becomes an AST node yellow.  Everything else is what is ignored--either by
the lexer or parser.  Now to do an identity transformation just make sure
that between emitting any two AST nodes you also emit what is inbetween.
When you start modifying ASTs then you have to figure out what makes sense
to preserve.  For instance if you delete a whole statement then you will
want to be smart enough not to emit the leading whitespace or the semicolon
and newline following it.  If a whole subtree got moved to another place in
the file then you have to handle the boundary conditions similarly.  Adding
new AST nodes requires the emitter to be smart enough to put the appropriate
whitespace around it--i.e. knowing to end a statement with a semicolon and
newline.
	If you get into C type preprocessing then preserving the #line
directives becomes important and difficult.  Well, it's difficult to do it
right with the GCC extensions because they keep a whole stack of #line
directives, one for each #include.
	I'd be happy to work on this as a consultant too.  See my webpage
for details.

Monty

www.codetransform.com

> -----Original Message-----
> From: xavier.huet at infineon.com [mailto:xavier.huet at infineon.com]
> Sent: Thursday, October 17, 2002 10:31 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] ANTLR / Generic parser / AST
> 
> 
> 
> 
> Hi there,
> 
> 
> We try to have kind of generic parser ie for example a verilog (or 
> vhdl or anything you can imagine) parser Engine that every client can
> configure for their own needs.
> 
> We arrived to the fact that we should right our own AST 
> structure and to
> have a factory in our parser to create it. (factory then can 
> be defined by
> our clients to built their own kind of objects , thought 
> deriving from Our
> base AST classes). So it looks like yours.
> 
> Calc.g
> 
> assignment returns [pNode asgn]
> {
>  pNode e1 = NULL;
>  pNode e2 = NULL;
>  pNode id = NULL;
> }
>  : e1 = expr { asgn = e1; }
>  | id = identifier ASSIGN e2 = expr
>  { asgn = factory.build_binary( n_assign, id, e2 ); }
>  ; 
>   
> This method is fine for a client that need to some "compile" 
> stuff. But what
> if one client wants to filter the input file only by 
> prefixing the module
> name. The output should be exactly the same (including whitespaces,
> comments, parenthesis, brackets, semi colon) except the 
> module name though.
> With the custom tree, we loose a lot of information. Do you have any
> solutions ? Or ideas ?
> 
> Also we may have some clients that would like a mix of 2 ie 
> building a AST.
> Evaluate some of the node and replace, in the original file, 
> the nodes by
> the evaluation's result.
> 
> For example , if the input looks like this :
> 
> ## This is a comment ##
> a = 1 + 2 ;
> ## This is another comment ##
> b = 2*a + 3 ;
> 
> the client would like , to evaluate the expression and 
> replace in the input
> by the results but all others
> stuff should not be skip.:
> ## This is a comment ##
> a = 3;
> ## This is another comment ##
> b = 9;
> 
> 
> Thank you in advance for your help,
> 
> ~XAvier
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/ 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/