[antlr-interest] Re: ANTLR 3 codegen (was: Enhance ANTLR support for comments?)

Sat Jul 19 07:19:10 PDT 2003

> Terence Parr <parrt at c...> wrote:
> 
> On Friday, July 18, 2003, at 01:27 PM, micheal_jor wrote:
> > One complicating issue is that depending on the constraints of
> > execution environment, I might want a compact or even more compact
> > representation of these. For instance, I might want to shoehorn 
both
> > the line and column numbers into a single 32-bit integer (16:16 or
> > 24:8 split) or leave them as two separate integers.
> >
> > Not sure how ANTLR 3 can support such scenarios easily.
> 
> One of the thoughts we had in the cabal was that you would specify 
the 
> token attributes in an ANTLR formalism and the code generator would 
be 
> able to decide how to encode in the target language.  For example, 
you 
> might do
> 
> token {
>    // text and type predefined perhaps
>    int start;
>    int stop;
>    String filename;
> }

In my mental model, there are perhaps four issues involved here:

1) Should the attributes listed above be supported as standard for 
*all* ANTLR tokens and AST nodes?

==> "Yes" would be my answer on this. I can't think of a project 
where I haven't needed this. "filename" might 
become "resourceLocation" or similar if support for sources other 
than files (e.g zip archives, urls etc) is added.

2) How can ANTLR [grammars] be extended to support declarative custom 
AST-node-attributes specification in a language-neutral manner?

==> Have to think about this a bit. Support for both homogenous and 
heterogenous trees makes this a little tricky.

3) Should ANTLR support custom token-attributes and how?

==> I haven't needed to do this except to add filename/line/col info. 
What do people think?.

4) How can we ensure that implementation decisions like "should I 
store line/col info in two 32-bit ints or a single int?" are properly 
left to ANTLR codegens?

==> This really needs head banging together to thrash stuff out. 
Would we end up with a set of interfaces each for OO, IMP, FP etc 
language families?

> At NeXT I had to encode token type and line number into a 32 bit 
int, 
> but in other cases it had to be an object.  The code generator 
could 
> generate either depending on options and what attributes you had in 
> there.

Cool. Seems you already have a jump on (4) above.

> Just some thoughts we had.  We're thinking about language 
independence 
> pretty heavily since I expect to make building a code generator for 
> ANTLR pretty easy.

Hopefully not so easy that the codegens aren't able to make often 
drastic implementation decisions as above. Actually, we could have a 
two-tier system:
TIER-1: The set of codegen interfaces that result from (4) above 
would support the development of fully integrated ANTLR codegens that 
require more work to build but in return produce the 
fastest/smallest/tighest[/prettiest?] code.
TIER-2: The intermediate form that you describe below on the other 
hand could allow anyone to build a very decent codegen in record time.

> Lots of back ends will appear I hope.  I'm going to 
> go so far as to have a text-based intermediate form (if wanted) so 
that 
> you don't even have to build the code generator in ANTLR.  You 
could 
> build the python code generator in python for example as it's just 
> reading a text file with all the "hard parts" filled in :)
> 
> Terence

Cheers,

Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/