[antlr-interest] Merging token vocabularies

Thu Jun 24 12:52:51 PDT 2004

I think I need one huge token vocabulary in order to avoid overloading
integers when embedding ASTs, as I mentioned.

Suppose (for illustration only; this is nothing like what I'm actually
doing) I want to process a file that looks like

===
This is a document with C code and Java code:
<c>
class Bar {};
</c>
Java is here.
<java>
public class Foo {};
</java>
===

Presumably, I want to build an AST that looks like
(DOCUMENT (TEXT "This is a document with C code and Java code:\n")
  (C (CLASS (NAME "Bar") ...))
  (TEXT "\nJava is here.\n")
  (JAVA (CLASS (NAME "Foo") ...)))

and then use a tree walker to do my work.

Assume that I have an independently created C parser and Java parser
that I want to just use.  I can't just use in my own parser

options {
  importVocab = CParser;
  importVocab = JavaParser;
}

so I have to do something else.  As I mentioned, what I am doing is
generating a CommonTokenTypes.txt by parsing the CParserTokenTypes.txt
and JavaParserTokenTypes.txt and merging the vocabularies, and then
importing Common back into CParser and JavaParser and regenerating.
This seems to be exactly what you are proposing:  ensuring that a
Common vocabulary be imported!  Or am I misunderstanding you?

"Don Bradshaw" <don.bradshaw at quipoz.com> writes:
> Here goes, I'm fairly new to ANTLR so I would appreciate feedback from
> the regulars if my advice is floored.
> 
> Even though it works, I agree that one huge Token vocabulary with all
> possible tokens across all languages is a bit of a nightmare. The main
> hassle would actually be managing the tokens to ensure that the same
> concepts had the same name and that you don't start duplicating. 
> 
> However, depending what you are trying to achieve, I still believe that
> it may be worth while importing a file that has common concepts defined,
> eg. IDENTIFIER, LOOP, STATEMENT etc. Things that are common across all
> languages involved.
> 
> Anyway, it is possible to translate a tree from one vocab to another.
> The "int" types that ANTLR uses are really for internal purposes. You
> can recurse through a tree and change the types to a difference
> vocabulary. 
> 
> This can be done in one of two ways. 
> 
> 1) Use the TokenNames array accessible from a parser/walkers. Lookup the
> array using the int type you want to change, get its name, then search
> the array of the target by name, get its int type, then update the AST. 
> 
> 2) Do the same as 1, except use reflection against the
> xxxxTokenType.java interfaces instead of using the TokenNames arrays. 
> 
> Ofcourse, both token vocabularies must atleast overlap for the types
> found in the tree. 
> 
> Regards,
> Don.

-- 
Franklin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/