[antlr-interest] Merging token vocabularies

Fri Jun 25 12:45:01 PDT 2004

On Jun 25, 2004, at 12:15 AM, Anthony Youngman wrote:

> Hmmm...
>
> Suggestion for Ter. I don't know whether he uses a 16- or 32-bit  
> integer
> for his tokens, but might it not be possible to add a "token-space"
> directive?

Possibly.

> The idea basically being that either Antlr uses a 32-bit integer, which
> is initialised with the high 16 set to the "token space" value, or you
> can tell Antlr where to start counting for user-space tokens (don't
> forget 0-3 is reserved or something).
>
> If you need to merge token lists, that would probably be a good way to
> do it ...

Might be able to simply do it with token spaces at the ID level like  
Java.FOR and C.FOR or something.

Ter

>
> Cheers,
> Wol
>
> -----Original Message-----
> From: FranklinChen at cmu.edu [mailto:FranklinChen at cmu.edu]
> Sent: 24 June 2004 20:53
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] Merging token vocabularies
>
> I think I need one huge token vocabulary in order to avoid overloading
> integers when embedding ASTs, as I mentioned.
>
> Suppose (for illustration only; this is nothing like what I'm actually
> doing) I want to process a file that looks like
>
> ===
> This is a document with C code and Java code:
> <c>
> class Bar {};
> </c>
> Java is here.
> <java>
> public class Foo {};
> </java>
> ===
>
> Presumably, I want to build an AST that looks like
> (DOCUMENT (TEXT "This is a document with C code and Java code:\n")
>   (C (CLASS (NAME "Bar") ...))
>   (TEXT "\nJava is here.\n")
>   (JAVA (CLASS (NAME "Foo") ...)))
>
> and then use a tree walker to do my work.
>
> Assume that I have an independently created C parser and Java parser
> that I want to just use.  I can't just use in my own parser
>
> options {
>   importVocab = CParser;
>   importVocab = JavaParser;
> }
>
> so I have to do something else.  As I mentioned, what I am doing is
> generating a CommonTokenTypes.txt by parsing the CParserTokenTypes.txt
> and JavaParserTokenTypes.txt and merging the vocabularies, and then
> importing Common back into CParser and JavaParser and regenerating.
> This seems to be exactly what you are proposing:  ensuring that a
> Common vocabulary be imported!  Or am I misunderstanding you?
>
>
> "Don Bradshaw" <don.bradshaw at quipoz.com> writes:
>> Here goes, I'm fairly new to ANTLR so I would appreciate feedback from
>> the regulars if my advice is floored.
>>
>> Even though it works, I agree that one huge Token vocabulary with all
>> possible tokens across all languages is a bit of a nightmare. The main
>> hassle would actually be managing the tokens to ensure that the same
>> concepts had the same name and that you don't start duplicating.
>>
>> However, depending what you are trying to achieve, I still believe
> that
>> it may be worth while importing a file that has common concepts
> defined,
>> eg. IDENTIFIER, LOOP, STATEMENT etc. Things that are common across all
>> languages involved.
>>
>> Anyway, it is possible to translate a tree from one vocab to another.
>> The "int" types that ANTLR uses are really for internal purposes. You
>> can recurse through a tree and change the types to a difference
>> vocabulary.
>>
>> This can be done in one of two ways.
>>
>> 1) Use the TokenNames array accessible from a parser/walkers. Lookup
> the
>> array using the int type you want to change, get its name, then search
>> the array of the target by name, get its int type, then update the
> AST.
>>
>> 2) Do the same as 1, except use reflection against the
>> xxxxTokenType.java interfaces instead of using the TokenNames arrays.
>>
>> Ofcourse, both token vocabularies must atleast overlap for the types
>> found in the tree.
>>
>> Regards,
>> Don.
>
> --  
> Franklin
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
> *********************************************************************** 
> *****
>
> This transmission is intended for the named recipient only. It may  
> contain private and confidential information. If this has come to you  
> in error you must not act on anything disclosed in it, nor must you  
> copy it, modify it, disseminate it in any way, or show it to anyone.  
> Please e-mail the sender to inform us of the transmission error or  
> telephone ECA International immediately and delete the e-mail from  
> your information system.
>
> Telephone numbers for ECA International offices are: Sydney +61 (0)2  
> 8272 5300, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and  
> New York +1 212 582 2333.
>
> *********************************************************************** 
> *****
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/