[antlr-interest] How can I modify the text of certain tokens in a CommonTokenStream?

Wed Feb 10 12:57:38 PST 2010

Embed code in the lexer rules directly. However I think that what is happening here is that the toString is probably just a substring of the original input text and not the amalgamation of the token texts. Rather than toString() you might just sout() the token text as you go and if type 4 just sout("V") instead.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Matthew McDole
> Sent: Wednesday, February 10, 2010 12:37 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] How can I modify the text of certain tokens
> in a CommonTokenStream?
> 
> Hello Everyone!
> 
> I'm new to ANTLR but I'm trying to learn some of the basics and use it
> in a project for school.  I've downloaded the java.g grammar,
> generated the lexer and now I'm using the lexer in program.
> 
> This is all working fine, however, I'm trying to figure out a way I
> can modify the text of certain tokens in my CommonTokenStream.
> 
> For example, I tried:
> 
> import org.antlr.runtime.*;
> import java.util.*;
> 
> public class LexerTest
> {
>     public static final int IDENTIFIER_TYPE = 4;
> 
>     public static void main(String[] args)
>     {
>     String input = "public static void main(String[] args) { int myVar
> = 0; }";
>     CharStream cs = new ANTLRStringStream(input);
> 
> 
>         JavaLexer lexer = new JavaLexer(cs);
>         CommonTokenStream tokens = new CommonTokenStream();
>         tokens.setTokenSource(lexer);
> 
>         int size = tokens.size();
>         for(int i = 0; i < size; i++)
>         {
>             Token token = (Token) tokens.get(i);
>             if(token.getType() == IDENTIFIER_TYPE)
>             {
>                 token.setText("V");
>             }
>         }
>         System.out.println(tokens.toString());
>     }
> }
> 
> 
> My goal is to set the text of each token which is of type "4"
> (Identifier), to the string literal "V".  However, my changes to the
> token stream via setText() are not preserved.  When I call
> tokens.toString(), the text remains what it previously was.  I assume
> this is because I'm calling setText() on a copy of the tokens, and not
> the tokens themselves.
> 
> One thing that is important to me, is I want the tokens start and end
> character positions to be reflective of the original source text, not
> what they would be after I modified them by changing all identifier
> tokens text to "V".  This is why I thought doing it at run-time made
> sense, rather than try and do it via the grammar.
> 
> What is the solution to doing something like this the proper way?
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address