[antlr-interest] Island Parsing - a different way, seems to work

Mon Jun 18 00:27:35 PDT 2007

Oooops!  I hit enter before I was ready to send this.. .let me clarify
a bit more.

protected Tree parseCFScript(Token start, ParserRuleReturnScope stop)
       {

//-- this is just to get the tokens I need to get the String, once I
have my String, it doesn't really matter.
org.antlr.runtime.BitSet bit = new org.antlr.runtime.BitSet();
               bit.add(OTHER);
               List otherTokens =
((CommonTokenStream)input).getTokens(start.getTokenIndex(),
stop.stop.getTokenIndex(), bit);

               StringBuffer buffer = new StringBuffer();

               for(Object t : otherTokens)
               {
                       buffer.append(((Token)t).getText());
               }

//now I have my string, I simply pass it off to my StringStream, here
I have a custom one that doesn't check case.
               CharStream input = new
ANTLRNoCaseStringStream(buffer.toString());
       CFScriptLexer lexer = new CFScriptLexer(input);

       CommonTokenStream tokens = new CommonTokenStream(lexer);
       CFScriptParser parser = new CFScriptParser(tokens);

       try
       {
               CFScriptParser.script_return root = parser.script();
               Tree ast = (Tree)root.getTree();
               return ast;
       }
       catch(RecognitionException exc)
       {
               //this is just my custom error reporting.
               ErrorEvent event = new ErrorEvent(exc, "CFScript Error");
               getObservable().notifyObservers(event);
       }

               //and failing all else, return null, which does nothing
               return null;
       }

I honestly, expected that this would create the right AST, but with
Tokens on the wrong lines, however, the island tokens seems to
automagically know what line they are on in the grand context of the
grammar.

This seems like a *much* simpler way to do island parsing, but have I
missed somethng crucial somewhere, that I'm not aware of?

Thanks,

Mark

On 6/18/07, Mark Mandel <mark.mandel at gmail.com> wrote:
> Hey all,
>
> I've been playing around with Island Parsing, and I think I've come up
> with a much simpler way of doing it other than the one that is in the
> wiki
> (http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control)
>
> I wanted to run it past you, in case there is something that I have missed.
>
> I'll chop out a lot of the extraneous code around what I'm doing, so
> hopefully I don't break it in the process.
>
> I needed to be able to do some island parsing, simply where I just had
> a string to parse, and  wanted to be able to insert the island grammar
> tree into my current AST.
>
> The code ended up looking pretty much like this:
>
> startTag
>         :
>         (
>         sto=START_TAG_OPEN stc=START_TAG_CLOSE  tc=tagContent
>                 (
>                 -> ^(CFTAG[$sto] START_TAG_CLOSE
>                                                 {
>                                                         parseScript(stc, tc)
>                                                 }
>                                                   tagContent)
>                 )
>         )
>         ;
>
> The only issue I had here, was that I couldn't use $tc, because for
> some reason ANTLR couldn't recognise it - so I just set it explicitly,
> and everything seemed happy.
>
> From there, I was able to write my own parseScript function that
> returns a CommonTree,
>
> protected Tree parseCFScript(Token start, ParserRuleReturnScope stop)
>         {
>
> org.antlr.runtime.BitSet bit = new org.antlr.runtime.BitSet();
>                 bit.add(OTHER);
>                 List otherTokens =
> ((CommonTokenStream)input).getTokens(start.getTokenIndex(),
> stop.stop.getTokenIndex(), bit);
>
>                 StringBuffer buffer = new StringBuffer();
>
>                 for(Object t : otherTokens)
>                 {
>                         buffer.append(((Token)t).getText());
>                 }
>
>                 CharStream input = new ANTLRNoCaseStringStream(buffer.toString());
>         CFScriptLexer lexer = new CFScriptLexer(input);
>
>         CommonTokenStream tokens = new CommonTokenStream(lexer);
>         CFScriptParser parser = new CFScriptParser(tokens);
>
>         try
>         {
>                 CFScriptParser.script_return root = parser.script();
>                 Tree ast = (Tree)root.getTree();
>                 return ast;
>         }
>         catch(RecognitionException exc)
>         {
>                 ErrorEvent event = new ErrorEvent(exc, "CFScript Error");
>                 getObservable().notifyObservers(event);
>         }
>
>                 return null;
>         }
>

-- 
E: mark.mandel at gmail.com
W: www.compoundtheory.com