[antlr-interest] parsing BSDL strings and "nested" parsers

Fri Apr 12 14:33:57 PDT 2002

I am working on a public domain BSDL parser and I would like some 
advice on how to handle strings. Basically, there is a string type 
that is lexically much like a C string (double quotes, can be 
concatenated with a '&').

The thing is that the contents of the strings have semantic meaning 
that can become part of the AST (the contents of the strings have 
EBNF and must be parsed in order to generate a complete BSDL tree).

I can't figure out how to handle this if I am to have a single tree 
generating parser, then the lexer stage or stages must somehow 
concatenate the string (step 1) and then lexically analyze that 
concatenation (step 2). I'm pretty sure that this can't be done with 
one lexer, as I can't see how the lexer could concatenate this string 
and then use this concatenated thing as the source of characters for 
forming tokens. So it would involve a multiple lexing steps. I 
realize that ANTLR has nice interfaces to support its "token stream 
pattern" but I can't figure out how this is going to help me here.

The only workable way that I have thought out is to use nested 
lexer/parsers to process the strings. So the main lexer would pass 
the string fragments and the main parser would concatenate the 
string. Then, when a parser rule in the main parser could simply 
reference the string rule. The parser rule's action would create a 
new lexer that would pass the string through an istringstream and a 
new miniparser that was just good for the specific grammer in the 
string. With BSDL I would need about 8 or 9 of these minigrammars. 
The tree generated by this miniparser could be directly grafted on to 
the main AST.

So basically my question is if this idea of having a main parser 
actions spawn new lexer/parser pairs to process its "parsed out" 
rules is a good pattern for my problem, or is there a nicer way.

All ideas are appreciated.

Thanks,
JKL

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/