[antlr-interest] nested parsing (BSDL)

Harald M. Müller harald_m_mueller at gmx.de
Tue Jan 1 05:57:40 PST 2008


A few more comments: 

[...]
> 
> > (but I  would definitely keep the multiple grammars apart).
> 
> I really do not want to keep the grammars separate.

Well, this seems to contradict what the language looks like. A parser
grammar, IMHO, is not the point to retrofit your wishes into the language:
If the language is as crooked as yours (if I understand you correctly), your
grammar should faithfully mirror this crookedness.
Only at the level of ASTs can you go for your "abstract syntax", which can
(and should) be as beautiful as possible.

The other possibility is that you have (or there is) a single-level sensible
grammar for that language which you can directly (not some parser with
re-feed/stream change concepts - that's NOT part of standard grammars
[contextfree, ENBF, whatever]!!). I doubt this, because whoever would have
written this grammar would have jumped out of the window and prayed for a
redesign of the language ...

Still, I do have an idea using the Emit() functionality of lexers to do a
sort of "shove those inner symbols up to the single parser" ... maybe I like
to try this also over my holidays :-)

> BSDL was 
> designed so 
> you could do either.  I want to logically keep the grammar in 
> something 
> resembling the form it really should have been in the first 
> place.  

As I said above, I think this is not a goal to be pursued: The
(lexer+parser) grammar should *describe* what is out there, not abstract
away from it into some wishful thinking direction.

> Understand what you are getting at, however, this is prohibitted
> in the language itself, fortunately.   Concatentation of litteral
> strings is the only form of expression allowed.   

... including white space and comments (as you examples show)! - so it's not
that trivial; and maybe there are preprocessor directives, which could also
crop up inside such a string etc.??? - and what else you have outside the
"conceptual grammar"?
At some point, my  3-pass suggestion will be easier than trying to write
some lexer-level machine which handles all those things at once.
But of course, this is hopefully only a horror scenario ...

> However, I think that parsing inside of strings has a lot of
> applications besides BSDL and needs to be supported and documented.

... therefore I tried to come up some an example code.
I hope I showed how a two-grammar (or N-grammar) machine can be done; I'll
think about a 1-grammar version a little ...

> Needing, for example, 20 different grammars because you have 
> 20 different
> string types, though, would leave a lot to be desired.

In all cases I know (printf or other formatting grammar inside strings;
regex grammar inside strings; Javadoc inside /** comments; C#'s XMLdoc
inside /// comments), it is actually *necessary* to have different grammars
- even the tokenization is different (think of printf vs. C).

You case of a language with nested strings where much of the core
tokenization inside and outside is the same is, I would risk to state, a
very odd example with almost no parallels anywhere else.

Regards
Harald



More information about the antlr-interest mailing list