[antlr-interest] Multipass parsing

Sun Mar 28 16:11:52 PST 2004

> >>>>> "FranklinChen" == FranklinChen  <FranklinChen at cmu.edu> writes:
> [...]
> 
> > What's currently the best way to do multipass parsing (while retaining
> > all location information so that error messages are informative)?
> 
> > Basically, I would like to be able to evolve a parser by starting from
> > recognizing and processing a coarse-grained structure and then refining
> > it, e.g., suppose I have a language that can be thought of at a first cut
> > as consisting of space-delimited tokens, and then I wish also to parse
> > within those tokens eventually.
> 
> For your example, with Antlr, you can write a separate grammar for each of
> the languages and then invoke the appropriate grammar at the appropriate
> spot.

Can you sketch out sample grammars and code that would handle what I
posted?  Seriously.  I've read the entire ANTLR documentation site and the
jGuru FAQ site and the source code for all the examples (for Java
generation), and I cannot offhand come up with what I am looking for.

> However, given your example, I don't understand *why* you'd want to do
> "nested parsing".  The main reason to do nested parsing is because you have
> dynamically changing/configurable sub-languages.  If you don't need that
> kind of pluggability, it's much easier to write a lexer and a parser do the
> minimum necessary to reliably build up a tree and then do your multiple
> "refinement" passes over the tree (rather than trying to do all of the
> stuff in the parser).
> 
> Or do you have some other need/constraint on the sub-languages?

Just a few observations:

- As my toy example tries to indicate, "-" can mean three
  different things, depending on the context 
- The whitespace first pass is important, because I want to pass
      grape[fruit]
  completely differently from
      grape [fruit]
- I want to be able to deal with unexpected input at fine granularity
  and recovery, to minimize parse failures, e.g., if I see
      1+2* is 7
  I want to not go below the "word" level in the first "token" "1+2*"
  because I don't want to try to parse an expression and fail.

Basically, imagine parsing English from random Web pages, with the
idea of getting as much information as possible.  You wouldn't want to
come up with a big monolithic grammar for English, because it will be
horribly ambiguous and/or not cover everything one might want down to
the lowest levels of detail or be designed to enable local error
recovery.

-- 
Franklin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/