[antlr-interest] Can one identify the type of parser needed for a given BNF grammar

Douglas Godfrey douglasgodfrey at gmail.com
Mon Jul 11 16:33:01 PDT 2011


If the SymbolTable is not built by the Parser then you cannot generate the
correct AST
for a class definition vs a function call. Also you cannot generate prompt
errors close to
the point where the error occurs in the parse. 90+% of all semantic
ambiguities can be
resolved by a SymbolTable in the Parser. Most C++ programs will not have any
statements
that cannot be resolved inline in the parser and would need to be resolved
in a later AST
TreeParser phase. The 90% case should not pay the compile time cost of a
separate
TreeParser just to resolve ambiguities that can be handled by the Parser.

No feedback is required for the Lexer. The Lexer can resolve all
pre-processor statements
before the Parser is invoked. The Lexer uses it's own primitive SymbolTable
to handle
text substitutions and macros. Again, 90+% of all C++ programs do not use
any complex
macros where a second lexer pass would be needed. For simple includes and
text
substitutions the Lexer can handle them inline. In 10% of the cases The
Lexer needs 2
passes where the first pass processes all includes, macros and substitutions
and the
second pass generates the token stream for the Parser.

On Mon, Jul 11, 2011 at 5:16 PM, Loring Craymer <lgcraymer at yahoo.com> wrote:

> On 3.) :  The parser just recognizes syntax and ignores semantic
> ambiguities;
> then in a first tree walker pass, the symbol tables are constructed; a
> second
> tree walker pass uses the symbol table information to resolve ambiguities
> (which
> of several interpretations of valid syntax is correct) and rewrites the
> tree
> into an unambiguous form.  Then you have finished recognizing C++ and can
> continue with further processing.  Never, never, never would I suggest
> adding a
> feedback mechanism that couples parser to lexer.
>
> --Loring
>
>
> ----- Original Message ----
> > From: The Researcher <researcher0x00 at gmail.com>
> > To: antlr-interest at antlr.org
> > Sent: Mon, July 11, 2011 12:55:05 PM
> > Subject: Re: [antlr-interest] Can one identify the type of parser needed
> for a
> >given BNF grammar
> >
> > >
> > > Loring,
> >
> >
> > Thanks for the feedback.
> >
> > While I now can  understand your answer, I still don't have enough
> > experience  to  implement any approach correctly.
> >
> > As a help to others who are working on  this in private or interested in
> > creating a C++  parser I will gloss  what Loring said because last week
> it
> > would have mostly been Greek to me. If  I say something wrong, please
> > correct. I am not afraid to make mistakes in  public so that others may
> learn
> > from them.
> >
> > The big thing that one has  to learn about parsing C++ is how to handle
> the
> > ambiguities. That is the  dragon to be slayed here.
> >
> > 1. And then you have to figure out how to prune  the GLR-generated
> "forests".
> >
> > GLR http://en.wikipedia.org/wiki/GLR_parser
> >
> > The reason GLR is considered  for C++ is that do to the ambiguities of
> C++
> > most parsers in a typical  fashion have to couple the feedback from the
> > parser to the lexer and  reference a symbol table to correctly parse C++.
> > This is contrary to what is  taught in school and most of the early
> examples
> > of parsing you will come  across. This allows them to produce an AST
> without
> > ambiguities after the  parser. This is pushing the work to the front. GLR
> is
> > one of the most  powerful parsing strategies and can parse C++ without
> > coupling the parser to  the lexer, but does so by generating multiple
> > subtrees for each ambiguity,  for which you must later prune out the
> > ambiguities from the forests. This is  pushing the work to the back.
> >
> > 2. C++ is nasty; it can be parsed with  ANTLR (as shown by NEXT and David
> > Wigg's
> > adaptions of that  grammar).
> >
> > The ANTLR C++ grammar does so by creating a symbol table and  using
> > predicates during the parse. In my initial conversions of the ANTLR  C
> > grammar to C# I was able to convert it, but did not understand why it
>  did
> > certain things at first. After researching it was clear that a lot of
>  the
> > ambiguities of C++ arise when an name is encountered and then it's use (
>  a
> > type, an identifier, etc)  has to be know in order to take the
>  appropriate
> > branch. Thus the creation of the symbol tables and other tables  and the
> use
> > of predicates. This pushes the work to the front.
> >
> > 3. I  believe that the right strategy with ANTLR is actually to use
> > multi-pass  recognition to sort out the ambiguities.  That has not been
>  done
> > yet.
> > The problem is that C++ cannot be adequately described with a
>  context-free
> > grammar; you have to do some context-sensitive processing to  resolve the
> > syntax
> > that is semantically ambiguous.
> >
> > It could/should  be possible to parse C++ with ANTLR without a symbol
> table
> > and associated  predicates, and then in the AST analysis verify that the
> > input C++ syntax is  valid C++ semantics.
> > Remember that the BNF for a grammar does not guarantee  valid
> code/semantics
> > only correct syntax. So the AST is analyzed to find  invalid semantics. I
> am
> > not sure if this will require coupling the parser to  the lexer or the
> AST to
> > the parser, but if I go this route I will soon  know.
> >
> > I know this went off topic but was worth it.
> >
> > Any answer to  the original question is still  appreciated.
> >
> > Thanks
> >
> > Eric
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> >http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list