[antlr-interest] Syntatic predicates...

Mark Lentczner markl at glyphic.com
Tue May 18 09:08:51 PDT 2004


> To put things in context: I am trying to convert the parts of the bnf
> description of the VHDL language into ANTLR. The BNF is full of
> ambiguities.

Ah, this does put things in context.  I looked a little at a VHDL BNF  
(the nicely done
	http://tech-www.informatik.uni-hamburg.de/vhdl/tools/grammar/vhdl93- 
bnf.html
) and I think I understand what you are up against a bit better.

I think there are three general sources of ambiguity in grammars:

1) The language is inherently ambiguous.  This is the kind of error my  
prior comments were directed at.

2) The language is ambiguous given only fixed, finite k tokens of look  
ahead.  This is kind of error that Terrence was talking about with C++  
declarations.

3) The language is fine, but the form of the grammar is ambiguous for  
fixed, finite k tokens of look ahead.  This is the problem the VHDL BNF  
has.  (Actually, it may suffer from the second type as well, but on  
casual inspection, I didn't see any of those issues.)

This third type of problem is something that has come up more than once  
in this forum.  BNFs in specifications are generally not designed for  
parsing.  They are designed for specification, and serve both pedagogic  
and  definitive roles.  Suitability for parsing isn't usually a goal.

For example, in the VHDL BNF we see (rewritten into Antlr syntax):

numeric_literal  : abstract_literal | physical_literal ;
abstract_literal : decimal_literal | based_literal ;
physical_literal : (abstract_literal)? unit_name ;

Unless you set the look ahead longer than the maximum number of tokens  
that can make up abstract_literal, this is ambiguous.  If  
abstract_literal can be an arbitrary number tokens long (which it is,  
if you follow the BNF faithfully, as you arrive at: "integer : digit (  
(underline)? digit )* ;", which is any number of digits), then no value  
of k works.  (For illustrative reasons I'm assuming that we can't  
recognize integer in the lexer, just making it one token in length.   
See below...)

But this form of expressing the grammar of the language isn't what a  
VHDL parser would want anyway.  It would be easier to use:

numeric_literal :
           abstract_literal (unit_name)?
         | unit_name
         ;
abstract_literal :
         decimal_literal | based_literal
         ;

In practice, once semantic actions are added, this looks like:

numeric_literal [returns Number n] :
     {Unit u;}
           n=abstract_literal ( u=unit_name {n.setUnit(u);} )?
         | u=unit_name                      {n.setUnit(u);  
n.setValue(1);}
         ;
abstract_literal [returns Number n] :
	    decimal_literal | based_literal
         ;

When converting a BNF such as VHDL to a working parser, there are two  
things that remove the ambiguities in the form of the BNF:

First is picking the right division between lexing and parsing: The  
proper division can make most parsing takes unambiguous with small  
fixed k look ahead.  For example, making integer above be a production  
of the lexer returning a single token as another way to resolve the  
issue.  (See the recent thread titled "Antlr noobie, nondeterminism  
abounds" in this list.)  But this isn't always possible for all  
productions.

Second is rewriting rules so that they can be parsed.  Often this means  
delaying the determining the distinction between two things (i.e.  
abstract_literal and physical_literal) until later in the token stream,  
rather than early on as the BNF does.

The VHDL BNF is big.  I can't say off hand if finding the proper  
lexer/parser division will solve most problems, or if you'll have alot  
of rule re-writing ahead of you.  Please report back on your progress:  
I'd love to know what worked in the end.

	- Mark

Mark Lentczner
markl at wheatfarm.org
http://www.wheatfarm.org/



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list