[antlr-interest] Ada Grammar Question

Mark Wright markwright at internode.on.net
Mon Jul 21 06:49:22 PDT 2008


Hello Joseph,

One idea to write disambiguating semantic predicates in the
target language that figure out the answer, and call them
in the lexer something like:

TIC: {isTIC(input)}? "\'";
CHARACTER_LITERAL: {isCharacterLiteral(input)}? "\'" . "\'";

I have never tried calling disambiguating semantic predicates in
a lexer.  I guess they work in the lexer, I am not sure.
It certainly seems different in the lexer, as I guess then you
are looking at individual characters (instead of tokens for
a dis-ambiguating semantic predicate used in the parser).

Then the disambiguating semantic predicates can do things like
the following to figure out the answers.  I don't know exactly
what you need to do as I am not familar enough with the ADA grammar.

- scan ahead matching parentheses and ' chars looking for dis-ambiguating
characters, like maybe the ; or I am not sure.

- if necessary it can call other functions, which possibly call themselves.
Dis-ambiguating semantic predicates can be implemented as little hand coded
recursive descent compilers, hopefully you don't need to do that in
a lexer.

On Sun, 20 Jul 2008 20:18:33 -0400
"Joseph Klumpp" <jklumpp0 at vt.edu> wrote:

> I have recently been updating the Ada grammar from
> http://antlr.org/grammar/ada/ada.g to Antlr v3.  In testing this
> grammar against the Ada Compiler test suite, I found that this grammar
> fails for very specific constructs - all related with the Ada TIC mark
> becoming confused with the CHARACHTER_LITERAL (or vice-versa).  The
> rules are duplicated here:
> 
> TIC    : { LA(3)!='\'' }?  '\''    ;
>         // condition needed to disambiguate from CHARACTER_LITERAL
> 
> 
> CHARACTER_LITERAL    : { LA(3)=='\'' }? // condition needed to
> disambiguate from TIC
>        "'" . "'"
> 
> 
> I rewrote these as:
> TIC: {LA(3) != '\''} => "\'";
> CHARACTER_LITERAL: {LA(3) == '\''}? => "\'" . "\'";
> 
> This works fine except for in constructs such as:
> VAR_1 := ArrayType'('a','b','c' => X, others => Y);
> 
> In these situations the open parenthesis would be considered a
> character literal and not a TIC mark, as it should be.  Any help in
> how I could differentiate between this mark and character literals
> would be greatly appreciated.


-- 


More information about the antlr-interest mailing list