[antlr-interest] should sempred questions be asked on trailing optional tokens?
Mark Wright
markwright at internode.on.net
Thu May 8 09:39:19 PDT 2008
> >simple_type_specifier
> > : {sa.isUnsigned((TokenStream)input)}?
> > 'unsigned'
> > | {sa.isUnsignedInt((TokenStream)input)}?
> > 'unsigned' 'int'
> > | {sa.isSigned((TokenStream)input)}?
> > 'signed'
> > | {sa.isSignedInt((TokenStream)input)}?
> > 'signed' 'int'
> > | {sa.isInt((TokenStream)input)}?
> > 'int'
> > ;
>
> On Fri, 09 May 2008 01:43:14 +1200
> Gavin Lambert <antlr at mirality.co.nz> wrote:
>
> Just out of curiosity, what are these sempred functions actually
> examining?
Hello Gavin,
They just simply look at the tokens with a hand coded
finite state machine. Which is like a hand coded version
of the code that ANTLR normally produces with simpler
grammars.
If the rest of the grammar was simpler, then ANTLR would
have no problems generating a DFA for this entire rule, which
is larger than the fragment above.
> Is it just the one-or-two token lookahead sufficient
> to disambiguate these alts,
Yes, but it is the rest of the grammar surrounding it
with ambiguities that require lots of dis-ambiguating
semantic predicates to resolve, that causes ANTLR to go
back to k=1 lookahead on this rule, requiring dis-ambiguating
semantic predicates on this rule.
I think ANTLR is awesome that it can handle my grammar.
Its no bother for me to write the simple dis-ambiguating
semantic predicates that are required on this rule.
> or are you looking even further ahead
> or doing something more esoteric?
No, not on this rule.
There are other rules that scan further ahead, including
scanning past stuff in this rule.
> Because if it's just those, then I'd use a synpred instead -- I
> think it's cleaner:
>
> simple_type_specifier
> : ('unsigned' 'int') => 'unsigned' 'int'
> | ('unsigned') => 'unsigned'
> | ('signed' 'int') => 'signed' 'int'
> | ('signed') => 'signed'
> | 'int'
> ;
>
> Or even:
>
> simple_type_specifier
> : (('unsigned') => 'unsigned') 'int'?
> | (('signed') => 'signed') 'int'?
> | 'int'
> ;
Sure if synpreds are cleaner in your grammar, and they probably are
cleaner in lots of grammars, then it sounds like a great idea to use
synpreds. Using synpreds is probably more appealing in lots of
grammars than hand coding thousands of lines of dis-ambiguating
semantic predicates using little recursive descent compilers,
however for my grammar the latter seemed more appealing to me.
There are reasons I am not using synpreds:
* I need to look up stuff in symbol tables do resolve ambiguities,
hence dis-ambiguating semantic predicates are necessary.
* I would have needed to use nested backtracking (nested synpreds).
At the time I was concerned that nested backtracking may be difficult
to understand and debug.
* I think for my grammar that using dis-ambiguating sempreds results
in it looking cleaner than it would with a combination of
dis-ambiguating sempreds and synpreds.
* I was concerned that I would find it difficult to understand and
debug the ANTLR generated parser code using synpreds. When I pass
-debug to ANTLR, it encounters an error. The netbeans debugger can
not open my 2.7MB generated parser Java file. I thought it would be
easier to debug code that I wrote than to debug ANTLR generated
parser code.
* I think it is easier for me to understand, develop and debug it
where I just use dis-ambiguating semantic predicates for everything.
* The dis-ambiguating semantic predicates are in a functional decomposition
that matches the grammar rule, and they call each other. So while I
was developing this, my logic was: why introduce a synpred when I can
just continue writing more dis-ambiguating semantic predicates, including
little recursive descent compilers. After writing thousands of lines of
dis-ambiguating sempreds, I did not need any synpreds.
Earlier when I writing lots of sempreds, and avoiding synpreds,
I was concerned that nested synpreds may be slow. However I
do not know if they would be any slower than what I am doing.
There is no way I am going to write it again using synpreds
to find out.
> Although normally you shouldn't even need to use any predicates
> here at all; just order the alts correctly.
Thanks, I tried commenting the sempreds and re-ordering the alts
starting with the longest ones first, however this gave the
same number of unmatched alts as the original ad hoc order with
the sempreds commented.
> I know you said they
> were required for some reason in your larger grammar, but the only
> explanation for that I can imagine at the moment is if 'int' were
> a valid variable name (which seems like a silly thing to
> allow). Maybe there's another case I haven't thought of?
If I just comment the one isInt() sempred, ANTLR
still compiles it OK, so it does not need that one.
If I comment the others though, then ANTLR tells me it is ambiguous, with
errors like:
java -Xmx512m -classpath /h/goanna/2/eng/dev/tntdbo/java_src:/h/goanna/2/eng/dev/tntdbo:/h/goanna/2/ts/antlr/antlr-2008-05-07.18/lib/antlr-2008-05-07.18.jar:/h/goanna/2/ts/antlr/antlr-2008-05-07.18/lib/runtime-2008-05-07.18.jar:/h/goanna/2/ts/antlr/antlr-2008-05-07.18/lib/stringtemplate-3.1b1.jar:/h/goanna/2/ts/antlr/antlr-2008-05-07.18/lib/antlr-2.7.7.jar org.antlr.Tool -Xconversiontimeout 1200000 -report Tntdbo.g
ANTLR Parser Generator Version 3.1b1 (??) 1989-2007
warning(200): Tntdbo.g:866:9: Decision can match input such as "UNSIGNED" using multiple alternatives: 4, 9, 10, 14, 15, 16, 17, 18, 19
As a result, alternative(s) 9,10,14,15,16,17,18,19 were disabled for that input
Semantic predicates were present but were hidden by actions.
error(201): Tntdbo.g:866:9: The following alternatives can never be matched: 9,10,14,15,16,17,18,19
So of course I put the sempreds back in and its fine.
Thanks, Mark
More information about the antlr-interest
mailing list