[antlr-interest] Error I can't get my head around at all...

Tue Mar 3 08:57:16 PST 2009

Sam Barnett-Cormack wrote:
> Hi all,
>
> I've wracked my brains on this one, and really can't see what's going on.
>
> I've got one error in my grammar, and I just can't see how it's come 
> about. I've been listening to everything people have said, I've been 
> trying to apply what I've learned, most notably drilling into rules 
> and expanding out to see why it says a choice is unreachable, but it 
> doesn't make sense.
>
> The rule with the error is on line 300:
>
> globalModuleReference : modulereference (objectIdentifierValue | 
> definedValue)? ;
>
> With the error:
>
> (201): The following alternatives can never be matched: 3
>
> Accompanying warnings:
>
> (200): Decision can match input such as "CAPID" using multiple 
> alternatives: 2, 3
> As a result, alternative(s) 3 were disabled for that input
>
> (200): Decision can match input such as "LCID" using multiple 
> alternatives: 2, 3
> As a result, alternative(s) 3 were disabled for that input
>
> The problem is, neither objectIdentifierValue is nillable, 
> objectIdentifierValue must start with '{', and defined value can start 
> with CAPID or LCID. So where's the ambiguity? It's got a choice of 
> matching '{', CAPID, LCID, or stopping for this rule.
>
> Hoping someone can shed some light on this before I get too much of a 
> headache...
I was trying to look at your grammar for you but the one you have 
attached is missing some token defs:

...
[08:08:06] warning(105): ASN_1.g:456:7: no lexer rule corresponding to 
token: BSTRING
[08:08:06] warning(105): ASN_1.g:273:24: no lexer rule corresponding to 
token: NUMBER
[08:08:06] warning(105): ASN_1.g:420:16: no lexer rule corresponding to 
token: NUMBER
[08:08:06] warning(105): ASN_1.g:621:9: no lexer rule corresponding to 
token: NUMBER
[08:08:06] warning(105): ASN_1.g:465:30: no lexer rule corresponding to 
token: HSTRING

...lots of these. You won't get anywhere until you have defined all the 
tokens you are trying to use in the parser.

If you can post one that has all the tokens defined correctly, I will 
try to help. Here's some more tips:

Take your tokens out of the tokens section and make them real rules, as in:

 BY='BY';

Becomes a keyword token in the lexer def:

// TOkens...
//
 BY:  'BY';

etc..

// Existing lexer

Then resolve your token confusion - I think that you need:

fragment NUMBER:;
fragment HSTRING:;

To define those token, but you could rename NUMBERf to be NUMBER;

You are trying to use backtrack in a lexer rule, and that does not work, 
backtrack is for parser. However you are also using skip() in CSTRING 
(which will drop the whole token, then setting $channel = DEFAULT; I 
can't tell what you are trying to do there, but the rule is incorrect. 
Are you trying to remove the opening and closing " marks? skip() means 
don't produce a token, not skip the current character.

Format your rules so that the () : | and ; align. This will make sure 
you don't merge alts by accident and means you can follow the alts with 
your eyes much easier.

Once I add the missing fragments, then I reproduce your error. However, 
I think that your CSTRING lexer rule is neither matching what you think 
it is, nor will return the text you want.

With formatting, it looks like this:

fragment
CSTRINGNL : WSNONL* NL WSNONL*;

CSTRING
    : '"'
      (
                   (WS)=> (      (CSTRINGNL)=>CSTRINGNL 
                           | WSNONL+
                         )
               | '"' '"'
               | ~'"'
          )*
        '"'
       {$channel=DEFAULT;}
       ;

Which means that you second predicate (CSTRINGNL)=> does not 
disambiguate anything and the first WS is redundant. I think I see what 
you are trying to do:

CSTRING
    :    '"'
            (
                  (CSTRINGNL)=>CSTRINGNL
                | ('""')=>'""'
                | ~'"'
            )*
        '"'
    ;

Next, your type rule. As mentioned in prior responses, this is happening 
because the normative grammar does not specify how to write a parser and 
so you have dangling ambiguity with the optional constraint sequence. 
Because constraint starts with '(' but yet is optional, it is ambiguous 
after sequenceOfType builtinType constraint (which AW shows you, but 
there other possibilities.

I don't know what how you want this to bind of course, but assuming that 
nearest match wins, and that you really wnat between 0 and infinite 
constraints and not just one, then you can have the constraints bind to 
the type using:

type : (typeWithConstraint | builtinType | referencedType) 
(('(')=>constraint)* ;

Given that the sequence is 0 or more, if this is not what you want, then 
your grammar is incorrect  and the constraint is too low down in the 
rules ;-) However, given that this is probably here because you typed it 
in from some spec, then it doubles the chances that this is what you want.

Next your globalModuleReference rule, which is causing problems again 
because (objectIdentifierValue | definedValue) is optional so the start 
set for these rules is ambiguous. definedValue can be 
externalValueReference | valueReference and the valueReference is LCID. 
However if that option is not taken then you can still get to LCID via 
valueAssignment, which starts with valueReference. The reason for this 
appears to be the fact that there are no statement terminators in this 
grammar. This may be because you have hidden NL but the language is 
actually NL sensitive, which means all rules with trailing optional 
elements cannot be disambiguated if they happen to match the start of a 
new statement. Are you sure that you should not be passing the NL through?

Again, I cannot tell where this should be associating but I strongly 
suspect that it should associate immediately. The problem is though, 
that without the statement terminator, then adding a predicate here will 
rid you of the error, but may consume something that looks like a value 
reference, and as I get to typing in this line, I think that the main 
problem is that you cannot hide NLs for this 'language' as they are 
significant as statement terminators. I tried to look up the specs but 
every link on http://www.itu.int is broken, which figures as it matches 
the language ;-)

So, you can solve the globalModuleReference ambiguities by creating 
predicates, that include the start tokens, but this may not be correct. 
I suspect that this is not ambiguous in the language because of the NL. 
read the spec and see if it says something like "A statement is 
terminated with a newline...." If it does then you cannot hide the NLs, 
at least not without lots of tricks that you do not know about yet.

Two final things:

Once again, I urge you to start again and go a little more slowly - the 
reason that you cannot solve these things is that they are nested too 
deep in the makeup now for you to see what is going on. Even trying to 
retro fit the newline stuff (assuming this is the issue) is going to be 
difficult for you at this point.

Finally, I see that your email address is from Lancaster Poly. This 
isn't your homework/graduate project/thesis is it? I have been caught by 
this before with someone from Sheffield Poly who offered to pay me then 
disappeared after  receiving the grammars. I await my Comp. Sci. degree 
from there with some impatience ;-)

Jim