[antlr-interest] Error I can't get my head around at all...
Jim Idle
jimi at temporal-wave.com
Tue Mar 3 08:57:16 PST 2009
Sam Barnett-Cormack wrote:
> Hi all,
>
> I've wracked my brains on this one, and really can't see what's going on.
>
> I've got one error in my grammar, and I just can't see how it's come
> about. I've been listening to everything people have said, I've been
> trying to apply what I've learned, most notably drilling into rules
> and expanding out to see why it says a choice is unreachable, but it
> doesn't make sense.
>
> The rule with the error is on line 300:
>
> globalModuleReference : modulereference (objectIdentifierValue |
> definedValue)? ;
>
> With the error:
>
> (201): The following alternatives can never be matched: 3
>
> Accompanying warnings:
>
> (200): Decision can match input such as "CAPID" using multiple
> alternatives: 2, 3
> As a result, alternative(s) 3 were disabled for that input
>
> (200): Decision can match input such as "LCID" using multiple
> alternatives: 2, 3
> As a result, alternative(s) 3 were disabled for that input
>
> The problem is, neither objectIdentifierValue is nillable,
> objectIdentifierValue must start with '{', and defined value can start
> with CAPID or LCID. So where's the ambiguity? It's got a choice of
> matching '{', CAPID, LCID, or stopping for this rule.
>
> Hoping someone can shed some light on this before I get too much of a
> headache...
I was trying to look at your grammar for you but the one you have
attached is missing some token defs:
...
[08:08:06] warning(105): ASN_1.g:456:7: no lexer rule corresponding to
token: BSTRING
[08:08:06] warning(105): ASN_1.g:273:24: no lexer rule corresponding to
token: NUMBER
[08:08:06] warning(105): ASN_1.g:420:16: no lexer rule corresponding to
token: NUMBER
[08:08:06] warning(105): ASN_1.g:621:9: no lexer rule corresponding to
token: NUMBER
[08:08:06] warning(105): ASN_1.g:465:30: no lexer rule corresponding to
token: HSTRING
...lots of these. You won't get anywhere until you have defined all the
tokens you are trying to use in the parser.
If you can post one that has all the tokens defined correctly, I will
try to help. Here's some more tips:
Take your tokens out of the tokens section and make them real rules, as in:
BY='BY';
Becomes a keyword token in the lexer def:
// TOkens...
//
BY: 'BY';
etc..
// Existing lexer
Then resolve your token confusion - I think that you need:
fragment NUMBER:;
fragment HSTRING:;
To define those token, but you could rename NUMBERf to be NUMBER;
You are trying to use backtrack in a lexer rule, and that does not work,
backtrack is for parser. However you are also using skip() in CSTRING
(which will drop the whole token, then setting $channel = DEFAULT; I
can't tell what you are trying to do there, but the rule is incorrect.
Are you trying to remove the opening and closing " marks? skip() means
don't produce a token, not skip the current character.
Format your rules so that the () : | and ; align. This will make sure
you don't merge alts by accident and means you can follow the alts with
your eyes much easier.
Once I add the missing fragments, then I reproduce your error. However,
I think that your CSTRING lexer rule is neither matching what you think
it is, nor will return the text you want.
With formatting, it looks like this:
fragment
CSTRINGNL : WSNONL* NL WSNONL*;
CSTRING
: '"'
(
(WS)=> ( (CSTRINGNL)=>CSTRINGNL
| WSNONL+
)
| '"' '"'
| ~'"'
)*
'"'
{$channel=DEFAULT;}
;
Which means that you second predicate (CSTRINGNL)=> does not
disambiguate anything and the first WS is redundant. I think I see what
you are trying to do:
CSTRING
: '"'
(
(CSTRINGNL)=>CSTRINGNL
| ('""')=>'""'
| ~'"'
)*
'"'
;
Next, your type rule. As mentioned in prior responses, this is happening
because the normative grammar does not specify how to write a parser and
so you have dangling ambiguity with the optional constraint sequence.
Because constraint starts with '(' but yet is optional, it is ambiguous
after sequenceOfType builtinType constraint (which AW shows you, but
there other possibilities.
I don't know what how you want this to bind of course, but assuming that
nearest match wins, and that you really wnat between 0 and infinite
constraints and not just one, then you can have the constraints bind to
the type using:
type : (typeWithConstraint | builtinType | referencedType)
(('(')=>constraint)* ;
Given that the sequence is 0 or more, if this is not what you want, then
your grammar is incorrect and the constraint is too low down in the
rules ;-) However, given that this is probably here because you typed it
in from some spec, then it doubles the chances that this is what you want.
Next your globalModuleReference rule, which is causing problems again
because (objectIdentifierValue | definedValue) is optional so the start
set for these rules is ambiguous. definedValue can be
externalValueReference | valueReference and the valueReference is LCID.
However if that option is not taken then you can still get to LCID via
valueAssignment, which starts with valueReference. The reason for this
appears to be the fact that there are no statement terminators in this
grammar. This may be because you have hidden NL but the language is
actually NL sensitive, which means all rules with trailing optional
elements cannot be disambiguated if they happen to match the start of a
new statement. Are you sure that you should not be passing the NL through?
Again, I cannot tell where this should be associating but I strongly
suspect that it should associate immediately. The problem is though,
that without the statement terminator, then adding a predicate here will
rid you of the error, but may consume something that looks like a value
reference, and as I get to typing in this line, I think that the main
problem is that you cannot hide NLs for this 'language' as they are
significant as statement terminators. I tried to look up the specs but
every link on http://www.itu.int is broken, which figures as it matches
the language ;-)
So, you can solve the globalModuleReference ambiguities by creating
predicates, that include the start tokens, but this may not be correct.
I suspect that this is not ambiguous in the language because of the NL.
read the spec and see if it says something like "A statement is
terminated with a newline...." If it does then you cannot hide the NLs,
at least not without lots of tricks that you do not know about yet.
Two final things:
Once again, I urge you to start again and go a little more slowly - the
reason that you cannot solve these things is that they are nested too
deep in the makeup now for you to see what is going on. Even trying to
retro fit the newline stuff (assuming this is the issue) is going to be
difficult for you at this point.
Finally, I see that your email address is from Lancaster Poly. This
isn't your homework/graduate project/thesis is it? I have been caught by
this before with someone from Sheffield Poly who offered to pay me then
disappeared after receiving the grammars. I await my Comp. Sci. degree
from there with some impatience ;-)
Jim
More information about the antlr-interest
mailing list