[antlr-interest] Confusing, hopefully-final, problems

Gavin Lambert antlr at mirality.co.nz
Sun Mar 1 23:33:24 PST 2009


At 11:21 2/03/2009, Sam Barnett-Cormack wrote:
 >NUMBER : ;

Defining non-fragment rules that can succesfully match nothing 
(ie. empty string) is Bad(tm).

 >BSTRING : '\'' BSTRINGCONT '\'B';
[...]
 >HSTRING : '\'' HSTRINGCONT '\'H';

These are going to give you grief.  The lexer cannot backtrack, 
and since HSTRINGCONT/BSTRINCONT can be infinite length, it cannot 
determine sufficient static lookahead to disambiguate these 
automatically.

What you should do is to define a generic 
single-quoted-string-with-optional-trailing-B/H lexer rule, and 
then either put some trailing code in the lexer rule to check the 
content and change the type (or report an error), or defer that 
until parse time.

 >fragment
 >CSTRINGNL : WSNONL* NL WSNONL* {setText("");};

setText has no effect in fragment rules.

 >fragment
 >XMLATTVAL : XMLDATTVAL | XMLSATTVAL ;
 >
 >fragment
 >XMLATTRIB : XMLNAME '=' XMLATTVAL ;
 >
 >fragment
 >WSBLOCK : WS+;
 >
 >fragment
 >XMLATTRIBS
 >      : XMLATTRIB
 >      | (XMLATTRIB WS)=>XMLATTRIB WSBLOCK XMLATTRIBS;
 >
 >fragment
 >XMLTAGATTS
 >      : WSBLOCK XMLATTRIBS ;
 >
 >fragment
 >XMLOPENTAG : '<' XMLNAME XMLTAGATTS? WS* '>';
 >
 >fragment
 >XMLCLOSETAG : '</' XMLNAME '>';
 >
 >fragment
 >XMLSCLOSETAG : '<' XMLNAME XMLTAGATTS? WS* '/>';
 >
 >fragment
 >XMLNONEMPTYELEMENT : XMLOPENTAG XMLCONTENT XMLCLOSETAG;
 >
 >fragment
 >XMLEMPTYELEMENT : XMLSCLOSETAG;
 >
 >fragment
 >XMLELEMENT options {
 >  backtrack=true;
 >}
 >      : XMLEMPTYELEMENT | XMLNONEMPTYELEMENT ;
 >
 >fragment
 >XMLCONTENT : (XMLELEMENT | XMLENTREF | ~INVALIDINXML) *;
 >
 >XMLFRAG : XMLELEMENT;

These really seem like they shouldn't be lexer rules.  (Or 
possibly that you should go look at the island grammar example.)

 >extensionAdditionGroup : '[[' versionNumber componentTypeList 
']]'
 >;
[...]
 >tag : '[' encodingReference class classNumber ']' ;

You should be very careful when using quoted literals in parser 
rules (in fact if you're not used to their quirks you should 
probably avoid using them).

The above will define four new lexer rules, similar to these:
T62: '[[';
T63: ']]';
T64: '[';
T65: ']';

In particular, note that the '[[' and ']]' produce unique tokens, 
not two occurrences of the '[' or ']' token.  This in turn means 
that if you happen to have [[ or ]] in your input where the 
grammar is expecting [ [ or ] ], then it will fail.  (This, 
incidentally, is the same issue behind C++s need to be careful 
with >s when nesting templates.)

 >[18:09:42] warning(200): ASN_1.g:518:15: Decision can match 
input
 >such as "'...'" using multiple alternatives: 1, 2
 >As a result, alternative(s) 2 were disabled for that input
 >[18:09:42] error(201): ASN_1.g:518:15: The following 
alternatives
 >can never be matched: 2

This one is fairly self-explanatory.  The rule in question is 
this:
setType : '{' (componentTypeLists | extensionAndException 
optionalExtensionMarker)? '}' ;

Now, let's look at the alternatives.  extensionAndException must 
begin with a '...' token.  It must match either 
'componentTypeLists' or 'extensionAndException'.  Now let's drill 
into componentTypeLists.  One of the alternatives is 
ctlExtensionStuff; and that also begins with an 
extensionAndException.  (It can also be followed by an 
optionalExtensionMarker.)  So now that's two alts with a common 
left prefix -- one must therefore die.  The second error is 
basically saying that they don't just have a common prefix -- one 
is actually a subset of the other.

I'm sure you'll find that the other errors occur for similar 
reasons.  (It's also interesting to note that most of them occur 
in places where you've turned backtracking on.  You should usually 
try to avoid doing that, in favour of rewriting your grammar to 
remove the ambiguities.)



More information about the antlr-interest mailing list