[antlr-interest] Trouble with nondeterminism

Mon Aug 28 18:46:28 PDT 2006

On Tue, Aug 29, 2006 at 03:17:46AM +0200, Spálený Ivo wrote:
>Hi,
>
>Sub-rule
>
>  ('0'..'9')+ (Exponent)?
>
>can be splitted into:
>
>  ('0' | '1'..'9' ('0'..'9')*) // duplicity of INT token
>| ('0'..'9')+ Exponent         // duplicity of an alternative DOUBLE branch 
>| '0' ('0'..'9')+              // the unique piece of information in this sub-rule
>
>In ANTLR point of view, nondeterministic input probably results in deterministic output. ANTLR disables alternatives. But isn't it better to be sure; if "1" is DOUBLE or INT token finally?
>
>Best regards,
>
>Ivo Spaleny

Hi,

Okay, that certainly makes sense.  I guess my question is what is the best way
to resolve these problems with common left prefixes, as one gets with
different numerical types.

After re-reading the section in the reference manual about predicates, I have
to following which generates no warnings from ANTLR:

Constant
    :   ( ('0'..'9')+ '.') => ('0'..'9')+ '.' ('0'..'9')* (Exponent)?
            { $setType(DOUBLE); }
    |   '.' ('0'..'9')+ (Exponent)?
            { $setType(DOUBLE); }
    |   ( ('0'..'9')+ ('e' | 'E')) => ('0'..'9')+ Exponent
            { $setType(DOUBLE); }
    |   ('0' | ( '1'..'9' ('0'..'9')* ))
            { $setType(INT); }
    ;

Is that a "good" method for dealing with this problem?  I must also say that
even after reading that section and hacking together the above rules, I still
don't really understand how these predicate rules help ANTLR do its job.  It
still must grab the characters one at a time.  How do the predicates make the
task easier or, at the very least, unambiguous?

-- 
--John Gruenenfelder    Research Assistant, UMass Amherst student
                        Systems Manager, MKS Imaging Technology, LLC.
Try Weasel Reader for PalmOS  --  http://gutenpalm.sf.net
"This is the most fun I've had without being drenched in the blood
of my enemies!"
        --Sam of Sam & Max