[antlr-interest] Error in predicate logic

Gerald B. Rosenberg gbr at newtechlaw.com
Tue Mar 13 15:26:40 PDT 2007

At 01:46 PM 3/13/2007, Terence Parr wrote:

>On Feb 15, 2007, at 9:42 AM, Gerald B. Rosenberg wrote:
>Trying to parse HTML to recognize the ampersand encoded special
>>| ( AMP ) => AMP { $type=PCCHAR; }
>>fragment PCCHAR : LETTER | DIGIT | PUNCTUATION | '>' | '/' ;
>>fragment LETTERS : (LETTER)+ ;
>>fragment LETTER  : 'a'..'z'| 'A'..'Z';
>>fragment AMP: '&' ;
>>fragment GRIDLET: '#' ;
>>fragment SEMI: ';' ;
>>The SPCHAR rule works as expected for input that:
>>- fully matches either of the first two subrules
>>- matches "&X" -- where "X" is anything *other* than a GRIDLET or a
>>For input "&hello there" , I get line 1:6 mismatched character ' '
>>expecting ';'
>There is no space allowed in LETTERS; it therefore fails to match alt 2.

But... LETTERS also does not include a GRIDLET.   That is, when alt 1 
fails, alt 2 is tried.  When alt 2 fails, as it must at the space, 
Antlr generates the error rather than moving on to alt 3 which should 
succeed with an adequate match just to the AMP.

Understanding the nuances of syntactic predicates is 
non-trivial.  Could not find a 'good' definition out there in the 
wild.  So, here is the best succinct formulation of the logic of a 
syntactic predicate I have come up with so far (rather doubt that I 
have gotten it entirely right):

Syntactic Predicate:
         Acts as an alternative subrule selector in the run-time 
evaluation of a rule.  Within a rule, all of the alternative 
predicate predictions are evaluated, in parallel and without 
backtracking (also without the invocation of any subrule actions), 
over an input data sequence with the result that:
         - the prediction  that actually matches the input data 
sequence, wins for its subrule
         - if no predicate prediction is matched by the input data 
sequence, there will be a run-time non-determinism
         - the path of each predicate prediction must be unique, 
otherwise there is a compile-time ambiguity (the path of one 
predicate prediction must not be a perfect left aligned subset of 
another predicate prediction )

Note: a syntactic predicate is not an if-then selector!  In the 
sequential evaluation of the input data, once a decision point 
between any two predicate predictions has been passed, the 'path not 
taken' alternative is eliminated from the set of viable subrules.

Note: when a predicate prediction  is matched, the input data 
sequence is re-wound and evaluated using just the corresponding 
subrule with actions applied;  the subrule need not actually match 
the predicate prediction !

Note: if a rule also contains alternate subrules not qualified by a 
predicate, evaluation of those subrules are deferred until a 
predicate winner is found, and then evaluated in combination with the 
predicate winning subrule (???).

If something like this (when corrected) would be a helpful addition 
to the Wiki, let me know and I will post it with examples.

Gerald B. Rosenberg, Esq.
260 Sheridan Ave., Suite 208
Palo Alto, CA  94306-2009

650.325.2100  (office)  /  650.703.1724  (cell)
650.325.2107  (facsimile)


CONFIDENTIALITY NOTICE:  This email message (including any 
attachments) is being sent by an attorney, is for the sole use of the 
intended recipient, and may contain confidential and privileged 
information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, 
please contact the sender immediately by reply email and delete all 
copies of this message and any attachments without retaining a copy.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070313/6ad29eae/attachment-0001.html 

More information about the antlr-interest mailing list