[antlr-interest] Error in predicate logic
Gerald B. Rosenberg
gbr at newtechlaw.com
Tue Mar 13 15:26:40 PDT 2007
At 01:46 PM 3/13/2007, Terence Parr wrote:
>On Feb 15, 2007, at 9:42 AM, Gerald B. Rosenberg wrote:
>Trying to parse HTML to recognize the ampersand encoded special
>>characters.
>>
>>SPCHAR
>>: ( AMP GRIDLET INT SEMI ) => AMP GRIDLET INT SEMI
>>| ( AMP LETTERS SEMI ) => AMP LETTERS SEMI
>>| ( AMP ) => AMP { $type=PCCHAR; }
>>;
>>
>>fragment PCCHAR : LETTER | DIGIT | PUNCTUATION | '>' | '/' ;
>>fragment LETTERS : (LETTER)+ ;
>>fragment LETTER : 'a'..'z'| 'A'..'Z';
>>fragment AMP: '&' ;
>>fragment GRIDLET: '#' ;
>>fragment SEMI: ';' ;
>>
>>The SPCHAR rule works as expected for input that:
>>- fully matches either of the first two subrules
>>- matches "&X" -- where "X" is anything *other* than a GRIDLET or a
>>LETTER
>>
>>For input "&hello there" , I get line 1:6 mismatched character ' '
>>expecting ';'
>
>There is no space allowed in LETTERS; it therefore fails to match alt 2.
>Ter
<rhetorical>
But... LETTERS also does not include a GRIDLET. That is, when alt 1
fails, alt 2 is tried. When alt 2 fails, as it must at the space,
Antlr generates the error rather than moving on to alt 3 which should
succeed with an adequate match just to the AMP.
</rhetorical>
Understanding the nuances of syntactic predicates is
non-trivial. Could not find a 'good' definition out there in the
wild. So, here is the best succinct formulation of the logic of a
syntactic predicate I have come up with so far (rather doubt that I
have gotten it entirely right):
Syntactic Predicate:
Acts as an alternative subrule selector in the run-time
evaluation of a rule. Within a rule, all of the alternative
predicate predictions are evaluated, in parallel and without
backtracking (also without the invocation of any subrule actions),
over an input data sequence with the result that:
- the prediction that actually matches the input data
sequence, wins for its subrule
- if no predicate prediction is matched by the input data
sequence, there will be a run-time non-determinism
- the path of each predicate prediction must be unique,
otherwise there is a compile-time ambiguity (the path of one
predicate prediction must not be a perfect left aligned subset of
another predicate prediction )
Note: a syntactic predicate is not an if-then selector! In the
sequential evaluation of the input data, once a decision point
between any two predicate predictions has been passed, the 'path not
taken' alternative is eliminated from the set of viable subrules.
Note: when a predicate prediction is matched, the input data
sequence is re-wound and evaluated using just the corresponding
subrule with actions applied; the subrule need not actually match
the predicate prediction !
Note: if a rule also contains alternate subrules not qualified by a
predicate, evaluation of those subrules are deferred until a
predicate winner is found, and then evaluated in combination with the
predicate winning subrule (???).
------------
If something like this (when corrected) would be a helpful addition
to the Wiki, let me know and I will post it with examples.
Thanks,
Gerald
----
Gerald B. Rosenberg, Esq.
NewTechLaw
260 Sheridan Ave., Suite 208
Palo Alto, CA 94306-2009
650.325.2100 (office) / 650.703.1724 (cell)
650.325.2107 (facsimile)
www.newtechlaw.com
CONFIDENTIALITY NOTICE: This email message (including any
attachments) is being sent by an attorney, is for the sole use of the
intended recipient, and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender immediately by reply email and delete all
copies of this message and any attachments without retaining a copy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070313/6ad29eae/attachment-0001.html
More information about the antlr-interest
mailing list