[antlr-interest] Error in predicate logic

Gerald B. Rosenberg gbr at newtechlaw.com
Thu Feb 15 09:42:55 PST 2007


Did I just run into the same bug, or is this a different problem (and 
how to fix)?

Trying to parse HTML to recognize the ampersand encoded special characters.

SPCHAR
: ( AMP GRIDLET INT SEMI ) => AMP GRIDLET INT SEMI
| ( AMP LETTERS SEMI ) => AMP LETTERS SEMI
| ( AMP ) => AMP { $type=PCCHAR; }
;

fragment PCCHAR : LETTER | DIGIT | PUNCTUATION | '>' | '/' ;
fragment LETTERS : (LETTER)+ ;
fragment LETTER  : 'a'..'z'| 'A'..'Z';
fragment AMP: '&' ;
fragment GRIDLET: '#' ;
fragment SEMI: ';' ;

The SPCHAR rule works as expected for input that:
- fully matches either of the first two subrules
- matches "&X" -- where "X" is anything *other* than a GRIDLET or a LETTER

For input "&hello there" , I get line 1:6 mismatched character ' ' 
expecting ';'

Thanks,
Gerald




At 05:53 PM 2/14/2007, you wrote:
>Hi Harmut,
>
>Hmm...that is suspicious:
>
>ID   :
>     (AA DIGIT) => AA
>   | (AAB DIGIT) => AAB
>   | ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')+
>   ;
>
>AAB collides with 3rd alt as does AA so it should test synpred1 then
>synpred2 then 3rd alt (no pred).  Odd.
>
>Adding to bug db.
>
>Ter

----
Gerald B. Rosenberg, Esq.
NewTechLaw
285 Hamilton Avenue, Suite 520
Palo Alto, CA  94301-2576

650.325.2100  (office)  /  650.703.1724  (cell)
650.325.2107  (fax)

www.newtechlaw.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070215/fc2fba90/attachment.html 


More information about the antlr-interest mailing list