[antlr-interest] Is there a way to separately categorize known HTML and unknown XML tags in the lexer?

Gerald B. Rosenberg gbr at newtechlaw.com
Sun Feb 11 22:49:46 PST 2007


Antlr 3.0b6

Trying to process a defined set of HTML tags while treating any other 
XML tags encountered as "just text".  Categorizing the known HTML tag 
types in the lexer seems cleaner, but requires some kind of catchall 
for the unknown category of XML tags.

Any way to do this, or do I have to push it all up into the parser?

Thanks,
Gerald

XML_TYPE
:  catchall for all tag types that are not a known BLOCK_TYPE OR SIMPLE_TYPE...
;

BLOCK_TYPE
: ( 'html'
| 'head'
| 'body'
| 'p'
| 'a'
...
| 'th'
| 'td' ){ if (debug) System.out.print("Block  "); }
;

SIMPLE_TYPE
: ( 'br'
| 'hr'
| 'col'){ if (debug) System.out.print("Simple "); }
;

----
Gerald B. Rosenberg, Esq.
NewTechLaw
285 Hamilton Avenue, Suite 520
Palo Alto, CA  94301-2576

650.325.2100  (office)  /  650.703.1724  (cell)
650.325.2107  (fax)

www.newtechlaw.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070211/851ad0e9/attachment.html 


More information about the antlr-interest mailing list