[antlr-interest] Is there a way to separately categorize known HTML and unknown XML tags in the lexer?
Gerald B. Rosenberg
gbr at newtechlaw.com
Sun Feb 11 22:49:46 PST 2007
Antlr 3.0b6
Trying to process a defined set of HTML tags while treating any other
XML tags encountered as "just text". Categorizing the known HTML tag
types in the lexer seems cleaner, but requires some kind of catchall
for the unknown category of XML tags.
Any way to do this, or do I have to push it all up into the parser?
Thanks,
Gerald
XML_TYPE
: catchall for all tag types that are not a known BLOCK_TYPE OR SIMPLE_TYPE...
;
BLOCK_TYPE
: ( 'html'
| 'head'
| 'body'
| 'p'
| 'a'
...
| 'th'
| 'td' ){ if (debug) System.out.print("Block "); }
;
SIMPLE_TYPE
: ( 'br'
| 'hr'
| 'col'){ if (debug) System.out.print("Simple "); }
;
----
Gerald B. Rosenberg, Esq.
NewTechLaw
285 Hamilton Avenue, Suite 520
Palo Alto, CA 94301-2576
650.325.2100 (office) / 650.703.1724 (cell)
650.325.2107 (fax)
www.newtechlaw.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070211/851ad0e9/attachment.html
More information about the antlr-interest
mailing list