[antlr-interest] Filtering out html tags and other questions.
BLade X
blade_x123 at yahoo.com
Fri Feb 13 21:50:44 PST 2004
Hi,
I have an input file which is quite like HTML,
I don't want any of the HTML tags but only the text in
the page. I am using something like
options { filter=HTML_TAG; }
...
...
protected
HTML_TAG
: '<' (~'>')* '>'
(
( // the usual newline hassle: \r\n can be matched
in alt 1
// or by matching alt 2 followed by alt 3 in
another iteration.
//
options {
generateAmbigWarnings=false;
}
: "\r\n" | '\r' | '\n'
) { newline();}
)*
| ( "\r\n" | '\r' | '\n' ) {newline();}
| .
;
which I picked from one of the examples given with
antlr. But whenever there is a tag like </html> the
above does not work. It borks at the "</".
My second problem is, I have two things in my grammar.
A "##" and a line beginning with a '#'. As in
#indtag=file.ext
## some text ##
The first one if a tag that I need much like a #define
statement. The other "##" is a marker to indicate the
start of my input data. How do I differentiate between
them ? I looked up most of the syntactic predicate
docs and previous mails and came up with this, but it
doesn't work.
file
: (
data
| (
((HASH HASH)) => (HASH HASH) {
marker(); }
| IND_TAG
)
)+
EOF
where IND_TAG is,
IND_TAG
: '#' (~('\n'|'\r'))* '\n'
;
I get this error message,
line 7:2: expecting ''#'', found ''i''
line 9:1: unexpected token: ##
Any suggestions ?
Thanks in advance,
Manju
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list