[antlr-interest] Filtering out html tags and other questions.

BLade X blade_x123 at yahoo.com
Fri Feb 13 21:50:44 PST 2004


Hi,
 I have an input file which is quite like HTML,
I don't want any of the HTML tags but only the text in
the page. I am using something like

options { filter=HTML_TAG; }
...
...
protected
HTML_TAG
	:	'<' (~'>')* '>'
		(
			(	// the usual newline hassle: \r\n can be matched
in alt 1
				// or by matching alt 2 followed by alt 3 in
another iteration.
				//
				 options {
					generateAmbigWarnings=false;
				}
			:	"\r\n" | '\r' | '\n'
			) 	{ newline();}
		)*
	|	( "\r\n" | '\r' | '\n' ) {newline();}
	|	.

	;

which I picked from one of the examples given with
antlr. But whenever there is a tag like </html> the
above does not work. It borks at the "</".

My second problem is, I have two things in my grammar.
A "##" and a line beginning with a '#'. As in

#indtag=file.ext

## some text ##

The first one if a tag that I need much like a #define
statement. The other "##" is a marker to indicate the
start of my input data. How do I differentiate between
them ? I looked up most of the syntactic predicate
docs and previous mails and came up with this, but it
doesn't work.

file
    :   (
            data
        |   (
                ((HASH HASH)) => (HASH HASH) {
marker(); }
            |   IND_TAG
            )
        )+
        EOF

where IND_TAG is,

IND_TAG
    :   '#' (~('\n'|'\r'))* '\n'
    ;

I get this error message,
line 7:2: expecting ''#'', found ''i''
line 9:1: unexpected token: ##

Any suggestions ?

Thanks in advance,
Manju




__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list