[antlr-interest] Filtering out html tags and other questions.

Terence Parr parrt at cs.usfca.edu
Sat Feb 14 11:14:25 PST 2004


Hi.

You need the charVocab thingie in there...  See any example...like java.
Ter
On Feb 13, 2004, at 9:50 PM, BLade X wrote:

> Hi,
>  I have an input file which is quite like HTML,
> I don't want any of the HTML tags but only the text in
> the page. I am using something like
>
> options { filter=HTML_TAG; }
> ...
> ...
> protected
> HTML_TAG
> 	:	'<' (~'>')* '>'
> 		(
> 			(	// the usual newline hassle: \r\n can be matched
> in alt 1
> 				// or by matching alt 2 followed by alt 3 in
> another iteration.
> 				//
> 				 options {
> 					generateAmbigWarnings=false;
> 				}
> 			:	"\r\n" | '\r' | '\n'
> 			) 	{ newline();}
> 		)*
> 	|	( "\r\n" | '\r' | '\n' ) {newline();}
> 	|	.
>
> 	;
>
> which I picked from one of the examples given with
> antlr. But whenever there is a tag like </html> the
> above does not work. It borks at the "</".
>
> My second problem is, I have two things in my grammar.
> A "##" and a line beginning with a '#'. As in
>
> #indtag=file.ext
>
> ## some text ##
>
> The first one if a tag that I need much like a #define
> statement. The other "##" is a marker to indicate the
> start of my input data. How do I differentiate between
> them ? I looked up most of the syntactic predicate
> docs and previous mails and came up with this, but it
> doesn't work.
>
> file
>     :   (
>             data
>         |   (
>                 ((HASH HASH)) => (HASH HASH) {
> marker(); }
>             |   IND_TAG
>             )
>         )+
>         EOF
>
> where IND_TAG is,
>
> IND_TAG
>     :   '#' (~('\n'|'\r'))* '\n'
>     ;
>
> I get this error message,
> line 7:2: expecting ''#'', found ''i''
> line 9:1: unexpected token: ##
>
> Any suggestions ?
>
> Thanks in advance,
> Manju
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
--
Professor Comp. Sci., University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Co-founder, http://www.jguru.com
Co-founder, http://www.knowspam.net enjoy email again!
Co-founder, http://www.peerscope.com pure link sharing





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list