[antlr-interest] Filtering out html tags and other questions.
Terence Parr
parrt at cs.usfca.edu
Sat Feb 14 11:14:25 PST 2004
Hi.
You need the charVocab thingie in there... See any example...like java.
Ter
On Feb 13, 2004, at 9:50 PM, BLade X wrote:
> Hi,
> I have an input file which is quite like HTML,
> I don't want any of the HTML tags but only the text in
> the page. I am using something like
>
> options { filter=HTML_TAG; }
> ...
> ...
> protected
> HTML_TAG
> : '<' (~'>')* '>'
> (
> ( // the usual newline hassle: \r\n can be matched
> in alt 1
> // or by matching alt 2 followed by alt 3 in
> another iteration.
> //
> options {
> generateAmbigWarnings=false;
> }
> : "\r\n" | '\r' | '\n'
> ) { newline();}
> )*
> | ( "\r\n" | '\r' | '\n' ) {newline();}
> | .
>
> ;
>
> which I picked from one of the examples given with
> antlr. But whenever there is a tag like </html> the
> above does not work. It borks at the "</".
>
> My second problem is, I have two things in my grammar.
> A "##" and a line beginning with a '#'. As in
>
> #indtag=file.ext
>
> ## some text ##
>
> The first one if a tag that I need much like a #define
> statement. The other "##" is a marker to indicate the
> start of my input data. How do I differentiate between
> them ? I looked up most of the syntactic predicate
> docs and previous mails and came up with this, but it
> doesn't work.
>
> file
> : (
> data
> | (
> ((HASH HASH)) => (HASH HASH) {
> marker(); }
> | IND_TAG
> )
> )+
> EOF
>
> where IND_TAG is,
>
> IND_TAG
> : '#' (~('\n'|'\r'))* '\n'
> ;
>
> I get this error message,
> line 7:2: expecting ''#'', found ''i''
> line 9:1: unexpected token: ##
>
> Any suggestions ?
>
> Thanks in advance,
> Manju
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Finance: Get your refund fast by filing online.
> http://taxes.yahoo.com/filing.html
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
--
Professor Comp. Sci., University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Co-founder, http://www.jguru.com
Co-founder, http://www.knowspam.net enjoy email again!
Co-founder, http://www.peerscope.com pure link sharing
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list