[antlr-interest] Parsing HTML pages

Alexander Nicolaysen Sørnes alex at thehandofagony.com
Thu Jul 3 02:44:53 PDT 2008


På Torsdag 03 juli 2008 , 11:32:43 skrev Ana Nelson:
> Hi, Alexander,
>
> It might be simpler to use an existing HTML parsing tool rather than
> writing your own.
>
> Here's one tool I really like, it's written in Ruby:
> http://code.whytheluckystiff.net/hpricot/
>

Thanks, I'll check it out.

> If you are trying to parse web pages remember that lots of people
> write very bad HTML which doesn't conform to the standard, so it can
> be very difficult to parse.
>

Yes, that's why I thought something like ANTLR would be a good idea.



ALexander

> 2008/7/3 Alexander Nicolaysen Sørnes <alex at thehandofagony.com>:
> > Hello,
> >
> > Since I'm quite new parsing, I'd just like to ask if ANTLR is a good
> > choice for extracting info out of HTML pages.  I greatly appreciate any
> > feedback.
> >
> >
> >
> > Alexander N. Sørnes




More information about the antlr-interest mailing list