[antlr-interest] Legal Document Parsing. Can ANTLR help?

Wed Jul 15 02:26:13 PDT 2009

HI,

I have the need to perform a syntactical parsing of various legal documents
with the result to identify and extract each article and sub-paragraph.

The documents are text like:

Act. 123 Bla Bla Bla

Art. 1
(Article title)

Article body with sub paragraph (at most three levels of sub
paragraph identified by numbers (1, 2, 3...) and letters (a, b,
c...) and roman literals (i, ii, iii, vi, etc.)

Unfortunately the real life is a bit tougher than this, i.e. in some
documents you have the string Art. in others Article; sometimes the
Article title is present sometimes not, and so on.

Do you think that ANTLR can help in generating a parser that identifies
and extracts the parts of the legal documents labelling  each part with
the proper hierarchical structure?

So far I am doing a prototype in PERL but taking into account all the
possible variations that can be found in the plethora of documents I have
to "ingest" it seems a quite cumbersome activity to code all the
exceptions.

Thanks for your support.

Regards

Marco Bagni