[antlr-interest] Legal Document Parsing. Can ANTLR help?

Wed Jul 15 06:15:42 PDT 2009

On Jul 15, 2009, at 11:26 AM, Marco Bagni wrote:

>
> HI,
>
> I have the need to perform a syntactical parsing of various legal  
> documents
> with the result to identify and extract each article and sub- 
> paragraph.
>
> The documents are text like:
>
> Act. 123 Bla Bla Bla
>
> Art. 1
> (Article title)
>
> Article body with sub paragraph (at most three levels of sub
> paragraph identified by numbers (1, 2, 3...) and letters (a, b,
> c...) and roman literals (i, ii, iii, vi, etc.)
>
> Unfortunately the real life is a bit tougher than this, i.e. in some
> documents you have the string Art. in others Article; sometimes the
> Article title is present sometimes not, and so on.
>
> Do you think that ANTLR can help in generating a parser that  
> identifies
> and extracts the parts of the legal documents labelling  each part  
> with
> the proper hierarchical structure?
>
> So far I am doing a prototype in PERL but taking into account all the
> possible variations that can be found in the plethora of documents I  
> have
> to "ingest" it seems a quite cumbersome activity to code all the
> exceptions.
>
> Thanks for your support.
>
> Regards
>
> Marco Bagni
>
>

	You can probably not use ANTLR or indeed any other parser generator  
for this purpose.
Parser generators are for computer languages - not natural languages.
-----------------------------------
See the amazing new SF reel: Invasion of the man eating cucumbers from  
outer space.
On congratulations for a fantastic parody, the producer replies :  
"What parody?"

Tommy Nordgren
tommy.nordgren at comhem.se