[antlr-interest] Legal Document Parsing. Can ANTLR help?

"Paul Bouché (NSN)" paul.bouche at nsn.com
Wed Jul 15 07:30:18 PDT 2009


Tommy Nordgren schrieb:
> On Jul 15, 2009, at 11:26 AM, Marco Bagni wrote:
>
>   
>> HI,
>>
>> I have the need to perform a syntactical parsing of various legal  
>> documents
>> with the result to identify and extract each article and sub- 
>> paragraph.
>>
>> The documents are text like:
>>
>> Act. 123 Bla Bla Bla
>>
>> Art. 1
>> (Article title)
>>
>> Article body with sub paragraph (at most three levels of sub
>> paragraph identified by numbers (1, 2, 3...) and letters (a, b,
>> c...) and roman literals (i, ii, iii, vi, etc.)
>>
>> Unfortunately the real life is a bit tougher than this, i.e. in some
>> documents you have the string Art. in others Article; sometimes the
>> Article title is present sometimes not, and so on.
>>
>> Do you think that ANTLR can help in generating a parser that  
>> identifies
>> and extracts the parts of the legal documents labelling  each part  
>> with
>> the proper hierarchical structure?
>>
>> So far I am doing a prototype in PERL but taking into account all the
>> possible variations that can be found in the plethora of documents I  
>> have
>> to "ingest" it seems a quite cumbersome activity to code all the
>> exceptions.
>>
>> Thanks for your support.
>>
>> Regards
>>
>> Marco Bagni
>>
>>
>>     
>
> 	You can probably not use ANTLR or indeed any other parser generator  
> for this purpose.
> Parser generators are for computer languages - not natural languages.
> -----------------------------------
> See the amazing new SF reel: Invasion of the man eating cucumbers from  
> outer space.
> On congratulations for a fantastic parody, the producer replies :  
> "What parody?"
>   
Well I once worked for a hotel booking company and they (ab)used a spam 
filter to automatically categorize all kinds of input hotels made into 
their systems.
> Tommy Nordgren
> tommy.nordgren at comhem.se
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>   


-- 
Paul Bouché
Voice: +49 30 590080-1284
 
Nokia Siemens Networks GmbH & Co. KG, An den Treptowers 1, 12435 Berlin, Germany
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRA 88537
WEEE-Reg.-Nr.: DE 52984304

Persönlich haftende Gesellschafterin / General Partner: Nokia Siemens Networks Management GmbH
Geschäftsleitung / Board of Directors: Lydia Sommer, Olaf Horsthemke
Vorsitzender des Aufsichtsrats / Chairman of supervisory board: Lauri Kivinen
Sitz der Gesellschaft: München / Registered office: Munich
Registergericht: München / Commercial registry: Munich, HRB 163416

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090715/8ad9abb7/attachment.html 


More information about the antlr-interest mailing list