[antlr-interest] Real simple grammar - newbie help?!

Sun Feb 7 12:34:27 PST 2010

The more general approach is to just broadly characterize key characters 
(DOT) and character strings (UPPER_WORD, LOWER_WORD, WORD) in the lexer 
and use the parser to create a well structured AST.  Don't do much if 
any analysis in the parser.  You then use multiple tree pattern matchers 
to identify key tokens and token sequences in context - each 
tree-pattern matcher implementing a discrete analysis rule or closely 
related set of rules.  Makes the system easily adaptable to changes in 
the keyword set and the recognition contexts.

On 2/7/2010 12:00 PM, James Crowley wrote:
> Hi Gerald,
>
> Thanks so much for that. What about the scenario where we don't know 
> what the keywords were specifically - just the format they appear in 
> (ie to group just that something upper case with a period in the 
> middle)... whilst still retaining other behaviours around periods if 
> they appear elsewhere? Is this then getting too difficult within the 
> constrains of what context-free grammars can do?
>
> Many thanks for your help
>
> James
>
> On 6 February 2010 06:22, Gerald Rosenberg <gerald at certiv.net 
> <mailto:gerald at certiv.net>> wrote:
>
>     While it may be heresy in the world of context-free grammars,
>     Antlr actually performs quite nicely for many NLP problems.
>
>     The illustrated approach works well for explicitly identifying a
>     few key words in context.  Just have to watch for the lexer
>     functionally being k=1 and remember that the lexer rules apply
>     top-down.
>
>     There is a filter option if all you want to do is just find keywords.
>
>
>     On 2/5/2010 4:45 PM, James Crowley wrote:
>
>         Hi Michael,
>
>         Thanks for the response. Sadly not - the language is English
>         ;-) But just
>         hoping to do some basic tokenization of paragraphs of text
>         (essentially just
>         extracting keywords) - thought it would be faster/easier to
>         use a tool like
>         ANTLR than using regex or attempting to roll my own. Am I
>         being foolish for
>         even attempting this?
>
>         James
>
>         On 5 February 2010 21:29, Michael
>         Matera<mike.matera at xilinx.com <mailto:mike.matera at xilinx.com>>
>          wrote:
>
>
>
>