[antlr-interest] Real simple grammar - newbie help?!
Gerald Rosenberg
gerald at certiv.net
Sun Feb 7 12:34:27 PST 2010
The more general approach is to just broadly characterize key characters
(DOT) and character strings (UPPER_WORD, LOWER_WORD, WORD) in the lexer
and use the parser to create a well structured AST. Don't do much if
any analysis in the parser. You then use multiple tree pattern matchers
to identify key tokens and token sequences in context - each
tree-pattern matcher implementing a discrete analysis rule or closely
related set of rules. Makes the system easily adaptable to changes in
the keyword set and the recognition contexts.
On 2/7/2010 12:00 PM, James Crowley wrote:
> Hi Gerald,
>
> Thanks so much for that. What about the scenario where we don't know
> what the keywords were specifically - just the format they appear in
> (ie to group just that something upper case with a period in the
> middle)... whilst still retaining other behaviours around periods if
> they appear elsewhere? Is this then getting too difficult within the
> constrains of what context-free grammars can do?
>
> Many thanks for your help
>
> James
>
> On 6 February 2010 06:22, Gerald Rosenberg <gerald at certiv.net
> <mailto:gerald at certiv.net>> wrote:
>
> While it may be heresy in the world of context-free grammars,
> Antlr actually performs quite nicely for many NLP problems.
>
> The illustrated approach works well for explicitly identifying a
> few key words in context. Just have to watch for the lexer
> functionally being k=1 and remember that the lexer rules apply
> top-down.
>
> There is a filter option if all you want to do is just find keywords.
>
>
> On 2/5/2010 4:45 PM, James Crowley wrote:
>
> Hi Michael,
>
> Thanks for the response. Sadly not - the language is English
> ;-) But just
> hoping to do some basic tokenization of paragraphs of text
> (essentially just
> extracting keywords) - thought it would be faster/easier to
> use a tool like
> ANTLR than using regex or attempting to roll my own. Am I
> being foolish for
> even attempting this?
>
> James
>
> On 5 February 2010 21:29, Michael
> Matera<mike.matera at xilinx.com <mailto:mike.matera at xilinx.com>>
> wrote:
>
>
>
>
More information about the antlr-interest
mailing list