[antlr-interest] Rematching AST Nodes
Courtney Falk
court at infiauto.com
Mon May 2 09:38:28 PDT 2011
On 5/2/2011 9:47 AM, Jim Idle wrote:
> I suspect that you are approaching this problem incorrectly in some way.
> Why do you feel you need to specify a new token at the AST stage? Why
> don't you restate your goal, ignoring what you have done so far - I
> suspect that we may be trying to solve a problem that you should not have.
Certainly. I was trying to keep things simple/short, but I can expand.
My project is a NLP tokenizer/parser. The first stage of functionality
is implemented the FuzzyLexer and FuzzyParser grammars. They strip out
all punctuation and white space, preserving them as tokens and grouping
all the text between the punctuation/white space as "unspecified" tokens.
Stage 1.5 is the language-specific composite grammar (Sentential.g),
which imports the Fuzzy* grammars. Here, I implement regular
expressions used in semantic predicates that attempt to categorize
"unspecified" tokens into relevant categories (see also,
LongNumber.java). For instance, the string "one" would be cast as a
long form number token. Any "unspecified" tokens that don't match any
semantic predicates stay "unspecified" tokens.
Stage 2, which is yet to be written, walks the AST output by stage 1.5
and wraps the tokens up into an application-specific data structure.
This tree grammar will also perform tasks such as clustering together
numbers into one single number, etc.
Courtney Falk
court at infiauto.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: FuzzyLexer.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: FuzzyParser.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0001.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LongNumber.java
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0002.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Sentential.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0003.pl
More information about the antlr-interest
mailing list