[antlr-interest] Rematching AST Nodes

Courtney Falk court at infiauto.com
Mon May 2 09:38:28 PDT 2011


  On 5/2/2011 9:47 AM, Jim Idle wrote:
> I suspect that you are approaching this problem incorrectly in some way.
> Why do you feel you need to specify a new token at the AST stage? Why
> don't you restate your goal, ignoring what you have done so far - I
> suspect that we may be trying to solve a problem that you should not have.

Certainly.  I was trying to keep things simple/short, but I can expand.

My project is a NLP tokenizer/parser.  The first stage of functionality 
is implemented the FuzzyLexer and FuzzyParser grammars.  They strip out 
all punctuation and white space, preserving them as tokens and grouping 
all the text between the punctuation/white space as "unspecified" tokens.

Stage 1.5 is the language-specific composite grammar (Sentential.g), 
which imports the Fuzzy* grammars.  Here, I implement regular 
expressions used in semantic predicates that attempt to categorize 
"unspecified" tokens into relevant categories (see also, 
LongNumber.java).  For instance, the string "one" would be cast as a 
long form number token.  Any "unspecified" tokens that don't match any 
semantic predicates stay "unspecified" tokens.

Stage 2, which is yet to be written, walks the AST output by stage 1.5 
and wraps the tokens up into an application-specific data structure.  
This tree grammar will also perform tasks such as clustering together 
numbers into one single number, etc.


Courtney Falk
court at infiauto.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: FuzzyLexer.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: FuzzyParser.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0001.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LongNumber.java
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0002.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Sentential.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20110502/8b09543e/attachment-0003.pl 


More information about the antlr-interest mailing list