[antlr-interest] Building a regular expression AST
Matt Palmer
mattpalms at gmail.com
Wed Jul 7 14:24:02 PDT 2010
Hi,
I want to build a standard parse-tree for a regular expression, but
I'm having one or two difficulties. For example, the expression:
ABC|123
should yield a tree of:
|
/ \
. .
/ \ / \
. C . 3
/ \ / \
A B 1 2
Alternatives (|) work, but I can't make concatenation of sequential
symbols (.) work at all - they end up as flat lists, rather than
nested:
|
/ \
. .
| |
ABC 123
A simple grammar that shows the concatenation issue is here:
grammar test;
options { output=AST; }
tokens { CONCAT; }
start : regex EOF;
regex : chars ( ALT^ chars )* ;
chars : CHAR+ -> ^(CONCAT CHAR+) ;
ALT : '|' ;
CHAR : (~ALT) ;
My real regular expression grammar is somewhat longer, and also
contains groups and quantifiers. It parses regular expressions very
well - with a somewhat deep parse tree - but I'm having problems
transforming the parse-tree into an AST. Should I even be trying to
produce an AST? I can of course simply write code that transforms the
parse tree into the structures I need - but I imagined that the AST
mechanism would be innately capable of this. Any ideas are welcome,
Regards,
Matt.
More information about the antlr-interest
mailing list