[antlr-interest] Building a regular expression AST

Matt Palmer mattpalms at gmail.com
Wed Jul 7 14:24:02 PDT 2010


Hi,

I want to build a standard parse-tree for a regular expression, but
I'm having one or two difficulties.  For example, the expression:

    ABC|123

should yield a tree of:

          |
        /   \
      .       .
     / \     / \
    .  C   .  3
   / \     / \
  A B   1 2

Alternatives (|) work, but I can't make concatenation of sequential
symbols (.) work at all - they end up as flat lists, rather than
nested:

         |
       /   \
     .       .
     |       |
  ABC  123

A simple grammar that shows the concatenation issue is here:

grammar test;

options { output=AST; }

tokens { CONCAT; }

start	:	regex EOF;

regex	:	chars ( ALT^ chars )* ;

chars	:	CHAR+ -> ^(CONCAT CHAR+) ;

ALT	:	'|' ;

CHAR	:	(~ALT) ;

My real regular expression grammar is somewhat longer, and also
contains groups and quantifiers.  It parses regular expressions very
well - with a somewhat deep parse tree - but I'm having problems
transforming the parse-tree into an AST.  Should I even be trying to
produce an AST?  I can of course simply write code that transforms the
parse tree into the structures I need - but I imagined that the AST
mechanism would be innately capable of this.  Any ideas are welcome,

Regards,

Matt.


More information about the antlr-interest mailing list