[antlr-interest] Building a regular expression AST

Matt Palmer mattpalms at gmail.com
Thu Jul 8 00:32:42 PDT 2010


In case anyone is interested, another strategy which does give a
left-associated tree (but is much less elegant and involves some
fiddly coding), is to make the lexer automatically emit virtual CONCAT
tokens between chars in the expression, following the advice on this
page:

http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497

Given an expression of the form ABC|123, the lexer emits the tokens:
CHAR(A) CONCAT CHAR(B) CONCAT CHAR(C) ALT CHAR(1) CONCAT CHAR(2) CONCAT CHAR(3)

Then using the rule:

chars:  CHAR (CONCAT^ CHAR)*;

will give a left-associative tree for the concatenation of  characters
in the expression, in the same way that the ALT rule does.

Matt.

On 7 July 2010 23:30, Matt Palmer <mattpalms at gmail.com> wrote:
> Wow.  That's some impressive ASTing ;)  It may take me some time to
> pull apart what is going on there.
>
> I don't think being right-associative will matter - but I'm not 100%
> sure right now.  I've got to check whether all those comp-sci
> algorithms to build NFAs and DFAs for regular expressions will care,
> or whether the right-associativity will cause other parsing problems
> in the more advanced parts of the regular expression.  But it gets me
> to the next level.
>
> thanks,
>
> Matt.
>
> On 7 July 2010 23:17, John B. Brodie <jbb at acm.org> wrote:
>> chars : (CHAR->CHAR) ( chars -> ^(CONCAT CHAR chars) )? ;
>


More information about the antlr-interest mailing list