[antlr-interest] Template rewriting: processing lists of things (possible bug?)

Conrad Hughes antlr at xrad.org
Fri Mar 4 03:28:18 PST 2011


Dear all,

I'm using ANTLR 3.3.  In a template rewriting grammar, structures like

  rule: l+=otherRule+

create a list of nulls in $l if otherRule isn't a template rewrite.
This seems undesirable behaviour since I'd've thought that unrewritten
rules should just appear in $l as their original text?  Am I doing
something wrong, is it correct behaviour (in which case, is there an
explanation as to why it's correct please?) or is it a bug?

Below is a grammar playing around with this; to clearly separate
behaviour, I treat lower case words as tokens, rewrite upper case words
and don't rewrite numbers.  I then play with collections of all three.
My sample input is:

  hi there
  123 456
  THIRD LIST
  ; mixture 07 THINGS
  ; MY 2 mixture
  ; version 3 MIXTURE

After the three "pure" collections, I try mixing them up.  I've come up
with two ways of rewriting a list containing a mixture of rules,
rewrites and tokens (I know ANTLR refuses to mix tokens and rules in its
own lists):

  - one (mix) is to insert a new rule (mixItem) which turns everything
    in the mixture into a template (which seems like a lot of work ---
    is there a templating shorthand for rewriting a rule to its
    identical self?);
  - another (hackMix) is to manually build the list using rule.text to
    work around the rules-turn-into-nulls problem.
  - I also sketch an "idealMix" rule which doesn't work.

Is there a version of idealMix which does work and is less messy than
mix or hackMix please?

Any suggestions gratefully received...
Conrad

grammar PlusEquals;

options {
  output = template;
  rewrite = true;
}

file    : tokeNs rules rewrites ';' mix ';' hackMix ';' idealMix ;

// To collect some tokens, you need <it.text>.  input -> output:
//    hi there -> Tokens (2): ["hi", "there"]
tokeNs  : (w+=TOKEN)+
          -> template(w={$w})
             "Tokens (<length(w)>): [<w:{\"<it.text>\"};separator={, }>]"
        ;

TOKEN   : ('a'..'z')+ ;

// You can't collect rules: $w contains a list of nulls.  input -> output:
//   123 456 -> Rules (2): []
rules   : (w+=rule)+
          -> template(w={$w})
             "Rules (<length(w)>): [<w:{\"<it>\"};separator={, }>]"
        ;

rule    : NUMBER ;
NUMBER  : ('0'..'9')+ ;

// To collect rewrites, you just use <it> directly.  input -> output:
//   THIRD LIST -> Rewrites (2): ["THIRD", "LIST"]
rewrites: (w+=rewrite)+
          -> template(w={$w})
             "Rewrites (<length(w)>): [<w:{<it>};separator={, }>]"
        ;

rewrite : CAPS -> template(w={$text}) "\"<w>\"" ;
CAPS    : ('A'..'Z')+ ;

// Because rules don't seem to accumulate right, turn everything into rewrites:
//   mixture 07 THINGS ->
//     Mixture (3): [Token:"mixture", Rule:"07", Rewrite:"THINGS"]
mix     : (w+=mixItem)+
          -> template(w={$w})
             "Mixture (<length(w)>): [<w:{<it>};separator={, }>]"
        ;

mixItem : rule -> template(w={$text}) "Rule:\"<w>\""
        | rewrite -> template(w={$text}) "Rewrite:<w>"
        | TOKEN -> template(w={$text}) "Token:\"<w>\""
        ;

// Build list manually; possible to avoid creating hackMixItem?
//   My 2 mixture -> Hacky mixture (3): ["MY", 2, mixture]
hackMix returns [List w = new ArrayList()]: (hackMixItem { $w.add($hackMixItem.text); })+
          -> template(w={$w})
             "Hacky mixture (<length(w)>): [<w:{<it>};separator={, }>]"
        ;

hackMixItem : rule | rewrite | TOKEN ;

// Doesn't work; no idea whether anything like it could.
//   version 3 MIXTURE -> Ideal mix (0): []
// I know ANTLR stops you from mixing tokens and rules in one list anyway.
idealMix : (w+=(rule | rewrite | TOKEN))+
          -> template(w={$w})
             "Ideal mix (<length(w)>): [<w:{<it>};separator={, }>]"
        ;

WS      : (' '|'\t'|'\n')+ {$channel=HIDDEN;} ;


More information about the antlr-interest mailing list