[antlr-interest] Template rewriting: processing lists of things (possible bug?)
Conrad Hughes
antlr at xrad.org
Fri Mar 4 03:28:18 PST 2011
Dear all,
I'm using ANTLR 3.3. In a template rewriting grammar, structures like
rule: l+=otherRule+
create a list of nulls in $l if otherRule isn't a template rewrite.
This seems undesirable behaviour since I'd've thought that unrewritten
rules should just appear in $l as their original text? Am I doing
something wrong, is it correct behaviour (in which case, is there an
explanation as to why it's correct please?) or is it a bug?
Below is a grammar playing around with this; to clearly separate
behaviour, I treat lower case words as tokens, rewrite upper case words
and don't rewrite numbers. I then play with collections of all three.
My sample input is:
hi there
123 456
THIRD LIST
; mixture 07 THINGS
; MY 2 mixture
; version 3 MIXTURE
After the three "pure" collections, I try mixing them up. I've come up
with two ways of rewriting a list containing a mixture of rules,
rewrites and tokens (I know ANTLR refuses to mix tokens and rules in its
own lists):
- one (mix) is to insert a new rule (mixItem) which turns everything
in the mixture into a template (which seems like a lot of work ---
is there a templating shorthand for rewriting a rule to its
identical self?);
- another (hackMix) is to manually build the list using rule.text to
work around the rules-turn-into-nulls problem.
- I also sketch an "idealMix" rule which doesn't work.
Is there a version of idealMix which does work and is less messy than
mix or hackMix please?
Any suggestions gratefully received...
Conrad
grammar PlusEquals;
options {
output = template;
rewrite = true;
}
file : tokeNs rules rewrites ';' mix ';' hackMix ';' idealMix ;
// To collect some tokens, you need <it.text>. input -> output:
// hi there -> Tokens (2): ["hi", "there"]
tokeNs : (w+=TOKEN)+
-> template(w={$w})
"Tokens (<length(w)>): [<w:{\"<it.text>\"};separator={, }>]"
;
TOKEN : ('a'..'z')+ ;
// You can't collect rules: $w contains a list of nulls. input -> output:
// 123 456 -> Rules (2): []
rules : (w+=rule)+
-> template(w={$w})
"Rules (<length(w)>): [<w:{\"<it>\"};separator={, }>]"
;
rule : NUMBER ;
NUMBER : ('0'..'9')+ ;
// To collect rewrites, you just use <it> directly. input -> output:
// THIRD LIST -> Rewrites (2): ["THIRD", "LIST"]
rewrites: (w+=rewrite)+
-> template(w={$w})
"Rewrites (<length(w)>): [<w:{<it>};separator={, }>]"
;
rewrite : CAPS -> template(w={$text}) "\"<w>\"" ;
CAPS : ('A'..'Z')+ ;
// Because rules don't seem to accumulate right, turn everything into rewrites:
// mixture 07 THINGS ->
// Mixture (3): [Token:"mixture", Rule:"07", Rewrite:"THINGS"]
mix : (w+=mixItem)+
-> template(w={$w})
"Mixture (<length(w)>): [<w:{<it>};separator={, }>]"
;
mixItem : rule -> template(w={$text}) "Rule:\"<w>\""
| rewrite -> template(w={$text}) "Rewrite:<w>"
| TOKEN -> template(w={$text}) "Token:\"<w>\""
;
// Build list manually; possible to avoid creating hackMixItem?
// My 2 mixture -> Hacky mixture (3): ["MY", 2, mixture]
hackMix returns [List w = new ArrayList()]: (hackMixItem { $w.add($hackMixItem.text); })+
-> template(w={$w})
"Hacky mixture (<length(w)>): [<w:{<it>};separator={, }>]"
;
hackMixItem : rule | rewrite | TOKEN ;
// Doesn't work; no idea whether anything like it could.
// version 3 MIXTURE -> Ideal mix (0): []
// I know ANTLR stops you from mixing tokens and rules in one list anyway.
idealMix : (w+=(rule | rewrite | TOKEN))+
-> template(w={$w})
"Ideal mix (<length(w)>): [<w:{<it>};separator={, }>]"
;
WS : (' '|'\t'|'\n')+ {$channel=HIDDEN;} ;
More information about the antlr-interest
mailing list