[antlr-interest] which is better?

Thu Sep 23 19:18:03 PDT 2010

Is it better to have the fragment rule match the token string, or have a rule that matches the token string and just use the fragment as the token type? By better I mean in terms of code size and performance. 

For example, say I have fragments FOO and BAR that match 'foo' and 'bar' (regardless of case), and then I use these in a rule FOO_OR_BAR.

fragment A : 'a' | 'A';
fragment B : 'b' | 'B';
... you get the picture ...
fragment Z : 'z' | 'Z';

fragment FOO : F O O;
fragment BAR : B A R;

FOO_OR_BAR
	: FOO { $type = FOO }
	| BAR { $type = BAR }
	;

or this?

fragment A : 'a' | 'A';
fragment B : 'b' | 'B';
... you get the picture ...
fragment Z : 'z' | 'Z';

fragment FOO : ;
fragment BAR : ;

FOO_OR_BAR
	: F O O { $type = FOO }
	| B A R { $type = BAR }
	;

It is necessary for me to to this 'fragmented' approach. Due to conflicts in the grammar, FOO and BAR cannot themselves be tokens. Well, I guess I could resolve the conflicts with syntactic predicates, but I have a large number of these and wound up with a 'code segment too large' problem. 

I'm thinking the first way is better.

--
David Grieve
603-312-1013
david.grieve at oracle.com