[antlr-interest] Mismatched Character, expecting set null

Fri Aug 6 10:10:27 PDT 2010

On Fri, 2010-08-06 at 12:07 -0400, Kevin J. Cummings wrote:
> On 08/06/2010 07:33 AM, John B. Brodie wrote:
....stuff snipped....
> >> WS: (' ' | '\t' | '\n' | '\r' | '\f') {$channel = HIDDEN;};
> > 
> > this rule recognizes (and then ignores) just a single white-space
> > character. would be more efficient as
> > 
> > WS : ( ' ' | '\t' | '\n' | '\r' | '\f' )+ {$channel=HIDDEN;} ;
> 
> Maybe, but doesn't it ignore *every* single WS character?
> No need to use the + in that case (unless the performance benefit is
> significant).
> 

I have not measured it but I would expect a significant performance
difference.

I believe the WS rule is processed as follows: gather up the input text
matching the rule's right hand side; create a token containing that
text; and place the new token on the hidden channel.

so, for long runs of white-space, such as indentation, under the former
rule we would create a token for EVERY character while under the latter
we create just one --- significantly less overhead (i suspect, have not
measured).

Just my 2 cents, for what it is worth...
   -jbb