[antlr-interest] Token parsing speed

Thu Feb 24 10:12:14 PST 2011

On Thu, Feb 24, 2011 at 3:25 AM, Richard Druce <contactdick at gmail.com> wrote:
> I have a question on the general best practice and speed between of
> using  tokens vs rules to construct parts of the grammar. Our language
> has many phrases that share words, a simplified sample being 'first'
> and 'first second'.  Would I be better off putting them in as tokens
> or rules from a speed of parsing perspective. Some of my tokens also
> contain whitespace.
>
> i.e
> rule1: FIRST;
>
> rule2: FIRST WS SECOND;
>
> FIRST: 'first';
> SECOND: 'second'
>
> WS: ' ';
>
> or
>
> rule1: RULE1;
>
> rule2: RULE2;
>
> RULE1: 'first second';
>
> RULE2: 'first';
>
> Thanks,
>
> Richard

I'm no expert on language design, but this sounds like way premature
optimization to me.  Why not focus on what the language purpose is,
ensure that the grammar is effective and efficient and capturing it,
and then revisit it about speed of tokenization?  In general, ANTLR is
pretty darn speedy, and if you are doing much complex with the
language the tokenization is unlikely to be the slow part.