[antlr-interest] Feedback request from veterans: thin vs thick lexer in grammars

Thu Jan 12 04:47:27 PST 2012

Greetings,
Having spent a few weeks on Anltr and still trying to establish the basics,
I've realized two things: I am a bit thick, and there is not an established
best practice in assigning key tasks to parser and lexer.
You can't help about the first one, but your input regarding the second
would be much appreciated. If there is a wiki page that discusses the
issue, I'd like to know about that.

I've looked at the C grammar (from Terence) for example, which uses a very
thin lexer section, with all keywords etc expressed as parser rules. Oracle
sql grammar does the same, most keywords are parser rules.
MySql Grammar and Sql 2003 grammar follow the other route; they both make
keywords tokens, and handle the obvious outcome of this choice: specific
tokens being recognized in the middle of other literals, as in 'select'
becoming a token in selected_vars.
I can see that Jim Idle has answered relevant questions in the past,
suggesting that lexer rules are used for tokens. He has also given the
solution to handling the 'select' in select_vars problem.

I do not mean to start a flame war (if that is possible at all), but with
different grammars following different methods, I'd like to hear from the
community regarding their experience. I am in the process of porting a
grammar that has originated from a LR parser framework, and I have a few
more grammars to develop for my PhD work. Making right choices now is
critical.

Kind regards
Seref