[antlr-interest] Feedback request from veterans: thin vs thick lexer in grammars

Ivan Brezina ibre5041 at ibrezina.net
Thu Jan 12 07:10:53 PST 2012


Hi,
it's questionable what goals you have when you develop a grammar.
Of course first of all it must valid.
On the other hand you can have other criteria like:
The grammar compiles in reasonable time. Especially when you have tens  
of failed
test and you intend to fix them one by one.

Also you may require, that ANTLR has reasonable memory requirements,
and also you may encounter problems with static initializations of the Java
classes(64KB of bytecode limit).

So if you want to reduce the size of the parser you turn some lexer rules into
parsers ones and then switch to backtracking mode. (Especially when  
your grammar
allows keywords to be used as identifiers).

That's the reason why I used parser rules. I had to implement some "small"
bugfixes, whose handled some obscure language constructs. Each of  
these changes
doubled ANTLR memory requirements.
Ivan


Quoting Seref Arikan <serefarikan at kurumsalteknoloji.com>:

> Greetings,
> Having spent a few weeks on Anltr and still trying to establish the basics,
> I've realized two things: I am a bit thick, and there is not an established
> best practice in assigning key tasks to parser and lexer.
> You can't help about the first one, but your input regarding the second
> would be much appreciated. If there is a wiki page that discusses the
> issue, I'd like to know about that.
>
> I've looked at the C grammar (from Terence) for example, which uses a very
> thin lexer section, with all keywords etc expressed as parser rules. Oracle
> sql grammar does the same, most keywords are parser rules.
> MySql Grammar and Sql 2003 grammar follow the other route; they both make
> keywords tokens, and handle the obvious outcome of this choice: specific
> tokens being recognized in the middle of other literals, as in 'select'
> becoming a token in selected_vars.
> I can see that Jim Idle has answered relevant questions in the past,
> suggesting that lexer rules are used for tokens. He has also given the
> solution to handling the 'select' in select_vars problem.
>
> I do not mean to start a flame war (if that is possible at all), but with
> different grammars following different methods, I'd like to hear from the
> community regarding their experience. I am in the process of porting a
> grammar that has originated from a LR parser framework, and I have a few
> more grammars to develop for my PhD work. Making right choices now is
> critical.
>
> Kind regards
> Seref
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:   
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the antlr-interest mailing list