[antlr-interest] Rob Pike on writing a lexer in Go for a template language

Tue Aug 30 19:40:25 PDT 2011

Hey All,

Slightly off topic post, I thought there might be some interest.

Last night I went to a talk by Rob Pike of Google, you can watch the talk at
   https://www.youtube.com/watch?v=HxaD_trXwRE&feature=player_embedded

Before I went my thinking was that this could probably be knocked up
in ANTLR in a few minutes, but then ....
All the uncomfort I have with ANTLR lexering came back to me.
So I though I'd go to the source and have a look at the lexer for ST,
and low and behold ST's lexer is written by hand.
Now I'm feeling quite uncomfortable about ANTLR's lexing.

I think it basically comes down to the stateless nature of the ANTLR lexing.
Not the first time context-sensitive scanning has been mentioned on
the list (*).
Yes I know that it can be made statefull (*) and/or I can push more
onto the parser, but both of these have issues.
Statefull ANTLR lexing code I generally find more confusing and harder
to write then functionally equivalent code in a target language.
Pushing more into the parser in this particular case is inefficient as
there are large chunks of text that doesn't need to be tokenized and
there is the issue the whitespace tokens might need to behaving
differently in different places (hidden verse not).

Started off as an off topic post, ended as a rant about lexing regards
Gary
P.S. I've started on the ANTLR target for Go, still very immature.
https://github.com/millergarym/antlr/tree/,

* Scott Stanchfield's context-sensitive scanning
http://javadude.com/articles/antlr-context-sensitive-scanner.html

* a good example of this is Jim's numerical lexing for JavaFX
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs