[antlr-interest] ANTLR version 2.X to ANTLR version 3.X (the horror, the horror)

Fri Aug 8 04:42:19 PDT 2008

At 14:41 8/08/2008, Ian Kaplan wrote:
>   My 2.X code has syntax like this:
>
>       t:TOKEN   (for example t:LPAREN)
>
>   I then reference t.getFile(), t.getLine() and t.getColumn() in 
> my Java code.  I have not figured out yet how to do this in 
> 3.X.  I'd be grateful for any pointers.

The token labels are now all done using = instead of : (IIRC you 
could use either in v2, but : was more common for some reason).

As for the values themselves:
   - t.getFile() : no equivalent AFAIK.  You'll have to track this 
yourself.
   - t.getLine() => $t.line
   - t.getColumn() => $t.pos
(see: 
http://www.antlr.org/wiki/display/ANTLR3/Attribute+and+Dynamic+Scopes)

You might find it useful to look through the examples in the wiki:
   http://www.antlr.org/wiki/display/ANTLR3/Examples
(As well as the downloadable ones, of course.)

>   My 2.X code also had grammar like
>
>   tokens {
>      ADJACENCY = "adjacency";
>     PATTERN = "pattern";
>   }
[...]
>    I actually found what seemed to be 3.X examples using the 
> above tokens syntax.  However, it doesn't seem to work.  The 
> proper form seems to be:
>
>   tokens {
>       ADJACENCY : 'adjacency';
>      PATTERN : 'pattern';
>   }

Actually you were right the first time (except you do need single 
quotes).  However, ANTLR is a bit sensitive as to where you put it 
-- it has to appear after the options block (if present) but 
before any rules and before any @something blocks.  Also, for some 
weird reason you get errors if you use it in a lexer-only grammar; 
to work around that you'll need to define regular rules instead 
(which is exactly like your second example except removing the 
surrounding 'tokens { }' bit).

>   These are reserved words in the query language.  I really 
> don't like the habit in the example code of using quoted strings 
> like 'adjacency' in the grammar rules.

Good.  You hang on to that dislike -- it will serve you well.

>   As noted in the 2.X to 3.X documentation, there's no built in 
> way to create case insensitivity without overriding the scanner 
> input stream.

http://www.antlr.org/wiki/pages/viewpage.action?pageId=1782

>   The good news is that there's documentation, but for some 
> reason with ANTLR there never seems to be enough documentation 
> to make the initial learning curve anything but painful.

Yep.  But there's all these helpful people around :)  (And hey, I 
manage to get by without even reading the ANTLR book, so it can't 
be all that bad.)

>   I noticed that the person who maintains the 2.X C++ grammar is 
> looking for someone to take it over since they don't want to 
> deal with the conversion to ANTLR 3.X.  I can't say I blame 
> them.   My grammar is a lot smaller and it's going to be at 
> least a two day slog with a fair amount of frustration.

Jim Idle has mentioned that he plans to build a C++ target at some 
point soon.  Although it's hard to say whether it's going to be a 
separate library or whether it will simply wrap the existing C 
runtime.  (He's announced both as possibilities at various points, 
IIRC.)

>   In addition to the fact that the 2.X grammar is obsolete, I'm 
> doing the conversion because I am hoping that the LL(*) will 
> avoid left factoring my grammar into a less clear form.  I hope 
> that I am not disappointed.

You might be.  Left factoring is critically important -- perhaps 
even more so in v3 than it was in v2.  The v3 lexer is much weaker 
(in my opinion) and needs more hand-holding than the v2 one did, 
despite its newfound Unicode support.

Having said that, for the most part the v3 grammars that I've seen 
just seem a bit "tidier" than their v2 counterparts.  But that's 
just a subjective thing :)