[antlr-interest] ANTLR version 2.X to ANTLR version 3.X (the horror, the horror)
Gavin Lambert
antlr at mirality.co.nz
Fri Aug 8 04:42:19 PDT 2008
At 14:41 8/08/2008, Ian Kaplan wrote:
> My 2.X code has syntax like this:
>
> t:TOKEN (for example t:LPAREN)
>
> I then reference t.getFile(), t.getLine() and t.getColumn() in
> my Java code. I have not figured out yet how to do this in
> 3.X. I'd be grateful for any pointers.
The token labels are now all done using = instead of : (IIRC you
could use either in v2, but : was more common for some reason).
As for the values themselves:
- t.getFile() : no equivalent AFAIK. You'll have to track this
yourself.
- t.getLine() => $t.line
- t.getColumn() => $t.pos
(see:
http://www.antlr.org/wiki/display/ANTLR3/Attribute+and+Dynamic+Scopes)
You might find it useful to look through the examples in the wiki:
http://www.antlr.org/wiki/display/ANTLR3/Examples
(As well as the downloadable ones, of course.)
> My 2.X code also had grammar like
>
> tokens {
> ADJACENCY = "adjacency";
> PATTERN = "pattern";
> }
[...]
> I actually found what seemed to be 3.X examples using the
> above tokens syntax. However, it doesn't seem to work. The
> proper form seems to be:
>
> tokens {
> ADJACENCY : 'adjacency';
> PATTERN : 'pattern';
> }
Actually you were right the first time (except you do need single
quotes). However, ANTLR is a bit sensitive as to where you put it
-- it has to appear after the options block (if present) but
before any rules and before any @something blocks. Also, for some
weird reason you get errors if you use it in a lexer-only grammar;
to work around that you'll need to define regular rules instead
(which is exactly like your second example except removing the
surrounding 'tokens { }' bit).
> These are reserved words in the query language. I really
> don't like the habit in the example code of using quoted strings
> like 'adjacency' in the grammar rules.
Good. You hang on to that dislike -- it will serve you well.
> As noted in the 2.X to 3.X documentation, there's no built in
> way to create case insensitivity without overriding the scanner
> input stream.
http://www.antlr.org/wiki/pages/viewpage.action?pageId=1782
> The good news is that there's documentation, but for some
> reason with ANTLR there never seems to be enough documentation
> to make the initial learning curve anything but painful.
Yep. But there's all these helpful people around :) (And hey, I
manage to get by without even reading the ANTLR book, so it can't
be all that bad.)
> I noticed that the person who maintains the 2.X C++ grammar is
> looking for someone to take it over since they don't want to
> deal with the conversion to ANTLR 3.X. I can't say I blame
> them. My grammar is a lot smaller and it's going to be at
> least a two day slog with a fair amount of frustration.
Jim Idle has mentioned that he plans to build a C++ target at some
point soon. Although it's hard to say whether it's going to be a
separate library or whether it will simply wrap the existing C
runtime. (He's announced both as possibilities at various points,
IIRC.)
> In addition to the fact that the 2.X grammar is obsolete, I'm
> doing the conversion because I am hoping that the LL(*) will
> avoid left factoring my grammar into a less clear form. I hope
> that I am not disappointed.
You might be. Left factoring is critically important -- perhaps
even more so in v3 than it was in v2. The v3 lexer is much weaker
(in my opinion) and needs more hand-holding than the v2 one did,
despite its newfound Unicode support.
Having said that, for the most part the v3 grammars that I've seen
just seem a bit "tidier" than their v2 counterparts. But that's
just a subjective thing :)
More information about the antlr-interest
mailing list