[antlr-interest] Help needed upgrading java.g to support Generics

Matt Quail matt at cortexebusiness.com.au
Wed Mar 12 19:20:06 PST 2003


Hi all,

I'm trying to update the java.g grammar with support for Generics (as defined 
by JSR14, grab the pdf spec at 
http://www.jcp.org/aboutJava/communityprocess/review/jsr014/index.html ). My 
intent is to upgrade the grammar and submit a patch back to the "offical" 
java.g; so any help will hopefully help us all.

The MAJOR problem is that JDK1.5 will allow this:

List<List<String>> x = ...;
                 ^^
The problem is that the lexer will match ">>" as a shift-right token, but we 
really want to parse it as two GT tokens in this context. The JSR pdf has a BNF 
grammar that solves this problem, at it is that pattern that I am trying to 
implement in ANTLR. (A re-cap of this trick is given at the end of the email.)

(Note that there is also a problem lexing ">>>", but lets just confine 
ourselves to ">>" for the moment.)

Okay, after a few false starts, I've come up with the following grammar (note 
that it is not the full JavaRecogniser parser, just enough to parse a SEMICOLON 
seperated list of types) (it uses the standard JavaLexer):

--------
compilationUnit
	:
         ( type SEMI ) *
		EOF!
	;

type
	:	referenceType
	|	builtInType (arrayDecl)?
	;

referenceType:
         identifier
         (  arrayDecl
         |  LT referenceTypeList1
         )?
     ;

referenceTypeList1:
         (referenceType1)=> referenceType1
     |
         (options{greedy=false;}: referenceType COMMA)+
         referenceType1
     ;

referenceType1:
         (referenceType GT)=> referenceType GT
     |
         identifier LT referenceTypeList2
     ;

referenceTypeList2 :
         (referenceType2)=> referenceType2
     |
         (options{greedy=false;}: referenceType COMMA)+
         referenceType2
     ;

referenceType2:
         referenceType SR
     ;

arrayDecl:
         (LBRACK RBRACK)+
     ;
// The primitive types.
builtInType
	:	"void"
	|	"boolean"
	|	"byte"
	|	"char"
	|	"short"
	|	"int"
	|	"float"
	|	"long"
	|	"double"
	;

identifier
	:	IDENT ( DOT^ IDENT)*
	;
--------

This grammar will sucessfully parse these constructs:
--------
String;
java.lang.String;
int;
float;
int[];
String[];
float[][][];
List<String>;
List<String[]>;
List<List<String[]> >;
List<List<String[]>>;

Map<String,Integer>;
Map<String,List<Integer> >;
Map<String,List<Integer>>;
Map<List<Integer>,String>;
Map<List<Integer>,List<String>>;

Map3<String,Integer,Float>;

Map<Map<String,String>,Map3<String,Integer,Float>>;
Map<List<String>,List<Integer>>;
--------

But it will not parse these:
Map3<List<String>,List<Integer>,List<Float>>;
Map3<String,List<Integer>,Float>;

The errors are:
G1.java:20:18: unexpected token: Integer
and
G1.java:24:24: unexpected token: Integer

Now, I can see why this is happening, it is caused by my non-greedy rules in 
referenceTypeList1 and referenceTypeList2. But I need them to be non-greedy (in 
some fashion), because I don't want them to match the last "referenceType" that 
  preceeds the next GT or SR token.

(Making them both greedy means that it matches too many times...)

I'm starting to get to the limits of my understanding of ANTLR... I started 
thinking it was a look-ahead problem... but it really requires "lots" of 
lookahead, that's why I have those syntactic predicates everywhere).

Any help will be greatly appreciated! Have I gone down the wrong track?

=Matt

PS: The 'trick' JSR14 uses to parse ">>" and ">>>":
The 'naive' grammar for parameterized type declarations (using the notation 
used in the JLS) is:

ReferenceType ::= ClassOrInterfaceType
                 | ArrayType
                 | TypeVariable

TypeVariable ::= Identifier

ClassOrInterfaceType ::= ClassOrInterface TypeArgumentsOpt

ClassOrInterface ::= Identifier
                    | ClassOrInterfaceType . Identifier

TypeArguments ::= < ReferenceTypeList >

ReferenceTypeList ::= ReferenceType
                     | ReferenceTypeList , ReferenceType


The "trick" is as folows (copied verbatim from the JSR14 spec)

ReferenceType ::= ClassOrInterfaceType
                 | ArrayType
                 | TypeVariable

ClassOrInterfaceType ::= Name
                        | Name < ReferenceTypeList1

ReferenceTypeList1 ::= ReferenceType1
                      | ReferenceTypeList , ReferenceType1

ReferenceType1 ::= ReferenceType >
                  | Name < ReferenceTypeList2

ReferenceTypeList2 ::= ReferenceType2
                      | ReferenceTypeList , ReferenceType2

ReferenceType2 ::= ReferenceType >>
                  | Name < ReferenceTypeList3

ReferenceTypeList3 ::= ReferenceType3
                      | ReferenceTypeList , ReferenceType3

ReferenceType3 ::= ReferenceType >>>



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list