[antlr-interest] Help needed upgrading java.g to support Generics
Matt Quail
matt at cortexebusiness.com.au
Thu Mar 13 13:45:10 PST 2003
Monty,
Thanks Monty! That has definitely given me something to think about. I will try
what you suggest, and remove the ">>", etc. tokens and parser them as GT GT
instead.
So we may have a parser rule:
sr: GT GT;
The one issue with this is that it will allow WS between the two ">" characters
in the ">>" operator (which Java does not allow). I might have a play with this
approach, in any case. I may be able to solve this problem by changing WS from
"skip" tokens to a {option ignore=WS;}. Will need to think some more on that
one; any ideas?
The other idea I was tinkering with last night was to leave SR as is, and have
some rule like this for matching the end of a "double-nested" template:
.... (GT GT | SR)
Then for "triple-nested" we might have something like
.... (GT GT GT | SR GT | GT SR | BSR)
But I'm not sure what the "...." would be :) Maybe I need to use some semantic
predicates and actually count the number of ">" I need to match. Something like
this:
typeArgs: typeArgsBody typeArgsEnd;
typeArgsBody:
LT {ltCount++;}
ReferenceType
(typeArsgBody)?
;
typeArgsEnd:
( // match 0,1,2 or 3 '>'
{ltCount == 0}=>
| {ltCount == 1}=> GT {ltCount-=1;}
| {ltCount == 2}=> (GT GT | SR) {ltCount-=2;}
| {ltCount == 3}=>
(GT GT GT | SR GT | GT SR | BSR) {ltCount-=3;}
)
// if there are more, match some more
{ltCount > 0}=> typeArgsEnd
;
(Hmmm... it is ugly to have to use a semantic predicate... but this may be a
"quick win".)
I will try your suggestion and my idea above and report back to this list.
=Matt
mzukowski at yci.com wrote:
> I'm not sure that's the best approach. I haven't thought it through but it
> seems like it would work in the LR world but not in the LL world. I would
> suggest trying this instead:
>
> 1. Eliminate ">>", ">>=", ">>>", and ">>>=" as tokens, make them all ">".
> Then make parser rules sr: ">" ">" and zr:">" ">" ">". Modify grammar to
> use grammar rules instead of the tokens for those operators.
>
> 2. Compile, inspect and test. Syntactic predicates may be necessary and may
> need to be manually hoisted.
>
> 3. If that works then add in your generic stuff and test it out. Only use
> ">" for your generics, don't use sr or zr.
>
> 4. There might be a better approach than this. Can generics be initialized?
> Then you have to worry about ">>=" as well.
>
> Email me privately if you would like to discuss this over the phone.
>
> Monty
>
> -----Original Message-----
> From: Matt Quail [mailto:matt at cortexebusiness.com.au]
> Sent: Wednesday, March 12, 2003 7:20 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Help needed upgrading java.g to support
> Generics
>
>
> Hi all,
>
> I'm trying to update the java.g grammar with support for Generics (as
> defined
> by JSR14, grab the pdf spec at
> http://www.jcp.org/aboutJava/communityprocess/review/jsr014/index.html ). My
>
> intent is to upgrade the grammar and submit a patch back to the "offical"
> java.g; so any help will hopefully help us all.
>
> The MAJOR problem is that JDK1.5 will allow this:
>
> List<List<String>> x = ...;
> ^^
> The problem is that the lexer will match ">>" as a shift-right token, but we
>
> really want to parse it as two GT tokens in this context. The JSR pdf has a
> BNF
> grammar that solves this problem, at it is that pattern that I am trying to
> implement in ANTLR. (A re-cap of this trick is given at the end of the
> email.)
>
> (Note that there is also a problem lexing ">>>", but lets just confine
> ourselves to ">>" for the moment.)
>
> Okay, after a few false starts, I've come up with the following grammar
> (note
> that it is not the full JavaRecogniser parser, just enough to parse a
> SEMICOLON
> seperated list of types) (it uses the standard JavaLexer):
>
> --------
> compilationUnit
> :
> ( type SEMI ) *
> EOF!
> ;
>
> type
> : referenceType
> | builtInType (arrayDecl)?
> ;
>
> referenceType:
> identifier
> ( arrayDecl
> | LT referenceTypeList1
> )?
> ;
>
> referenceTypeList1:
> (referenceType1)=> referenceType1
> |
> (options{greedy=false;}: referenceType COMMA)+
> referenceType1
> ;
>
> referenceType1:
> (referenceType GT)=> referenceType GT
> |
> identifier LT referenceTypeList2
> ;
>
> referenceTypeList2 :
> (referenceType2)=> referenceType2
> |
> (options{greedy=false;}: referenceType COMMA)+
> referenceType2
> ;
>
> referenceType2:
> referenceType SR
> ;
>
> arrayDecl:
> (LBRACK RBRACK)+
> ;
> // The primitive types.
> builtInType
> : "void"
> | "boolean"
> | "byte"
> | "char"
> | "short"
> | "int"
> | "float"
> | "long"
> | "double"
> ;
>
> identifier
> : IDENT ( DOT^ IDENT)*
> ;
> --------
>
> This grammar will sucessfully parse these constructs:
> --------
> String;
> java.lang.String;
> int;
> float;
> int[];
> String[];
> float[][][];
> List<String>;
> List<String[]>;
> List<List<String[]> >;
> List<List<String[]>>;
>
> Map<String,Integer>;
> Map<String,List<Integer> >;
> Map<String,List<Integer>>;
> Map<List<Integer>,String>;
> Map<List<Integer>,List<String>>;
>
> Map3<String,Integer,Float>;
>
> Map<Map<String,String>,Map3<String,Integer,Float>>;
> Map<List<String>,List<Integer>>;
> --------
>
> But it will not parse these:
> Map3<List<String>,List<Integer>,List<Float>>;
> Map3<String,List<Integer>,Float>;
>
> The errors are:
> G1.java:20:18: unexpected token: Integer
> and
> G1.java:24:24: unexpected token: Integer
>
> Now, I can see why this is happening, it is caused by my non-greedy rules in
>
> referenceTypeList1 and referenceTypeList2. But I need them to be non-greedy
> (in
> some fashion), because I don't want them to match the last "referenceType"
> that
> preceeds the next GT or SR token.
>
> (Making them both greedy means that it matches too many times...)
>
> I'm starting to get to the limits of my understanding of ANTLR... I started
> thinking it was a look-ahead problem... but it really requires "lots" of
> lookahead, that's why I have those syntactic predicates everywhere).
>
> Any help will be greatly appreciated! Have I gone down the wrong track?
>
> =Matt
>
> PS: The 'trick' JSR14 uses to parse ">>" and ">>>":
> The 'naive' grammar for parameterized type declarations (using the notation
> used in the JLS) is:
>
> ReferenceType ::= ClassOrInterfaceType
> | ArrayType
> | TypeVariable
>
> TypeVariable ::= Identifier
>
> ClassOrInterfaceType ::= ClassOrInterface TypeArgumentsOpt
>
> ClassOrInterface ::= Identifier
> | ClassOrInterfaceType . Identifier
>
> TypeArguments ::= < ReferenceTypeList >
>
> ReferenceTypeList ::= ReferenceType
> | ReferenceTypeList , ReferenceType
>
>
> The "trick" is as folows (copied verbatim from the JSR14 spec)
>
> ReferenceType ::= ClassOrInterfaceType
> | ArrayType
> | TypeVariable
>
> ClassOrInterfaceType ::= Name
> | Name < ReferenceTypeList1
>
> ReferenceTypeList1 ::= ReferenceType1
> | ReferenceTypeList , ReferenceType1
>
> ReferenceType1 ::= ReferenceType >
> | Name < ReferenceTypeList2
>
> ReferenceTypeList2 ::= ReferenceType2
> | ReferenceTypeList , ReferenceType2
>
> ReferenceType2 ::= ReferenceType >>
> | Name < ReferenceTypeList3
>
> ReferenceTypeList3 ::= ReferenceType3
> | ReferenceTypeList , ReferenceType3
>
> ReferenceType3 ::= ReferenceType >>>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list