[antlr-interest] Help needed upgrading java.g to support Gene
rics
mzukowski at yci.com
mzukowski at yci.com
Thu Mar 13 13:58:56 PST 2003
If you are counting columns then you can enforce no spaces between >> for
the operator with a semantic predicate. Otherwise you could have a
different token based on what preceded it by maintaining some state in your
lexer. CGT if the > was preceded immediately by >. Just GT otherwise. I
haven't thought that one through. The operator would be GT CGT or GT CGT
CGT. A generic end token could be GT or CGT.
The semantic predicate is a good possible approach. You might need a way to
propagate the end matches up the parse stack. It depends on how nested all
the rules for declarations are, I haven't inspected it so I'm not sure. I'm
just thinking aloud here.
Try this for typeArgsEnd
typeArgsEnd:
( //matching zero doesn't make sense
GT {ltCount-=1;}
| SR {ltCount-=2;}
| BSR {ltCount-=3;}
)
// if there are more, match some more
{ltCount > 0}=> typeArgsEnd
;
You know, I don't think that will work as sketched below. It'll choke on
Map<List<Integer>,String>; because you aren't nesting your calls.
Play with it and report back.
Monty
-----Original Message-----
From: Matt Quail [mailto:matt at cortexebusiness.com.au]
Sent: Thursday, March 13, 2003 1:45 PM
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] Help needed upgrading java.g to support
Generics
Monty,
Thanks Monty! That has definitely given me something to think about. I will
try
what you suggest, and remove the ">>", etc. tokens and parser them as GT GT
instead.
So we may have a parser rule:
sr: GT GT;
The one issue with this is that it will allow WS between the two ">"
characters
in the ">>" operator (which Java does not allow). I might have a play with
this
approach, in any case. I may be able to solve this problem by changing WS
from
"skip" tokens to a {option ignore=WS;}. Will need to think some more on that
one; any ideas?
The other idea I was tinkering with last night was to leave SR as is, and
have
some rule like this for matching the end of a "double-nested" template:
.... (GT GT | SR)
Then for "triple-nested" we might have something like
.... (GT GT GT | SR GT | GT SR | BSR)
But I'm not sure what the "...." would be :) Maybe I need to use some
semantic
predicates and actually count the number of ">" I need to match. Something
like
this:
typeArgs: typeArgsBody typeArgsEnd;
typeArgsBody:
LT {ltCount++;}
ReferenceType
(typeArsgBody)?
;
typeArgsEnd:
( // match 0,1,2 or 3 '>'
{ltCount == 0}=>
| {ltCount == 1}=> GT {ltCount-=1;}
| {ltCount == 2}=> (GT GT | SR) {ltCount-=2;}
| {ltCount == 3}=>
(GT GT GT | SR GT | GT SR | BSR) {ltCount-=3;}
)
// if there are more, match some more
{ltCount > 0}=> typeArgsEnd
;
(Hmmm... it is ugly to have to use a semantic predicate... but this may be a
"quick win".)
I will try your suggestion and my idea above and report back to this list.
=Matt
mzukowski at yci.com wrote:
> I'm not sure that's the best approach. I haven't thought it through but
it
> seems like it would work in the LR world but not in the LL world. I would
> suggest trying this instead:
>
> 1. Eliminate ">>", ">>=", ">>>", and ">>>=" as tokens, make them all ">".
> Then make parser rules sr: ">" ">" and zr:">" ">" ">". Modify grammar to
> use grammar rules instead of the tokens for those operators.
>
> 2. Compile, inspect and test. Syntactic predicates may be necessary and
may
> need to be manually hoisted.
>
> 3. If that works then add in your generic stuff and test it out. Only use
> ">" for your generics, don't use sr or zr.
>
> 4. There might be a better approach than this. Can generics be
initialized?
> Then you have to worry about ">>=" as well.
>
> Email me privately if you would like to discuss this over the phone.
>
> Monty
>
> -----Original Message-----
> From: Matt Quail [mailto:matt at cortexebusiness.com.au]
> Sent: Wednesday, March 12, 2003 7:20 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Help needed upgrading java.g to support
> Generics
>
>
> Hi all,
>
> I'm trying to update the java.g grammar with support for Generics (as
> defined
> by JSR14, grab the pdf spec at
> http://www.jcp.org/aboutJava/communityprocess/review/jsr014/index.html ).
My
>
> intent is to upgrade the grammar and submit a patch back to the "offical"
> java.g; so any help will hopefully help us all.
>
> The MAJOR problem is that JDK1.5 will allow this:
>
> List<List<String>> x = ...;
> ^^
> The problem is that the lexer will match ">>" as a shift-right token, but
we
>
> really want to parse it as two GT tokens in this context. The JSR pdf has
a
> BNF
> grammar that solves this problem, at it is that pattern that I am trying
to
> implement in ANTLR. (A re-cap of this trick is given at the end of the
> email.)
>
> (Note that there is also a problem lexing ">>>", but lets just confine
> ourselves to ">>" for the moment.)
>
> Okay, after a few false starts, I've come up with the following grammar
> (note
> that it is not the full JavaRecogniser parser, just enough to parse a
> SEMICOLON
> seperated list of types) (it uses the standard JavaLexer):
>
> --------
> compilationUnit
> :
> ( type SEMI ) *
> EOF!
> ;
>
> type
> : referenceType
> | builtInType (arrayDecl)?
> ;
>
> referenceType:
> identifier
> ( arrayDecl
> | LT referenceTypeList1
> )?
> ;
>
> referenceTypeList1:
> (referenceType1)=> referenceType1
> |
> (options{greedy=false;}: referenceType COMMA)+
> referenceType1
> ;
>
> referenceType1:
> (referenceType GT)=> referenceType GT
> |
> identifier LT referenceTypeList2
> ;
>
> referenceTypeList2 :
> (referenceType2)=> referenceType2
> |
> (options{greedy=false;}: referenceType COMMA)+
> referenceType2
> ;
>
> referenceType2:
> referenceType SR
> ;
>
> arrayDecl:
> (LBRACK RBRACK)+
> ;
> // The primitive types.
> builtInType
> : "void"
> | "boolean"
> | "byte"
> | "char"
> | "short"
> | "int"
> | "float"
> | "long"
> | "double"
> ;
>
> identifier
> : IDENT ( DOT^ IDENT)*
> ;
> --------
>
> This grammar will sucessfully parse these constructs:
> --------
> String;
> java.lang.String;
> int;
> float;
> int[];
> String[];
> float[][][];
> List<String>;
> List<String[]>;
> List<List<String[]> >;
> List<List<String[]>>;
>
> Map<String,Integer>;
> Map<String,List<Integer> >;
> Map<String,List<Integer>>;
> Map<List<Integer>,String>;
> Map<List<Integer>,List<String>>;
>
> Map3<String,Integer,Float>;
>
> Map<Map<String,String>,Map3<String,Integer,Float>>;
> Map<List<String>,List<Integer>>;
> --------
>
> But it will not parse these:
> Map3<List<String>,List<Integer>,List<Float>>;
> Map3<String,List<Integer>,Float>;
>
> The errors are:
> G1.java:20:18: unexpected token: Integer
> and
> G1.java:24:24: unexpected token: Integer
>
> Now, I can see why this is happening, it is caused by my non-greedy rules
in
>
> referenceTypeList1 and referenceTypeList2. But I need them to be
non-greedy
> (in
> some fashion), because I don't want them to match the last "referenceType"
> that
> preceeds the next GT or SR token.
>
> (Making them both greedy means that it matches too many times...)
>
> I'm starting to get to the limits of my understanding of ANTLR... I
started
> thinking it was a look-ahead problem... but it really requires "lots" of
> lookahead, that's why I have those syntactic predicates everywhere).
>
> Any help will be greatly appreciated! Have I gone down the wrong track?
>
> =Matt
>
> PS: The 'trick' JSR14 uses to parse ">>" and ">>>":
> The 'naive' grammar for parameterized type declarations (using the
notation
> used in the JLS) is:
>
> ReferenceType ::= ClassOrInterfaceType
> | ArrayType
> | TypeVariable
>
> TypeVariable ::= Identifier
>
> ClassOrInterfaceType ::= ClassOrInterface TypeArgumentsOpt
>
> ClassOrInterface ::= Identifier
> | ClassOrInterfaceType . Identifier
>
> TypeArguments ::= < ReferenceTypeList >
>
> ReferenceTypeList ::= ReferenceType
> | ReferenceTypeList , ReferenceType
>
>
> The "trick" is as folows (copied verbatim from the JSR14 spec)
>
> ReferenceType ::= ClassOrInterfaceType
> | ArrayType
> | TypeVariable
>
> ClassOrInterfaceType ::= Name
> | Name < ReferenceTypeList1
>
> ReferenceTypeList1 ::= ReferenceType1
> | ReferenceTypeList , ReferenceType1
>
> ReferenceType1 ::= ReferenceType >
> | Name < ReferenceTypeList2
>
> ReferenceTypeList2 ::= ReferenceType2
> | ReferenceTypeList , ReferenceType2
>
> ReferenceType2 ::= ReferenceType >>
> | Name < ReferenceTypeList3
>
> ReferenceTypeList3 ::= ReferenceType3
> | ReferenceTypeList , ReferenceType3
>
> ReferenceType3 ::= ReferenceType >>>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list