[antlr-interest] Java Grammar

Simon cocoa at gmx.ch
Sun Nov 23 11:11:09 PST 2008


Hi

I'm aware that it is not possible to do this without context. That's  
why I keep symbol tables, all imports, ... at hand to handle those  
situations. Using semantic predicates it should be possible to build a  
meaningful AST (that is an AST with subtree nodes representing  
operations such as invoke method, invoke static method, field  
access, ...).

But I've just realized that without access to the referenced classes  
(source or binary form) it is still not possible to do that.

  a.b.c.D.X.z()

Where does the class name end and the field access start? There is  
simply no way to answer that question without access to the classes.  
Let's say there is class named a.b.c.D, then I know that X must be a  
static field in that class with a method named z. With that knowledge  
it should be possible to write semantic predicates that allow to  
properly build a meaningful AST (I want to use the AST building  
functionality of ANTLR; I'm free to build whatever AST form I like,  
there is no existing AST for what I want to do).

As I'm pretty new to ANTLR, I'm just wondering whether this is the  
"correct" way to approach the problem.

Simon

PS: the language I try an AST for is not Java, but it is close enough  
to use the Java grammar as my starting point

On Nov 23, 2008, at 19:25 , Yang Jiang wrote:

> Hi Simon,
>
> I guess that would not be possible. Consider the following two cases,
> class A {
>   static b b;
> }
>
> class b {
>   static String bField;
>   public static void main(String[] args) {
>      A.b.bField ="s";                                    // "A" is  
> the type, "b.bField" field access
>   }
> }
> ----------------------------------------------------
>
> package A;
>
> class b {
>   static int bField;
>   public static void main(String[] args) {
>       A.b.bField = 0;                             // "A.b" is the  
> type, "bField" field access
>   }
> }
>
> So it's not possible to identify the type here without knowing the  
> context. (Although it's possible to do so in other constructs like  
> import declaration).

> The way javac parses these is that each '.' is parsed as a select,  
> then if you see a '(', it's a method call, or a '[' for array access  
> etc...
> In fact, the grammar you are using- the one on ANTLR website, is  
> derived from JLS chapter 18 (http://java.sun.com/docs/books/jls/third_edition/html/syntax.html 
> )
> which actually says
>   The grammar presented piecemeal in the preceding chapters is much  
> better for exposition, but it is not well suited as a basis for a  
> parser.The grammar presented in this chapter is the basis for the  
> reference implementation.
>
> So, this grammar is actually a representation how Sun guys build  
> their parser, so it might not be suitable for your job.
> Say rules like this
>
> classOrInterfaceDeclaration
>   :   classOrInterfaceModifiers (classDeclaration |  
> interfaceDeclaration)
>   ;
>
>
> doesn't take advantage of ANTLR's look ahead features, it's like  
> this because the hand written javac parser can't see passing  
> modifiers.
>
> You can try to use the Compiler Grammar copy of Java.g, which is a  
> little more tuned for ANTLR and better tested.
>
> Yang
>
>
>
>
>
>
>
>
> Johannes Luber wrote:
>> Simon schrieb:
>>
>>> I try to rephrase my question. I don't know how to handle some of  
>>> the  primary constructs of the Java grammar.
>>>
>>>  Integer.parseInt("123")
>>>  x.y(a, b)
>>>  x[12][34]
>>>  String.class
>>>  java.util.Arrays.class
>>>
>>> are all pretty simple to detect with a symbol table and the   
>>> information from the imports. But how do I handle qualified type   
>>> names, such as the one in
>>>
>>>  java.util.Arrays.asList("1", "2")
>>>
>>> Conceptually, I need something like the following:
>>>
>>>  primary
>>>      :    { isType(Identifier ('.' Identifier)*) } Identifier  
>>> ('.'  Identifier)* ...
>>>      ;
>>>
>>> That is, I have to stop as soon as I have a type name   
>>> (the .asList("1", "2") part should be parsed as selector).
>>>
>>> This combination of semantic and syntactic predicate does not  
>>> exist  out of the box. I could write a semantic predicate. But, is  
>>> there an  easier way?
>>>
>>> How would you write your parser to detect qualified type name   
>>> constructs? Any help is appreciated!
>>>
>>> Simon
>>>
>>
>> On the download page there are example grammars. One of them is a  
>> Java
>> grammar without the heavy interaction with javac. A similar  
>> language to
>> Java is C#. The official grammar for it is available on
>> <http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-334.pdf 
>> >.
>>
>> Johannes
>>
>>> On Nov 21, 2008, at 21:56 , Simon wrote:
>>>
>>>
>>>> hi
>>>>
>>>> I'm trying to build an AST for a Java like language. The hardest  
>>>> part
>>>> (if you want to built a meaningful AST) is the section of
>>>> unaryExpressionNotPlusMinus (see grammar fragments at end or the
>>>> Java.g grammar on antlr.org).
>>>>
>>>> I have successfully built ASTs for the following constructs (using
>>>> semantic predicates based on a symbol table)
>>>>
>>>>  ^(FIELD_ACCESS target Identifier)
>>>>  ^(INVOKE target Identifier arguments)
>>>>  ^(ARRAY_ACCESS target expr)
>>>>
>>>> However, I'm struggling with fully qualified type names, such as  
>>>> those
>>>> in
>>>>
>>>>  java.lang.Integer.parseInt("123")
>>>>
>>>> Of course, I want something like
>>>>
>>>>  ^(INVOKE ^(TYPE_REFERENCE ...) arguments)
>>>>
>>>> The problem is that I somehow have to look ahead to detect  
>>>> whether it
>>>> is a qualified type name (don't know how the precedence is if  
>>>> there is
>>>> a variable named java with a field named lang that has a field  
>>>> named
>>>> Integer that has method named parseInt, but that's another  
>>>> problem). I
>>>> could write my own semantic predicate method that looks ahead in  
>>>> the
>>>> input to detect a qualified type name. Is there an easier way to do
>>>> that? Or am I approaching the problem from the wrong side?
>>>>
>>>> I've tried to look at the Java grammar from langtools recently  
>>>> posted
>>>> in this list, but didn't get any smarter (they rely heavily on the
>>>> existing javac classes).
>>>>
>>>> Thanks
>>>> Simon
>>>>
>>>>
>>>>
>>>> unaryExpressionNotPlusMinus
>>>>    :   ...
>>>>    |   primary selector* ('++'|'--')?
>>>>    ;
>>>>
>>>> primary
>>>>    :   parExpression
>>>>    |   literal
>>>>    |   'new' creator
>>>>    |   Identifier ('.' Identifier)* identifierSuffix?   // this is
>>>> the hard / interesting part
>>>>    |   primitiveType ('[' ']')* '.' 'class'
>>>>    |   'void' '.' 'class'
>>>>    ;
>>>>
>>>> identifierSuffix
>>>>    :   ('[' ']')+ '.' 'class'
>>>>    |   ('[' expression ']')+ // can also be matched by selector,  
>>>> but
>>>> do here
>>>>    |   arguments
>>>>    |   '.' 'class'
>>>>    ;
>>>>
>>>> selector
>>>>    :   '.' Identifier arguments?
>>>>    |   '[' expression ']'
>>>>    ;
>>>>
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>



More information about the antlr-interest mailing list