[antlr-interest] Java Grammar

Yang Jiang yang.jiang.z at gmail.com
Sun Nov 23 10:25:56 PST 2008


Hi Simon,

I guess that would not be possible. Consider the following two cases,
class A {
    static b b;
}

class b {
    static String bField;
    public static void main(String[] args) {
       A.b.bField ="s";                                    // "A" is the 
type, "b.bField" field access
    }
}
----------------------------------------------------

package A;

class b {
    static int bField;
    public static void main(String[] args) {
        A.b.bField = 0;                             // "A.b" is the 
type, "bField" field access
    }
}

So it's not possible to identify the type here without knowing the 
context. (Although it's possible to do so in other constructs like 
import declaration).

The way javac parses these is that each '.' is parsed as a select, then 
if you see a '(', it's a method call, or a '[' for array access etc...
In fact, the grammar you are using- the one on ANTLR website, is derived 
from JLS chapter 18 
(http://java.sun.com/docs/books/jls/third_edition/html/syntax.html)
which actually says
    The grammar presented piecemeal in the preceding chapters is much 
better for exposition, but it is not well suited as a basis for a 
parser.The grammar presented in this chapter is the basis for the 
reference implementation.

So, this grammar is actually a representation how Sun guys build their 
parser, so it might not be suitable for your job.
Say rules like this

classOrInterfaceDeclaration
    :   classOrInterfaceModifiers (classDeclaration | interfaceDeclaration)
    ;


doesn't take advantage of ANTLR's look ahead features, it's like this 
because the hand written javac parser can't see passing modifiers.

You can try to use the Compiler Grammar copy of Java.g, which is a 
little more tuned for ANTLR and better tested.

Yang








Johannes Luber wrote:
> Simon schrieb:
>   
>> I try to rephrase my question. I don't know how to handle some of the  
>> primary constructs of the Java grammar.
>>
>>   Integer.parseInt("123")
>>   x.y(a, b)
>>   x[12][34]
>>   String.class
>>   java.util.Arrays.class
>>
>> are all pretty simple to detect with a symbol table and the  
>> information from the imports. But how do I handle qualified type  
>> names, such as the one in
>>
>>   java.util.Arrays.asList("1", "2")
>>
>> Conceptually, I need something like the following:
>>
>>   primary
>>       :    { isType(Identifier ('.' Identifier)*) } Identifier ('.'  
>> Identifier)* ...
>>       ;
>>
>> That is, I have to stop as soon as I have a type name  
>> (the .asList("1", "2") part should be parsed as selector).
>>
>> This combination of semantic and syntactic predicate does not exist  
>> out of the box. I could write a semantic predicate. But, is there an  
>> easier way?
>>
>> How would you write your parser to detect qualified type name  
>> constructs? Any help is appreciated!
>>
>> Simon
>>     
>
> On the download page there are example grammars. One of them is a Java
> grammar without the heavy interaction with javac. A similar language to
> Java is C#. The official grammar for it is available on
> <http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-334.pdf>.
>
> Johannes
>   
>> On Nov 21, 2008, at 21:56 , Simon wrote:
>>
>>     
>>> hi
>>>
>>> I'm trying to build an AST for a Java like language. The hardest part
>>> (if you want to built a meaningful AST) is the section of
>>> unaryExpressionNotPlusMinus (see grammar fragments at end or the
>>> Java.g grammar on antlr.org).
>>>
>>> I have successfully built ASTs for the following constructs (using
>>> semantic predicates based on a symbol table)
>>>
>>>   ^(FIELD_ACCESS target Identifier)
>>>   ^(INVOKE target Identifier arguments)
>>>   ^(ARRAY_ACCESS target expr)
>>>
>>> However, I'm struggling with fully qualified type names, such as those
>>> in
>>>
>>>   java.lang.Integer.parseInt("123")
>>>
>>> Of course, I want something like
>>>
>>>   ^(INVOKE ^(TYPE_REFERENCE ...) arguments)
>>>
>>> The problem is that I somehow have to look ahead to detect whether it
>>> is a qualified type name (don't know how the precedence is if there is
>>> a variable named java with a field named lang that has a field named
>>> Integer that has method named parseInt, but that's another problem). I
>>> could write my own semantic predicate method that looks ahead in the
>>> input to detect a qualified type name. Is there an easier way to do
>>> that? Or am I approaching the problem from the wrong side?
>>>
>>> I've tried to look at the Java grammar from langtools recently posted
>>> in this list, but didn't get any smarter (they rely heavily on the
>>> existing javac classes).
>>>
>>> Thanks
>>> Simon
>>>
>>>
>>>
>>> unaryExpressionNotPlusMinus
>>>     :   ...
>>>     |   primary selector* ('++'|'--')?
>>>     ;
>>>
>>> primary
>>>     :   parExpression
>>>     |   literal
>>>     |   'new' creator
>>>     |   Identifier ('.' Identifier)* identifierSuffix?   // this is
>>> the hard / interesting part
>>>     |   primitiveType ('[' ']')* '.' 'class'
>>>     |   'void' '.' 'class'
>>>     ;
>>>
>>> identifierSuffix
>>>     :   ('[' ']')+ '.' 'class'
>>>     |   ('[' expression ']')+ // can also be matched by selector, but
>>> do here
>>>     |   arguments
>>>     |   '.' 'class'
>>>     ;
>>>
>>> selector
>>>     :   '.' Identifier arguments?
>>>     |   '[' expression ']'
>>>     ;
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>       
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>     
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>   



More information about the antlr-interest mailing list