[antlr-interest] Re: yet another java 1.5 grammar

lgcraymer lgc at mail1.jpl.nasa.gov
Tue Aug 24 16:46:21 PDT 2004


Michael--

Please send Ter a copy so that he can put it in the "sharing" section
of antlr.org.

--Loring


--- In antlr-interest at yahoogroups.com, Michael Stahl
<gcpa-antlr-interest at m...> wrote:
> 
> [posting via gmane, hope that works and does not mess up attachments]
> 
> 
> hello!
> 
> this is the java 1.5 grammar which i thought i would post 3 weeks ago :)
> it took so long mostly because the proposed final draft version of JSR14
> came out on july 27 and that had some changes relative to the public
> review that came out in 2001 (!). i finally found the time for looking
> through the draft 3rd ed. of JLS that comes with JSR14-pfd last week.
> 
> oh, and thanks to michael studman, without your java15.g i would not
> even have noticed that! a superficial look at that yielded some weird
> WILDCARD thing which my grammar was missing...
> 
> since i need my grammar anyway (because i have a whitespace preserving
> grammar that is based on it and a serializing tree parser to match)
> i thought i might as well post it here despite java15.g being available.
> 
> also, i have some issues with java15.g, which are fixed in my grammar:
> 
> - the gtToReconcile stuff is way overkill. i replaced this with
>   a simple semantic predicate in typeArguments:
>         {inputState.guessing !=0 || ltCounter == currentLtLevel + 1}?
>   (oh, and thanks anyway, because i did not actually notice that
>   tree construction was broken in the first place)
> 
> - the typeArguments for method calls was missing in rule identPrimary:
>   i found adding this rather difficult because unfortunately there is a
>   conflict between the (DOT typeArgs IDENT)* loop in identPrimary
>   and (DOT typeArgs "super")* in rule postfixExpression which was not
>   there before (k=2). 
>   my first idea was to put a syntactic predicate in the loop, but
>   then antlr (2.7.4) gave me a warning that this does not make sense.
>   so what i came up with is this, and it is rather ugly:
>     (   (DOT typeArguments IDENT) =>
>                 DOT^ ta2:typeArguments! IDENT
>     |   {false}?        // FIXME: this is very ugly but it seems to
work...
>     )*
>   This generates two warnings, one of which i could disable, the other
>   says that empty alternative does not make any sense in a loop.
>   Obvious feature request for the antlr authors:
>   Would it be possible to handle the case of a syntactic predicate in a
>   loop with only one alternative differently in antlr, i.e. such that
>   not matching the syntax predicate means the loop is broken out of?
> 
> - newExpression is also missing typeArguments, but that is easy to fix
> 
> - the handling of the various different field types has a lot of
>   duplication. this is factored a bit better in my grammar.
> 
> a cool feature i have added is switchable keyword support. the
> "assert" and "enum" keywords can be dis/enabled at runtime. this is
> done via a string comparison in the lexer IDENT rule. i think this
> should not harm performance too much, but i have not benchmarked
> anything and do not particularly care anyway.
> 
> oh, and i have also cleaned up the formatting to always use tabs instead
> of the hideous tabs/spaces mix it was before.
> 
> my grammar is actually not much tested, but the whitespace preserving
> grammar based on it parses and serializes the entire aspectix cvs
> source tree (almost 3000 java 1.4 files) and also the sources of
> findbugs (which uses generics), as well as some custom written test
> cases.
> 
> btw, does anyone know of a sizable code base that uses java 1.5
> features and is licensed under some kind of free software license?
> i.e. not the sun jdk 1.5 source...
> 
> so, if anybody wants to comment on this or finds a bug, i'd be happy
> to hear from you.
> 
> michael stahl
> 
> PS: i hereby donate the code in the attached files, which was written
>     by myself except for the parts that are from the antlr distribution
>     and the parts taken from javaG.g by Matt Quail, to the public
>     domain.
> 
> /** Java 1.5/JSR14/JSR201/JSR175 Recognizer
>  *
>  * Run 'java Main [-showtree] directory-full-of-java-files'
>  *
>  * [The -showtree option pops up a Swing frame that shows
>  *  the AST constructed from the parser.]
>  *
>  * Run 'java Main <directory full of java files>'
>  *
>  * Contributing authors:
>  *		John Mitchell		johnm at n...
>  *		Terence Parr		parrt at m...
>  *		John Lilley			jlilley at e...
>  *		Scott Stanchfield	thetick at m...
>  *		Markus Mohnen       mohnen at i...
>  *      Peter Williams      pete.williams at s...
>  *      Allan Jacobs        Allan.Jacobs at e...
>  *      Steve Messick       messick at r...
>  *      John Pybus			john at p...
>  *
>  * Version 1.00 December 9, 1997 -- initial release
>  * Version 1.01 December 10, 1997
>  *		fixed bug in octal def (0..7 not 0..8)
>  * Version 1.10 August 1998 (parrt)
>  *		added tree construction
>  *		fixed definition of WS,comments for mac,pc,unix newlines
>  *		added unary plus
>  * Version 1.11 (Nov 20, 1998)
>  *		Added "shutup" option to turn off last ambig warning.
>  *		Fixed inner class def to allow named class defs as statements
>  *		synchronized requires compound not simple statement
>  *		add [] after builtInType DOT class in primaryExpression
>  *		"const" is reserved but not valid..removed from modifiers
>  * Version 1.12 (Feb 2, 1999)
>  *		Changed LITERAL_xxx to xxx in tree grammar.
>  *		Updated java.g to use tokens {...} now for 2.6.0 (new feature).
>  *
>  * Version 1.13 (Apr 23, 1999)
>  *		Didn't have (stat)? for else clause in tree parser.
>  *		Didn't gen ASTs for interface extends.  Updated tree parser too.
>  *		Updated to 2.6.0.
>  * Version 1.14 (Jun 20, 1999)
>  *		Allowed final/abstract on local classes.
>  *		Removed local interfaces from methods
>  *		Put instanceof precedence where it belongs...in relationalExpr
>  *			It also had expr not type as arg; fixed it.
>  *		Missing ! on SEMI in classBlock
>  *		fixed: (expr) + "string" was parsed incorrectly (+ as unary plus).
>  *		fixed: didn't like Object[].class in parser or tree parser
>  * Version 1.15 (Jun 26, 1999)
>  *		Screwed up rule with instanceof in it. :(  Fixed.
>  *		Tree parser didn't like (expr).something; fixed.
>  *		Allowed multiple inheritance in tree grammar. oops.
>  * Version 1.16 (August 22, 1999)
>  *		Extending an interface built a wacky tree: had extra EXTENDS.
>  *		Tree grammar didn't allow multiple superinterfaces.
>  *		Tree grammar didn't allow empty var initializer: {}
>  * Version 1.17 (October 12, 1999)
>  *		ESC lexer rule allowed 399 max not 377 max.
>  *		java.tree.g didn't handle the expression of synchronized
>  *		statements.
>  * Version 1.18 (August 12, 2001)
>  *      	Terence updated to Java 2 Version 1.3 by
>  *		observing/combining work of Allan Jacobs and Steve
>  *		Messick.  Handles 1.3 src.  Summary:
>  *		o  primary didn't include boolean.class kind of thing
>  *      	o  constructor calls parsed explicitly now:
>  * 		   see explicitConstructorInvocation
>  *		o  add strictfp modifier
>  *      	o  missing objBlock after new expression in tree grammar
>  *		o  merged local class definition alternatives, moved after
declaration
>  *		o  fixed problem with ClassName.super.field
>  *      	o  reordered some alternatives to make things more efficient
>  *		o  long and double constants were not differentiated from int/float
>  *		o  whitespace rule was inefficient: matched only one char
>  *		o  add an examples directory with some nasty 1.3 cases
>  *		o  made Main.java use buffered IO and a Reader for Unicode support
>  *		o  supports UNICODE?
>  *		   Using Unicode charVocabulay makes code file big, but only
>  *		   in the bitsets at the end. I need to make ANTLR generate
>  *		   unicode bitsets more efficiently.
>  * Version 1.19 (April 25, 2002)
>  *		Terence added in nice fixes by John Pybus concerning floating
>  *		constants and problems with super() calls.  John did a nice
>  *		reorg of the primary/postfix expression stuff to read better
>  *		and makes f.g.super() parse properly (it was METHOD_CALL not
>  *		a SUPER_CTOR_CALL).  Also:
>  *
>  *		o  "finally" clause was a root...made it a child of "try"
>  *		o  Added stuff for asserts too for Java 1.4, but *commented out*
>  *		   as it is not backward compatible.
>  *
>  * Version 1.20 (October 27, 2002)
>  *
>  *      Terence ended up reorging John Pybus' stuff to
>  *      remove some nondeterminisms and some syntactic predicates.
>  *      Note that the grammar is stricter now; e.g., this(...) must
>  *	be the first statement.
>  *
>  *      Trinary ?: operator wasn't working as array name:
>  *          (isBig ? bigDigits : digits)[i];
>  *
>  *      Checked parser/tree parser on source for
>  *          Resin-2.0.5, jive-2.1.1, jdk 1.3.1, Lucene, antlr 2.7.2a4,
>  *	    and the 110k-line jGuru server source.
>  *
>  * Version 1.21 (October 17, 2003)
>  *	Fixed lots of problems including:
>  *	Ray Waldin: add typeDefinition to interfaceBlock in java.tree.g
>  *  He found a problem/fix with floating point that start with 0
>  *  Ray also fixed problem that (int.class) was not recognized.
>  *  Thorsten van Ellen noticed that \n are allowed incorrectly in
strings.
>  *  TJP fixed CHAR_LITERAL analogously.
>  *
>  * Version 1.22 (April 14, 2004)
>  *  Changed vocab to be ..\uFFFE to avoid -1 char. removed dummy
VOCAB rule.
>  *
>  * Version 1.21.2 (March, 2003)
>  *      Changes by Matt Quail to support generics (as per JDK1.5/JSR14)
>  *      Notes:
>  *      o We only allow the "extends" keyword and not the "implements"
>  *        keyword, since thats what JSR14 seems to imply.
>  *      o Thanks to Monty Zukowski for his help on the antlr-interest
>  *        mail list.
>  *      o Thanks to Alan Eliasen for testing the grammar over his
>  *        Fink source base
>  *
>  * Version 1.22+assert+JSR14 (2004-06-10)
>  *      Merged ANTLR version 1.22 with javaG.g version 1.21.2 and added
>  *      the ability to enable the "assert" keyword at runtime via
the lexer.
>  *      Also made changes to generics rules for a saner AST creation.
>  *
>  * Version 1.22+assert+JSR14+JSR201 (2004-06-12)
>  *      Added support for enums, varargs, enhanced for loop, and
import static
>  *
>  * Version 1.22+assert+JSR14+JSR201+JSR175 (2004-06-14)
>  *      Added support for metadata (JSR 175). Refactored the field
rule into
>  *      classField and interfaceField.
>  *
>  * Version 1.22+assert+JSR14+JSR201+JSR175+AST (2004-07-02)
>  *      Various changes to improve AST generation; also made the
tree parser
>  *      recognize all the fancy new stuff. Added the ability to
enable the
>  *      "enum" keyword at runtime (just like "assert").
>  *
>  * Version 1.22+assert+JSR14+JSR201+JSR175+AST+fixes (2004-08-17)
>  *      Bug fixes and support for wildcard type arguments and
constructor
>  *      type parameters (new in final draft of JSR 14). Formatting
cleanup.
>  *
>  * This grammar is in the PUBLIC DOMAIN
>  */
> class JavaRecognizer extends Parser;
> options {
> 	k = 2;                           // two token lookahead
> 	exportVocab=Java;                // Call its vocabulary "Java"
> 	codeGenMakeSwitchThreshold = 2;  // Some optimizations
> 	codeGenBitsetTestThreshold = 3;
> 	defaultErrorHandler = false;     // Don't generate parser error
handlers
> 	buildAST = true;
> }
> 
> tokens {
> 	BLOCK; MODIFIERS; OBJBLOCK; SLIST; CTOR_DEF; METHOD_DEF; VARIABLE_DEF;
> 	INSTANCE_INIT; STATIC_INIT; TYPE; CLASS_DEF; INTERFACE_DEF;
> 	PACKAGE_DEF; ARRAY_DECLARATOR; EXTENDS_CLAUSE; IMPLEMENTS_CLAUSE;
> 	PARAMETERS; PARAMETER_DEF; LABELED_STAT; TYPECAST; INDEX_OP;
> 	POST_INC; POST_DEC; METHOD_CALL; EXPR; ARRAY_INIT;
> 	IMPORT; UNARY_MINUS; UNARY_PLUS; CASE_GROUP; ELIST; FOR_INIT;
FOR_CONDITION;
> 	FOR_ITERATOR; EMPTY_STAT; FINAL="final"; ABSTRACT="abstract";
> 	STRICTFP="strictfp"; SUPER_CTOR_CALL; CTOR_CALL;
> 	ASSERT; TYPE_ARGS; TYPE_ARGS_END; TYPE_PARAMS; ENUM; ENUM_DEF;
ENUM_CONST;
> 	ANNOTATION_DEF; ANNOTATION_MEMBER_DEF; ANNOTATION; ANNOTATIONS;
> 	ANNOTATION_INIT_EMPTY; ANNOTATION_INIT_VALUE; ANNOTATION_INIT_LIST;
> 	ANNOTATION_INIT_MEMBER; WILDCARD;
> }
> 
> {
>     /**
>      * Counts the number of LT seen in the typeArguments production.
>      * It is used in semantic predicates to ensure we have seen
>      * enough closing '>' characters; which actually may have been
>      * either GT, SR or BSR tokens.
>      */
>     private int ltCounter = 0;
> }
> 
> // Compilation Unit: In Java, this is a single file.  This is the start
> //   rule for this parser
> compilationUnit
> 	:	// A compilation unit starts with an optional package definition
> 		// Metadata makes a mess of things: even package definitions can be
> 		// annotated, although _only_ in one file (not enforced here ;))
> 		(	( annotations "package" ) => packageDefinition
> 		|	/* nothing */
> 		)
> 
> 		// Next we have a series of zero or more import statements
> 		( importDefinition )*
> 
> 		// Wrapping things up with any number of class or interface
> 		//    definitions
> 		( typeDefinition )*
> 
> 		EOF!
> 	;
> 
> 
> // Package statement: "package" followed by an identifier.
> packageDefinition
> 	options {defaultErrorHandler = true;} // let ANTLR handle errors
> 	:	annotations p:"package"^ {#p.setType(PACKAGE_DEF);} identifier SEMI!
> 	;
> 
> 
> // Import statement: import followed by a package or class name
> // JSR 201 allows the optional "static" keyword
> importDefinition
> 	options {defaultErrorHandler = true;}
> 	:	i:"import"^ {#i.setType(IMPORT);} ("static")? identifierStar SEMI!
> 	;
> 
> // A type definition in a file is either a class, interface,
enumeration or
> // annotation type definition.
> typeDefinition
> 	options {defaultErrorHandler = true;}
> 	:	m:modifiers!
> 		( classDefinition[#m]
> 		| enumDefinition[#m]
> 		| interfaceDefinition[#m]
> 		| annotationTypeDefinition[#m]
> 		)
> 	|	SEMI!
> 	;
> 
> /** A declaration is the creation of a reference or primitive-type
variable
>  *  Create a separate Type/Var tree for each var in the var list.
>  */
> declaration!
> 	:	m:modifiers t:typeSpec[false] v:variableDefinitions[#m,#t]
> 		{#declaration = #v;}
> 	;
> 
> // A type specification is a type name with possible brackets afterwards
> //   (which would make it an array type).
> typeSpec[boolean addImagNode]
> 	: classTypeSpec[addImagNode]
> 	| builtInTypeSpec[addImagNode]
> 	;
> 
> // built in types are not reference types, everything else is
> referenceTypeSpec[boolean addImagNode]
> 	: classTypeSpec[addImagNode]
> 	| arrayTypeSpec[addImagNode]
> 	;
> 
> // A class type specification is a class type with either:
> // - possible brackets afterwards
> //   (which would make it an array type).
> // - generic type arguments after
> classTypeSpec[boolean addImagNode]
> 	:	classOrInterfaceType[false]
> 		(options{greedy=true;}: // match as many as possible
> 			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
> 		)*
> 		{
> 			if ( addImagNode ) {
> 				#classTypeSpec = #(#[TYPE,"TYPE"], #classTypeSpec);
> 			}
> 		}
> 	;
> 
> classOrInterfaceType[boolean addImagNode]
> 	:	IDENT typeArguments
> 		(options{greedy=true;}: // match as many as possible
> 			DOT^
> 			IDENT typeArguments
> 		)*
> 		{
> 			if ( addImagNode ) {
> 				#classOrInterfaceType = #(#[TYPE,"TYPE"],
> 					#classOrInterfaceType);
> 			}
> 		}
> 	;
> 
> typeArguments
> {int currentLtLevel = 0;}
> 	:
> 		{currentLtLevel = ltCounter;}
> 		(
> 			lt:LT^	{ ltCounter++; #lt.setType(TYPE_ARGS); }
> 			typeArgument
> 			(options{greedy=true;}: // match as many as possible
> 				// The second test is needed to construct trees properly
> 				// in the case when we have ">>" or ">>>" tokens
> 				// (test case: "var<O1<I1<M1>>, O2<I2>> a;"
> 				// The first test is needed because otherwise
> 				// stuff breaks when guessing (e.g. declaration)
> 				// because semantic actions are not executed and the
> 				// second test would always fail (trees are not constructed
> 				// while guessing, so no problem there).
> 				{inputState.guessing !=0 || ltCounter == currentLtLevel + 1}?
> 					COMMA! typeArgument
> 			)*
> 
> 			(   // turn warning off since Antlr generates the right code,
> 				// plus we have our semantic predicate below
> 				options{generateAmbigWarnings=false;}:
> 				typeArgumentsEnd!
> 			)?
> 		)
> 		// make sure we have gobbled up enough '>' characters
> 		// if we are at the "top level" of nested typeArgument productions
> 		{(currentLtLevel != 0) || ltCounter == currentLtLevel}?
> 	|	{#typeArguments = #(#[TYPE_ARGS,"TYPE_ARGS"], #typeArguments);}
> 	;
> 
> // either reference type or wildcard type with optional lower or
upper bound
> typeArgument
> 	:	(	q:QUESTION^ {#q.setType(WILDCARD);}
> 			( // faux conflict on "extends" because typeArgsEnd may be empty
> 				options{greedy=true;}:
> 				"extends" referenceTypeSpec[true]
> 			|	"super" referenceTypeSpec[true]
> 			)?
> 		)
> 	|	referenceTypeSpec[true]
> 	;
> 
> // this gobbles up *some* amount of '>' characters, and counts how many
> // it gobbled.
> protected
> typeArgumentsEnd
> 	:	(	GT {ltCounter-=1;}
> 		|	SR {ltCounter-=2;}
> 		|	BSR {ltCounter-=3;}
> 		)
> 		{ #typeArgumentsEnd.setType(TYPE_ARGS_END); }
> 	;
> 
> // A builtin type specification is a builtin type with possible brackets
> // afterwards (which would make it an array type).
> builtInTypeSpec[boolean addImagNode]
> 	:	builtInType
> 		(options{greedy=true;}: // match as many as possible
> 			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
> 		)*
> 		{
> 			if ( addImagNode ) {
> 				#builtInTypeSpec = #(#[TYPE,"TYPE"], #builtInTypeSpec);
> 			}
> 		}
> 	;
> 
> // An array type specification is a builtin type with brackets
afterwards
> arrayTypeSpec[boolean addImagNode]
> 	:	builtInType
> 		(options{greedy=true;}: // match as many as possible
> 			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
> 		)+
> 		{
> 			if ( addImagNode ) {
> 				#arrayTypeSpec = #(#[TYPE,"TYPE"], #arrayTypeSpec);
> 			}
> 		}
> 	;
> 
> // A type name. which is either a (possibly qualified and parameterized)
> // class name or a primitive (builtin) type
> type
> 	:	classOrInterfaceType[false]
> 	|	builtInType
> 	;
> 
> // The primitive types.
> builtInType
> 	:	"void"
> 	|	"boolean"
> 	|	"byte"
> 	|	"char"
> 	|	"short"
> 	|	"int"
> 	|	"float"
> 	|	"long"
> 	|	"double"
> 	;
> 
> // A (possibly-qualified) java identifier.  We start with the first
IDENT
> //   and expand its name by adding dots and following IDENTS
> identifier
> 	:	IDENT  ( DOT^ IDENT )*
> 	;
> 
> identifierStar
> 	:	IDENT
> 		( DOT^ IDENT )*
> 		( DOT^ STAR  )?
> 	;
> 
> // A list of zero or more modifiers.  We could have used (modifier)* in
> //   place of a call to modifiers, but I thought it was a good idea
to keep
> //   this rule separate so they can easily be collected in a Vector if
> //   someone so desires
> // JSR 175 says that annotations are allowed everywhere modifiers are.
> // A nondeterminism warning is masked by the greedy option.
> modifiers
> 	:	( options{greedy=true;} : modifier | annotation )*
> 		{#modifiers = #([MODIFIERS, "MODIFIERS"], #modifiers);}
> 	;
> 
> // modifiers for Java classes, interfaces, class/instance vars and
methods
> modifier
> 	:	"private"
> 	|	"public"
> 	|	"protected"
> 	|	"static"
> 	|	"transient"
> 	|	"final"
> 	|	"abstract"
> 	|	"native"
> 	|	"threadsafe"
> 	|	"synchronized"
> //	|	"const"			// reserved word, but not valid
> 	|	"volatile"
> 	|	"strictfp"
> 	;
> 
> // Definition of an enumeration (JSR 201)
> enumDefinition![AST modifiers]
> 	:	ENUM IDENT
> 		// it might implement some interfaces...
> 		ic:implementsClause
> 		// now parse the body of the enum
> 		eb:enumBlock
> 		{#enumDefinition = #(#[ENUM_DEF,"ENUM_DEF"], modifiers,IDENT,ic,eb);}
> 	;
> 
> // This is the body of an enumeration.  It can contain a list of comma
> // separated identifiers (the enum values), and optionally,
seperated by a
> // semicolon, some declarations like in a class at the end.
> // The values of the enumeration may be annotated.
> enumBlock
> 	:	LCURLY! // next line has a nondeterminism warning without option
greedy
> 			( enumConst ( options {greedy=true;} : COMMA! enumConst )* )?
> 			( COMMA! )?	// optional comma at end of value list
> 			( SEMI! ( classField | SEMI! )* )?
> 		RCURLY!
> 		{#enumBlock = #([OBJBLOCK, "OBJBLOCK"], #enumBlock);}
> 	;
> 
> // Each enum value is in fact a class instance, and can be followed
by the
> // usual class declarations.
> enumConst
> 	:	annotations IDENT enumConstInit ( classBlock )?
> 		{#enumConst = #([ENUM_CONST, "ENUM_CONST"], #enumConst );}
> 	;
> 
> // This is really a constructor invocation.
> enumConstInit
> 	: ( lp:LPAREN^ argList RPAREN! { #lp.setType(CTOR_CALL); } )?
> 	;
> 
> // Definition of an annotation type (JSR 175)
> annotationTypeDefinition![AST modifiers]
> 	:	AT "interface" IDENT
> 		// now parse the body of the annotation type
> 		ab:annotationBlock
> 		{#annotationTypeDefinition = #(#[ANNOTATION_DEF,"ANNOTATION_DEF"],
> 									modifiers,IDENT,ab);}
> 	;
> 
> // This is the body of an annotation type. Only inner type
definitions and
> // members (which use a notation similar to methods) are allowed.
> annotationBlock
> 	:	LCURLY!
> 			( annotationField | SEMI! )*
> 		RCURLY!
> 		{#annotationBlock = #([OBJBLOCK, "OBJBLOCK"], #annotationBlock);}
> 	;
> 
> annotationField!
> 	:
> 		mods:modifiers
> 		(	it:innerTypeDef[#mods]		// inner type definition
> 			{#annotationField = #it;}
> 		|	ts:typeSpec[false]
> 			(	i:IDENT LPAREN RPAREN dv:defaultValue SEMI
> 				{#annotationField =
> 						#(#[ANNOTATION_MEMBER_DEF,"ANNOTATION_MEMBER_DEF"],
> 						mods, #(#[TYPE,"TYPE"],ts), i, dv); }
> 			|	v:variableDefinitions[#mods,#ts] SEMI
> 				{#annotationField = #v;}
> 			)
> 		)
> 	;
> 
> // Annotation members may have optional default values.
> defaultValue
> 	:	( "default"^ annotationMemberValue )?
> 	;
> 
> annotations
> 	:	( annotation )*
> 		{#annotations = #([ANNOTATIONS, "ANNOTATIONS"], #annotations);}
> 	;
> 
> annotation
> 	:	AT^ identifier annotationInit
> 		{#AT.setType(ANNOTATION);}
> 	;
> 
> // The initialization (list of assignments, single value, or nothing).
> annotationInit
> 	:	(	lp:LPAREN^
> 			(	annotationMemberInit
> 				( COMMA! annotationMemberInit )*
> 				{#lp.setType(ANNOTATION_INIT_LIST);}
> 			|	annotationMemberValue	{#lp.setType(ANNOTATION_INIT_VALUE);}
> 			)
> 			RPAREN!
> 		)
> 	|	{#annotationInit = #([ANNOTATION_INIT_EMPTY, "AN_INIT_EMPTY"]);}
> 	;
> 
> annotationMemberInit
> 	:	IDENT ASSIGN! annotationMemberValue
> 		{#annotationMemberInit =
> 		 #([ANNOTATION_INIT_MEMBER, "AN_INIT_MEMBER"],
#annotationMemberInit);}
> 	;
> 
> annotationMemberValue
> 	:	annotation
> 	|	conditionalExpression
> 		{#annotationMemberValue = #(#[EXPR,"EXPR"],#annotationMemberValue);}
> 	|	arrayInitializer
> 	;
> 
> // Definition of a Java class
> classDefinition![AST modifiers]
> 	:	"class" IDENT
> 		// it _might_ have type paramaters
> 		tp:typeParameters
> 		// it _might_ have a superclass...
> 		sc:superClassClause
> 		// it might implement some interfaces...
> 		ic:implementsClause
> 		// now parse the body of the class
> 		cb:classBlock
> 		{#classDefinition = #(#[CLASS_DEF,"CLASS_DEF"],
> 							   modifiers,IDENT,tp,sc,ic,cb);}
> 	;
> 
> superClassClause!
> 	:	( "extends" classOrInterfaceType[false] )?
> 		{#superClassClause = #(#[EXTENDS_CLAUSE,"EXTENDS_CLAUSE"],
> 			#superClassClause);}
> 	;
> 
> // Definition of a Java Interface
> interfaceDefinition![AST modifiers]
> 	:	"interface" IDENT
> 		// it _might_ have type paramaters
> 		tp:typeParameters
> 		// it might extend some other interfaces
> 		ie:interfaceExtends
> 		// now parse the body of the interface
> 		ib:interfaceBlock
> 		{#interfaceDefinition = #(#[INTERFACE_DEF,"INTERFACE_DEF"],
> 									modifiers,IDENT,tp,ie,ib);}
> 	;
> 
> typeParameters
> {int currentLtLevel = 0;}
> 	:
> 		{currentLtLevel = ltCounter;}
> 		(
> 			lt:LT^	{ ltCounter++; #lt.setType(TYPE_PARAMS); }
> 			typeParameter (COMMA! typeParameter)*
> 			(typeArgumentsEnd!)?
> 		)
> 		// make sure we have gobbled up enough '>' characters
> 		// if we are at the "top level" of nested typeArgument productions
> 		{(currentLtLevel != 0) || ltCounter == currentLtLevel}?
> 	|	{#typeParameters = #(#[TYPE_PARAMS,"TYPE_PARAMS"], #typeParameters);}
> 	;
> 
> typeParameter
> 	:	IDENT
> 		(   // I'm pretty sure Antlr generates the right thing here:
> 			options{generateAmbigWarnings=false;}:
> 			"extends" classOrInterfaceType[true]
> 			(BAND! classOrInterfaceType[true])*
> 		)?
> 	;
> 
> // This is the body of an interface.
> interfaceBlock
> 	:	LCURLY!
> 			( interfaceField | SEMI! )*
> 		RCURLY!
> 		{#interfaceBlock = #([OBJBLOCK, "OBJBLOCK"], #interfaceBlock);}
> 	;
> 
> // This is the body of a class.  You can have fields and extra
semicolons,
> // That's about it (until you see what a field is...)
> classBlock
> 	:	LCURLY!
> 			( classField | SEMI! )*
> 		RCURLY!
> 		{#classBlock = #([OBJBLOCK, "OBJBLOCK"], #classBlock);}
> 	;
> 
> // An interface can extend several other interfaces...
> interfaceExtends
> 	:	(
> 		e:"extends"!
> 		classOrInterfaceType[false] ( COMMA! classOrInterfaceType[false] )*
> 		)?
> 		{#interfaceExtends = #(#[EXTENDS_CLAUSE,"EXTENDS_CLAUSE"],
> 							#interfaceExtends);}
> 	;
> 
> // A class can implement several interfaces...
> implementsClause
> 	:	(
> 			i:"implements"! classOrInterfaceType[false]
> 							( COMMA! classOrInterfaceType[false] )*
> 		)?
> 		{#implementsClause = #(#[IMPLEMENTS_CLAUSE,"IMPLEMENTS_CLAUSE"],
> 								 #implementsClause);}
> 	;
> 
> // Fields that are type definitions.
> protected
> innerTypeDef![AST modifiers]
> 	: 	(	ed:enumDefinition[modifiers]        // inner enum
> 			{#innerTypeDef = #ed;}
> 
> 		|	cd:classDefinition[modifiers]       // inner class
> 			{#innerTypeDef = #cd;}
> 
> 		|	id:interfaceDefinition[modifiers]   // inner interface
> 			{#innerTypeDef = #id;}
> 
> 		|	ad:annotationTypeDefinition[modifiers]   // inner annotation type
> 			{#innerTypeDef = #ad;}
> 		)
> 		;
> 
> protected
> memberDef![AST modifiers, AST typeParams, boolean allowMethodBody]
> 	:
> 		// A generic method has the typeParameters before the return type.
> 		// This is not allowed for variable definitions, but this production
> 		// allows it; a semantic check could be used if you wanted.
> 		t:typeSpec[false]  // method or variable declaration(s)
> 		(	IDENT  // the name of the method
> 
> 			// parse the formal parameter declarations.
> 			LPAREN! param:parameterDeclarationList RPAREN!
> 
> 			rt:declaratorBrackets[#t]
> 
> 			// get the list of exceptions that this method is
> 			// declared to throw
> 			(tc:throwsClause)?
> 
> 			( SEMI | {allowMethodBody}? s2:compoundStatement )
> 			{#memberDef = #(#[METHOD_DEF,"METHOD_DEF"],
> 							 modifiers,
> 							 typeParams,
> 							 #(#[TYPE,"TYPE"],rt),
> 							 IDENT,
> 							 param,
> 							 tc,
> 							 s2);}
> 		|	v:variableDefinitions[modifiers,#t] SEMI
> //			{#field = #(#[VARIABLE_DEF,"VARIABLE_DEF"], v);}
> 			{#memberDef = #v;} // omit tp here, as it is not legal anyway
> 		)
> 	;
> 
> // An interface can contain inner type definitions, methods and constant
> // definitions. Generalizing the latter, memberDef allows member
variables.
> // To detect and prevent that, use a semantic check.
> interfaceField!
> 	:
> 		mods:modifiers
> 		(	it:innerTypeDef[#mods]		// inner type definition
> 			{#interfaceField = #it;}
> 		|	tp:typeParameters md:memberDef[#mods, #tp, false]
> 			// method or variable definition
> 			{#interfaceField = #md;}
> 		)
> 	;
> 
> // Now the various things that can be defined inside a class...
> classField!
> 	:	// method, constructor, or variable declaration
> 		mods:modifiers
> 		(	it:innerTypeDef[#mods]	// inner type definition
> 			{#classField = #it;}
> 
> 		|	tp:typeParameters
> 			(	h:ctorHead s:constructorBody // constructor
> 				{#classField = #(#[CTOR_DEF,"CTOR_DEF"], mods, tp, h, s);}
> 
> 			|	md:memberDef[#mods, #tp, true] // method or variable definition
> 				{#classField = #md;}
> 			)
> 		)
> 
> 	// "static { ... }" class initializer
> 	|	"static" s3:compoundStatement
> 		{#classField = #(#[STATIC_INIT,"STATIC_INIT"], s3);}
> 
> 	// "{ ... }" instance initializer
> 	|	s4:compoundStatement
> 		{#classField = #(#[INSTANCE_INIT,"INSTANCE_INIT"], s4);}
> 	;
> 
> constructorBody
> 	:	lc:LCURLY^ {#lc.setType(SLIST);}
> 			( options { greedy=true; } : explicitConstructorInvocation)?
> 			(statement)*
> 		RCURLY!
> 	;
> 
> /** Catch obvious constructor calls, but not the expr.super(...)
calls */
> explicitConstructorInvocation
> 	:	typeArguments
> 		(	"this"! lp1:LPAREN^ argList RPAREN! SEMI!
> 			{#lp1.setType(CTOR_CALL);}
> 		|	"super"! lp2:LPAREN^ argList RPAREN! SEMI!
> 			{#lp2.setType(SUPER_CTOR_CALL);}
> 		)
> 	;
> 
> variableDefinitions[AST mods, AST t]
> 	:	variableDeclarator[getASTFactory().dupTree(mods),
> 						   getASTFactory().dupList(t)]
> 		(	COMMA!
> 			variableDeclarator[getASTFactory().dupTree(mods),
> 							   getASTFactory().dupList(t)]
> 		)*
> 	;
> 
> /** Declaration of a variable.  This can be a class/instance variable,
>  *   or a local variable in a method
>  * It can also include possible initialization.
>  */
> variableDeclarator![AST mods, AST t]
> 	:	id:IDENT d:declaratorBrackets[t] v:varInitializer
> 		{#variableDeclarator = #(#[VARIABLE_DEF,"VARIABLE_DEF"],
> 								mods, #(#[TYPE,"TYPE"],d), id, v);}
> 	;
> 
> declaratorBrackets[AST typ]
> 	:	{#declaratorBrackets=typ;}
> 		(lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!)*
> 	;
> 
> varInitializer
> 	:	( ASSIGN^ initializer )?
> 	;
> 
> // This is an initializer used to set up an array.
> arrayInitializer
> 	:	lc:LCURLY^ {#lc.setType(ARRAY_INIT);}
> 			(	initializer
> 				(
> 					// CONFLICT: does a COMMA after an initializer start a new
> 					//           initializer or start the option ',' at end?
> 					//           ANTLR generates proper code by matching
> 					//           the comma as soon as possible.
> 					options {
> 						warnWhenFollowAmbig = false;
> 					}
> 				:
> 					COMMA! initializer
> 				)*
> 			)?
> 			(COMMA!)?
> 		RCURLY!
> 	;
> 
> 
> // The two "things" that can initialize an array element are an
expression
> //   and another (nested) array initializer.
> initializer
> 	:	expression
> 	|	arrayInitializer
> 	;
> 
> // This is the header of a method.  It includes the name and parameters
> //   for the method.
> //   This also watches for a list of exception classes in a "throws"
clause.
> ctorHead
> 	:	IDENT  // the name of the method
> 
> 		// parse the formal parameter declarations.
> 		LPAREN! parameterDeclarationList RPAREN!
> 
> 		// get the list of exceptions that this method is declared to throw
> 		(throwsClause)?
> 	;
> 
> // This is a list of exception classes that the method is declared
to throw
> throwsClause
> 	:	"throws"^ identifier ( COMMA! identifier )*
> 	;
> 
> 
> // A list of formal parameters
> parameterDeclarationList
> 	:	( parameterDeclaration ( COMMA! parameterDeclaration )* )?
> 		{#parameterDeclarationList = #(#[PARAMETERS,"PARAMETERS"],
> 									#parameterDeclarationList);}
> 	;
> 
> // A formal parameter.
> // The ellipsis is the support for varargs (JSR 201)
> // This rule allows ellipsis on any parameter, not just the last (as
specified
> // by JSR 201), so a semantic check is needed for that.
> parameterDeclaration!
> 	:	pm:parameterModifier t:typeSpec[false] ( el:ELLIPSIS )? id:IDENT
> 		pd:declaratorBrackets[#t]
> 		{#parameterDeclaration = #(#[PARAMETER_DEF,"PARAMETER_DEF"],
> 									pm, #([TYPE,"TYPE"],pd,el), id);}
> 	;
> 
> // Parameters can be final. And annotated. Or even both.
> parameterModifier
> 	:	( "final" | annotation )*
> 		{#parameterModifier = #(#[MODIFIERS,"MODIFIERS"],
#parameterModifier);}
> 	;
> 
> // Compound statement.  This is used in many contexts:
> //   Inside a class definition prefixed with "static":
> //      it is a class initializer
> //   Inside a class definition without "static":
> //      it is an instance initializer
> //   As the body of a method
> //   As a completely indepdent braced block of code inside a method
> //      it starts a new scope for variable definitions
> 
> compoundStatement
> 	:	lc:LCURLY^ {#lc.setType(SLIST);}
> 			// include the (possibly-empty) list of statements
> 			(statement)*
> 		RCURLY!
> 	;
> 
> 
> statement
> 	// A list of statements in curly braces -- start a new scope!
> 	:	compoundStatement
> 
> 	// declarations are ambiguous with "ID DOT" relative to expression
> 	// statements.  Must backtrack to be sure.  Could use a semantic
> 	// predicate to test symbol table to see what the type was coming
> 	// up, but that's pretty hard without a symbol table ;)
> 	|	(declaration)=> declaration SEMI!
> 
> 	// An expression statement.  This could be a method call,
> 	// assignment statement, or any other expression evaluated for
> 	// side-effects.
> 	|	expression SEMI!
> 
> 	// class or enum definition
> 	|	m:modifiers! ( enumDefinition[#m] |	classDefinition[#m] )
> 
> 	// Attach a label to the front of a statement
> 	|	IDENT c:COLON^ {#c.setType(LABELED_STAT);} statement
> 
> 	// If-else statement
> 	|	"if"^ LPAREN! expression RPAREN! statement
> 		(
> 			// CONFLICT: the old "dangling-else" problem...
> 			//           ANTLR generates proper code matching
> 			//           as soon as possible.  Hush warning.
> 			options {
> 				warnWhenFollowAmbig = false;
> 			}
> 		:
> 			"else"! statement
> 		)?
> 
> 	// For statement, with support for the enhanced variant (JSR 201)
> 	|	"for"^
> 			LPAREN!
> 			(
> 				( parameterDeclaration COLON ) =>
> 				parameterDeclaration COLON! expression
> 			|
> 				forInit SEMI!   // initializer
> 				forCond	SEMI!   // condition test
> 				forIter         // updater
> 			)
> 			RPAREN!
> 			statement                     // statement to loop over
> 
> 	// While statement
> 	|	"while"^ LPAREN! expression RPAREN! statement
> 
> 	// do-while statement
> 	|	"do"^ statement "while"! LPAREN! expression RPAREN! SEMI!
> 
> 	// get out of a loop (or switch)
> 	|	"break"^ (IDENT)? SEMI!
> 
> 	// do next iteration of a loop
> 	|	"continue"^ (IDENT)? SEMI!
> 
> 	// Return an expression
> 	|	"return"^ (expression)? SEMI!
> 
> 	// switch/case statement
> 	|	"switch"^ LPAREN! expression RPAREN! LCURLY!
> 			( casesGroup )*
> 		RCURLY!
> 
> 	// exception try-catch block
> 	|	tryBlock
> 
> 	// throw an exception
> 	|	"throw"^ expression SEMI!
> 
> 	// synchronize a statement
> 	|	"synchronized"^ LPAREN! expression RPAREN! compoundStatement
> 
> 	// asserts (this can be enabled/disabled via the lexer)
> 	|	ASSERT^ expression ( COLON! expression )? SEMI!
> 
> 	// empty statement
> 	|	s:SEMI {#s.setType(EMPTY_STAT);}
> 	;
> 
> casesGroup
> 	:	(	// CONFLICT: to which case group do the statements bind?
> 			//           ANTLR generates proper code: it groups the
> 			//           many "case"/"default" labels together then
> 			//           follows them with the statements
> 			options {
> 				greedy = true;
> 			}
> 			:
> 			aCase
> 		)+
> 		caseSList
> 		{#casesGroup = #([CASE_GROUP, "CASE_GROUP"], #casesGroup);}
> 	;
> 
> aCase
> 	:	("case"^ expression | "default") COLON!
> 	;
> 
> caseSList
> 	:	(statement)*
> 		{#caseSList = #(#[SLIST,"SLIST"],#caseSList);}
> 	;
> 
> // The initializer for a for loop
> forInit
> 		// if it looks like a declaration, it is
> 	:	(	(declaration)=> declaration
> 		// otherwise it could be an expression list...
> 		|	expressionList
> 		)?
> 		{#forInit = #(#[FOR_INIT,"FOR_INIT"],#forInit);}
> 	;
> 
> forCond
> 	:	(expression)?
> 		{#forCond = #(#[FOR_CONDITION,"FOR_CONDITION"],#forCond);}
> 	;
> 
> forIter
> 	:	(expressionList)?
> 		{#forIter = #(#[FOR_ITERATOR,"FOR_ITERATOR"],#forIter);}
> 	;
> 
> // an exception handler try/catch block
> tryBlock
> 	:	"try"^ compoundStatement
> 		(handler)*
> 		( finallyClause )?
> 	;
> 
> finallyClause
> 	:	"finally"^ compoundStatement
> 	;
> 
> // an exception handler
> handler
> 	:	"catch"^ LPAREN! parameterDeclaration RPAREN! compoundStatement
> 	;
> 
> 
> // expressions
> // Note that most of these expressions follow the pattern
> //   thisLevelExpression :
> //       nextHigherPrecedenceExpression
> //           (OPERATOR nextHigherPrecedenceExpression)*
> // which is a standard recursive definition for a parsing an expression.
> // The operators in java have the following precedences:
> //    lowest  (13)  = *= /= %= += -= <<= >>= >>>= &= ^= |=
> //            (12)  ?:
> //            (11)  ||
> //            (10)  &&
> //            ( 9)  |
> //            ( 8)  ^
> //            ( 7)  &
> //            ( 6)  == !=
> //            ( 5)  < <= > >=
> //            ( 4)  << >>
> //            ( 3)  +(binary) -(binary)
> //            ( 2)  * / %
> //            ( 1)  ++ -- +(unary) -(unary)  ~  !  (type)
> //                  []   () (method call)  . (dot -- identifier
qualification)
> //                  new   ()  (explicit parenthesis)
> //
> // the last two are not usually on a precedence chart; I put them in
> // to point out that new has a higher precedence than '.', so you
> // can validy use
> //     new Frame().show()
> //
> // Note that the above precedence levels map to the rules below...
> // Once you have a precedence chart, writing the appropriate rules
as below
> //   is usually very straightfoward
> 
> 
> 
> // the mother of all expressions
> expression
> 	:	assignmentExpression
> 		{#expression = #(#[EXPR,"EXPR"],#expression);}
> 	;
> 
> 
> // This is a list of expressions.
> expressionList
> 	:	expression (COMMA! expression)*
> 		{#expressionList = #(#[ELIST,"ELIST"], expressionList);}
> 	;
> 
> 
> // assignment expression (level 13)
> assignmentExpression
> 	:	conditionalExpression
> 		(	(	ASSIGN^
> 			|	PLUS_ASSIGN^
> 			|	MINUS_ASSIGN^
> 			|	STAR_ASSIGN^
> 			|	DIV_ASSIGN^
> 			|	MOD_ASSIGN^
> 			|	SR_ASSIGN^
> 			|	BSR_ASSIGN^
> 			|	SL_ASSIGN^
> 			|	BAND_ASSIGN^
> 			|	BXOR_ASSIGN^
> 			|	BOR_ASSIGN^
> 			)
> 			assignmentExpression
> 		)?
> 	;
> 
> 
> // conditional test (level 12)
> conditionalExpression
> 	:	logicalOrExpression
> 		( QUESTION^ assignmentExpression COLON! conditionalExpression )?
> 	;
> 
> 
> // logical or (||)  (level 11)
> logicalOrExpression
> 	:	logicalAndExpression (LOR^ logicalAndExpression)*
> 	;
> 
> 
> // logical and (&&)  (level 10)
> logicalAndExpression
> 	:	inclusiveOrExpression (LAND^ inclusiveOrExpression)*
> 	;
> 
> 
> // bitwise or non-short-circuiting or (|)  (level 9)
> inclusiveOrExpression
> 	:	exclusiveOrExpression (BOR^ exclusiveOrExpression)*
> 	;
> 
> 
> // exclusive or (^)  (level 8)
> exclusiveOrExpression
> 	:	andExpression (BXOR^ andExpression)*
> 	;
> 
> 
> // bitwise or non-short-circuiting and (&)  (level 7)
> andExpression
> 	:	equalityExpression (BAND^ equalityExpression)*
> 	;
> 
> 
> // equality/inequality (==/!=) (level 6)
> equalityExpression
> 	:	relationalExpression ((NOT_EQUAL^ | EQUAL^) relationalExpression)*
> 	;
> 
> 
> // boolean relational expressions (level 5)
> relationalExpression
> 	:	shiftExpression
> 		(	(	(	LT^
> 				|	GT^
> 				|	LE^
> 				|	GE^
> 				)
> 				shiftExpression
> 			)*
> 		|	"instanceof"^ typeSpec[true]
> 		)
> 	;
> 
> 
> // bit shift expressions (level 4)
> shiftExpression
> 	:	additiveExpression ((SL^ | SR^ | BSR^) additiveExpression)*
> 	;
> 
> 
> // binary addition/subtraction (level 3)
> additiveExpression
> 	:	multiplicativeExpression ((PLUS^ | MINUS^) multiplicativeExpression)*
> 	;
> 
> 
> // multiplication/division/modulo (level 2)
> multiplicativeExpression
> 	:	unaryExpression ((STAR^ | DIV^ | MOD^ ) unaryExpression)*
> 	;
> 
> unaryExpression
> 	:	INC^ unaryExpression
> 	|	DEC^ unaryExpression
> 	|	MINUS^ {#MINUS.setType(UNARY_MINUS);} unaryExpression
> 	|	PLUS^  {#PLUS.setType(UNARY_PLUS);} unaryExpression
> 	|	unaryExpressionNotPlusMinus
> 	;
> 
> unaryExpressionNotPlusMinus
> 	:	BNOT^ unaryExpression
> 	|	LNOT^ unaryExpression
> 
> 		// use predicate to skip cases like: (int.class)
> 	|	(LPAREN builtInTypeSpec[true] RPAREN) =>
> 		lpb:LPAREN^ {#lpb.setType(TYPECAST);} builtInTypeSpec[true] RPAREN!
> 		unaryExpression
> 
> 		// Have to backtrack to see if operator follows.  If no operator
> 		// follows, it's a typecast.  No semantic checking needed to parse.
> 		// if it _looks_ like a cast, it _is_ a cast; else it's a "(expr)"
> 	|	(LPAREN classTypeSpec[true] RPAREN unaryExpressionNotPlusMinus)=>
> 		lp:LPAREN^ {#lp.setType(TYPECAST);} classTypeSpec[true] RPAREN!
> 		unaryExpressionNotPlusMinus
> 
> 	|	postfixExpression
> 	;
> 
> // qualified names, array expressions, method invocation, post inc/dec
> postfixExpression
> 	:
> 	/*
> 	"this"! lp1:LPAREN^ argList RPAREN!
> 		{#lp1.setType(CTOR_CALL);}
> 
> 	|	"super"! lp2:LPAREN^ argList RPAREN!
> 		{#lp2.setType(SUPER_CTOR_CALL);}
> 	|
> 	*/
> 		primaryExpression
> 
> 		(
> 			/*
> 			options {
> 				// the use of postfixExpression in SUPER_CTOR_CALL adds DOT
> 				// to the lookahead set, and gives loads of false non-det
> 				// warnings.
> 				// shut them off.
> 				generateAmbigWarnings=false;
> 			}
> 		:	*/
> 			DOT^ "this"
> 
> 		|	DOT^ ta1:typeArguments!
> 			(	IDENT
> 				(	lp:LPAREN^ {#lp.setType(METHOD_CALL);}
> 					{astFactory.addASTChild(currentAST, #ta1);}
> 					argList
> 					RPAREN!
> 				)?
> 			|	"super"
> 				(	// (new Outer()).super()  (create enclosing instance)
> 					lp3:LPAREN^ {#lp3.setType(SUPER_CTOR_CALL);}
> 					{astFactory.addASTChild(currentAST, #ta1);}
> 					argList
> 					RPAREN!
> 				|	DOT^ ta2:typeArguments! IDENT
> 					(	lps:LPAREN^ {#lps.setType(METHOD_CALL);}
> 						{astFactory.addASTChild(currentAST, #ta2);}
> 						argList
> 						RPAREN!
> 					)?
> 				)
> 			)
> 		|	DOT^ newExpression
> 		|	lb:LBRACK^ {#lb.setType(INDEX_OP);} expression RBRACK!
> 		)*
> 
> 		(	// possibly add on a post-increment or post-decrement.
> 			// allows INC/DEC on too much, but semantics can check
> 			in:INC^ {#in.setType(POST_INC);}
> 	 	|	de:DEC^ {#de.setType(POST_DEC);}
> 		)?
>  	;
> 
> // the basic element of an expression
> primaryExpression
> 	:	identPrimary ( options {greedy=true;} : DOT^ "class" )?
> 	|	constant
> 	|	"true"
> 	|	"false"
> 	|	"null"
> 	|	newExpression
> 	|	"this"
> 	|	"super"
> 	|	LPAREN! assignmentExpression RPAREN!
> 		// look for int.class and int[].class
> 	|	builtInType
> 		( lbt:LBRACK^ {#lbt.setType(ARRAY_DECLARATOR);} RBRACK! )*
> 		DOT^ "class"
> 	;
> 
> /** Match a, a.b.c refs, a.b.c(...) refs, a.b.c[], a.b.c[].class,
>  *  and a.b.c.class refs.  Also this(...) and super(...).  Match
>  *  this or super.
>  */
> identPrimary
> 	:	ta1:typeArguments!
> 		IDENT
> 		// Further proof that java designers should lay off the drugs:
> 		// Syntax for method invocation with type arguments is
> 		//   <String> foo ("bla")  instead of  foo <String> ("bla")
> 		(
> 			options {
> 				// .ident could match here or in postfixExpression.
> 				// We do want to match here.  Turn off warning.
> 				greedy=true;
> 				// This turns the ambiguity warning of the second alternative
> 				// off. See below. (The "false" predicate makes it non-issue)
> 				warnWhenFollowAmbig=false;
> 			}
> 			// bah, great, we have a new nondeterminism because of those
> 			// stupid typeArguments... only a syntactic predicate will help...
> 			// The problem is that this loop here conflicts with
> 			//  DOT typeArguments "super"  in postfixExpression (k=2)
> 			// A proper solution would require a lot of refactoring...
> 		:	(DOT typeArguments IDENT) =>
> 				DOT^ ta2:typeArguments! IDENT
> 		|	{false}?	// FIXME: this is very ugly but it seems to work...
> 						// this will also produce an ANTLR warning!
> 				// Unfortunately a syntactic predicate can only select one of
> 				// multiple alternatives on the same level, not break out of
> 				// an enclosing loop, which is why this ugly hack (a fake
> 				// empty alternative with always-false semantic predicate)
> 				// is necessary.
> 		)*
> 		(
> 			options {
> 				// ARRAY_DECLARATOR here conflicts with INDEX_OP in
> 				// postfixExpression on LBRACK RBRACK.
> 				// We want to match [] here, so greedy.  This overcomes
> 				// limitation of linear approximate lookahead.
> 				greedy=true;
> 			}
> 		:	(	lp:LPAREN^ {#lp.setType(METHOD_CALL);}
> 				// if the input is valid, only the last IDENT may
> 				// have preceding typeArguments... rather hacky, this is...
> 				{if (#ta2 != null) astFactory.addASTChild(currentAST, #ta2);}
> 				{if (#ta2 == null) astFactory.addASTChild(currentAST, #ta1);}
> 				argList RPAREN!
> 			)
> 		|	( options {greedy=true;} :
> 				lbc:LBRACK^ {#lbc.setType(ARRAY_DECLARATOR);} RBRACK!
> 			)+
> 		)?
> 	;
> 
> /** object instantiation.
>  *  Trees are built as illustrated by the following input/tree pairs:
>  *
>  *  new T()
>  *
>  *  new
>  *   |
>  *   T --  ELIST
>  *           |
>  *          arg1 -- arg2 -- .. -- argn
>  *
>  *  new int[]
>  *
>  *  new
>  *   |
>  *  int -- ARRAY_DECLARATOR
>  *
>  *  new int[] {1,2}
>  *
>  *  new
>  *   |
>  *  int -- ARRAY_DECLARATOR -- ARRAY_INIT
>  *                                  |
>  *                                EXPR -- EXPR
>  *                                  |      |
>  *                                  1      2
>  *
>  *  new int[3]
>  *  new
>  *   |
>  *  int -- ARRAY_DECLARATOR
>  *                |
>  *              EXPR
>  *                |
>  *                3
>  *
>  *  new int[1][2]
>  *
>  *  new
>  *   |
>  *  int -- ARRAY_DECLARATOR
>  *               |
>  *         ARRAY_DECLARATOR -- EXPR
>  *               |              |
>  *             EXPR             1
>  *               |
>  *               2
>  *
>  * Note that the typeArguments are no error here, you can write
things like:
>  * 		Foo f = new <Bar> Foo <Baz> ();
>  * The first type arguments are for the constructor, the second for
the class.
>  */
> newExpression
> 	:	"new"^ typeArguments type
> 		(	LPAREN! argList RPAREN! (classBlock)?
> 
> 			//java 1.1
> 			// Note: This will allow bad constructs like
> 			//    new int[4][][3] {exp,exp}.
> 			//    There needs to be a semantic check here...
> 			// to make sure:
> 			//   a) [ expr ] and [ ] are not mixed
> 			//   b) [ expr ] and an init are not used together
> 
> 		|	newArrayDeclarator (arrayInitializer)?
> 		)
> 	;
> 
> argList
> 	:	(	expressionList
> 		|	/*nothing*/
> 			{#argList = #[ELIST,"ELIST"];}
> 		)
> 	;
> 
> newArrayDeclarator
> 	:	(
> 			// CONFLICT:
> 			// newExpression is a primaryExpression which can be
> 			// followed by an array index reference.  This is ok,
> 			// as the generated code will stay in this loop as
> 			// long as it sees an LBRACK (proper behavior)
> 			options {
> 				warnWhenFollowAmbig = false;
> 			}
> 		:
> 			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);}
> 				(expression)?
> 			RBRACK!
> 		)+
> 	;
> 
> constant
> 	:	NUM_INT
> 	|	CHAR_LITERAL
> 	|	STRING_LITERAL
> 	|	NUM_FLOAT
> 	|	NUM_LONG
> 	|	NUM_DOUBLE
> 	;
> 
> 
>
//----------------------------------------------------------------------------
> // The Java scanner
>
//----------------------------------------------------------------------------
> class JavaLexer extends Lexer;
> 
> options {
> 	exportVocab=Java;      // call the vocabulary "Java"
> 	testLiterals=false;    // don't automatically test for literals
> 	k=4;                   // four characters of lookahead
> 	charVocabulary='\u0003'..'\u7FFE';
> 	// without inlining some bitset tests, couldn't do unicode;
> 	// I need to make ANTLR generate smaller bitsets; see
> 	// bottom of JavaLexer.java
> 	codeGenBitsetTestThreshold=20;
> }
> 
> {
> 	/** flag for enabling the "assert" keyword */
> 	private boolean assertEnabled = false;
> 	/** flag for enabling the "enum" keyword */
> 	private boolean enumEnabled = false;
> 
> 	/** Enable the "assert" keyword */
> 	public void enableAssert() { assertEnabled = true; }
> 	/** Disable the "assert" keyword */
> 	public void disableAssert() { assertEnabled = false; }
> 	/** Query the "assert" keyword state */
> 	public boolean isAssertEnabled() { return assertEnabled; }
> 	/** Enable the "enum" keyword */
> 	public void enableEnum() { enumEnabled = true; }
> 	/** Disable the "enum" keyword */
> 	public void disableEnum() { enumEnabled = false; }
> 	/** Query the "enum" keyword state */
> 	public boolean isEnumEnabled() { return enumEnabled; }
> }
> 
> 
> // OPERATORS
> QUESTION		:	'?'		;
> LPAREN			:	'('		;
> RPAREN			:	')'		;
> LBRACK			:	'['		;
> RBRACK			:	']'		;
> LCURLY			:	'{'		;
> RCURLY			:	'}'		;
> COLON			:	':'		;
> COMMA			:	','		;
> //DOT			:	'.'		;
> //ELLIPSIS		:	"..."	;
> ASSIGN			:	'='		;
> EQUAL			:	"=="	;
> LNOT			:	'!'		;
> BNOT			:	'~'		;
> NOT_EQUAL		:	"!="	;
> DIV				:	'/'		;
> DIV_ASSIGN		:	"/="	;
> PLUS			:	'+'		;
> PLUS_ASSIGN		:	"+="	;
> INC				:	"++"	;
> MINUS			:	'-'		;
> MINUS_ASSIGN	:	"-="	;
> DEC				:	"--"	;
> STAR			:	'*'		;
> STAR_ASSIGN		:	"*="	;
> MOD				:	'%'		;
> MOD_ASSIGN		:	"%="	;
> SR				:	">>"	;
> SR_ASSIGN		:	">>="	;
> BSR				:	">>>"	;
> BSR_ASSIGN		:	">>>="	;
> GE				:	">="	;
> GT				:	">"		;
> SL				:	"<<"	;
> SL_ASSIGN		:	"<<="	;
> LE				:	"<="	;
> LT				:	'<'		;
> BXOR			:	'^'		;
> BXOR_ASSIGN		:	"^="	;
> BOR				:	'|'		;
> BOR_ASSIGN		:	"|="	;
> LOR				:	"||"	;
> BAND			:	'&'		;
> BAND_ASSIGN		:	"&="	;
> LAND			:	"&&"	;
> SEMI			:	';'		;
> AT				:	'@'		;
> 
> 
> // Whitespace -- ignored
> WS	:	(	' '
> 		|	'\t'
> 		|	'\f'
> 			// handle newlines
> 		|	(	options {generateAmbigWarnings=false;}
> 			:	"\r\n"  // Evil DOS
> 			|	'\r'    // Macintosh
> 			|	'\n'    // Unix (the right way)
> 			)
> 			{ newline(); }
> 		)+
> 		{ _ttype = Token.SKIP; }
> 	;
> 
> // Single-line comments
> SL_COMMENT
> 	:	"//"
> 		(~('\n'|'\r'))* ('\n'|'\r'('\n')?)?
> 		{$setType(Token.SKIP); newline();}
> 	;
> 
> // multiple-line comments
> ML_COMMENT
> 	:	"/*"
> 		(	/*	'\r' '\n' can be matched in one alternative or by matching
> 				'\r' in one iteration and '\n' in another.  I am trying to
> 				handle any flavor of newline that comes in, but the language
> 				that allows both "\r\n" and "\r" and "\n" to all be valid
> 				newline is ambiguous.  Consequently, the resulting grammar
> 				must be ambiguous.  I'm shutting this warning off.
> 			 */
> 			options {
> 				generateAmbigWarnings=false;
> 			}
> 		:
> 			{ LA(2)!='/' }? '*'
> 		|	'\r' '\n'		{newline();}
> 		|	'\r'			{newline();}
> 		|	'\n'			{newline();}
> 		|	~('*'|'\n'|'\r')
> 		)*
> 		"*/"
> 		{$setType(Token.SKIP);}
> 	;
> 
> 
> // character literals
> CHAR_LITERAL
> 	:	'\'' ( ESC | ~('\''|'\n'|'\r'|'\\') ) '\''
> 	;
> 
> // string literals
> STRING_LITERAL
> 	:	'"' (ESC|~('"'|'\\'|'\n'|'\r'))* '"'
> 	;
> 
> 
> // escape sequence -- note that this is protected; it can only be called
> //   from another lexer rule -- it will not ever directly return a
token to
> //   the parser
> // There are various ambiguities hushed in this rule.  The optional
> // '0'...'9' digit matches should be matched here rather than letting
> // them go back to STRING_LITERAL to be matched.  ANTLR does the
> // right thing by matching immediately; hence, it's ok to shut off
> // the FOLLOW ambig warnings.
> protected
> ESC
> 	:	'\\'
> 		(	'n'
> 		|	'r'
> 		|	't'
> 		|	'b'
> 		|	'f'
> 		|	'"'
> 		|	'\''
> 		|	'\\'
> 		|	('u')+ HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
> 		|	'0'..'3'
> 			(
> 				options {
> 					warnWhenFollowAmbig = false;
> 				}
> 			:	'0'..'7'
> 				(
> 					options {
> 						warnWhenFollowAmbig = false;
> 					}
> 				:	'0'..'7'
> 				)?
> 			)?
> 		|	'4'..'7'
> 			(
> 				options {
> 					warnWhenFollowAmbig = false;
> 				}
> 			:	'0'..'7'
> 			)?
> 		)
> 	;
> 
> 
> // hexadecimal digit (again, note it's protected!)
> protected
> HEX_DIGIT
> 	:	('0'..'9'|'A'..'F'|'a'..'f')
> 	;
> 
> 
> // an identifier.  Note that testLiterals is set to true!  This means
> // that after we match the rule, we look in the literals table to see
> // if it's a literal or really an identifer
> IDENT
> 	options {testLiterals=true;}
> 	:	('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> 		{
> 			// check if "assert" keyword is enabled
> 			if (assertEnabled && "assert".equals($getText)) {
> 				$setType(ASSERT); // set token type for the rule in the parser
> 			}
> 			// check if "enum" keyword is enabled
> 			if (enumEnabled && "enum".equals($getText)) {
> 				$setType(ENUM); // set token type for the rule in the parser
> 			}
> 		}
> 	;
> 
> 
> // a numeric literal
> NUM_INT
> 	{boolean isDecimal=false; Token t=null;}
> 	:	'.' {_ttype = DOT;}
> 		(	'.' '.' {_ttype = ELLIPSIS;}
> 		|	(	('0'..'9')+ (EXPONENT)? (f1:FLOAT_SUFFIX {t=f1;})?
> 				{
> 				if (t != null && t.getText().toUpperCase().indexOf('F')>=0) {
> 					_ttype = NUM_FLOAT;
> 				}
> 				else {
> 					_ttype = NUM_DOUBLE; // assume double
> 				}
> 				}
> 			)?
> 		)
> 
> 	|	(	'0' {isDecimal = true;} // special case for just '0'
> 			(	('x'|'X')
> 				(											// hex
> 					// the 'e'|'E' and float suffix stuff look
> 					// like hex digits, hence the (...)+ doesn't
> 					// know when to stop: ambig.  ANTLR resolves
> 					// it correctly by matching immediately.  It
> 					// is therefor ok to hush warning.
> 					options {
> 						warnWhenFollowAmbig=false;
> 					}
> 				:	HEX_DIGIT
> 				)+
> 
> 			|	//float or double with leading zero
> 				(('0'..'9')+ ('.'|EXPONENT|FLOAT_SUFFIX)) => ('0'..'9')+
> 
> 			|	('0'..'7')+									// octal
> 			)?
> 		|	('1'..'9') ('0'..'9')*  {isDecimal=true;}		// non-zero decimal
> 		)
> 		(	('l'|'L') { _ttype = NUM_LONG; }
> 
> 		// only check to see if it's a float if looks like decimal so far
> 		|	{isDecimal}?
> 			(	'.' ('0'..'9')* (EXPONENT)? (f2:FLOAT_SUFFIX {t=f2;})?
> 			|	EXPONENT (f3:FLOAT_SUFFIX {t=f3;})?
> 			|	f4:FLOAT_SUFFIX {t=f4;}
> 			)
> 			{
> 			if (t != null && t.getText().toUpperCase() .indexOf('F') >= 0) {
> 				_ttype = NUM_FLOAT;
> 			}
> 			else {
> 				_ttype = NUM_DOUBLE; // assume double
> 			}
> 			}
> 		)?
> 	;
> 
> 
> // a couple protected methods to assist in matching floating point
numbers
> protected
> EXPONENT
> 	:	('e'|'E') ('+'|'-')? ('0'..'9')+
> 	;
> 
> 
> protected
> FLOAT_SUFFIX
> 	:	'f'|'F'|'d'|'D'
> 	;



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list