[antlr-interest] Parsing Parts of Java Code

Terence Parr parrt at cs.usfca.edu
Mon Jan 23 11:53:02 PST 2006


On Jan 23, 2006, at 10:29 AM, Matthias Gutheil wrote:

> Hi,
>
> am I'm right, that the filter=true option for the lexer doesn't  
> help me with this grammer?
>
> http://www.antlr.org/grammar/1090713067533/index.html

That is a full grammar...you need to build some complex lexer rules  
with filter=true to find the patterns you want.

Here is my fuzzy java parser that finds function calls, function  
defs, class defs for ANTLR v3 but you can reverse engineer to v2 I  
think.

Ter

lexer grammar FuzzyJava;
options {filter=true;}

IMPORT
	:	'import' WS name=QIDStar WS? ';'
	;
	
/** Avoids having "return foo;" match as a field */
RETURN
	:	'return' (options {greedy=false;}:.)* ';'
	;

CLASS
	:	'class' WS name=ID WS? ('extends' WS QID WS?)?
		('implements' WS QID WS? (',' WS? QID WS?)*)? '{'
         {System.out.println("found class "+$name.text);}
	;
	
METHOD
     :   TYPE WS name=ID WS? '(' ( ARG WS? (',' WS? ARG WS?)* )? ')' WS?
        ('throws' WS QID WS? (',' WS? QID WS?)*)? '{'
         {System.out.println("found method "+$name.text);}
     ;

FIELD
     :   TYPE WS name=ID '[]'? WS? (';'|'=')
         {System.out.println("found var "+$name.text);}
     ;

STAT:	('if'|'while'|'switch'|'for') WS? '(' ;
	
CALL
     :   name=QID WS? '('
         {/*ignore if this/super */ System.out.println("found call "+ 
$name.text);}
     ;

COMMENT
     :   '/*' (options {greedy=false;} : . )* '*/'
         {System.out.println("found comment "+getText());}
     ;

SL_COMMENT
     :   '//' (options {greedy=false;} : . )* '\n'
         {System.out.println("found // comment "+getText());}
     ;
	
STRING
	:	'"' (options {greedy=false;}: ESC | .)* '"'
	;

CHAR
	:	'\'' (options {greedy=false;}: ESC | .)* '\''
	;

WS  :   (' '|'\t'|'\n')+
     ;

fragment
QID :	ID ('.' ID)*
	;
	
/** QID cannot see beyond end of token so using QID '.*'? somewhere  
won't
*  ever match since k=1 lookahead in the QID loop of '.' will make it  
loop.
*  I made this rule to compensate.
*/
fragment
QIDStar
	:	ID ('.' ID)* '.*'?
	;

fragment
TYPE:   QID '[]'?
     ;

fragment
ARG :   TYPE WS ID
     ;

fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
     ;

fragment
ESC	:	'\\' ('"'|'\''|'\\')
	;




More information about the antlr-interest mailing list