[antlr-interest] Parsing Parts of Java Code
Terence Parr
parrt at cs.usfca.edu
Mon Jan 23 11:53:02 PST 2006
On Jan 23, 2006, at 10:29 AM, Matthias Gutheil wrote:
> Hi,
>
> am I'm right, that the filter=true option for the lexer doesn't
> help me with this grammer?
>
> http://www.antlr.org/grammar/1090713067533/index.html
That is a full grammar...you need to build some complex lexer rules
with filter=true to find the patterns you want.
Here is my fuzzy java parser that finds function calls, function
defs, class defs for ANTLR v3 but you can reverse engineer to v2 I
think.
Ter
lexer grammar FuzzyJava;
options {filter=true;}
IMPORT
: 'import' WS name=QIDStar WS? ';'
;
/** Avoids having "return foo;" match as a field */
RETURN
: 'return' (options {greedy=false;}:.)* ';'
;
CLASS
: 'class' WS name=ID WS? ('extends' WS QID WS?)?
('implements' WS QID WS? (',' WS? QID WS?)*)? '{'
{System.out.println("found class "+$name.text);}
;
METHOD
: TYPE WS name=ID WS? '(' ( ARG WS? (',' WS? ARG WS?)* )? ')' WS?
('throws' WS QID WS? (',' WS? QID WS?)*)? '{'
{System.out.println("found method "+$name.text);}
;
FIELD
: TYPE WS name=ID '[]'? WS? (';'|'=')
{System.out.println("found var "+$name.text);}
;
STAT: ('if'|'while'|'switch'|'for') WS? '(' ;
CALL
: name=QID WS? '('
{/*ignore if this/super */ System.out.println("found call "+
$name.text);}
;
COMMENT
: '/*' (options {greedy=false;} : . )* '*/'
{System.out.println("found comment "+getText());}
;
SL_COMMENT
: '//' (options {greedy=false;} : . )* '\n'
{System.out.println("found // comment "+getText());}
;
STRING
: '"' (options {greedy=false;}: ESC | .)* '"'
;
CHAR
: '\'' (options {greedy=false;}: ESC | .)* '\''
;
WS : (' '|'\t'|'\n')+
;
fragment
QID : ID ('.' ID)*
;
/** QID cannot see beyond end of token so using QID '.*'? somewhere
won't
* ever match since k=1 lookahead in the QID loop of '.' will make it
loop.
* I made this rule to compensate.
*/
fragment
QIDStar
: ID ('.' ID)* '.*'?
;
fragment
TYPE: QID '[]'?
;
fragment
ARG : TYPE WS ID
;
fragment
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
fragment
ESC : '\\' ('"'|'\''|'\\')
;
More information about the antlr-interest
mailing list