[antlr-interest] Extract C Function Definitions Using Parser

Eric researcher0x00 at gmail.com
Mon Mar 19 05:02:50 PDT 2012


Hi Josh,

Here is what I would try.

The grammar should be creating an AST and the grammar has a
function_definition rule. I would use the function_definition rule to find
the start and end tokens making up the function and then if the tokens have
the start and end line and positions set, I would use those as a quick test
to see if I am correctly getting the functions as needed. If so, then do a
more advanced version pruning out the parts of the AST that aren't needed
in functions and reconstruct the functions from the tokens in the AST.

Some good general advice is:

"A common problem with novices attempting to implement language analysis is
to believe that their task is simplified by moving sophisticated tasks to
conceptually simple tasks. They will try to simplify semantic analysis by
creating a more detailed syntactic analysis and syntactic analysis by
creating a more detailed lexical analysis. Almost invariably they discover
that this attempt is fruitless and has to be undone, because it results in
poor error reporting, runs into conflicts as the implementation becomes
more complete, duplicates functionality in the later portions of the
analysis, and is hard to maintain." By William Clodius

In this case, let the parser do what it is best at, making sure the input
is valid and creating an AST. Don't create a pruned AST with the parser,
let the full AST pass onto another phase for AST analysis and
transformations. Let the AST transformations do the work, don't put an
additional burden on the parser of filtering out the functions.

Hope that helps, Eric

On Sun, Mar 18, 2012 at 11:55 PM, Joshua Garcia <joshuaga at usc.edu> wrote:

> Hi Everyone,
>
> I've been working on modifying an ANTLR C grammar so that it produces a
> parser that simply outputs function definitions it recognizes to different
> files. I need to do this in order to apply some information retrieval
> techniques to C source code.
>
> Is there a way to get the generated parser to recognize only the function
> definitions (including the function body) and comments while ignoring
> everything else? I've found it too troublesome to deal with comments so
> I've been ignoring them for now.
>
> If not, is there a way to get the generated parser to recognize only the
> function definitions (including the function body) and ignore everything
> else? I've been able to modify the grammar so that it can recognize a large
> majority of the functions in pre-processed files of a version of bash.
> However, the pre-processed files tend to transform some function definition
> text to extern declarations. Therefore, I lose function definition text
> that I need. Furthermore, the parser does not ignore everything else that's
> not part of a function definition, but instead, I've added rules to the
> grammar in order to recognize as much of the bash version I'm parsing as
> possible.
>
> In particular, I've been trying to use this grammar:
>
> http://www.antlr.org/grammar/1153358328744/C.g
>
> Thanks,
> Josh
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list