[antlr-interest] C++ parser usage ideas

Micheal J open.zone at virgin.net
Thu Oct 13 10:37:04 PDT 2005


> My first problem (which I have resolved) was that the parser 
> couldn't handle macro code at all. so something like:
> 
> class MY_API FooBar {
> //code here
> };
> 
> would cause exceptions to be thrown. I found a preprocessor 
> library which seems to work quite well, that can replace 
> these macros with their real values.
> 
> I can create my custom code graph from this modified text. 
> Each node in this graph contains the line it's on, and it's 
> column, or start position.
> 
> So now I have another problem, which I am hoping folks here 
> may have some ideas about how best to tackle:
> 
> All the line/col positions are based on the *modified*, 
> pre-processed code. Ideally I want this information so that I 
> can use it, say, to position the cursor at a given position 
> in the editor, or to replace/modify a chunk of text that 
> corresponds to that node. But the "real" positions need to be 
> based on the original code, so I need some sort of 
> translation back from one (parsed code) to the other (original code).
> 
> Has anyone done anything like this? Is there a better way to 
> tackle this?

Depends on whether you need to tackle arbitary codebases or just code that
you have some control over. For sane code without extreme macro-isms (e.g.
Akilesh's CONCAT sample), you can parse the unpreprocessed code into a graph
that maintains both the macro definitions and their uses (with links between
them). Thus for any given set of compile-time symbols, you can generate the
preprocessed text. This implies that your graph must be able to support
multiple definitions for any given symbol.

I define sane code as code that would parse correctly if the preprocessor
#xxx lines were deleted. For some classes of insane code, it may be possible
to write a filter that converts the insane code to sane code by sanitizing
the use of macros.

> Is there a way to modify the orignal C++ grammar 
> to just skip over the macros entirely (this would be great, 
> as it would get around the whole issue).

Perhaps but (as per my response above), the source code that remains after
excising/ignoring macros may not be legal C++.

Cheers,

Micheal





More information about the antlr-interest mailing list