[antlr-interest] C++ parser usage ideas
Micheal J
open.zone at virgin.net
Thu Oct 13 10:37:04 PDT 2005
> My first problem (which I have resolved) was that the parser
> couldn't handle macro code at all. so something like:
>
> class MY_API FooBar {
> //code here
> };
>
> would cause exceptions to be thrown. I found a preprocessor
> library which seems to work quite well, that can replace
> these macros with their real values.
>
> I can create my custom code graph from this modified text.
> Each node in this graph contains the line it's on, and it's
> column, or start position.
>
> So now I have another problem, which I am hoping folks here
> may have some ideas about how best to tackle:
>
> All the line/col positions are based on the *modified*,
> pre-processed code. Ideally I want this information so that I
> can use it, say, to position the cursor at a given position
> in the editor, or to replace/modify a chunk of text that
> corresponds to that node. But the "real" positions need to be
> based on the original code, so I need some sort of
> translation back from one (parsed code) to the other (original code).
>
> Has anyone done anything like this? Is there a better way to
> tackle this?
Depends on whether you need to tackle arbitary codebases or just code that
you have some control over. For sane code without extreme macro-isms (e.g.
Akilesh's CONCAT sample), you can parse the unpreprocessed code into a graph
that maintains both the macro definitions and their uses (with links between
them). Thus for any given set of compile-time symbols, you can generate the
preprocessed text. This implies that your graph must be able to support
multiple definitions for any given symbol.
I define sane code as code that would parse correctly if the preprocessor
#xxx lines were deleted. For some classes of insane code, it may be possible
to write a filter that converts the insane code to sane code by sanitizing
the use of macros.
> Is there a way to modify the orignal C++ grammar
> to just skip over the macros entirely (this would be great,
> as it would get around the whole issue).
Perhaps but (as per my response above), the source code that remains after
excising/ignoring macros may not be legal C++.
Cheers,
Micheal
More information about the antlr-interest
mailing list