C++ file parser and function extractor
-
I need to develop a Qt/C++ software that reads in C++ source files, finds the dependencies, and copies only the code used from the dependent files into a file of the same name.
As an example. main.cpp calls foo(). foo() is declared in foofuncs.h and defined in foofuncs.cpp. foofuncs has many different functions other than foo(). I need to copy the foo() declaration from foofuncs.h to a new file (e.g. foofuncs.h.copy) which would just have the foo() declaration (and relevant includes). Similarly I would do the same to foofuncs.cpp.copy which would just contain the foo() definition (as well as the #include "foo.h", etc). This will likely need some kind of recursive process to go through all the includes, etc.
So my question is how can I do that? Can I use QRegularExpressions to find the code blocks? If so I need help with that. Also, is there any related open-source tools to be integrated in my Qt app to help me with that? Thanks
-
@alizadeh91
I don't have a solution for you, only a warning. If you intend to parse arbitrary C++ files from anywhere, rather than some terribly cut-down subset whose layout you know intimately, you're going to find this very hard to do. Very, very hard!Heck, you can't even write a grammar for parsing C++ source files because of the pre-processor, e.g. http://trevorjim.com/c-and-cplusplus-are-not-context-free/
- To parse C and C++, you start by using a very powerful preprocessor. These preprocessors are inevitably written by hand (they are not based on a theoretic foundation like regular expressions or context-free grammars).
As far as I am aware, there is no open source which will produce a standalone parse tree for you. The consensus (search the web) is that you really need to try to leverage an existing C++ compiler to do such a job. You could look at
gcc
, or perhaps more likelyclang
. I still don't think you'll get what you want out of them.If all you want to do is some hacky approximation which just extracts
#include
statements so that you can follow them then you could do something withQRegularExpression
s. Correctly locating & extracting the definition of functions in.cpp
/.h
files will be harder --- you'll probably end up making all sorts of assumptions as to where definitions start & end more on a line-by-line basis than via regular expressions or proper parsing. It may do you for your purposes, provided you are happy with some vague approximation which is not robust.Sorry, but good luck!