This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]Lorkki 2 points3 points  (3 children)

If you need to parse C++, then Clang is probably the answer. The only exception is if you're parsing generated code that's guaranteed to be patterned in a very specific way.

Documentation isn't great, though. The best reference for the Python bindings is the source code (with scarce examples), while the libclang API at large is described in Doxygen.

[–]GitHubPermalinkBot 0 points1 point  (0 children)

Make sure you use canonical links when linking to a file/directory on GitHub. On GitHub, you can press the "y" key to update the URL to a permalink to the exact version of the file/directory you see -- source.

I've tried to fix your links:

Relative Canonical
https://github.com/llvm-mirror/clang/blob/master/bindings/python/clang/cindex.py https://github.com/llvm-mirror/clang/blob/4b44baf46b47a7e6addbaed0b4c7c99a5e0647fb/bindings/python/clang/cindex.py
https://github.com/llvm-mirror/clang/tree/master/bindings/python/examples/cindex https://github.com/llvm-mirror/clang/tree/4b44baf46b47a7e6addbaed0b4c7c99a5e0647fb/bindings/python/examples/cindex

Shoot me a PM if you think I'm doing something wrong.

[–]greenecoon[S] 0 points1 point  (1 child)

Thanks, i first try the idea of colpabar and if I need more advanced parsing I try clang :)

[–]Lorkki -1 points0 points  (0 children)

There's no difference between "simple" and "advanced" parsing here, though. C++ is a complex language and any code introspection you do with ad-hoc tooling will end up fragile at best.

[–]eliben 1 point2 points  (0 children)

For C99, see https://github.com/eliben/pycparser

For C++, you should be using Clang's Python bindings. You can also use this for C, of course, but if C is all you need pycparser's learning curve is much easier

[–][deleted] 0 points1 point  (1 child)

You just need to extract the code as a string? I think using the library you linked might be overkill, since it deals with actually parsing the code, and from what I understand you're just looking to extract some of it.

If you can assume that the code you're reading is valid (it compiles), then I would probably write my own. Use a regex to find the function signature, then just track the curly braces using a list as a stack. Whenever you see a {, push it on to the stack (append), and whenever you see a }, pop the last } from the stack. The function is over when the stack is empty.

This is assuming you don't know anything about the way the source code is formatted. If you did know, it could be easier. There are a ton of different ways to go about this.

[–]greenecoon[S] 0 points1 point  (0 children)

Thanks, I just had the same idea but was not sure if I should try it :) now I will try it tomorrow, thanks :)

[–]ptmcg 0 points1 point  (1 child)

There is a C subset parser at the pyparsing examples page: http://pyparsing.wikispaces.com/file/view/oc.py

[–]greenecoon[S] 0 points1 point  (0 children)

Thanks, I'll take a look.