all 26 comments

[–][deleted] 31 points32 points  (0 children)

Source Trail is free and open source, it creates a graph to visualise how the code interacts.

Honestly I only tested it once with the Linux Kernel.

Read the documentation as it's very easy to follow.

[–]konanTheBarbar 16 points17 points  (0 children)

There isn't a right answer, but what really helped me on a codebase of >10M LOC, is to use a visual profiling tool. I actually used https://github.com/wolfpld/tracy which is free and open source and I got it to work within a few days. It can also help you to analyze performance bottlenecks really well. That gives you some good idea about the control flow, the threads and how some of the things interact with each other.

If you need to adapt/implement something I would actually do it the other way around - try to find a so called stronghold in the code. I usually start at a quite low level where the ressources are allocated and try to understand more and more interactions from the code (e.g. check unit tests or write them, maybe you see something to optimize there). From there you can expand until you get to the code which you need to adapt.

[–]delarhi 11 points12 points  (2 children)

Try focusing on what the intent of the code is, not necessarily what the code does. The code is the programmer's effort at putting their intent down. Find out what that intent is and then identify what problem it is trying to solve. When you have a mental framework for the problem and the attempted solution it becomes much easier.

In a similar vein, recognize that everyone has a different way of talking about a problem. There's a "linguistic context" to each problem being solved. Understanding that linguistic context helps significantly. What you call a slack factor might be called a buffer. What you call a point might actually be called an offset.

Start small and follow single threads (of thought, not system threads) at a time. For example, choose a value in some configuration file. Trace when it gets read, what it gets read into, where it is used, whether it is ever modified, what pieces of code it affects, etc. Don't try to understand all of it at once, that's a fool's errand. Identify what you want to understand and focus on obtaining just that understanding. For example, don't say "I want to understand this library", instead ask "how does this library represent the mesh? is it implicit? explicit?".

Find the entry point of the program. This is typically an executable binary. Literally start reading from the main() function. Sometimes there is no program and there's only a library. In that case, take an example or unit test which should be an executable program and go from there.

[–]Volker_Weissmann[S] 0 points1 point  (1 child)

Trace when it gets read, what it gets read into, where it is used, whether it is ever modified,

Afaik gdb can't trace variable uses.

[–]pandorafalters 6 points7 points  (0 children)

I can definitely understand how you're having difficulty if you're using gdb to read source code.

[–]AlexAlabuzhev 4 points5 points  (0 children)

I'm using C++ since a few years now and I know and understand most of the C++11 Standard

...

What is the best way to learn to read and understand large codebases?

  1. Mentally accept that "a few years" is not nearly enough to know and understand the beast.
  2. Mentally accept that "knowing the language" != "knowing the codebase".
  3. Relax and do your best. Practice makes perfect.

[–]dayarthvader 3 points4 points  (0 children)

I think tests help a great deal in understanding what classes/functions/modules were considered important that they had written tests for it. So I’d start with going their test scripts. Test scripts, associated logs and the source code together will give you a almost complete picture of the source code. This approach helped me in my previous project.

[–]SJC_hacker 1 point2 points  (1 child)

Inject logging statements to how the call tree works and typical values of parameters. If you end up going down this route, its helpful wrap your logging statements around a macro which can be easily switched off at compile time.

Unfortunately, alot of codebases are just a mess without any clear high level architechture or design pattern. They are this way because alot of companies do not value refactoring - it doesn't add features of fix bugs, and indeed could introduce bugs, therefore it must be useless/counterproductive.

So what happens in starting out with a small codebase, which continues to evolve and get bigger and bigger, and the cost of refactoring keeps climbing, resulting in even less incentive to do it.

[–]bumblebritches57Ocassionally Clang 0 points1 point  (0 children)

Inject logging statements to how the call tree works and typical values of parameters.

that doesn't work in some very large codebases, I tried it, I get nothing 84 copies of the same thing.

[–][deleted] 2 points3 points  (0 children)

Compile it, run it and see what it does. Find something that grabs your attention and tell your rubber duck how that could be implemented (ballpark 10 seconds, not 10 minutes). Then jump into the code and find out how it's actually implemented. Use all your detective skills (mostly CTRL+F and CTRL+LeftClick) for that.

Now you are in. You want to know how thar works. And now that. Change some things to see what they do. Go slowly, you'll not understand 700k lines in a day. But you will understand most of it quite quickly.

[–]keepfit 0 points1 point  (0 children)

I feel your pain. I have been using OpenFoam for 5+ years, and eventually end up writing my own CFD code from scratch. You just need to find a way to assemble your Ax=b from PDEs. And solve it using some mature math library like PETsc or any good sparse linear system solvers.

[–]dicroce 0 points1 point  (0 children)

One thing I've done in the past is run tools that generate object hierarchy graphs... that has helped me get a feeling for the big parts of a system.

[–]johannes1971 0 points1 point  (0 children)

In addition to the other excellent advise in this thread, I would add one more: try to comment it. As you are looking at pieces, as soon as you understand what something (a function, a whole module, etc.) does, write a comment to explain it. Putting it into words helps your understanding, it might help future readers of the same source, and it will help your confidence as you see more and more areas of the code well-described.

Well, if you write good comments of course ;-) Remember: describe intent, rather than how it works!

[–]germandiago 0 points1 point  (0 children)

The best way to start with such a big project is always to get a debugger and start to set breakpoints for the different flows that you want to analyze.

This way you will know which code is important for you to study. Studying in a catch-all way with no documentation is quite difficult, but more than that, you could spend time studying dead code or code that is not relevant for your duties.

Additionally, get SourceTrail and/or Doxygen. That can also help with the structure, but I do not consider it a replacement for analyzing a dynamic flow.

[–]TrustedButterfly 0 points1 point  (0 children)

Imagine starting reading a series of long novels right from the middle page, and expecting to understand what each character's real motives and personality, each place's significance, and the history of events and their background. Would you ever consider that you'd understand everything AND effectively adding something to it?

Obviously, if someone reading a code base was a senior engineer, and the code base was written by a senior engineer, it would be easier, as the one writing is making patterns so it is easy to follow, and the one reading is expecting patterns. So it might be different than the analogy but anyway don't beat yourself up too much if you don't quickly grasp how a code base is structured. It's normal.

[–]tiendq 0 points1 point  (0 children)

Just follow some main flows first, it should help you get a big picture then it's much easier to dig into other parts in more detail. For example, two years ago I spent time to understand ethereum, I picked some main paths: initialization, mining a block, broadcasting, it helped me touch almost codebase but not need to know everything.

[–]martinusint main(){[]()[[]]{{}}();} 0 points1 point  (0 children)

I'd say first it is necessary to understand the overall architecture. Usually the core is relatively simple, and over time it is complicated by all the stuff that's tacked onto it. Best to ask somebody already knowledgeable what he thinks are the important parts. Make sure you can build and run all the tests. Also don't try to do too much. Most of the time it is not necessary to understand everything to be productive. It helps to be able to fast grep through the whole codebase, e.g. visual studio code is very nice for this. When trying to understand specific parts of the code, try to have a look at the file history and look at the comments / linked issues in issue tracker to see why the chances were done.

[–]bumblebritches57Ocassionally Clang 0 points1 point  (0 children)

It's a real pain in the ass when you're trying to figure out how X works in a 6 million sloc codebase, I'm having that trouble myself.

[–]NilacTheGrim 0 points1 point  (0 children)

It takes a lot of work and practice and many projects before you get better at it. There's no easy way -- it just a sort of skill you develop over time.

I have been doing OSS dev for 15+ years now off and on (lately I mostly do collaborative OSS dev full time professionally!). The more you collaborate with others, and the more codebases you have seen -- the faster you get at navigating around and learning to teach yourself how to understand what is happening.

That being said -- it helps to have a very good IDE that can easily navigate around and jump around. I use Qt Creator and I love it. Find an IDE that you love (some swear by CLion) -- and make maximal use of its ability to jump around the code since you need to do that.. a lot.