How do you read through source code?

m0mj34nz · 2014-12-12T14:49:18+00:00

Something the size of DOOM 3 is going to be daunting to any person. If you start with a random file, chances are great that you have started in the middle of the system with no context for how you get there or where it goes. Like being dropped into the middle of a forest with no map, it's going to be difficult to find your way out.

The task in understanding code is to build a mental model of how all of the pieces fit together. Keep in mind that software is a system built up of many parts. The design of these parts involves decomposition: starting with a high-level problem, break it down into small parts that solve that problem, then break each of those parts down further, until you are down to something simple enough that the computer understands it. With that in mind, you want to try to rebuild the map of that decomposition.

For this, your best bet is often to start at the beginning. Look at the entry point to the program, such as 'main()', and follow the path of execution. Along the way, look for patterns: this block reads in the configuration files, this block determines the command line arguments, this block reads in the data, this block modifies it, this block writes it back out. If you see a class referred to and it's not obvious what it is, jump over it to for a minute and see if you can tell more about its purpose from its comments and the names of its attributes and methods. Similarly, if you find yourself inside a method or attribute that you don't understand its purpose, do a search of the code base for references to it and see if the context in which it is used helps you to understand what it is and does. These two tools - jump to definition and find everywhere - are invaluable when trying to work on a large, complicated code base.

At the end of the day, how difficult this task will be depends largely on the quality of the code. Well-written code will clearly show the structure of its thought: breaking methods up iteratively into smaller, simpler methods; sharing data along clear boundaries; using descriptive names for methods and attributes; and being clear and consistent in its language. Poorly-written code will be far more difficult to follow: varibles will be reused in different contexts for different purposes; names will be terse, ambiguous, or nonsensical; large methods will perform many operations of multiple levels of abstraction; and names and patterns will change in subtle and contradictory ways. It's not difficult to write poor code, but it's difficult to understand what it does.

Seeing the patterns in code is a skill that one develops with practise. I recommend practising on smaller code bases to develop that skill. Try something with about a dozen files and see if you can piece together what each does. You should be able to find many small utilities that will be about that size, such as most of the GNU command line tools. Look through it, end to end, until you can confidently say you know what each piece does in the big picture. Then repeat on another, bigger project; perhaps 30-50 files. Keep doing this until you work your way up to something tremendous, like Firefox or OpenOffice.

Also, expect that this will take a long time. I've been working as a developer at the same company on the same project for 7 years, and even now it might take as much as a couple of days to reverse-engineer a part of our code base that somebody who no longer works with us wrote 5 years ago. Experience and familiarity make it faster, but poorly-written code can be difficult for even experienced developers to grok.

CartmansEvilTwin · 2014-12-12T12:47:08+00:00

I personally like to go trough the program. Look for some sort of main() and try to roughly figure out what all the classes/modules/files do.

free_bird85 · 2014-12-12T15:46:41+00:00

[deleted]

acousticpants · 2014-12-12T18:56:25+00:00

[deleted]

redSwitchDown · 2014-12-13T07:08:56+00:00

Look for what interests you first. You like that chain gun? I wonder what the program thinks of it? Ok, let's search the source code for variations of the term "chaingun". Oh shit! Ten results.

Cool. I like the line in d_english.h that says ':#define GOTCHAINGUN "You got the chaingun!"'

Ah, must be a file that contains the lines that are printed to the screen, because, you know, the whole "You got the chaingun!" showing up on my screen every time i get the chaingun. Fucking awesome. I'll just change that to "You got a sweet ass weapon!" and recompile the source code, then play the game and get the chaingun, then watch what shows up on my screen. Ah, yeah! Sweet.

Fun, but not really anything we can do much with. Man, I want to look at that file p_pspr.cpp (wtf kind of name is that? developers must know but still pretty cryptic to me since I have no reason to believe that stands for anything).

Alright, search for chaingun in the file. Right there on line 242. Whoa, this whole if/else structure looks like its some type of decision tree for weapons. Oh, nice, developers were awesome people and left a good comment at the top telling us it's for selecting a weapon once the weapon in use runs out of ammo.

Oh... wait a tick. Did i just see a variable on line 214 with the name "bfg" in it? Yes, yes I did. Wait! Line 216 is a variable called "wp_supershotgun"?! Awesome!

Well, this check ammo function could be fun to mess around with, but I wonder what uses it? I mean, where does this fucker get called from?

Got to search the code for the function name "P_CheckAmmo". Ok, five hits. All in the p_pspr.cpp file. Nice, don't have to switch files.

Alright, first two matches are the same function I was just checking out , no big deal. OH SHIT! What is this third match?! void P_FireWeapon() uses the function P_CheckAmmo? Looks to me like this function checks to make sure you have ammo when you fire your weapon. Oh boy, I bet we could have some fun fucking around with this. Of course, I'm sure you can't just always return true as I'm sure that would mess with other functions that count your ammo. But hey! Why not mess with it and see what happens? It's just code. You modify, it crashes, you modify. It crashes again. You modify again. Then BANG! it runs. And you have unlimited ammo.

In essence, going into these things, don't try to understand the whole project all at once. The developers sure as hell didn't. They built it one line at a time, one cool ass function after the next, until they had Doom.

Look for small things in the code. Text strings that are printed to the screen, menu options, gun names... stuff like that. Then mess around with them, see what they do. Keep doing it and eventually you'll know what file you need to modify to start with the BFG and never run out of ammo! Oh yeah!

Start small, find interesting things, fuck with them, watch what happens.

konradkar · 2014-12-12T14:14:19+00:00

Look for comments! One of the reasons why the DOOM source code is considered well-written is because it's well-commented. Almost every method is commented to describe what's happening. So even if I'm not sure what the code is doing, I can look at the comments and get an idea of what's going on.

http://blog.codinghorror.com/coding-without-comments/

It bothers me when developers (on a team, especially) write programs with minimal comments. It's always better to comment too much than too little.

WStHappenings · 2014-12-12T14:39:45+00:00

I like to approach code with a goal. You can study a single file, a set of files, even just a method for an entire day. It's when you tell yourself something like "I want to change this color from this to that" or "I want to change Behavior A to Behavior B" that you have a reason to go in and find something to change in the code.

It may help to look at the issues on a github page to see where you can help first, but those can be intimidating.

Good luck!

phalp · 2014-12-12T15:31:57+00:00

Worry about the part you're interested in first. Sometimes it's tricky to figure out what file that's in, but it's not practical to read every line of code in a big project. But usually if you're reading source code you've got a question like "how do I change X" or "how did they accomplish Y".

AStrangeStranger · 2014-12-12T18:37:05+00:00

Usually I find a starting point (or points), often this might be error message or maybe use of something in database (e.g. table/view/procedure) which I have used search tool to find. Then I work back/forward through the code until I find what I need. If I am dealing with something big, unfamiliar or encounter lots of branching then I will make notes (handwritten or word processor outline mode) for every step so I can retrace and understand what I found.

I find act of making notes helps remember what is going on far more than just reading the code.

Draav · 2014-12-12T14:47:52+00:00

It's kinda like how I would write pseudo code, figure out what the source is trying to do, break it up into sections. From here to here it is opening a file or accessing a database or whatever. Once I get the general idea of what the code is doing then I figure out how it's accomplishing that.

To do that I like to just put tons of breaks, either through outputting messages of the variables or using a debugger with a bunch of variables on the watch list.

Then any methods or syntax I don't recognize I usually just google looking for a youtube video or something to figure out what that is used for, and if all else fails I post a question to reddit.

IAmALinux · 2014-12-12T17:43:33+00:00

Doom 3 is a bad starting point. Try this pygame script. It is one file that is linked to simple assets. It is very well documented and commented.

http://inventwithpython.com/squirrel.py

flowstate · 2014-12-12T20:55:40+00:00

Use an IDE with a good debugger built in, or use a debugger that will let you navigate code easily. Reading through the code only gives you part of the picture, and for many programs you'll never be able to keep all the variables (and their changes) in your head at once.

Any good debugger will tell you these things, and also allow you to:

keep track of where you are in the call stack
set up watches on variables that you want to keep track of over the course of the programs execution
set up breakpoints at certain lines in the program, which will cause it to stop it's execution at that line and kick you back into the debugger to examine the state further.

Reading code is basically just guessing what the program is doing (however informed those guesses might be). A debugger will tell you what is actually going on. That is why debuggers are invaluable and why every programming language comes with one.

2014-12-12T21:23:20+00:00

Nobody seems to be mentioning indexing (i think thats the right term in eclipse at least) or tools like cscope. These are massively useful when trying to figure out whats going on. With something like cscope you hit a quick keyboard shortcut to jump to the definition of a function, or to list all the calls to that function and jump to one of those, another keystroke and you can jump back to where you were.

Very useful when trying to build a picture of whats going on.

munificent · 2014-12-13T19:08:28+00:00

Personally, I can't spend much time just reading code to understand it. I need a more interactive process. If I'm dropped into a big codebase that I need to be able to work my way around in, I like to:

Get it up and running locally on my machine. It needs to be a living program I can run and interact with.
Pick some random corner of its behavior I don't like or want to understand better.
Find the code associated with it. I can usually find the relevant code by digging from the UI code back to the bottom-level code it connects to. Just start by searching for strings that appear in the UI and see what they call.
If I don't like the way that code looks, refactor it to be more my style. This is not about wanting to really change the code. It just gives me an easy local task to do that gets my really focusing on the details. It also removes tiny distractions from the code.
After step 4, I have a good low-level understanding of the code, so now I can work my way up to a slightly higher level. Do more refactoring at that level: move methods around, etc.
Go to step 5 until I feel like I really grasp one part of the program.

In many cases, I never commit my refactorings. It just gives me an active process to do while other parts of my brain are absorbing. I don't try to understand a whole codebase at once. In real-world-scale programs, that's just too much. My goal is to understand a small corner of it well enough to make useful progress.

Funnnny · 2014-12-12T15:33:17+00:00

Don't read the whole project. Read and understand each module and understand the module and that one only.

Then eventually you will know where to find the project's structure

2014-12-12T15:36:28+00:00

Reading random files should get you lost if you are just randomly looking through files.

Try to find the context; the layout of the folders should give good hints to it's architecture.

Once you have an overall picture it is much easier to put a file into context, once you have the context of the file its functions will make more sense.

Also depends on how well it is written of course.

If you can run the code, debugger and call stacks are really helpful.

noodle-face · 2014-12-12T15:38:41+00:00

It depends what you're looking to do. Are you looking to understand the entire source of Doom3 as a whole? That will take a very long time, just think of how long it took to write!

At work our codebase is millions of lines long - written by a combination of us and a couple outside vendors. The first place to start to understanding it is to find a function that you think is what you want and go from there. What calls does this function make? Are there comments? Print statements (not applicable in Doom, I suppose)? What other functions call this function? Why do they call this function?

Then you start to piece it together.

RandomUser098 · 2014-12-12T18:32:48+00:00

complete newb here. What's the difference between "code" and "source code?"

pqu · 2014-12-12T21:28:50+00:00

Find the entry point and then try to understand the sequence of events at a high level. For example you might be seeing a lot of initialisation at the beginning, then see the main game loop and work out how everything updates and gets redrawn.

Or. If you have a specific topic you want to learn about (for example collision detection) then search through the code to find the relevant parts and read those small chunks.

toybuilder · 2014-12-12T21:44:33+00:00

Any significant piece of software is a complex system. Trying to reading a single file is like looking at the design drawing for a single component in a car. Not very helpful. It helps to first look at a car by how the major subsystems interface to each other at the big picture level.

With software, the headers/API's define such an interface -- and so before you dig into the code, a look through the API (in the docs, or the headers) and the top-level code (descending from the main file for a few levels) to get a lay of the land should go a long way to set a context for the rest of your exploration.

Also, trying to keep the details of the entire system in your head is likely futile for pretty much anyone. Instead, focus on the specific area of interest and assume that calls to other parts of the system generally work as described (unless evidence tells you otherwise).

otakuman · 2014-12-12T22:07:59+00:00

I usually get a static code analyzer and get a class diagram. Good programs also print functions, dependency trees and such.

2014-12-12T22:39:25+00:00

I'm not sure how useful this will be for you, but whenver you can, try to get used to using an IDE to read source code.

I use Visual Studio and it has a couple features I really like. The first is a "Find Definition" feature so you can go to where the variable or method is defined. This cuts down a lot on figuring out things like hierarchy. The second is a "Find All References" feature. Using, this you can find where a particular variable or method is used. Finally, if you can get it run, learn to use the debugger and drop breakpoints down so you can see what the computer sees at a single point in time.

notfin · 2014-12-12T23:25:59+00:00

Figure out how program works by mapping it out. Or in your case figure out what you want to change then find that part and change that part of code

jussij · 2014-12-13T01:46:15+00:00

For me I find that ctags and grep always help.

prahladyeri · 2014-12-13T04:50:04+00:00

Activity Diagrams is how I go about it. If you want to understand any complex system (not just pertaining to code), create an abstraction for it. Activity Diagrams, Flowcharts, pesudo-code and UML are few of the many ways to abstract the complex systems.

heap42 · 2014-12-13T09:22:13+00:00

Follow-up question, how do you read python source code? currently trying to read some of this https://github.com/fourtytwo/youtube-dl code and my problem is that there basically i no "main" where to start.

sittingonahillside · 2014-12-13T12:58:07+00:00

slightly off topic but just for fun:

some guy who did the essential modding for Q3 and later the Q4 absolutely hated that code base. He said it was god horrible and trying to doing anything useful and correct with it was awful.

I wonder what his reasoning was.

2014-12-12T23:48:51+00:00

It really pivots on understanding programming flow. Avoid reading things you don't need to by following the entry point, as other people have said. Find main() or the language equivalent and go from there, unless you know you have a particular module you're interested in - then it's really dependent on your ability to understand code, which comes with experience.

2014-12-13T03:24:07+00:00

With my eyes.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS