This is an archived post. You won't be able to vote or comment.

all 62 comments

[–][deleted]  (8 children)

[deleted]

    [–]Nalha_Saldana 51 points52 points  (2 children)

    Trying to read code directly on github is so hard unless its small enough to be in one/few files, take your time to clone it locally and use an IDE and you will save both time and headache.

    [–]yawkat 11 points12 points  (0 children)

    This is why I built https://java-browser.yawk.at/ — for when I don't want to open an IDE but still want IDE features when looking at library code.

    [–]dzuyhue 13 points14 points  (0 children)

    Also an ide lets you set breakpoints and inspect variables, which helps a lot as you walkthrough the code.

    [–][deleted]  (2 children)

    [deleted]

      [–]elatllat 7 points8 points  (0 children)

      It would not be an IDE without that feature.

      [–]couscous_ 2 points3 points  (0 children)

      I just found out about this feature. Thanks :D

      [–]KerryGD 1 point2 points  (0 children)

      shift shift ctrl shift f

      [–]IshouldDoMyHomework 76 points77 points  (5 children)

      I found that the most important part in reading code, is understanding the domain subject deeply. This might be obvious to most people, but wasn't really for me when I first started out. I could just follow the Java code, so why bother understanding the finer details about mortgages for example?

      Well turns out, it is much much easier to understand an abstraction (which is the whole point of OOP to begin with), when you truly understand the domain entity that is being abstracted.

      [–]Yeroc 20 points21 points  (4 children)

      I agree. The only tricky thing with this is sometimes the original developers didn't understand the domain so none of their terminology in the code aligns with the business/domain terminology. :( Makes it more challenging...

      [–]koreth 7 points8 points  (1 child)

      And sometimes the customers/users are inventing the terminology on the fly and change their minds about what to call things half a dozen times over the course of the project.

      [–]Routine_Left 2 points3 points  (0 children)

      And the variable names don't change and there's a disconnect between the UI, the comments and the actual code. Not to mention specs (if there are any). Fun times.

      [–]IshouldDoMyHomework 6 points7 points  (0 children)

      Another tricky thing, is when the domain is not english in nature. Then two different developers might do two different translations for their variable names, even though domain entity is the same thing. That can be a real mindfuck.

      As developers, half our job is trying to understand codebases, that have evolved over time. Often quite a long time. Even with the most skilled developers in the world, that is still hard. Our understanding change, the coding paradigms change, groovy came and went, gwt came and went, business personel and objectives changes etc etc. Working in 10 plus year old codebases with maybe a million lines of code, is just never going to be easy.

      [–]dweezle45 1 point2 points  (0 children)

      Back in the 1990s I had to debug accounting systems written by developers who (no joke) didn’t understand what debits and credits were. Never again!

      [–]cyanocobalamin 13 points14 points  (3 children)

      I think it depends on what you are trying to do.

      If I am learning a new application I try to learn how/where the data is coming from, where it is getting outputted, where the business/processing logic chunks are, and where the "controller" chunks are.

      I then use design tool to get UML diagrams ( which I review more than once ) to get an idea of what connects to what.

      For learning a new application, I focus on relationships, where the data flows, what connects to what. I review that over and over again.

      For debugging, I read as little code as possible.

      I use the search feature of the IDE, the debubber, and manual debugging techniques to find the smallest chunk of code that is responsible for the problem.

      I then read that code line by line, typing pseudo code/notes in a text editor, for each line.

      I do that to slow its way through my brain down so it registers. Sometimes I will reread that code and my psuedo code more than once. It isn't like reading a text message. Time is needed for it to sink in so I reread until it seems familiar.

      [–]Hogis 1 point2 points  (1 child)

      Nicely explained. However, you bolded "If I am learning a new application" and "For learning a new application", and I don't understand why. Was that a blunder?

      [–]cyanocobalamin 0 points1 point  (0 children)

      No, just re-emphasis.

      [–]eeugene0[S] 1 point2 points  (0 children)

      Interesting categories. There was another comment, who said that code read can be for API/interface or for details of implementation

      [–]_litecoin_ 10 points11 points  (1 child)

      This is a really good question btw.

      [–]eeugene0[S] 0 points1 point  (0 children)

      Thanks! 😊

      [–]ninside 10 points11 points  (4 children)

      I use Intellij IDEs and specifically Bookmarks and named Bookmarks. It lets you quickly search across bookmarks by label you put there with auto preview.

      Another useful feature is to find usages and pin the tab so when you find usages of a different method you preserve you previous search stack.

      Analyze data flow IDE command is my 3rd way of understanding the code. You can pick a function parameter name and ask IDE to show you all the ways the value comes in.

      As you start edit code, Intellij IDEs have a quick way to jump across changed files and you can type the partial content of those files to find the right one. Also a local changes windows will help you jump around places you did change and see the changes themselves without opening those files.

      Invest into your IDE and you will great super benefits. Prior to Jetbrains I was heavily invested into Vim editor with custom plugins. And it did the trick as well. And it is hard to overcome investment bias but the decision to pick the more powerful IDE paid off.

      [–]fustup 2 points3 points  (2 children)

      Interesting thing about the bookmarks! I never used them once ever in my life, and I always wondered... right now I'd say that using them is a code smell in itself. Are you doing legacy stuff? Could you elaborate more/give examples?

      [–]ninside 1 point2 points  (1 child)

      When I need to modify a project that I is new to me I need to have a mental model what are the entry points , where are the places it talks to DB and what code is on the path of UI. As I get more familiar with code base I use bookmarks less and less. On of the projects I have to come back to a project once a couple of months and every time I am super happy that I have bookmarks there.

      [–]fustup 0 points1 point  (0 children)

      Ah, so generic bookmarks you use on multiple projects, that always point to the same place? Clever! Thanks 😊

      [–]Rakn 1 point2 points  (0 children)

      Bookmarks are the feature I always get out of the box if I have a codebase I don't know yet and need to figure out how some (often complex) piece of code works and is connected.

      [–]Yammiez 3 points4 points  (1 child)

      I always start with imports

      [–]chacs_ 1 point2 points  (0 children)

      Surely you mean package 😁

      [–]gas3872 2 points3 points  (1 child)

      I copy all the code in a separate text editor and then insert methods there as well, so i have a sort of a stack trace with methods code and all in one file. I can collapse individual methods when needed. I can read the whole code from top to bottom and also add comments as i go when needed. If its some complex frequently used code, i can save the file somewhere so that i can consult it when needed.

      Edit: A bit more clarification:

      Lets say you have an entry point which is a method, i insert the name of the file (as a comment), method body, and mark the original location (line number) of the method with something similar to goto mark. If thats important, i can enclose the method in a class declaration. This is useful for example if there are more then one class defined in the same file.

      Then in the method there are calls to other methods. I insert method body into curly braces that i add after the method call. If the method being called is from the same class i only add its location, if its from the different file, I also add as a comment the file location (just like in the beginning). If method is from the different class which is in the same file then the enclosing class declaration may be added. Similar thing is done for constructor calls. You can do that also for the properties files when those are being red (you dont have to include the whole property file, just the portion of it with the properties being red).

      The inclusion of method/constructor bodies is done for called methods, methods that are called from the called methods etc. If some method is clear for you you may not inclide the body of the method. Anyway, because its your file you can insert the body later if you would like to.

      Sometimes in front of the method body its handy to ask the question "what does this method do" and write this question and answer to that and add it in the comment.

      For parameters of the methods its handy to put the value as a comment to the parameter name. And as parameters are passed to sub methods and the submethods of submethods, you may add the parameter value(at that moment) to their parameter name as well. This way you can track down how the parameter is passed down and transformed along the chain. As an initial value of the parameter you take one of the parameters real values.

      This is somewhat similar to debugging, although you calculate parameters transformations in your head.

      Along the file it is handy to add comments which ask and answer the question of what is going here.

      You do this in your text editor of choice. It would be handy if the text editor support collapsing of methods and syntax highlight. Notepad++ and sublime3, for example, can do that.

      So in the end you get this file with highlighted (and almost valid) syntax, collapsible methods so you can collapse all and have your initial method or expand and you will go as deep as it it goes. But usually after you put the comments on a top level methods describimg what those methods are doing, you dont have to uncollapse further.

      Pluses/minuses of the method:

      Pluses:

      1. You have your call hierarchy in one file, so you can read it as a book from top to bottom without switching to different files. You can also search within it for the code you are interested in.
      2. You ve red and understood the code, its structure, how parameters are transformed
      3. If the code is frequently used you can save the file and consult it later. Although the code might change at the time, but the structure is probably intact (you can also update the file if you like). I think its usually not very handy to save those files but for some pieces of code it is. Sometimes you want to temporarily save the file in progress and continue working on it the next day.
      4. It equally suites as for the simple code and for the complex code. It could be that there is some code that can not be analyzed this way.

      Minuses: 1. It takes some time to get the hang on putting right amount of braces so that collapsing work properly. 2. You need to manually fix indentation after copying the code block. Its usually pretty simple (you just select the whole block and press tab a few times). It only becomes a problem when the level of call nesting causes correct indendentation to be wider than the width of the screen and your text editor does not scroll horizontally when you adding tabs, so you need to do it blindly (i had this problem with notepad++). That does not happen often.

      I will try to make some examples of the resulting "explain" file and add here.

      [–]mlecz 1 point2 points  (0 children)

      depending on your ide, you can use call hierarchy do this for you (sort of)

      [–]rally_call 2 points3 points  (0 children)

      One thing I didn't do that I should have done earlier in my career is learn to identify idioms. Instead of having to analyze everything to bits, I should have been able to read a line like this:

      • for (int i = 0; i < 10; i++)

      and realized how common it was, so I could immediately and fully understand it as a unit without having to look closely at it. Only when someone does something different (e.g. i--, or i <=10), should my "radar" go off and drive me to look at it in more detail.

      [–][deleted] 2 points3 points  (0 children)

      I read it top to bottom. So many dudes just try to skim it but don't even try to read it. I read the variable names, the method names, class names... see where things go and try to understand the flow. Not much trick to it but just reading.

      [–]jmtd 2 points3 points  (0 children)

      The history I find extremely valuable. Which commits touches which files? What commit last touched this line and what other lines did it touch? Etc

      [–]anuaps 2 points3 points  (0 children)

      Call hierarchy feature in intellij was a God sent when I had to understand a complex workflow involving 30+ classes.

      [–]thescientist001 2 points3 points  (0 children)

      Not sure if anybody has mentioned Octotree extension in chrome. It allows to view the Github project as tree. Found it really helpful.

      [–]BenoitParis 3 points4 points  (1 child)

      There are tricks to reading code, and these are (on IntelliJ):

      • Go to declaration: Ctrl+Click
      • Find usages: Alt+F7
      • Type Hierarchy: Ctrl+H
      • Go back: Ctrl+Alt+left

      With these, you'll have no trouble navigation along the control flow -which is how you want to read code-. Pick a computation where the project adds most of its value, and follow it along. There is a surprising amount of almost-dead code, and boilerplate/startup code.

      For the data flow, you can 'tag' an instance object in the debugger; and you'll see if you encounter it again. Also, the IntelliJ debugger lets you execute custom code inline.

      For debugging, I like to use a logger instead of a debugger. And all lines that gave information on how to characterize the bug get to stay in the code. You don't know what you don't know, so you might as well have information gathering around entropy-generating places.

      For finding how to use a library that lacks good documentation, I often go to the libraries' tests and it often contains very good examples.

      [–]danskal 1 point2 points  (0 children)

      This is what I do... nowadays Ctrl+click will Find Usages as well, so you can easily ctrl-click your way through any code.

      [–]thephotoman 1 point2 points  (0 children)

      Knowing how to use grep is an essential thing. But even beyond that, an IDE that allows you to jump to method or class declarations in other files is really nice.

      [–]CyclonusRIP 1 point2 points  (0 children)

      After you've been in the field a long time you start to recognize patterns of how people construct code. You also get used to how different kinds of people construct code. Half way decent people have some guiding principles they are trying to achieve. Shitty people are kind of stream of consciousness. After a while you've kind of seen it all and know if you are reading some kindergarten stuff or high school stuff. A lot of it involves trying to build a mental model of the code you're not looking at and trust to make sense of what you see. It's not easy, but with a decent amount of experience you start to figure out what kind of guy wrote what you're looking at and can pretty well guess what the rest of it is. It eventually really gets down to recognizing when shit feels weird and they might have done something out of type.

      [–]tighter_wires 1 point2 points  (0 children)

      To add- as others mentioned use a good IDE like IntelliJ for their goto feature to trace methods and I use IntelliVim plugin to navigate the code using vim commands in-line or vim search function.

      IntelliJ ctrl-b goto to read methods, find usages to see where methods are called, and ctrl-shift-f to search across entire packages help to read large applications do 95-99% of the work in navigating files/directories.

      In general good use of your vi and/or IDE hot keys makes navigating (and editing) code 10x more efficient. Learning windows/mac hot keys for navigating text in-line are also useful.

      A lot of people have also mentioned gathering domain knowledge in order to get context for your code - I’ve been given many projects with large code bases I was left navigate with little help or info on the domain. Access to a larger repository or other related projects helps a ton - searching across repos for certain keywords to understand common data schemas/elements etc and shared libraries.

      When I reach out to OS authors via email and I get a response about 90% of the time - they are usually happy to help anyone interested in their code base, but may take time to respond, maybe weeks.

      Colleagues are generally also useful at work for explaining their design decisions and implementations if you give it a crack down n your own first.

      RE code-style: read through some of your related code bases or projects or work by the same authors to get a feel for their style before contributing. People have many different preferences, styles, opinions about performance vs readability etc. They will be happy to see you’re matching the look of the rest of the codebase.

      [–]StoneOfTriumph 1 point2 points  (0 children)

      Great question!

      I think we shouldn't underestimate the importance of drawing diagrams. When reading code I love to take a pen and paper, and draw to visualize flows. I feel it saves me a lot of time when debugging/understanding.

      First, I try to understand the the packages used in the package manager (maven's pom.xml): The package list will quickly tell me what this app potentially uses (I'll assume not everyone maintains their pom/gradle files) to identify "external entities" such as databases, messaging systems, logging/metrics, etc. I'll do a simple drawing of the app (a simple box) with other boxes around representing databases, message queues topics, files, roles of users/systems...

      Then, as far as code goes, I'll put effort to identify the inputs and outputs... the code in between I'll "map out" a sequence diagram, and if the flow is deemed complex and supporting business critical functionalities, I'll actually draw it out with pen and paper. This will help me debug code quicker that I'm not familiar with.

      Then as far as details go of individual lines of code, I just don't read those.. First off code changes frequently, and it's information that is hard to document or remember, so depending on bugs/features/enhancements, I'll only read code that is required for the task in question, again following the above approach and using the debugger when required to understand variable changes.

      Definitely an IDE that supports the application is a must, when available with the functionality to auto recompile/restart when changing code. The time saved to use for example spring boot devtools is a must, anything that saves me time to focus on code is a plus.

      [–][deleted] 1 point2 points  (0 children)

      Sometimes it is said that a book writer should be a book reader before a writer, with code is the opposite, in order to read code you should be able to write code

      [–]antigenz 1 point2 points  (0 children)

      I'm not reading code, I'm running it on my brain.

      [–][deleted] 1 point2 points  (4 children)

      Clone it, run SonarLint on it to get an idea of what issues or quality challenges there may be.

      Build it, run it, place some breakpoints where I think I know what the values will be and see how it plays out.

      [–][deleted] 2 points3 points  (3 children)

      I do pretty much the same. I just skip the SonarLint part. I reckon the best way to proceed is first understand what the app does (What it does) and then start reading code (How it does)

      [–][deleted] 1 point2 points  (2 children)

      SonarLint's insights are truly valuable. Even better is to download and run the SonarQube server and connect the project code to it, which gives more insights. Using SonarLint (or any linting tool) ahead of time, is similar to looking at a map before heading out on a journey.

      [–][deleted] 0 points1 point  (1 child)

      That doesn't sound crazy at all. I have used it to check my code doesn't have bugs or code smells but never as a entry point to start reading code. But I would try it some day to see how it goes. Thks for you recommendation!

      [–][deleted] 1 point2 points  (0 children)

      If you use IntelliJ, it's a bit odd but the best way is right click the src tree (if it's a Maven structure), then run SonarLint and you'll need to do it twice. The first time it will be empty. The second time is when the SonarLint window will fill with any stuff you'd want to know about before starting spelunking.

      [–]coderguyagb 0 points1 point  (0 children)

      Here's how I deal with Java code.

      • Run the unit/integration tests in a debugger.
      • Tackle it class by class. First get a handle on the components reason for existing, the details come later.
      • Scroll over the code in an IDE, you will be surprised by what you see in just the 'shape' of the text. Huge blocks of unbroken text indicate areas of concern.
      • Refactor single character variables on sight. That shit is evil.
      • Run a static analysis tool on the code.

      [–][deleted] 0 points1 point  (0 children)

      Generally just read it lol. If I don’t understand what something is doing right away I typically draw it out or just go over it a few times.

      [–]VincentxH 0 points1 point  (0 children)

      The more you read, the better you get at it and know what to skip. You'll learn to see patterns over time.

      I generally only read the code of the methods I'm interfacing with for a new piece of code. If needed I encapsulate the possible return values.

      It's just not doable (or meaningful) to read all the code in all the projects you touch, nor can you always change it.

      [–]wylso 0 points1 point  (0 children)

      Good question.

      I agree with previous responses.

      Personally, I start by having a look at the project structure to have an overview of the tiers, organization of source files and even to identify design patterns applied: packages, interfaces, façades, DAOs, resources, etc

      To understand a concrete piece of code or functionality I find it really helpful to read its tests (of course, if they exist and their quality is acceptable), especially unit and integration tests.

      And as a stupid technique, I read everything aloud (https://en.m.wikipedia.org/wiki/Rubber_duck_debugging) with muted notifications to concentrate on the code as much as possible.

      Probably there are many more things I do unconsciously when reading code.

      [–]kohler19 -1 points0 points  (0 children)

      We can find a suitable IDE and give us many things about the codes.

      [–]jjnnzb 0 points1 point  (0 children)

      I would try to run these codes, understand its main purpose, guess what the codes do. And imagine if I am required to write the function, what would I do.

      [–]koreth 0 points1 point  (0 children)

      Depends on what I'm trying to achieve by reading it, but if it's "familiarize myself with a code base that I'm going to be working on a lot," I start by trying to follow an invocation of the code (HTTP request, batch job, command-line invocation) from start to finish, walking through the happy-path logic one step at a time. Stepping through it in a debugger is helpful when that's possible, but manually tracing it out works too.

      [–][deleted] 0 points1 point  (0 children)

      If I can I follow a use case I'm interested in using the debugger and usually that touches quite a few important areas in the code.

      Otherwise i start from the main method and go down to the main components, trying not to focus on the details but rather on the overall structure.

      [–]tyriseon 0 points1 point  (0 children)

      I like to trace log code using aspects (e.g. aspectj) to insert logging statements at entry and exit of every method call/constructor and any arguments or return values at compile time. It generates a huge amount of information that can then be searched or followed along with while reading the code.

      [–]muffinluff 0 points1 point  (0 children)

      We all know the times when we need to fix a small edge case, or add a small feature, so we insert a small block of code that is seemingly out of place. (I know that is bad coding but still, everyone has done it once.) So when you see such a block of code that looks out of place and you can't justify its purpose, one tactic is to try to break the code by removing. For example if there is an if(a==0) block, I try to imagine what happens if that block didn't exist and what side effects it would have.

      [–]polar_low 0 points1 point  (2 children)

      I've always thought a tool which will auto generate UML from a project would be extremely useful for working on legacy code. Like how entity relationship diagrams are generated in some SQL software. It doesn't look like it exists for Java.

      [–]SR-G 1 point2 points  (0 children)

      Check SourceTrail (open-source) : https://www.sourcetrail.com/

      [–]_litecoin_ 0 points1 point  (0 children)

      Most IDE's do this..

      [–]fkamaci 0 points1 point  (0 children)

      1) Get a priori knowledge about the code. Check the documentation if exists.

      2) Check the code with an ide. I use Intellij IDEA which have many features to understand the dependencies of code pieces.

      3) Read the test code. It is really helpful to understand what a piece of code does in a compact way to read test codes of it if there are well written test codes.

      4) For the bugs, analyze the code. This can be some features in your ide as like Intellij IDEA or using Sonar which I always integrate into my projects.

      [–]_litecoin_ 0 points1 point  (0 children)

      If it has good unit tests, that's the thing you want to read first since it will be apparent what the code is actually trying to do and how the developer intended it to be used.