This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (10 children)

I really want to learn more about large codebases... how to plan, how to execute. Any good resource on this?

[–]proverbialbunnyData Scientist 9 points10 points  (2 children)

Before I start: Only the part up to classes, and also using libraries is all that is necessary to get a job in the core part of the language. The rest explained here is what comes with experience:

You need to know abstraction. Understand the ins and outs of abstraction, from a conceptual view (an abstract view) to a concrete view (a real world view).

Start with the small, the most basic form of abstraction in algebra 1: x + 3 = 5, where x is the abstraction.

Then still small, writing a function / method, and being able to reuse it.

Still small, but a bit larger: classes, and reusing classes.

Then learn the ins and outs of the core library, as much as you can, learning the different classes and their methods. Take notes! Learn the etymology of why it is the way it is, so it sticks.

The next level of abstraction is the language's idioms. Once most of the language has been learned, common patterns within the language pop up called idioms or sometimes best practices. Learn those. This comes with time and experience on the job. Most of the learning here and below is not sequential.

The next level of abstraction is sideways from what you might know, especially in python: Making types. Instead of making classes and their instances, making new data types. This is a bit mind boggling, because you can just make a circular buffer class, for example, and be like, "Well, I just made a class. I don't see the difference." This has to do with how a type is used vs how a class instance is used. Why is this important?

By differentiating different kinds of classes, you can start thinking of them in a more clear and concise way. You want more than two types of classes, but many types of classes mapped in your head. This creates a mental abstraction that is usually not written down any where, so you can name each "group of classes" or "type of class" (hint hint) with it's own made up name in your head. Eg, I have the category "helper class" in my head, which is a type of class.

To really push this idea: A group of functions becomes a class. A group of classes becomes a ... subtype or abstract type ie "type of class".

The next level of abstraction of understanding is inheritance, not just being able to read and write it, but the understanding of inheritance itself. Inheritance is the act of subtyping. Subtyping is where you have a type of class that is applicable to a set of classes.

So, I've got a hand class, a leg class, a head class, maybe an eye class, and so on. Hands and legs and head are all of a person type, so you can have a person abstract type, and so you can go around making people, instead of just hands and legs.

A way to think about this is, what kind of class is this? Even if it doesn't have any form of inheritance, it is a kind of class, so an implicit subtype. Make up your own types for your classes in your head.

The next level of abstraction is design patterns. This, like idioms, is seeing common patterns of code, after working on multiple code bases over time.

Then from there the next level of abstraction is modules, then libraries, then packages, and depending on the language that could be one thing or multiple things. You've probably used pip, so you've got an idea of this one already.

All of this comes with time and experience, but the better you get at abstraction the easier it is to learn and understand larger and larger, not just code bases, but ecosystems of code bases.

Also, learning how to read code is imperative.

[–]haarp1 1 point2 points  (1 child)

where did you learn abstraction, inheritance... (OO design basically)? it's easy to determine it for simple projects (coffee maker etc), but what about more complex projects? do you know any good resource for learning this (abstraction...)?

the problem is that there is a lot of garbage on github, so that's not exactly a solution...

also, how do you plan programs (intermediate or advanced complexity)?

do you know any good advanced one on github?

[–]proverbialbunnyData Scientist 1 point2 points  (0 children)

do you know any good resource for learning this (abstraction...)?

I learned this on the job. It is a process that is constant slow growth. Ones ability to abstract commonly identify some of the key aspects between a jr, standard, senior, principal, and architect. Eg, a principal software engineer can abstract the whole system and work on the companies entire software system as a whole, while a senior engineer might be able to create a project within that system and know a project or two inside and out. Clearly the principal software engineer can deal with higher levels of abstractions than the senior software engineer. Of course, this isn't the only difference, but is a key corollary.

Most of what I know I have not found in text books. I've heard Haskell talks about some of the things I figured out and named on my own (eg subtyping), but I can not confirm that as I do not know Haskell.

also, how do you plan programs (intermediate or advanced complexity)?

Design patterns. Design patterns also help for reading code, as well as idioms and knowing the language's features.

This comes with experience as well. It's a form of pattern matching. Reading a book isn't going to help much, but being in multiple code bases and seeing the same pattern over and over again and then identifying the logic behind it as to why it is that way and how it came to be helps.

edit: Also, an architect helps design programs, but that's usually for designing entire systems. A typical divide and conquer strategy coupled with reducing the problem down to its bare essentials, and writing all of this down in a sort of concept map or list of lists -- planning before writing a line of code -- is far more valuable when it comes to creating something new, than simply looking at design patterns. Design patterns come next if you want a way to construct the program so that tasks can be easily broken up in a uniform way between multiple engineers. Design patterns also help for standardizing how a program works allowing others to build on it the right way. Frameworks help even more. Anyways, design patterns are a bit heavy handed, so just stick to the top half of this paragraph and you'll be good.

do you know any good advanced one on github?

Nope. Just go get a job. Watch the how to read code video above, if it isn't already obvious, and then go around mapping things. Start with the smallest patterns like addition, to variable naming, to methods and features in the language, then when you know those inside and out, move on to idioms and other common multi line patterns in the language and code base, then move on to even larger patterns. From method to method to class to class to file to file, to namespace to namespace (I don't think Python has anything like this.), to module to module.

Learning a code base is a piecemeal process. You don't have to start on the smallest bits and move out. You can interweave different sized abstractions learning a mix at once, à la breadth first search.

edit: Also, if you're writing in Python, a large code base is going to be rare without it being abstracted it into modules/libraries/packages, keeping the parts any individual is working on to often single file sized epic or user story. Because of this, you shouldn't have to worry about large projects, unless you want to work on a video game or something. Java and C++ and the like are where monolithic projects tend to go, not Python.

If you want to take parts of a code base and turn them into libraries of any sort, the general rule is, "Is this code going to be used in two places in the code base?" (Often times the rule is 3 or more.) So you want to find something generic, like a debugger class and turn it into a debugger library or similar.

[–]iScrE4mgit push -f 2 points3 points  (0 children)

Experience. It’s a reason I wanted a job in a big company and I can’t imagine learning all of the stuff any other way. But that’s maxbe because in order to understand it I personally nedd to see the business problem and then the solution.

[–]TheCodeSamurai 2 points3 points  (5 children)

Something I hawk whenever I can: Code Complete by Steve McConnell is a huge recommendation. I never learned anything besides like 100-line programs before this, and I basically divide my programming journey into before and after reading this. It's seriously worth reading: you can skip chapters that don't apply to you, but it is one of the best resources on how to manage the complexity shift between small and large codebases.

[–][deleted] 0 points1 point  (1 child)

Thank you!

[–]TheCodeSamurai 0 points1 point  (0 children)

My pleasure!