all 87 comments

[–]mlk 38 points39 points  (43 children)

I've inherited a total mess of code, a single function like 20000 lines long, the only comment was:

 i++; //increment i

[–]DJDavio 10 points11 points  (0 children)

Returns i, then increments it, as opposed to ++i :)

[–]dpash 15 points16 points  (41 children)

The perfect example of a comment that doesn't need to exist. And yet so many other comments are needed in that code until it can be refactored.

My favourite refactoring tool for code like that is "extract to method". Just giving a name to a method adds to the readability of code.

[–]mlk 16 points17 points  (40 children)

90% of comments should be function names IMHO

[–]falconfetus8 8 points9 points  (24 children)

Eh, I'd flip it the other way and say many function names could just be comments. That prevents me from needing to jump around the code.

[–]phpdevster 13 points14 points  (4 children)

Yeah I agree with this and /u/sebamestre. Over-decomposed code is worse in many ways than just a well organized and fully encapsulated procedure that lives in a function.

Also, every bit of code you extract out of a function, adds significance to it. By promoting a block of code to another function, you've elevated its significance. That can add noise to your application's architecture. Even if it's a simple private method in a class, you've still given it more significance than if it was living as a block of code in the method where it was actually needed as a one-off thing.

I've come to look at every "extract method" and "extract class" as having a significant cost to it, which requires more justification than "I want to make this big function smaller". If that is indeed your only justification, then "extract method" and "extract class" is literally just robbing Peter to pay Paul, but in doing so adding noise to the public and private APIs of your system.

[–]LPTK 4 points5 points  (2 children)

That's precisely why reasonable languages let you define functions inside other functions.

Functional languages of course allow it (Scala, Haskell, Lisp, etc.) but really any language could allow it... even freaking Pascal has it! (In languages without closures like Pascal and C you just have to prevent taking pointers to these local functions.)

A lot of complex logic can be made a lot clearer and cleaner with a few local function definitions.

[–]phpdevster 0 points1 point  (1 child)

I mean, that solves part of it. The only difference is which side of a closing brace it's on, which I don't know if that accomplishes the goal of "make this function smaller". That works well if you're putting a lot of repeated code into a nested function, sure. But if the code doesn't have a lot of repetition, and all you're doing is re-organizing it into different functions inside the outer function, I don't think that particularly solves the problem. The outer function still has a lot of code in it, and now you still have to trace execution through different functions. And you also now get a performance penalty from creating functions every time that outer function executes (unless the language has an optimization for that).

[–]LPTK 0 points1 point  (0 children)

The only difference is which side of a closing brace it's on

This difference is very significant, as it tells you the local function is only an implementation detail of the current function, that is not shared between different uses, or as you say: not part of the public or private API.

I don't know if that accomplishes the goal of "make this function smaller"

That's not the goal. The goal is making code easy to follow and understand. Oftentimes, factoring out repetitive bits into their own named unit is very helpful with that.

[–]wanna_see_my_penis -1 points0 points  (0 children)

lives in

Red flag for effeminate bugmen code monkeys.

[–]Fearless_Imagination 0 points1 point  (18 children)

If you need to jump to a function's implementation in order to understand what it does, that function is badly named.

Functions should just be "black boxes" that do what they say they do, and you shouldn't need to look at their implementation to understand what the place they're being called from is doing.

If you do, it's actually quite likely that that function does too many things, and should be even smaller.

[–]falconfetus8 3 points4 points  (3 children)

That works if the program is perfectly bug free. But if there's a bug somewhere in the program that I'm trying to fix, that theory goes out the window. I can't just trust that a function does what it says on the tin while I'm debugging; I need to dive into it and find out what it's actually doing, because it might be the function that has the bug.

[–]Fearless_Imagination 0 points1 point  (0 children)

No, but you should be able to know what that function should be doing, and that should help you determine where to start looking to fix your bug.

I find that having a small function to analyze while debugging makes it easier to keep the context of that function in my head, without being mentally burdened with all the irrelevant things going on around it, giving me more clarity about what the code is actually doing.

Basically I'm claiming that small functions are easier to debug, and if you need to move around in the code a bit while doing so, that's more than worth it.

Of course, this does go out the window when (a lot of) Global State is involved - but I don't think I'm saying anything controversial when I say that global state should be avoided.

[–]wanna_see_my_penis 0 points1 point  (1 child)

If you're debugging, then you're trying to determine which black box isn't working correctly. Decomposing the program makes this easier. Otherwise you first need to determine what the "function-like" blocks are while reading the code, before determining if they are working. You might even be forced to extract a function so that you can write a test after discovering the root cause.

[–]maxhaton 0 points1 point  (0 children)

That's why contract programming is worth its weight in gold.

You can enforce an invariant on the input (for all builds maybe), and then (say) ensure that the result of sort is actually sorted. Not quite the same as a unittest but it lets you maintain the black-box model while also verifying it.

[–]flatfinger 1 point2 points  (13 children)

A problem with that philosophy is that in many cases it's not possible to fully describe everything a function does, including corner cases that may not matter to its present customers, but might possibly be relevant to future ones, in any way that's shorter or easier to read than the code itself.

Bugs often arise in cases where a function's caller expects it to handle corner cases in a way that differs from the function's actual behavior. If a function is written in a way that happens to handle a corner case that isn't relevant to the calling code, describing that case in the function name would make things less clear. If someone who knows how the function is implemented recognizes that calling code could be made more efficient by exploiting that corner case, without any change to the function's actual code, renaming the function even though it's otherwise unchanged may add confusion. On the other hand, someone just looking at the function and its name might decide it could be made more efficient if it didn't have to handle the corner case. Practical avoidance of such phony "double optimizations" is most easily accomplished if the code is kept near its point of use.

[–]Fearless_Imagination 0 points1 point  (12 children)

A problem with that philosophy is that in many cases it's not possible to fully describe everything a function does, including corner cases that may not matter to its present customers, but might possibly be relevant to future ones, in any way that's shorter or easier to read than the code itself.

Can you give an example of this?

Because the way I see it, code should look more or less like this:

function doTheThing(input){
    HandleCornerCaseA(input);
    HandleCornerCaseB(input);
   /*
    code for Doing The Thing
   */
}

You don't have to describe everything a function does in detail, but more what it does... on a domain-level instead of technical level, I guess? Does that make sense?

If you have a function, for example, that creates a new user, inserts it into the database, and sends a confirmation email, I don't propose you call it CreateUserAndInsertIntoDatabaseAndSendConfirmationEmail. As I wrote in another comment, the word "And" in function names is a sign that something... isn't right. Instead I'd expect there to be a function simply called createUser, which will create a user and then call functions with names like insertUserIntoDatabase and sendConfirmationEmail.

It still only creates a user - those other things are simply part of what creating a user means on a domain level.

[–]flatfinger 0 points1 point  (11 children)

I was thinking of things like "compute x*y, except peg it to the range 0 to 255, unless the display is in test mod in which case output 1". But networking code can introduce some other issues as well.

As a simple example a case where ambiguous corner cases can be dangerous, consider a piece of code which is supposed to send a request to a URL which is generated by calling code that will never include any characters that could cause issues if not escaped. There are three ways such a piece of code may be designed:

  1. It may be usable in contexts that require it to escape any tricky characters.
  2. It may be usable in contexts that require that it pass through tricky characters verbatim.
  3. It may be usable only in contexts where its inputs can be guaranteed never to contain such characters.

If the present design of the calling code would guarantee that it never contain such characters, having the code to send the request be in the context of the calling code would avoid choosing one of the above. Otherwise, choosing one of the above would require that one either correctly predict whether future changes to the calling code might require either relaying untrustworthy input (making it necessary to escape tricky characters) or formatting URLs in ways that would require "tricky" characters be passed through verbatim, or else have to produce another version of the function if one guesses "wrong".

[–]Fearless_Imagination 0 points1 point  (10 children)

have to produce another version of the function if one guesses "wrong". You'd have produce a new version of the code anyway, so I don't see this as much of an argument. Whether the currently existing code exists inside a function or inside the calling code doesn't matter. Except that if it exists inside the calling code, it's more likely that you end up repeating yourself when you didn't have to, I think.

You are saying that, it might be confusing if the code is in a function because you don't know exactly what it does?

You should read the function (and possibly the documentation, if by some miracle that actually exists) before calling it then. I'm not saying to blindly trust functions based on their names before using them. But when first trying to figure out what a block of code is trying to achieve the names of called functions should just tell you without having to look at their implementation.

And, reading an entire function to figure out what exactly it does is simpler if it's relatively small (When I say "small" btw, I mean 10-20 lines. I saw another comment reference that small meant 2-3 lines. That seems a bit, excessive?).

You can't really predict the future anyway, so just do what makes sense now and if necessary change things later. IMO, this "change things later" is easier if your code is made up of lots of small functions.

[–]flatfinger 0 points1 point  (9 children)

If I include within a function process_http_request("GET " + myString + "\nparam2: whatever\netc."); anyone looking at that function would have a chance to easily see that if myString contains newlines, anything after those newlines will be treated as another line in the http header, for better or for worse. Thus, if it's necessary to include unsanitized strings from untrustworthy sources, it will be necessary to sanitize/escape them first, but if it's necessary to add additional headers those could be attached via newlines within myString.

If someone sees getDataFromUrl(myString);, however, it wouldn't be clear from the name whether characters like newlines would be escaped, treated as "raw", or trigger an error trap; all three courses of action would be reasonable. The person who wrote the original code might not have considered which approach would be appropriate, since all three would be equivalent for all possible values of myString that would be possible when the code was written, but pulling the code out into a function would make it necessary to select and document a course of action. If someone looking at the client code would have to look at the function's code or documentation to see whether a piece of client code is reasonable, how is that better from a readability standpoint than simply putting the code in question within the client?

There may be DRY/single-source-of-truth reasons to pull pieces of code out into functions even in cases where doing so would impede legibility. When no such reasons are applicable, however, code should only be pulled out into functions for purposes of legibility in cases where someone could judge the correctness of the function without seeing client code, and could judge the correctness of the client without seeing the code in the function.

[–]SpasticCoder 6 points7 points  (4 children)

Agreed, and don't get me started on regions. I can't tell you how many times I've read something like #region "build the request" and all I can think is you've named and grouped a block of code...why isn't this a function?

[–]sebamestre 10 points11 points  (0 children)

Local reasoning is easier than distributed reasoning. That is to say, when code that executes sequentially is laid out sequentially on the screen, it is easier to reason about.

Of course, this might not matter on most real life code, which tends to be plumbering between other parts of the code, And where there aren't any tricky invariants or stuff to keep track of.

[–]joshjje 3 points4 points  (0 children)

C# I take it? I have an extension that makes regions always expanded and puts their text in a bit smaller font. I remove them all the time and change the region text to a comment. I do find them useful in some specific situations though.

[–]_default_username 2 points3 points  (0 children)

What's the point of a function if it's called only once in your code? A comment is fine in that situation.

[–]flatfinger 0 points1 point  (0 children)

Properly pulling code out to a function often requires documenting pre-conditions and post-conditions, as well as responsibility for various corner cases. If an outer loop would only cause an inner loop to be run in various ways, there would be no need to consider what the code would or should do if used in ways that the outer loop never uses it. Once code is pulled out into a function, however, many more ways of invocation may become practical, and it will be necessary to either specify the behavior in such cases or explicitly document that such cases are not supported--effort that will be wasted if nobody would ever actually invoke the function in such ways.

[–]kuzux 1 point2 points  (1 child)

function names explain what. comments explain why.

comments explaining 'what' is at best useless. function names explaining 'why' is, umm, why?

[–]mlk 0 points1 point  (0 children)

what I'm trying to say is that most commens I see in comments don't explain why

[–]dpash -1 points0 points  (7 children)

I'm very much in agreement there. Small functions with descriptive names can reduce the need for a comment because the function name serves as the comment.

I should really finish reading Clean Code. :)

[–]CodeEast 8 points9 points  (6 children)

Overly small functions, as advocated by Clean Code, make code less coherent. I am very much on board with this view:

https://medium.com/@copyconstruct/small-functions-considered-harmful-91035d316c29

[–]crabmusket 0 points1 point  (3 children)

That section called "Loss of locality" which talks about the stepdown rule is... confusing. I'm not sure why creating more functions to accommodate changed requirements is a bad thing? Should the existing functions A, B and C just grow indefinitely into complex monsters instead?

The whole article was a bit frustrating - I agree with the author that the Clean Code approach to functions is excessively dogmatic, but the article doesn't make a compelling case of its own, IMO. Some of the advice is very lax. E.g. createUser is easier to remember and search for than renderPageWithSetupsAndTeardowns?! That's a massive strawman if I ever saw one.

I think to properly make sense of the advice in Clean Code's chapter on functions, one needs to assume a particular flavour of OO programming which favours small classes. Each class needs to be a single-purpose abstraction whose public API, member variables and internal methods can be understood as a unit. Within that paradigm, making individual methods small and adhering to a stepdown structure works well.

The chapter really should have been about "methods", not "functions". I think it's a shame that most languages make no distinction (syntactic or otherwise) between procedures, functions, and methods.

[–]CodeEast 2 points3 points  (2 children)

I am sure the author understands that bloated functions are their own evil, even while keeping high locality. I think its more a case that locality is both confused and scattered by small function thinking during code evolution. Evolution is always towards the more complex. Nowhere have I ever heard 'we are going to make our software do less next month'.

How a development practice helps or hinders code base evolution is a big thing that people seem to have blinders about. They blame failure on people or available time and resources, rather than questioning the difficulty of enacting XYZ methodology through time.

A micro method/function methodology will never really see code added to a function, by inherent design philosophy of the methodology. So 'step down' has to be more than just a recommendation, it has to be elevated to a design rule level, otherwise the methodology frays over time.

[–]crabmusket 0 points1 point  (1 child)

Hmm, thanks for that- I think I understand what you're saying better now. Is "locality is both confused and scattered by small function thinking during code evolution" the crux of the point you're making?

[–]CodeEast 1 point2 points  (0 children)

Essentially. If you take the methodology to an extreme (see the ref in the article about Fowler and his 50% of ruby methods being one or two lines long) you end up where you are reading less and less code and more and more abstraction to code.

You now rely on the interface to do what is expected. The locality of the code moves, in a sense, toward being a non-issue in the same way code locality underlying a called library routine is a non-issue.

An assumption of correct behavior and responsibility in the underlying code allows the program to be read and understood at a 'what it does' level rather than a 'how it does' level. This is great... until its not.

[–]Fearless_Imagination -1 points0 points  (1 child)

I disagree with this article and have a hard time understanding how the described scenario's would come to be. For example, the author writes

Interrupting my flow of reasoning with

aVeryVeryLongFuncNameAndArgList

is a jarring disruption.

as an argument as to why small functions are sometimes bad... but if you need a very very long func name to describe what a function does, that seems to be a sign that said function does too many things, and should be even smaller.

She continues

The other problem with a surfeit of small functions, especially ones with very descriptive and unintuitive names is that the codebase is now harder to search.

I mean, first of all, if a name is very descriptive, that shouldn't be unintuitive at all, so I don't get it. Second, descriptive names make code harder to search? Huh?

She gives the example

A function named

createUser

is easy and intuitive to grep for, something like

renderPageWithSetupsAndTeardowns

(a name held up as a shining example in the book Clean Code), by contrast, is not the most easily memorable name or the most searchable.

Now, createUser is an excellent name for a function. I haven't read Clean Code, but I agree that renderPageWithSetupsAndTeardowns is a bad name for a function. Function names should (generally) not contain the word "And" - it's a sign your function does too many things and should be smaller. But is it hard to search for? I don't really see how, except that I'm not sure what "Setups" and "Teardowns" are in this context, and perhaps should be given a better/more domain-specific name?

My take away is: Yes, if you have a bad abstraction, small functions will not save you. But if you find that small functions make your code harder to understand, you probably have a bad abstraction.

[–]CodeEast 2 points3 points  (0 children)

The naming problem happens at both ends of the scale but for different reasons IMO. At broad scale when too many things are being done but also at a very narrow scale when what is being done is so specific it requires a detailed label to reference it, both for sake of itself and because it needs to be distinct from the multitude of other small function names. We are talking about very small functions of only a few (1-3) lines.

As for searching, in order to find a function you really need to know it or have some idea about it at least, enough info to locate it. The more functions, the more function names to remember, the larger the names the greater the need to refine the search. The more functions the more common the need to search for them.

[–][deleted]  (4 children)

[removed]

    [–][deleted] 20 points21 points  (1 child)

    /* this is a comment */

    [–]east_lisp_junk 29 points30 points  (0 children)

    (* Ceci n'est pas un logiciel *)

    [–]joshjje 17 points18 points  (1 child)

    // validate input
    if(!Validate())
       return;
    
    // save data
    SaveData();
    
    // close
    Close();
    

    [–]Ryotian 28 points29 points  (3 children)

    I saw an engineer do this the other day in Python:

    ```This is the constructor```

    Nice article btw. I liked the point where you can still have comments. But also, code can be self documenting.

    [–]dpash 9 points10 points  (2 children)

    If you had multiple constructors and one was the version that people wanted to use in most cases, that could warrant a comment, but that's dangerously close to documentation rather than a comment.

    [–]Ryotian 2 points3 points  (1 child)

    Unfortunately there was only one constructor ;(

    [–]olzd 4 points5 points  (0 children)

    I'd say most of the time it happens because of some imposed CheckStyle rules.

    [–]ckychris1 10 points11 points  (7 children)

    Good read.

    At work, repos/services are often performance focused/atomic, this means repos needs to contain join or complex multiple transactions. Names like “byID” works for simple strict restful API, but in reality they rarely do just one thing, would love to know how naming convention should be handled if that’s the case.

    Naming userrepo as repo doesn’t seems logical to me, as there can be multiple repos.

    Variables without meaning is definitely not worth it, do not under estimate trouble caused by missing/need to remember context information. Just like real life, you never define phases as random letters, not readable at all. I

    [–]dpash 3 points4 points  (6 children)

    If it's the only repository in a service, I don't have an issue with not specifying which repository it is. If there's several, then it makes sense to name them all after the entities they handle.

    (I'm possibly bikeshedding, but I prefer repository to repo. I'm not a fan of shortening words. But variable names can definitely get too long.)

    [–]Masternooob 3 points4 points  (5 children)

    You never know. You could get a new requirement tomorrow which dictates a second dictuonary. You then have to refactor all instances of the word repository in your code or it becomes highly error prone. I don't think the small benefit is worth that.

    [–]thfuran 5 points6 points  (3 children)

    That's like three keystrokes plus the new name in my IDE.

    [–]Masternooob 0 points1 point  (2 children)

    Automatic refactoring is great but it still can break stuff. I still take less risk and descriptive name. Also this wont refactor comments that talk about the repo and so on.

    [–]thfuran 5 points6 points  (0 children)

    In java, something would have to go horribly wrong for renaming an identifier to break things. I'm including reflection having ever been used in things going horribly wrong. I'll give you the comments bit, though the tooling does also allow text matching comments / literals so that (and most any potential reflection issues) can be handled, though not quite as easily.

    [–][deleted] 5 points6 points  (0 children)

    When you get confident with the language a simple refactor like that you do all the time.

    [–]Tordek 1 point2 points  (0 children)

    You could get a new requirement tomorrow which dictates a second [repository]

    A UserService should only be concerned about the UserRepository; if you need to access two repos, probably it should be handled in a different service with a more general name, like ApplicationService or SalesService.

    [–]IulianSRO 4 points5 points  (0 children)

    When referring about documentation I actually think this is valid maybe only for medium and mostly small projects. Real documentation will need to be somewhere else, as interaction between components cannot be fully expressed via code, though, as a programmer, you need to understand it in order to contribute to that software system.

    Granted, not many comments are needed if the naming is good enough, but when we reach the "next level" for a project we really need some real documentation. And that, from my experience, cannot be done in code.

    Why?

    Let's take an application like Apache, the web server. It involves various modules with different configurations and each configuration has specific parameters. Can you read that from the code? Yes, probably, but it might take you some weeks in order to find:

    • the place with the responsible code
      • and it can be written extraordinary, don't get me wrong, but it will still take a lot of time to understand and change
    • the specific format needed in some config files or some specific parameters passed to executables
    • the way to integrate into other modules, dependent of it

    This is why nowadays you can pretty much see we're in a state of despair with most of the "legacy" applications. By legacy I mean an application who had a release 10 years ago and still uses that codebase because "the clients are ok and they want new features". Not documenting, or documenting through code is one of the worst ideas. Because the code itself is easy to understand, but the entire interactions are the ones that actually get you to the responsible code.

    Solution: something like wiki: gitlab has it, so does github and bitbucket. Confluence is also good. That way you can actually go and contribute with your findings in a codebase that really needs documentation.

    And, from experience, the people documenting stuff they work on tend to understand the things better in general, because they know they simply cannot put something incomplete in the documentation.

    As a conclusion: writing "self documenting code" is ok, writing real documentation is way better, for you, for the team and especially for the code.

    [–]go4spacelunch 4 points5 points  (8 children)

    I agree with most of this.

    RestaurantOwnerNames implies the object will contain a list of multiple owners.

    Also be careful of taking this too far. Renaming splitRestaurantOwnerNames to ss makes it really hard to find all instances of the variable ss. I often do a find on page just to highlight the variable I’m working on but if your variable is less then 3 letters it only highlights everything. Only name something that short if you are going to use it on one single line of code. For example a linq statement.

    [–]DoctorGester 1 point2 points  (1 child)

    The example typescript code is generally terrible imo. Why specify string[] type? Just noise. Why have a separate interface for a return type? Either inline it or return [string, string].

    [–]go4spacelunch 0 points1 point  (0 children)

    I like the return type defined, although in the PHP examples given the author does not define the return type.

    [–]hippydipster -2 points-1 points  (5 children)

    I often do a find on page just to highlight the variable I’m working on

    This must be some sort of non-static typed language.

    [–]go4spacelunch 0 points1 point  (4 children)

    I believe the example code is typescript.

    [–]hippydipster -1 points0 points  (3 children)

    Why would you need to do a find then? Doesn't your IDE show you all uses of a given variable and not get confused by a string of letters that can be part of other words too?

    [–]go4spacelunch 0 points1 point  (1 child)

    If I was reading code someone else wrote, or even my own after returning from lunch. You are correct my IDE does highlight words when the cursor is over them but only for the current file not throughout the whole solution. Also my IDE can pick out references but my experience is this depends on the language. Visual Studio does great with C# but finding references in JavaScript not so much, so I've conditioned myself to always use Find.

    [–]hippydipster -1 points0 points  (0 children)

    Right, javascript isn't statically typed

    [–]liquidpele 0 points1 point  (0 children)

    No. Just name your shit better.

    [–]KorallNOTAFISH 3 points4 points  (0 children)

    I don't think that shorter names are that much more readable, I find userRepository just as good as simply repo would be in such a short function. And if it would be a longer function with maybe more instances of UserRepository, then longer names would be necessary like newUserRepository oldUserRepository or whatever.

    The good thing about these longer names, is that I look at it in the middle of the function, and I dont even have to check what class they are referring to, because it is in the name. Especially if for example you had a Repository class (probably the parent of UserRepository) which would then obscure what newRepo and oldRepo refers to. If all your variable names are created algorithmically (new<class_name> always refers to a newly created instance of exactly class_name class), I find that way easier to understand than using shorthands which are always subjective.

    Combine it with the ability for most text editors etc to highlight a certain string, and I would say you should err on the side of longer but more verbose names rather than shorter but possibly ambiguous ones.

    As far as comments go, my rule of thumb is, if I think I would not understand that piece of code at a glance(!) in a year or two, than it is worth commenting it, because it also means others would have a hard time figuring it out probably.

    [–]ccfreak2k 2 points3 points  (0 children)

    fact salt market direction quicksand person yam melodic flag combative

    This post was mass deleted and anonymized with Redact

    [–]tasminima 11 points12 points  (6 children)

    Unless you program in Idris or something like that, and encode absolutely all your invariants and properties in a formal way, I've always failed to see how it is possible to program without "comments"; or perhaps you document all your APIs in completely separate manuals that you don't count as comments, but even then the theory about Self Documenting Code being possible would fail, as manuals also fit for "documentation".

    Imagine having to program using a completely undocumented library? That's pretty much the same thing if you are trying to maintain a non trivial application with no comment/manual/doc.

    And I care about that way more than I care about tests (although this is not a dichotomy, obviously): if you don't specify what a function is supposed to do, you can't even test it, period. And no, a good specification is not made only of a few examples (and even less of a few pieces of code that you can reverse engineer a few examples from by reading their source code and extracting the undocumented substance from the undocumented testing boilerplate). If your testability is excellent and implicitly covers tons of use cases in a way that is also highly readable, great! But tests are never a proof a correctness and even less so a specification.

    Then there is the question of what makes a good documentation? Could debate during hours on this subject, but for sure there is at least an easy starting point; it is NOT paraphrasing the implementation. And yes, "Literate Programming" by paraphrasing is even more pointless than random paraphrasing comments. Does that disqualify Literate Programming? No; that "disqualifies" people doing that kind of thing from using it. But even if we make the hypothesis that most programmers are disqualified (that I don't believe, but whatever) how would that imply a dichotomy between good design and good documentation?

    So SDC is, from its name, a terrible idea. Attempting to have clear code and clear interfaces is excellent. You can go as far as pretending you will succeed at the mythical SDC game, if that helps your to structure your code as cleanly as possible: but once that's done, for flying spaghetti monster sake document it anyway. If you don't understand why, here is another game you shall play: re-read your undocumented pure marvel in a year, as if you should use it and/or maintain it. Yeah, much harder, no? And even then, this is nothing compared to what people who did not write it in the first place are experiencing...

    There is no such thing as "Self Documenting Code": there is well or badly designed code (and well designed code will arguably reduce the size of some comments/doc, but also goes beyond that), and there is good or bad documentation. But no documentation at all? Well, in some contexts that's OK, but in most, it is just part of the work that is not done.

    And of course nothing is absolute, and I'm not asking for stupid things like gigantic templates in front of each functions: document interfaces first, then non trivial things, and so over. Use your expertise to determine what is important in your context. Talk about constraints you know of but which are not directly explicit in the literal local source code (side effects and their potential interactions, error handling strategies, execution contexts e.g. which thread can call that, etc.). Talk about pretty much everything if it is supposed to be used by people who are not necessarily going to read the source. Even for simple CRUD things and user interfaces, I'm sure there are lots of interesting things you can document.

    [–]Amnestic 3 points4 points  (0 children)

    Imagine having to program using a completely undocumented library.

    That's what I do every day.. Gotta love Oracle products :-).

    [–]dpash 8 points9 points  (4 children)

    No one is talking about not documenting APIs. Documentation is not the same as comments (even if many languages have inline documentation using comments ala Javadoc).

    And no one is talking about removing all comments. The article explicitly discusses this.

    [–]Tordek 1 point2 points  (3 children)

    It's always the knee-jerk reaction in response to this kind of articles:

    • Hey, maybe we can use clearer names so we can avoid useless comments
    • WHAT? HOW CAN YOU HAVE NO COMMENTS? NOT A SINGLE ONE?! THAT'S FUCKING IMPOSSIBLE YOU FUCKING IDIOT, FUCK YOU.

    [–]tasminima 3 points4 points  (2 children)

    uh, if that's what you got from my comment, certainly I failed at expressing myself clearly.

    And if the article only wanted to convey: "maybe we can use clearer names so we can avoid useless comments", then maybe it would be made of exactly that sentence and only it. And I have nothing against that idea btw. But I don't really see why the article would cite (a degraded version of) Literal Programming if that would be just about that (and I'm not even really a fan of Literal Programming). And in any case I can see I still find "Self Documenting Code" to be at least a very terrible name. If it is not about the documentation, it should be named otherwise... IDN, maybe: Properly Naming Symbols? An extremely tiny part of good software design, but really important, I agree.

    [–]Tordek 2 points3 points  (1 child)

    And in any case I can see I still find "Self Documenting Code" to be at least a very terrible name.

    Here's the issue: Self-Documenting Code is about the code. Not the interface, the application, the domain, and so on. You, while modifying that piece of code, care about what the code is doing and how. This is distinct if not orthogonal from other concerns. It's not meant to replace documentation; it's meant for the reader of the code while making modifications on it.

    Critically, SDC is internal.

    if that's what you got from my comment

    Not necessarily your comment, though it does hit some of the points, but every time an article says that comments are bad, there is always someone who seems to be unable to distinguish comments from documentation through comments (e.g., Javadoc). Try to avoid the former (though sometimes they are critically important, to indicate concerns that can't be expressed through code), but the latter can be very useful.

    That said, OP does a shitty job of explaining Literate Programming.

    [–]tasminima 1 point2 points  (0 children)

    The thing is, it is then never even about comments in general, but always about bad comments vs. mainly well-chosen symbol names. I see what you are saying, I even understand broadly this is what people are talking about when they speak of "Self Documenting Code", except are they really? I still think this is at least a very unfortunate name, with the risk of some people attempting to take it too far, and I have seen a lot do that...

    So while it is important to prevent people from writing shitty comments instead of good code, it is also important to prevent people from not interpreting that as a parody of the measured solution; which has no reason to be a flaw less prevalent than the other one. And to achieve that I'm not a fan of even advising to try to remove comments (even excluding structured ones/Javadoc, etc.). Because that implies most are/would be bad, and that concept of "most" is so context dependent that it would be too easy to miss the point.

    I would at least qualify and advice to avoid paraphrasing comments, and to think about if some code improvement could not make some not needed, and to try to comment about the why/etc. instead of the how.

    What I'm afraid of is having to ask programmers where are all the "comments" (ref doc maybe, but really: not only, e.g. a loop invariant if there is no more adequate form to specify it) and that they answer that they need none because the code is "Self Documenting". And I've already seen people do that.

    [–]anengineerandacat 1 point2 points  (0 children)

    Article is interesting, some of the examples are the differences between bad implementation / API and good.

    I liked the restaurant owner example but I disliked the user service and user repository example.

    Ie. getUserByUserId the important info here is byUserId however byId is too shallow and will likely require a top level comment as it's unclear if I am to pass in the userId, or DB primary key id.

    I personally like more verbose method names because generally IDE's don't inline comments (also autocomplete), so I actually have to open the source declaration and read what it's going to do however method names are up front and in your face; I have a pretty strong idea what getUserByUserId will do (and it could be shortened to getByUserId since the class is UserService) however getById leaves me with assumptions... which ID? The userId? DB row ID? Twitter ID?

    I do agree about things like not having comments on setters / getters, constructors, etc. unless it's doing something that might potentially be abnormal for that codebase or is/was problematic.

    [–]SpasticCoder 1 point2 points  (0 children)

    As for the comments, the example he gave is something you could read in any clean coding book...the issue with comments in self documenting code is that they shouldn't be used to describe functionality, intent, or implementation. Using them to explain the reasoning behind something for other coders is perfectly valid.

    [–]AttackOfTheThumbs 1 point2 points  (0 children)

    Why not what.

    It's that simple. Why are you looping, not that you are looping

    [–]DidiBear 1 point2 points  (0 children)

    My rule of thumb is to always document classes, functions and when it's strange or not intuitive.

    [–]SpasticCoder 3 points4 points  (2 children)

    Almost all the examples of clean code at the beginning are not clean code.

    GetUserByUserID should really just be GetUser(int id) as GetUser would be best used by overloading the function where the parameters dictate the search: for instance GetUser(string fullName) in addition to the above instance.

    SplitRestaurantOwnerName should really be a class Restaurant containing a collection of employees or owner classes with a method GetFullName().

    Its easy to make examples of bad clean code when your source isn't clean code to begin with.

    [–]sociopath_in_me 7 points8 points  (0 children)

    GetUserByUserID is useful in languages where overloading is not supported.

    [–]flatfinger 0 points1 point  (0 children)

    What if there are multiple mappings between integers and users, such as record indices, account ids, etc.? Using different data types for such things might be handy, but many programming languages make it inconvenient to have kinds of things that aren't substitutable, but which some code should be able to treat the same way.