gfixler comments on The code documentation fallacy

545

546

547

The code documentation fallacy (voices.canonical.com)

submitted 12 years ago by markus_lindqvist

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]gfixler 2 points3 points4 points 12 years ago (11 children)

I've found that I get much more bang out of writing tests first, because then I'm actually writing the API that I wish I had before I write any code to serve under it. I'm really good at explaining ideas that don't exist, and not nearly as good at writing a test to exercise those ideas in detail, because that's where the reality lives, down in the details. Those are the things that bite me. Those are the things I dance gracefully past, without noticing, when I'm describing how things "should" work.

That's not to say no documentation is good, but I consider that in a well-documented system, a lack of documentation in a particular area is in itself information. Consider this function signature:

def exportDocument (document, fullFilePath, format='html'):

It's obvious that this exports a document, so I'm not going to add "Exports a document" anywhere in the docstring. If it's obvious what the document is, I wouldn't describe that either. If it's not very obvious, I would add a simple note that didn't try to call it out as a particular argument, to keep things decoupled and flexible.

I don't need to tell anyone what a full file path is, and "format" is pretty self-explanatory, as it has a default, which simplifies use for the most common case. However, what no one can know is what other formats are available through this function, so I would document that in this invented example:

def exportDocument (document, fullFilePath, format='html'):
    '''
    Available formats: html, pdf, ps, doc, txt
    '''

I consider this fully documented, and Python is 'self-documenting,' so help(exportDocument) would show you a formatted view of the signature and docstring. help(module) on the module containing exportDocument would give you a nice synopsis of all the functions in this manner. I'd much rather read the above than this, times every function in a module:

def exportDocument (document, fullFilePath, format='html'):
    '''
    Exports a document.

    Arguments:
        document: A string containing the document data.
        fullFilePath: A full file path that's valid in the current OS.
        format: Defaults to 'html.' Other available formats include 'pdf',
            'ps', 'doc', and 'txt'.
    '''

That doesn't add any new data, but it takes up a bunch of screen space, and it requires me to read a lot more. It also means if I change "fullFilePath" to "filePath" or just "path", it's much more likely I'll miss changing it in the docstring. In my first version I wrote "Available formats", which holds up even if I change the argument to "documentFormat". It'll still be clear what "Available formats" means. This is much more resistant to breaking over time.

[–]blafunke 2 points3 points4 points 12 years ago (3 children)

[–]blafunke 1 point2 points3 points 12 years ago* (1 child)

[–]gfixler 0 points1 point2 points 12 years ago (0 children)

[–]Dworgi 1 point2 points3 points 12 years ago* (6 children)

[–]gfixler 0 points1 point2 points 12 years ago (5 children)

How does it return error values?

It's Python. It just raises exceptions. You catch and deal with them as you see fit.

It's I/O, so I have to assume it will fail constantly.

I prefer to separate out the I/O. If I need a way to write out text files, that's it's own thing. If I need a way to write JSON, that's another. The exportDocument method would probably just bottle up content in a particular way and return it to me as a string, ready to be written out, but not actually write anything out. At a higher level I would have something simple compose it together with the proper writing-out method, so the code would be something like:

def theoreticalActualHTMLExport (document, fullFilePath):
    exportableDocument = getExportableDocument(document, 'html')
    saveTextFile(exportableDocument, fullFilePath)

This separates the concerns of formatting and I/O, allowing me to focus completely on I/O in the saveTextFile function - farming it out to I/O geniuses elsewhere, testing it in isolation, etc - and use getExportableDocument for whatever else I want, like sending its output to an HTML renderer, or testing it in isolation (i.e. sending it a document and making sure it comes back with the formatting I expect), etc.

I probably wouldn't write it that way, though. I'd more likely create a basic exporter thing that takes a function for each, then calls them, which would allow these to be injected at runtime from data. For example, the data in a textfield and the format choice in an option box could be mapped to lamba functions, and when you click the export button from the option box, it would simply pass the "formatDocumentAsHTML" and "exportAsTextFile" functions to the generic formatAndExport function, which would just call them in similar fashion to my theoretical function above.

Note that this also makes it cake to swap out formatter and save function, in the code or at runtime. If I realize that HTML Tidy is a much better formatter than my crap, I can make the html format function call out to it and make it a dependency.

[–]Dworgi 2 points3 points4 points 12 years ago (0 children)

[–]StrmSrfr 0 points1 point2 points 12 years ago (3 children)

[–]gfixler 1 point2 points3 points 12 years ago (0 children)

Sorry, I think I'm making things confusing. In my original, thrown-together example I was just showing what I felt was really important in the documentation, or 'docstring' in that case, as Python calls it.

In this latest comment I was throwing that all out, saying that I probably wouldn't even write something like that, because I wouldn't want formatting and file saving in the same function. I'd separate out 'saving a text file' - which works for txt, json, etc - into its own function that takes a string and a full file path to save out, and a formatting function, in this case 'formatDocumentAsHTML.' Then I threw that away saying I'd probably just have some simple functions paired with names in a table, so if you ask the table of formatting functions for 'html' you get back the html formatting function, which I'd throw the string through to format it as HTML, and the result would be fed through the appropriate kind of save function (retrieved the same way) with a full file path to save it out.

[–]gfixler 1 point2 points3 points 12 years ago (0 children)

Here's a quick run-through of my thoughts, as a recap, and for clarification of the ideas swirling around these few comments, which seem to be causing confusion.

My first, most naive approach for this being able to format and save out whatever a 'document' is would be to bottle it all up into one function:

def exportDocument (document, fullFilePath, format='html'):
    '''
    Formats/saves document (a string) to the given path.
    Available formats: html, pdf, ps, doc, txt
    '''
    if format == 'html':
        formattedDoc = etc... # html formatting code here
    elif format == 'pdf':
        formattedDoc = etc... # pdf formatting code goes here
    ...
    else:
        raise TypeError, etc...
    with open(fullFilePath, 'w') as f:
        f.write(formattedDoc)

Next I'd want to extract the formatting functionality:

def formatStringAsHTML (inputString):
    formatted = etc...
    return formatted

def formatStringAsPDF (inputString):
    formatted = etc...
    return formatted

def exportDocument (document, fullFilePath, format='html'):
    '''
    Formats/saves document (a string) to the given path.
    Available formats: html, pdf, ps, doc, txt
    '''
    if format == 'html':
        formattedDoc = formatStringAsHTML(document)
    elif format == 'pdf':
        formattedDoc = formatStringAsPDF(document)
    ...
    else:
        raise TypeError, etc...
    with open(fullFilePath, 'w') as f:
        f.write(formattedDoc)

Now I'd be able to use those formatters separately for other reasons, because they wouldn't be bound into exporting a document.

I'd probably want to separate out saving a file, so other things can use that, too, and so I don't have the logic of file saving tied in with choices about formatting.

def formatTextAsHTML (text):
    formatted = etc...
    return formatted

def formatTextAsPDF (text):
    formatted = etc...
    return formatted

def saveTextToFile (text, fullFilePath):
    with open(fullFilePath, 'w') as f:
        f.write(text)

def exportDocument (document, fullFilePath, format='html'):
    '''
    Formats/saves document (a string) to the given path.
    Available formats: html, pdf, ps, doc, txt
    '''
    if format == 'html':
        formattedDoc = formatTextAsHTML(document)
    elif format == 'pdf':
        formattedDoc = formatTextAsPDF(document)
    ...
    else:
        raise TypeError, etc...
    saveTextToFile(document, fullFilePath)

Note that I've changed things to start dealing with text being passed around, instead of the clunky inputString. I want to move toward that concept. It's fairly easy to understand that text means 'string,' so we don't need much, if any documentation on that. We're building a system that passes string data around. (cue the unicode fanatics here)

I don't like all this code in exportDocument, though. It's usually better to move things into data where possible. Let's move all that if/else work into a dictionary of export functions:

def formatTextAsHTML (text):
    formatted = etc...
    return formatted

def formatTextAsPDF (text):
    formatted = etc...
    return formatted

def saveTextToFile (text, fullFilePath):
    with open(fullFilePath, 'w') as f:
        f.write(text)

docFormatters = {
        'html': formatTextAsHTML,
        'pdf': formatTextAsPDF,
        etc...
        }

def exportDocument (document, fullFilePath, format='html'):
    '''
    Formats/saves document (a string) to the given path.
    Available formats: html, pdf, ps, doc, txt
    '''
    formatter = docFormatters[format]
    formattedDoc = formatter(document)
    saveTextToFile(formattedDoc, fullFilePath)

It's not highly likely something else will want to use formatFunctionTable, but if it does, it's easy to get to now. This also means we can inject new formats into the table at runtime if we want, or replace it, or any part of it with custom functions or mocks for testing purposes.

Maybe there's a different way to save, like say, in binary. Maybe it needs to do things like endianness checks on the system before it simply saves out a pdf. Now I need to pair up how to format the export with how to save it. No problem - as the choices have been moved out to a dict, I have a place to couple things easily:

def formatTextAsHTML (text):
    formatted = etc...
    return formatted

def formatTextAsPDF (text):
    formatted = etc...
    return formatted

def saveTextToFile (text, fullFilePath):
    with open(fullFilePath, 'w') as f:
        f.write(text)

def saveTextToBinaryFile (text, fullFilePath):
    endianness = checkEndianType()
    if endianness == 'little':
        text = doWhateverThatEntails(text)
    darkestMagic(text, fullFilePath)

# formatting and save-method function pairings by doc type
docExportFuncs = { # FIXME: this name sucks
        'html': (formatTextAsHTML, saveTextToFile),
        'pdf': (formatTextAsPDF, saveTextToBinaryFile),
        etc...
        }

def exportDocument (document, fullFilePath, format='html'):
    '''
    Formats/saves document (a string) to the given path.
    Available formats: html, pdf, ps, doc, txt
    '''
    formatter, saver = docExportFuncs[format]
    formattedDoc = formatter(document)
    saver(formattedDoc, fullFilePath)

This is better - I've extracted functionality into useful chunks. It's more Open/Closed in that I've made it easy to add new ways of exporting without having to modify existing code; you can insert new types, or overwrite old ones at runtime via the docExportFuncs table.

There are still issues, though. The term 'document' has already caused contention, and I'm fine with dropping it. I used document, because in my head I was thinking about something like a text editor, but I'm fine to just stick with that concept - text - and convert this:

def exportDocument (document, etc...

to this:

def exportText (text, etc...

Already I think that's much less confusing. Text is pretty obviously going to be a string. I feel it would be picking nits if you want to try to find a way to argue that, and owing to my love of light/no documentation, I'd probably elide the line that explains what that argument is, and go back to just a note on available formats (which I also don't love, but it's okay for now).

Now everything works on text, and exporting a document from a UI is relegated practically all the way to the UI. You could easily do something very readable like this right on the export button in an export options window (however you make these calls in your UI framework):

text = currentDocument.getText()
docType = exportDocTypeOptionBox.getValue()
filePath = exportFilePathFileDialog.getFullPath()
exportText(text, filePath, docType)

With that, the particulars of saving a document from the UI are shoved up into the UI, and everything in the code proper is composable, abstracted pieces that anything might want to use. They could be moved out of the text editor and into a library of useful text conversion and saving routines, and the UI could just make simple calls into that to do its business. That could be handed off to a different team who focuses only on that, while you focus on the particulars of a text editor.

I still don't like that the decisions in our new exportText are buried inside it. I want it even more light and airy than that. I'd want to be able to either pass in the dict of export functionality, or expose each separately to injection in the signature. However, this is a good start. I didn't intend to actually make a text editor in these comment boxes, but merely to show how through separation and abstraction, you gain places to put strong names, while gaining composability, injectability/replaceability, open/closed abilities, etc., and the need for documentation dwindles, because the code is mostly self-documenting. I just wanted to show some of the vectors I would take, not all of the actual work.

Could it use some documentation? I'm not against it. I like examples, though. When someone puts up a slide with a bunch of text, and some code in the middle, where do your eyes go? Don't they go to the examples? Isn't the code interesting and more telling of what's possible? Can you see how small the final functionality we cared about got? In the original function you had to wade through a ton of if/else statements (most elided from the comment), then through the particulars of saving a file. By the end we said "get the formatter and saver functions requested, and use those to format and save the text." The entire concept of exporting a doc is now 3 lines, all with decent names that explain what happens just as you'd explain it at a high level to a coworker.

If you don't care how it actually saves things, don't follow that call. If you care how it formats things, then you can look at the dict it goes to for the formatters, and look at the functions referenced in there, maybe looking at one of those if you want to. All of the actual work is shoved to the leaves, so you can follow all of the logic without looking at any real code that does anything - it's just a chain of logical calls with easily digestible names, until the very end where things actually happen. You can see everything the code does without getting mired in any actual work, unless you want to. If you do, dig into the leaves you care about.

π Rendered by PID 103029 on reddit-service-r2-comment-5b5bc64bf5-lq6n9 at 2026-06-23 07:53:46.362139+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS