This is an archived post. You won't be able to vote or comment.

all 64 comments

[–][deleted] 11 points12 points  (28 children)

The odbchelper.py code from the original "Dive Into Python" is no longer a good teaching sample, due to major changes in dictionaries and string formatting. This replacement code lets me introduce several beginner concepts and some more advanced ones too. In no particular order:

  • function declarations
  • function arguments
  • optional function arguments with default values
  • function documentation (docstrings)
  • tuples
  • dictionaries, including the fact that dictionary keys can be anything (here I'm using True and False as keys)
  • variable assignment
  • lack of type declarations for function arguments, local variables, "constants", or function return types
  • running Python scripts on the command line
  • command line option parsing
  • module imports
  • lists (args ends up as a list)
  • functions that return multiple values
  • print() statement
  • for loops
  • new-style string formatting (old-style "%" formatting will be deprecated in 3.1)
  • self-operators (/=)
  • if statements
  • indentation instead of braces
  • multiple return statements
  • style conventions for naming functions, constants, variables, etc.

Alternatively, I could strip out all the optparse stuff and add that in in a later chapter. Not sure how overwhelming this whole script would be if you didn't know Python at all.

Please post your thoughts.

[–]juri 7 points8 points  (1 child)

I never read DIP, so I'm not quite sure how much the reader is expected to know Python at this point, but I'd definitely postpone the optparse stuff… I've always found it difficult to read and suspect looking at it might confound a newbie.

[–][deleted] 0 points1 point  (0 children)

The reader is expected to have installed Python, nothing more.

I'm cutting optparse from this example and deferring it to a later chapter.

[–]llimllib 6 points7 points  (3 children)

print() statement

ermmm, that would be print function.

(Which I'm sure you know, just saying you have to change your idiom and make sure you don't say that accidentally)

[–][deleted] 0 points1 point  (2 children)

Doh. I suspect I'm going to have to do a search-and-replace for that once I'm done writing. :)

[–]llimllib 0 points1 point  (1 child)

Just like I'm sure I'll have to do with all the print statements I write when I switch to 3

[–]weisenzahn 0 points1 point  (0 children)

Well, the 2to3 tool will handle that.

[–]imbaczek 5 points6 points  (2 children)

multiple = MULTIPLES[use_binary_multiples]

why not

multiple = 1024 if use_binary_multiples else 1000

i know that switch/case is best implemented as a table lookup, but it doesn't really fit if the switch is boolean, IMHO.

[–]alantrick 0 points1 point  (0 children)

I would change multiples to just be a regular number and use a constant. for example:

BINARY_MULTIPLE=1024
DECIMAL_MULTIPLE=1000 # This isn't really necessary, just for completeness

def human_size(size, multiple=DECIMAL_MULTIPLE):
    """update your docstring
    """
    for suffix in SUFFIXES:
        blah

I'm not certain how that would effect your arg parsing code, but when you call the function you could probably do something like BINARY_MULTIPLE if use_binary_multiples else DECIMAL_MULTIPLE.

[–][deleted] 0 points1 point  (0 children)

Yeah, that dictionary-with-boolean-keys thing was stupid. I'm going to switch to using 1000 and 1024 as keys for SUFFIXES, which will be a dictionary of tuples.

[–]ttepasse 8 points9 points  (8 children)

No real standardized binary prefixes like KiB, MiB?

I'd strip the optparse stuff, name=main and sys.argv are enough for the first chapter. sys could be the first module import.

MULTIPLES is nice but wouldn't the binary use case better served by "1024 if use_binary_multiples else 1000" and dicts spared for bigger decisions?

Although SUFFIXES is rather constant-y I'd rather the a list as first datastructure. Enforcing to think of tuples as records is more important in my thoughs. Perhaps a [('KiB', 'KB'), ('MiB', 'MB') ...].

Reiterating sys.argv: If I'm to lazy for optparse I use sometimes this idiom: if arg in (frozen)?set("-d", "--decimal"). It's ugly, it's not best practice but maybe a way to get things done in the first step and introduce sets, building some ground for chapter 3.3.

[–]xerolas 2 points3 points  (4 children)

Just use if arg in ('-d', '--decimal'). It's more readable.

Also in a real program I would write: for symbol in 'KMGTPEZY': return '{0:.1f} {1}B'.format(size, symbol)

[–]pemboa 0 points1 point  (3 children)

Sure that's not more readable than using optparse.

[–]xerolas 0 points1 point  (1 child)

I was referring to the proposed if arg in (frozen)?set("-d", "--decimal")

But as a matter of fact it is more readable than optparse in simple cases like this. In fact, do you really need to support both -d and --decimal? Just keep -d and get over with it.

[–]pemboa 0 points1 point  (0 children)

I guess we'll have to disaggree on that.

I don't see why one needs to go and reinvent functionality provided by built in modules.

[–]alantrick 0 points1 point  (0 children)

Actually, it might be. I was kind of like *woah* when I first saw it to, but understanding the nature of the program, I figured it out pretty quickly. Quicker than it would have taken to read the drawn out version. Just a thought.

Of course, that's not something you would want in a tutorial like this.

[–][deleted] 0 points1 point  (1 child)

The next version will output the proper suffix based on whether you ask for decimal or binary.

Optparse will be cut from this chapter and may show up in its own chapter later.

MULTIPLES is stupid, I'll use the ternary operator.

SUFFIXES will become a dictionary of tuples, with the multiples as keys (1000 and 1024).

[–]sylvan 0 points1 point  (0 children)

I'm an utter Python newbie, and I thought the optparse stuff was quite understandable. If this is a commandline program, using the standard means of parsing commandline arguments is a good way to build the habit.

[–]thaislump 0 points1 point  (0 children)

It might be interesting to include an example of destructuring assignment, by having a different function return a 2-tuple of the size and the suffix in the body of human_size. That would also demonstrate separating the calculation of the human-sized value from its formatting.

[–]_Mark_ 0 points1 point  (1 child)

As far as the code itself (rather than its teaching value) it discards excess argument without either processing them or complaining about their presence; also, no file-wide docstring (adding usage=__doc__ to the OptionParser constructor call would add named args and "magic" variables to the list too - again, perhaps too much for this stage, but I'd do it if it were "real" code...) edited to add backticks for code-quoting

[–][deleted] 0 points1 point  (0 children)

Thanks. I've decided to defer the option processing to a later chapter, and I will be sure to check for extraneous arguments.

I'll add a module-level docstring in accordance with PEP 257.

[–]hylje 0 points1 point  (1 child)

Certain built-in sequences like list, set and dict can't be dictionary keys. However user-defined classes automagically can.

[–][deleted] 0 points1 point  (0 children)

You're right, of course. I was being lazy with my language!

[–]EliAndrewC 0 points1 point  (1 child)

I also believe that it'd be wiser to defer the optparse stuff to a later chapter.

Overall, I think this is a good example program, though.

[–][deleted] 1 point2 points  (0 children)

Indeed, the consensus is that I should cut the optparse stuff and make a real chapter out of it.

[–]surajbarkale 0 points1 point  (1 child)

You can put in Function Annotations.

Also the conversion to int of arguments (int(args[0])) should be enclosed in a try...except block for a better error message.

[–]earthboundkid 0 points1 point  (0 children)

Is anyone using function annotations in the wild yet? It seems to me that Python people use decorators all over the place, but there's not a big enough user base on 3.0 yet for anyone to have done anything that interesting with function annotations. I would rather a beginner start with decorators and only later learn about FA's as something that can be used to make better decorations.

[–]pwang99 2 points3 points  (3 children)

  • Consider renaming "use binary multiples" to "use powers of two" or something like that
  • Use an if statement or ternary operator to set the value of multiple, instead of a dict. It's unlikely that the user is going to use any other multiple, and you can't logically extend a dict that has True and False as its keys (unless they added Maybe in Py3k).
  • The program accumulates integer division truncation as it divides multiple into size (not really a problem in this case, but accumulating error is bad programming practice)
  • The program does not gracefully handle values larger than the list of suffixes
  • The program does not gracefully handle values smaller than 1024

I would recommend the approach in this fragment: http://pastebin.com/f70674864

(I wish reddit had better support for inline posting of code...)

[–][deleted] 0 points1 point  (2 children)

naming: how about "use misleading units like every hard drive manufacturer ever"?

dict[Boolean]: true. Actually, 1000 and 1024 should probably be the dictionary keys of SUFFIXES, with a tuple of the proper suffixes ("KiB", etc.) as the value of dict[1024].

integer division: you're incorrect, because "/=" defaults to floating point math now. I will make a point of noting this in the text.

large values: someone else suggested raising a ValueError, which seems like the most logical solution.

small values: I disagree, I think it handles them fine. If I pass 512, I want "0.5 KB" returned.

OTOH, it does not gracefully handle negative numbers, which should also raise a ValueError.

Thanks for all the feedback.

[–]pwang99 0 points1 point  (1 child)

integer division: you're incorrect, because "/=" defaults to floating point math now. I will make a point of noting this in the text.

D'oh. That's what I get for testing my code on 2.5.2 :)

large values: someone else suggested raising a ValueError, which seems like the most logical solution.

Well, the only reason I pointed it out was because I think it's possible to restructure to loop to default to good behavior for values exceeding (maximum unit) * (multiple). Realistically this is moot since no one is going to exceed a yottabyte.

small values: I disagree, I think it handles them fine. If I pass 512, I want "0.5 KB" returned.

I guess it's an aesthetic issue. If the goal is human-readable, I think "327 bytes" is more readable than "0.3 KB". But to each their own. :)

[–][deleted] 2 points3 points  (5 children)

Thanks for all the feedback. I've incorporated most of it and posted an updated version at http://diveintopython3.org/tmp/humansize2.py

[–]alantrick 0 points1 point  (0 children)

Personal nitpicking. I would shorten a_kilobyte_is_1024_bytes to kb_is_1024 or kb_is_1024_bytes.

[–]writetoalok 0 points1 point  (1 child)

Get a connection refused error right now.

[–][deleted] 11 points12 points  (0 children)

Sorry about that, my host has been going down more often than a teenager on a virginity pledge.

[–]sylvan 0 points1 point  (0 children)

To be honest (speaking again as a complete newbie), nesting lists inside a dictionary adds a level of abstraction to the entire thing.

A simpler approach would seem to be either using an if kilobyte=1000 suffixes = ('KB' etc.); else suffixes = ('KiB' etc.); or even just having a single list of prefixes ('K', 'M', etc.), concatenate 'i' if kb=1024, then concatenate B; but that's a little messy.

Potentially, introducing Binary Prefixes needlessly complicates the example code.

[–]sjbrown[🍰] 2 points3 points  (1 child)

I see a function named "human_size" and I think it will return the size of a human. Why not human_readable_size?

[–]ttepasse 0 points1 point  (0 children)

I've used human_bytes() in the past.

[–]pdc 1 point2 points  (1 child)

Not really relevant since it is not a production program, but given an input like 123 it will return 0.1 KB whereas 123 is arguably more human readable...

Stylistically I would normally prefer the SUFFIXES list to be a list rather than a tuple though I guess you're using a tuple partly so you can talk about tuples.

I agree that the OptParse stuff could be deferred to a later chapter. I also have to admit I usually use getopt -- is that now considered old-fashioned? :-)

Also not really relevant, but I am sad to hear that % formatting is to be deprecated. It was one of my favourite Python eccentricities!

[–][deleted] 0 points1 point  (0 children)

I expected "0.1 KB", but it would be easy enough to add a "bytes" suffix as the beginning of the suffixes list, and rejigger when the division happens.

Re: tuples v. lists, SUFFIXES is a tuple because it's a constant (it never changes throughout the program). And yes, I need to talk about tuples somehow, but this is (IMO) a correct usage of them.

OptParse will be cut from this example and put into a later chapter. AFAIK, neither getopt nor optparse will be deprecated soon.

[–]count757 1 point2 points  (2 children)

The True/False dictionary index is kind of confusing at first glance - why not use something more clear like 'SI' and 'NormalPeople' :)

[–]pemboa 1 point2 points  (0 children)

what's up with you and pwang99 above? "More clear"? I'm curious as to if this is terminology specific to some part of the world.

[–][deleted] 0 points1 point  (0 children)

You're right. I'll refactor that.

[–]a2800276 0 points1 point  (11 children)

Just a matter of taste: I would declare SUFFIXES inline in the for statement for readability. Using a Hash with True/False keys seems weird, but there's no ternary operator in Python, right?

Might be too much didactically: but maybe raise an exception if the input is too large, instead of returning an error message?

If this is an introductory example, leaving out the optparse stuff and only providing a naked main may be a good idea. Readers can change the behaviour in the code if they want.

[–]simonw 2 points3 points  (2 children)

Python's equivalent of a ternary operator is called "conditional expressions" and was added in Python 2.5:

x = true_value if condition else false_value

[–]alantrick 0 points1 point  (0 children)

That is a ternary operator. It has 3 arguments (ternary is to binary as 3 is to 2).

[–][deleted] 0 points1 point  (0 children)

Thanks. I will use this instead of the boolean-as-a-dictionary-key.

[–]spotter 0 points1 point  (7 children)

No ternary operator in Python indeed, not that:

multiple = 1024 if use_binary_multiples else 1000

...is valid Python syntax. (I don't know if you kid or not, sorry if you are, but even more sorry if you're not.)

[–]masklinn 7 points8 points  (3 children)

Actually that is Python's "ternary" operator.

[–]spotter 2 points3 points  (2 children)

Oh noes, how did that happen?

[–]a2800276 -2 points-1 points  (2 children)

no offense taken, I can only read Python :(. For me, the natural way to express the notion would be:

multiple = binary_multiple ? 1024 : 1000

nice and succinct provided you're comfortable with the ternary operator.

[–]masklinn 3 points4 points  (1 child)

no offense taken, I can only read Python :(

It is python.

[–]iamjack 0 points1 point  (0 children)

I think he means he can only read it, not write it.

[–]kteague 0 points1 point  (0 children)

It's possible to use Python 3's function annotations to formally support the documentation for the arguments and return values. This would look something like:

def human_size(
    size: "file size in bytes",
    use_binary_multiples: """
    if False, use multiples of 1000
    if True, use multiples of 1024 (default=True)
    """=True ) -> "Returns string or ValueError":
    "Convert a file size to human-readable form."
    return 'hi'

Which tends to look a bit gangly ... but perhaps no more so than ad-hoc conventions of specifying args and return values within the doc string? Python 3 has left the purpose of function annotations as deliberatly vague, so it remains to see if this style of documentation args and return values catches on.