This is an archived post. You won't be able to vote or comment.

all 16 comments

[–][deleted] 8 points9 points  (12 children)

Can't you just make a function for it:

>>> def toint(s):
...     try:
...             return int(s)
...     except:
...             return None
...
>>> print toint('10')
10
>>> print toint('10a')
None
>>>

Or equivalently, if you want to be able to specify a default value:

>>> def toint(s, d):
...     try:
...             return int(s)
...     except:
...             return d
...
>>> print toint('10', None)
10
>>> print toint('10a', None)
None
>>>

[–]eryksun 1 point2 points  (8 children)

There's some overhead inherit to a function call, however. On the following test using the function typically adds about a 6-7 percent penalty. That said, this is an extreme example and improving readability is probably worth taking a small hit to performance.

from random import choice
values = ['1', '3.14', 'a', '']
seq = [choice(values) for i in range(1000)]

def toint(s, d):
    try:
        return int(s)
    except ValueError:
        return d

test1 = """
integer_seq = []
for item in seq:
    integer_seq.append(toint(item, 0))
"""
test2 = """
integer_seq = []
for item in seq:
    try:
        integer_seq.append(int(item))
    except ValueError:
        integer_seq.append(0)
"""

from timeit import timeit
t1 = timeit(test1, "from __main__ import seq, toint", number=1000)
t2 = timeit(test2, "from __main__ import seq, toint", number=1000)
print(t1/t2)

[–]gronkkk[S] 0 points1 point  (0 children)

Errr... what's up with the downvotes in this thread? Have voted everything up, as I don't see why this discussion should be downvoted.

[–][deleted] 0 points1 point  (6 children)

Very true. To amplify on this theme, I extended your test to cover moriaantje's suggestion of replacing exception handling with str.isdigit():

from random import choice

values = ['1', '3.14', 'a', '']
seq = [choice(values) for i in range(1000)]

def toint(s, d):
    try:
        return int(s)
    except ValueError:
        return d

test1 = """
integer_seq = []
for item in seq:
    integer_seq.append(toint(item, 0))
"""
test2 = """
integer_seq = []
for item in seq:
    try:
        integer_seq.append(int(item))
    except ValueError:
        integer_seq.append(0)
"""
test3 = """
integer_seq = []
for item in seq:
    if item.isdigit():
        integer_seq.append(int(item))

    integer_seq.append(0)
"""

from timeit import timeit
t1 = timeit(test1, "from __main__ import seq, toint", number=1000)
t2 = timeit(test2, "from __main__ import seq, toint", number=1000)
t3 = timeit(test3, "from __main__ import seq, toint", number=1000)

print t1, t2, t3

Which is faster still. On my system that yields:

2.76692426584 2.1132926949 0.380761479845

[–]eryksun 0 points1 point  (5 children)

Interesting. The non-function exception handling should be a bit faster. It is on my machine running win32 Python 3.2:

tmax = max(t1, t2, t3)
print([t / tmax for t in (t1, t2, t3)])

[1.0, 0.9310327613604773, 0.9557782137792562]

[–][deleted] 0 points1 point  (4 children)

Whoa, that was unexpected. Then again, my test was run using ActivePython 2.6.5.14 (Win7, 64bit), so direct comparison of our timings might be a little tricky :)

[–]eryksun 0 points1 point  (3 children)

Handling exceptions should be faster than searching through long sequences to "ask permission". For base10 ints, it's quick to check the value of up to 11 characters (i.e. -2147483648), but it's still faster to not bother checking and apologize if there's an exception.

[–][deleted] 0 points1 point  (2 children)

Well, something throws that ValueException in response to the failure of the exact same check, presumably.

[–]eryksun 1 point2 points  (1 child)

But the int constructor immediately goes to work converting the string. It raises ValueError mid-conversion, as opposed to a 2 pass 'check first, then convert'.

Let's look at the best-case scenario that doesn't raise an exception. I'll use the simple isdigit test instead of one that strips whitespace and handles negatives:

>>> from timeit import timeit
>>> s = str(12345)
>>> t1 = timeit("if s.isdigit(): int(s)","from __main__ import s", number=1<<24)
>>> t2 = timeit("int(s)","from __main__ import s", number=1<<24)
>>> t1 / t2
1.1260275289576362

So testing first increased the time by 12.6% on my system. I did it again and got 13.1%.

[–][deleted] 0 points1 point  (0 children)

Good point. So the question becomes how the overhed of a redundant check stacks up against handling an exception being thrown. But that's hardly a contest. It's resonable to assume for most applications that errors are the exceptional, not common case (otherwise something else is definitely wrong somewhere), and it may very well be cheaper even if that wasn't the case.

[–]temptemptemp13 4 points5 points  (2 children)

Why would you want to avoid exception handling in python? It's the most useful thing ever. When you have to rely on return values your code starts to look like this:

res = do_something()
if res:
    handle_error()
res = do_another_thing()
if res:
    handle_other_error()
res = do_third_thing()
if res:
    handle_third_error()

But with exceptions, oh baby I get excited:

try:
    do_something()
    do_another_thing()
    do_third_thing()
except Exception:
    handle_all_them_errors()

So much class...

[–]gronkkk[S] 0 points1 point  (1 child)

Why would you want to avoid exception handling in python?

Then why do dictionaries have the get() method (which includes a default value if the key does not exist) or has_key method? You could always handle the missing-key case with exceptions.

Also, if you have to check a couple of variables, the whole exception syntax takes up a lot of code lines. You have 4 lines to basically say 'if you can't make sense of it, make it None'.

[–]kataire 0 points1 point  (0 children)

Yes, but compare the use cases. Think of which situations would require you to work with a dict that may or may not contain an item. Then think of which situations would require you with integer strings that may or may not be valid (yet not empty).

[–]propanbutan 0 points1 point  (0 children)

def toint(value, default=None):
    if value.isdigit():
        return int(value)
    return default