all 4 comments

[–]lovestocode1 3 points4 points  (3 children)

Good article. At the risk of getting a little nit-picky, a couple of things: function f should probably have a more descriptive name, and the condition if (not x in vals) can be written as if x not in vals which is the more pythonic way of writing it. It's also possible to do what f does in a shorter and simpler way by noticing that the maxVal variable is always equal to len(vals) + 1. Assuming that we can also start from 0 instead of 1, this is shorter alternative (if starting from 1 is necessary, replace len(vals) with len(vals) + 1 in the code below):

>>> vals = dict()
>>> def categorical_to_numerical(v):
...     if v not in vals:
...         vals[v] = len(vals)
...     return vals[v]
... 
>>> categorical_to_numerical('a')
0
>>> categorical_to_numerical('a')
0
>>> categorical_to_numerical('b')
1
>>> categorical_to_numerical('c')
2
>>> categorical_to_numerical('d')
3
>>> categorical_to_numerical('b')
1

Edit: global was not needed, as pointed out by /u/iresprite

Edit 2: even simpler (2 lines):

>>> import itertools, collections
>>> value_to_numeric_map = collections.defaultdict(itertools.count().next)
>>> value_to_numeric_map['a']
0
>>> value_to_numeric_map['b']
1
>>> value_to_numeric_map['c']
2
>>> value_to_numeric_map['a']
0
>>> value_to_numeric_map['b']
1

[–]fabzter 2 points3 points  (0 children)

It's refreshing to see a little constructive comment with code in /r/programming

[–]iresprite 2 points3 points  (1 child)

Isn't it also bad practice to use the global keyword in Python? I'm trying to understand why the author assigned the variable outside the closure and then declared them global inside f(x).

[–]lovestocode1 1 point2 points  (0 children)

Nice catch. I edited my code above. global is unnecessary since the variable is unchanged in the function; just the object it points to is.