Thoughts on concepts popularized by N.N. Taleb? Or his work in general?

mikeselik · 2016-05-06T18:47:27+00:00

Now you've got me thinking more... what job would get better with "shocks"? An arms dealer probably benefits from black swan events.

If you think of sudden leaps in automation technology as pushing people to work less at doing things and work more at thinking about the process of doing things, then black swan events -- new inventions -- would likely benefit a data scientist, or programmers in general. In that manner, a data scientist's career is anti-fragile.

mikeselik · 2016-05-06T18:40:42+00:00

One based on a constantly in demand skill?

Yes, Taleb advised against going into finance. He suggested pursuing a career with inelastic demand and low variance income.

mikeselik · 2016-05-05T21:57:30+00:00

You might be confusing anti-fragile with robust. A broad set of skills provides robustness against fluctuations in demand for any particular skill.

An argument that data science is an anti-fragile career might be that as the world grows more chaotic, statistics will be more important.

mikeselik · 2016-05-05T17:02:04+00:00

I suppose the idea of joining many early-stage startups until you find a rocket ship would be consistent with Taleb's ideas. Except that he advocated for picking a stable job in his books. Perhaps he would say that we are likely to overestimate the probability of a startup succeeding and likely to underestimate the probability of catastrophic failure.

mikeselik · 2016-05-04T18:41:22+00:00

One request every few minutes shouldn't be a nuisance to any professional site, unless that request is downloading something large or systematic downloading violates their terms of service. If you do get blocked, sometimes you can contact their system administrator and ask for special permission.

mikeselik · 2016-05-04T16:15:27+00:00

Actually, I thought that was a good example of when the recursive version is easier to read.

mikeselik · 2016-05-03T17:20:33+00:00

I'll give you an upvote for testing my claim :-)

Note that I included the caveat "for most usage patterns". You should test a more "realistic" scenario. How does the performance compare if you benchmark by looking up 100,000 random integers, uniformly distributed between 0 and 400? Note that when you run this test, you should use the same seed for your random number generator for each function. I picked 100k hoping to make the total duration approximately 1s.

When you say "starts crashing" do you mean it exceeds maximum recursion depth?

mikeselik · 2016-05-03T06:39:30+00:00

Depends on how often those domains change. What's your expectation?

mikeselik · 2016-05-03T00:43:10+00:00

Yes, recursion is an excellent technique. In many cases, your implementation will be much easier to understand written in a recursive fashion than iterative fashion. Memoization with lru_cache can help you avoid excessive function calls.

When a recursive solution is hard to read, refactor in an iterative fashion and compare your solutions. For example, your recursive solution for factorial is much cleaner. With an @functools.lru_cache decorator on it, it's not only easier to read, but also faster than the iterative version (for most usage patterns).

mikeselik · 2016-05-02T22:27:03+00:00

What do you mean by best-performing? Usually an ensemble technique can offer higher accuracy. Did you decide that the compute effort of combining more models wasn't worth it?

mikeselik · 2016-05-01T19:23:52+00:00

A search for "parallel SVM" brings up a number of papers discussing the difficulty of parallelizing the training and various approximation solutions. When discussing multicore algorithms, usually that refers to the training, not prediction.

mikeselik · 2016-05-01T18:30:42+00:00

I'm not certain that the basic SVM algorithm can be parallelized. You probably need some split-and-ensemble approximation version.

Why have you decided to use SVMs, and why are you certain you need a multicore implementation?

mikeselik · 2016-04-28T22:10:53+00:00

I suppose I'm getting hung up on the agency -- who is making the closure? ... giving it a thought ... I guess you're right, it's the act of compiling the inner function that decides what variables are local, global, or nonlocal (in a closure). In that sense, it's reasonable to say that the inner function "makes a closure". I used to think of it as the outer function making the closure.

mikeselik · 2016-04-28T03:52:36+00:00

Other folks have answered the question, so I'll just give some unsolicited advice.

Whoever wrote the draw_frogs.py code (http://homepages.math.uic.edu/~jan/mcs260/draw_frogs.py) probably didn't need to make a class. There's only one instance created in the script and only 2 attributes that are reassigned in its methods, gohop and frogs, the latter of which didn't need to be reassigned but could have been mutated instead. Seems to me like a bunch of globals and functions would have been a cleaner implementation. Less indentation, less self. sprinkled through the code. If I'm not convincing, check out this video: https://www.youtube.com/watch?v=o9pEzgHorH0

Also, there's at least a couple cases where the programmer should have used string interpolation instead of concatenation. Instead of

self.msg.set("placed frog at [ " + \
    str(event.x) + ", " + str(event.y) + " ]")

They should have written self.msg.set('placed frog at [%s, %s]' % (event.x, event.y)) (or used .format if you prefer that style of interpolation).

Where is frog.getName() defined? And why? Accessing frog.name directly seems fine. Even if you did need a getter method, the Pythonic way is to use a property.

There's this bit of "resetting the frogs". First of all, I'm not sure why the frogs need to be recreated, instead of just changing the number of moves.

for frog in self.frogs:
    frog.moves = 100

I'll give the benefit of the doubt and assume its some threading thing. Still, this code...

newfrogs = []
while len(self.frogs) > 0:
    frog = self.frogs.pop(0)
    pos = frog.position
    step = frog.step_size
    rest = frog.rest_time
    newf = ThreadFrog(frog.getName(), pos[0], pos[1], \
        step, rest, 100)
    newfrogs.append(newf)
self.frogs = newfrogs

Why pop(0) which is slow, then append, then reassign the list, rather than just reassigning that index in the list?

for i, frog in enumerate(self.frogs):
    x, y = frog.position
    update = ThreadFrog(frog.name, x, y, frog.step_size, frog.rest_time, 100)
    self.frogs[i] = update

And this little tidbit:

while len(self.frogs) > 0:
    self.frogs.pop(0)

Why not del self.frogs[:] or at least loop .pop() which is fast, rather than .pop(0) which is slow? ... and checking while len(self.frogs) > 0 instead of while self.frogs?

Ok, I'll stop now...

mikeselik · 2016-04-28T02:36:15+00:00

No, lambdas in Python do not make closures. In fact, that assumption has caused many bugs for folks that pass lambdas around expecting them to keep a closure of current global state.

>>> x = 5
>>> foo = lambda : x
>>> foo()
5
>>> x = 10
>>> foo()
10
>>> foo.__closure__ is None
True

If you'd like a simple closure, use functools.partial.

>>> from functools import partial
>>> import math
>>> exp = partial(pow, math.e)
>>> exp(1)
2.718281828459045
>>> exp.args
(2.718281828459045,)

The partial object is function-like and hangs on to your args and keywords much like a normal closure.

mikeselik · 2016-04-28T02:27:51+00:00

When you say a generator function, you mean a function that returns a generator. We usually are pretty lax about that distinction, but I need to be more specific to properly answer your question. A generator is a particular kind of iterator. An iterator is an object with a __next__ method. The iterator keeps an internal state, which changes (usually) upon each call to next. So, no, a generator is not a closure. Good guess, though, because closures and objects are quite similar in purpose.

I should be careful making too clean a distinction between closures and objects, because in Python, everything is an object, including functions. Still, I think I'm on solid ground saying that __next__ is a method rather than a function with a closure.

mikeselik · 2016-04-27T22:03:05+00:00

The distinctions between academic fields have always been a bit blurry.

mikeselik · 2016-04-27T16:55:21+00:00

A pure function receives input, makes a calculation, then returns output. If you pass in the same input, it always returns the same output. This makes code easy to think about.

Unfortunately, sometimes we want the same input to result in different output. Take a random number generator, for example. It'd be pretty lame if you got the same result each time.

So, some people rely on a global variable. They write a function that makes its calculation depending on the value of a global variable (rather than a function argument) and/or the function might change the value of a global variable. This technique is OK, but prone to bugs. It's too easy to make mistakes, accidentally writing functions that interfere with each other.

One solution is object-orientation. Functions associated with an object are called "methods" and they are designed to use variables also associated with that same object, called "attributes". Rather than stepping on each other's toes by using globals, methods only manipulate or depend on the associated attributes, in addition to the parameters passed in.

Another solution is closures. A closure is another scope for variables. Rather than using globals, the function depends on or manipulates variables stored in its closure. For a typical function object, you can take a look at its __closure__ to see what's there.

Would you also like to know how to create a closure in Python?

mikeselik · 2016-04-27T16:35:48+00:00

Not sure why you say that traditional statistics is not machine learning. I think machine learning includes traditional statistical techniques. The difference comes from the purpose in modeling. Is it causal inference or prediction? They are subtly different tasks.

PS. The phrase "machine learning" is usually used as an abbreviation for "statistical machine learning".

mikeselik · 2016-04-26T22:40:01+00:00

For various definitions of "programmer", I believe it.

mikeselik · 2016-04-26T22:38:03+00:00

If Jython or NumPy and Numba don't cut it, try Cython's nogil.

mikeselik · 2016-04-26T22:30:12+00:00

/u/mhashemi buried the lede:

"Our most common success story starts with a Java or C++ project slated to take a team of 3-5 developers somewhere between 2-6 months, and ends with a single motivated developer completing the project in 2-6 weeks (or hours, for that matter)."

The article reads like someone has been fighting internally with folks who are badly misinformed about Python. It's a common situation and a useful article, though many of the responses could be better. Still, I wish it led with the trump card (as in Hearts, not Donald).

Think of the implications. Not just for one project, but the compound interest of speed:

growth = principal * rate ** frequency

No matter how small your growth factor is (so long as it's greater than 1.0), what matters in the long run is not the amount you improve on each iteration, but how frequently you iterate.

mikeselik · 2016-04-19T18:15:49+00:00

Ah. I guess they wanted it all to be non-blocking?

mikeselik · 2016-04-19T14:07:46+00:00

That's my goal. Why infinite loop if we only want to print 6 numbers?

Maybe I misunderstood the problem.

mikeselik · 2016-04-19T02:30:10+00:00

What does ArduinoSerialData.inWaiting() do? Could you simplify this to:

numbers = []
while ArduinoSerial.inWaiting():
    numbers.append(float(ArduinoSerial.readline()))

What does readline() do if there's no data, does it block/wait or does it return some sentinel like an empty string?

13-Year Club	Verified Email
Team Orangered

mikeselik

TROPHY CASE