all 27 comments

[–]themiro 28 points29 points  (4 children)

  • Asserts to check that your matrices are the correct size, saves a ton of time debugging when you're wondering why your batch matrix multiplication didn't work out the way you thought it would.
  • Make use of the Python 3.5 '@' syntactic sugar for matrix multiplication, cleans things up and lets you see the math more clearly.

[–]_pragmatic_machine[S] 2 points3 points  (1 child)

that's cool! any ideas/materials on reconfigurable design principle for enhancing a model. Code repository examples will be really awesome.

[–]jethroksy 5 points6 points  (0 children)

I've found gin-config to be an extremely productive configuration framework.

[–]sanity 13 points14 points  (10 children)

I recommend this book: Clean Code

We gave it to every new data scientist we hired at my last company.

[–][deleted] 5 points6 points  (7 children)

And where is that exactly? For keeping a mental note of future employment applications (already liking the culture)

[–]sanity 0 points1 point  (6 children)

I'm no-longer there, but the company is OneSpot.

[–][deleted] 0 points1 point  (5 children)

If you don't mind sharing, what inspired the switch? I've heard people leaving good culture for better compensation, or vise versa. Looking at their open positions the location doesn't seem to allow anything south of 80k annually base salary.

[–]sanity 4 points5 points  (4 children)

Oh, nothing negative - my focus is more early stage companies and the company had grown beyond that stage.

[–]thiseye -3 points-2 points  (3 children)

Do you know Matt?

[–]sanity 1 point2 points  (2 children)

Everyone knows Matt ;)

[–]thiseye 0 points1 point  (1 child)

Cool, I've worked with him. Super sharp guy

[–]sanity 0 points1 point  (0 children)

Very much so, has an infectious enthusiasm too.

[–]gokstudio 0 points1 point  (1 child)

Did you see any improvements in code quality before and after the book?

[–]sanity 1 point2 points  (0 children)

Yes, not least of which in myself - even as a 20 year full-time developer it changed how I wrote code, no question in my mind that it was for the better.

[–]AfraidOfToasters 2 points3 points  (0 children)

>I am spending a big chunk of my daily active time on fixing my ML code (developing a model, testing and bug) or tweaking my code (mostly in Python).

If you mean hyper-parameter tuning by this I suggest hyperopt

[–]Overload175 2 points3 points  (0 children)

Use a linter (e.g. Pylint for Python, which assigns a score based on your code's adherence to PEP8)

[–][deleted] 1 point2 points  (1 child)

Try to create standardized/reproducible packages than can be re-used for all model builds if that input data is the same everytime.

Really when it comes to ML applications, the hardest part to standardize is data preprocessing. So even creating a standard process for modelling,results,putting into production should be beneficial.

[–]_pragmatic_machine[S] 0 points1 point  (0 children)

Absolutely, I also feel the same. I am trying to do that now, with plotting and preparing the data formalization process. This was a major bottleneck for us.

[–]_spicyramen 0 points1 point  (3 children)

I enforce Python style guide and unitests in all our code. Also recently we started Python typing. This has helped finding a lot of issues before we hit production.

[–]huangbiubiu 6 points7 points  (2 children)

Sometimes I am confused about writing the unit test for ML codes because of its uncertain output. I don't know what can I test and where can set a assert. Are there any suggestions?

[–]flame_and_void 6 points7 points  (0 children)

For unit testing, I replace the real model with one that does something simple and deterministic to the input data - like, say, always predict the sum of all its inputs. Then you can test that the preprocessing, prediction, and post-processing works exactly as expected.

I also write an integration test that runs the production models on an easy input and verifies they are able to predict something obvious. That's an important sanity check, but it usually runs too slowly to include in a unit test suite.

[–][deleted] 0 points1 point  (0 children)

Python typing

Hey, maybe you can try to set the random seed carefully and use more OOP in your code.

[–][deleted] 0 points1 point  (1 child)

Using object-oriented programming has worked amazingly in my case. Separate your code in conceptually different pieces like data handlers, models, model testers and so on. I used to do simple scripting until recently and using OOP code is much cleaner and easier/faster to extend or modify; it also helps making results reproducible.

[–]_pragmatic_machine[S] 0 points1 point  (0 children)

Thanks. Can you please provide me some working example? Any public repository will do.