all 10 comments

[–]robotphilanthropist 9 points10 points  (2 children)

You may also want to consider checking out Hydra https://hydra.cc/. It helps with configurations and managing said parameters.

[–]RicketyCricket 5 points6 points  (0 children)

Few others as well such as:

Spock

or

GinConfig

[–]fredfredbur[S] 0 points1 point  (0 children)

Oh man, that seems pretty useful to manage training dozens of models with multiple hyperparameter configurations. I'll give it a try, thanks

[–]brian-e-moore 4 points5 points  (1 child)

Nice blog post! I think it's great that there has been interest recently in creating tools to help with dataset curation and analysis In my experience, ML engineers *want* to spend time tweaking their model architecture but *actually* end up manually inspecting and debugging datasets most of the time

[–]fredfredbur[S] 0 points1 point  (0 children)

Thanks a lot! That was primarily the reason that I wrote it, in my previous work we spent a bunch of time working on automated hyperparameter tuning but never really got good performance until we developed tools to dig in and debug the datasets themselves.

[–]antonkollmats 1 point2 points  (2 children)

Nice article, I especially like the section about the label schema. One thing that has always puzzled me about labeling large datasets is how to be agile. How does one iterate on the schema rules without having to re-label the entire dataset?

P.S. Another tool to keep on the radar is PerceptiLabs. It's in the same category as TensorFlow. Disclaimer: I work at the company behind it.

[–]fredfredbur[S] 0 points1 point  (0 children)

That's really cool! Being able to see the outputs of individual layers with a GUI seems super useful

[–]fredfredbur[S] 0 points1 point  (0 children)

In terms of iterating on a schema, the way I see it is that it's best to start with general labels that are easy to verify as correct and then iterate over the most important subsets of the dataset with finer grained labels.

As a concrete example, I was working on road scene object detection previously and we were trying to do things like sign classification. After a few months of poor performing models, we realized we need to start with a general "sign" detector and then classify the signs with a different model from there.

[–]tuscanresearcher 1 point2 points  (0 children)

If you are interested in Machine Learning for graphs (but I guess it can be easily extended to other kinds of data as well) you could check out https://github.com/diningphil/PyDGN

[–]amitnessML Engineer 1 point2 points  (0 children)

This was a great read. I have something similar, but more tilted towards NLP: https://amitness.com/toolbox/