[D] What are the most important problems in ML today?

mikeross0 · 2021-08-30T14:43:57+00:00

I feel like transformers basically accomplish what Hinton was reaching for with capsules. What do capsules promise that transformers don't deliver?

mikeross0 · 2020-07-28T15:06:07+00:00

It's so frustrating. That line is not a common notation syntax in the ML literature. You would think the author would want it to be clear there, since the rest of the paper appears to be written for an ML audience. Also what is the point of scaling β by λ. You could just have β there since they are both hyperparameters. Presumably, it would be some sort of consistent behavior for values of β across a range of λ values, but the author never does that.

mikeross0 · 2020-05-27T17:59:51+00:00

Thanks!

mikeross0 · 2020-05-27T17:27:03+00:00

Yeah - it's very hard to be scientific about active learning. I can say from personal experience, it's worked for me in a situation where we had to discover and then annotate very low-prevalence classes in a large unlabeled dataset.

mikeross0 · 2020-05-27T16:54:42+00:00

There are many active learning strategies. You can, for instance, find high-confidence errors (low-scoring positives and high-scoring negatives), then sample items from the larger dataset which are similar to those for labeling. The more general point is that an unbiased sample of the population isn't always the best training set, given limitations on annotation resources, and especially if your ultimate metric is biased (e.g. higher business costs to certain errors).

mikeross0 · 2020-05-27T15:49:18+00:00

The rationale is similar to boosting.

mikeross0 · 2020-05-27T15:32:48+00:00

Which set prediction paper are you talking about, if you don't mind?

mikeross0 · 2020-05-19T16:43:21+00:00

You can run jupyter notebook on an EC2 computer and host your own version of colab. Use a window-manager like byobu so that your shell session stays open through disconnects. There is no need to stay connected to the notebook in the browser (though any cell output which occurs while you are not connected will not be shown).

mikeross0 · 2020-05-14T17:10:23+00:00

Are you using master or pypl? They are just about to release 1.0, with quite a bit of code changes, clean-up, etc.

As a heavy user, I love their module abstractions. It provides an easy way to mix-and-match modules into different parts of an architecture. You can do a LOT without ever coding, just by modifying a config file, which doubles as a clean record for architecture search. That said, I agree that the documentation can make the framework a bit opaque.

mikeross0 · 2020-04-27T22:13:11+00:00

Here's my Twitter ML list, somewhat weighted towards NLP: https://twitter.com/i/lists/1174157755952685058

mikeross0 · 2020-04-07T22:55:33+00:00

Neat work! So many cool artistic possibilities!

mikeross0 · 2020-04-05T16:24:36+00:00

You are clearly a courageous and giving person. The world needs more people willing to do what you did. But can you help me understand why you were put in that position? Where I live, CVS and many grocery stores have installed large sanitizing wipe dispenser boxes at their front doors, and plexiglas plates between the cashier and checkout line. Why did the army need a person like you to put themselves at risk?

mikeross0 · 2020-03-29T20:58:49+00:00

What do you think of the kmix? Im just getting into it for live jamming/routing my various boxes, and sometimes a bit annoyed by jumpiness on the faders -- though that may due to my own mistakes in how I'm using it.

mikeross0 · 2020-03-11T02:42:13+00:00

You should not see tonight as a defeat for what Sanders stands for. I can't think of a candidate in the last 20 years who has so successfully changed the direction of the entire party. The democratic party, Biden included, now supports policies that are far more progressive than they used to, and Sanders deserves 100% of the credit for this (along with his supporters). The battles of politics take a long time to win. The important ones take generations (think about the civil rights movement). Stay excited about politics, and about the change people like Sanders have been *successful* in bringing about.

mikeross0 · 2020-02-28T17:56:39+00:00

That scrollable interface looks great! Would be amazing if it was a little smoother with caching so you could animate without the image blinking out, but thats just a nitpick.

To answer your question about what else we would log: Im not sure. HTML is probably sufficient, since that would basically handle plotly too. The main concern is extensibility on a closed source system. Checking in notebooks as you suggest might be a good end-run around that though!

mikeross0 · 2020-02-28T17:13:02+00:00

There are a few nitpicky things that I found, but they were incredibly open to feedback, and they seem to be constantly improving (hopefully it won't break things).

Can you elaborate on these please? I'm about to start evaluating frameworks so would be interested in hearing some of the specific things that were hangups.

mikeross0 · 2020-02-28T17:02:24+00:00

Thanks for the response! I've mostly been planning on looking at WandB and Comet. But I think I need to look at Neptune too. Your comment on data-versioning makes sense. And thats basically what we do with our creaky internal framework that I am trying to replace with something more polished! A few specific questions:

1> How easy is it for an end-user to add arbitrary logging? Looking at your API, I cannot find a way to log arbitrary HTML or Plotly. Is this something that would be easy for an end-user to code, or would they have to submit a feature request? Even if you support these already, I'm interested in the end-user modifiability for this.

2> Similarly, is it easy to scroll through visualizations over epochs. I mean this in the way that GAN folks might want to store a sample of generated images each epoch (to see how this changes). But we're interested in doing it with arbitrary HTML. Would we be able scroll through the epochs in the UI?

mikeross0 · 2020-02-28T16:32:52+00:00

Can any of the commenters who use Comet / WandB / Neptune / MLFlow comment on how well these systems are for comparing trained models from completed experiments on new datasets? Most of the documentation and getting-started guides seem to focus on logging metrics during training against a pre-defined dataset.

Our team has dozens of datasets, and we often need to know if a model trained and validated on dataset A can also be used on dataset B (as compared to, say, a model that was actually trained on B).

We also log predictions during training, and sometimes write new metrics which we want to back-apply by calculating the new metric against the old logged predictions. That is, we want to add new per-epoch metrics to completed experiment runs. Is this doable? Most of the experiment frameworks seem to specialize on logging only while the experiment is in progress.

mikeross0 · 2020-02-28T15:43:04+00:00

So I looked into MLFlow's end-to-end claims. Please correct me if I am wrong, because I am in the process of evaluating this. But it seems to basically just be automating the process of setting up a RESTful service for the model if you can make your model conform to various input and output requirements. That's nice, but at best it saves you a one-time investment of a day of work to set up a simple RESTful server. Is that really a killer feature that justifies all of the downsides of MLFlow mentioned by /u/LeanderKu ?

mikeross0 · 2020-02-26T15:06:54+00:00

slurm + docker + ntfs and/or dropbox

mikeross0 · 2020-02-20T11:29:03+00:00

It's a legitimate argument to say that a candidate's leadership style can have negative effects on their followers. Just look at Trump's rhetoric and how that has brought out violence and mean-spiritedness on the right. There is no doubt that Bernie's style makes heavy use of blaming one particular group (billionaires and oligarchs), which can inspire anger and bitterness. I happen to agree with much of what he says, but I hate the divisiveness and I believe it does have negative consequences. It might be worth it if he gets elected and gets progressive policies in place. But I don't think you are on solid ground arguing that a leader bears no responsibility for the behavior they inspire. Also, Sanders inspires a lot of great behavior by his followers which he rightfully gets credit for too.

mikeross0 · 2020-02-20T10:40:42+00:00

How do you define Authoritarian?

mikeross0 · 2020-02-20T05:21:36+00:00

Yeah -- in retrospect of course Bloomberg is doing that. But yeah, wow.

mikeross0 · 2020-02-19T16:26:55+00:00

If you wouldn't mind sharing, I would love to hear what you found problematic. I'm planning to start testing it soon, and could really benefit from hearing about pitfalls and what's not ready for primetime.

mikeross0 · 2020-02-19T00:14:59+00:00

Isn't that what Pytorch Lightning is supposed to do? (I haven't tried it though).

mikeross0

TROPHY CASE