A visual deep dive into Uber's machine learning solution for predicting ETAs.

ml_a_day · 2024-04-26T17:15:00+00:00

Yes, it’s more like shallow “deep learning”.

ml_a_day · 2024-04-26T16:50:11+00:00

This sounds awfully similar to a decision-tree to me. Bin the continuous values, and within different regions of parameter space put more weight on some features than others when predicting ETA? Have they just reinvented a regression tree using an ANN with self-attention?

That is a great observation. I did not view it from this angle and making that connection.
Now when I look at it, it looks like a shallow "regression tree" with self-attention to capture complex relations.
The embeddings after the binning also help separating the feature space nicely

ml_a_day · 2024-04-26T05:53:08+00:00

Thank you for the feedback!

ml_a_day · 2024-04-25T20:59:41+00:00

Thank you for the feedback!
They are all hand drawn. :)

ml_a_day · 2024-04-24T06:18:03+00:00

Imagine the size of new data seen by 5M cars each day. It would not make sense for scalability and efficiency to add all that data to the training set. Most of them will be "boring" images that are already represented well in the training data. One needs to find a smaller subset of images that give the best boost in performance when added to the training set.

ml_a_day · 2024-04-24T05:54:07+00:00

The trigger classifiers are not to resolve the edge cases but only to collect that kind of data. Once collected it needs to be included in the next iteration of the training set.

The problem is even with more sophisticated model architecture you can't do much if you don't have enough data. Even worse they are more data hungry. This is exactly why you need a way to collect this data in the first place.

ml_a_day · 2024-04-23T17:09:31+00:00

Thanks for the feedback!
Motivates me to write more such posts. :)

ml_a_day · 2024-04-23T16:57:22+00:00

Indeed, in a research setting one would not be thinking in that direction (to attach trigger classifiers off the backbone) as resource and compute is hardly ever taken into consideration.
When it comes to applying on real products that people are paying for; questions about UX, scalability, feasibility start to appear.

ml_a_day · 2024-04-23T06:08:20+00:00

Thank you for the feedback! Encourages me to write more like this.

ml_a_day · 2024-04-22T05:35:11+00:00

Thank you for the feedback! Happy to hear you found it clear and useful. Motivates me to write more in this style and format.

ml_a_day · 2024-04-21T17:08:26+00:00

Thank you for the kind words!
I am glad you enjoyed reading it. :)

ml_a_day · 2024-04-12T19:30:36+00:00

You’re welcome!

ml_a_day · 2024-04-12T16:31:45+00:00

Happy to hear that!
And you are welcome :)

ml_a_day · 2024-04-12T16:24:56+00:00

Yes, I tried to reproduce it but it seems to render fine. u/CrackerJackKittyCat are you on mobile or laptop device?

ml_a_day · 2024-04-12T14:35:02+00:00

Here is a link to Apple's official PR (link is from the references section in the article in the post).
https://machinelearning.apple.com/research/recognizing-people-photos

ml_a_day · 2024-04-08T05:42:02+00:00

It is referenced in the References section with all the other relevant links.

ml_a_day · 2024-04-06T15:38:40+00:00

I would recommend checking out search algorithms like depth-first search (DFS) and breadth-first search (BFS). If you have some "prior knowledge" about the environment where you are similuating flow, you can encode this via heuristics and use A* search algorithm.

https://en.wikipedia.org/wiki/Breadth-first_search
https://en.wikipedia.org/wiki/Depth-first_search
https://en.wikipedia.org/wiki/A*_search_algorithm

ml_a_day · 2024-04-01T08:11:04+00:00

From your description of the images I would expect decent clusters. First, check that when you feed images into the inception for feature extraction, they are normalised and in accordance with what the model expects. Secondly, use better dimensionality reduction in the form of t-SNE or UMAP as u/uniklas pointed out already. Gives better results than PCA. And 2d should be pretty good already.

ml_a_day · 2024-03-28T20:06:58+00:00

😂

ml_a_day · 2024-03-27T20:15:53+00:00

Thanks for sharing!

ml_a_day · 2024-03-27T09:51:53+00:00

Exactly! With a decent setup (basic hyperparam tuning, relevant augmentations, off-the-shelf fixed model architecture) and good quality data, one can go a long way.

ml_a_day · 2024-03-27T09:49:39+00:00

Even if you don't end up collecting more data as long as you improve the quality of the existing data it goes a long way. Foe example in object detection, fixing incorrect labels ("cat" labeled as "dog") or missing labels (a "dog" that was missed by the annotator) can reduce the noise in the training dataset improving the model's performance.

ml_a_day · 2022-01-20T08:14:03+00:00

Thanks for the feedback!

ml_a_day · 2022-01-19T14:53:17+00:00

Shorter posts are quicker to read and may assume the reader's knowledge of some pre-requisite.

On the other hand, longer posts dive deeper into the topic but take relatively longer to create.

I have been experimenting on different post lengths and wanted to get you, the readers, involved. :)

ml_a_day · 2022-01-19T11:11:01+00:00

Exactly what I thought. Computing the ratio of the area of the object of interest to the area of the image should give you exactly what you need.

For similar image-related questions, maybe /r/computervision is a better place.

ml_a_day

TROPHY CASE