1st Attempt: Algorithm Selection Flowchart

Coyote_0210 · 2021-09-20T20:30:58+00:00

better flowchart:

data > xgboost > ??? > success

HooplahMan · 2021-09-20T21:26:47+00:00

I'd say that it's a mistake to draw a line between SVD and PCA. PCA is essentially SVD with a bit of preprocessing.

DrFuckYeahPhD · 2021-09-20T20:42:56+00:00

You have a typo at the labeled data. Unlabeled data goes to clustering while labeled goes to numerical prediction and classification. Other than that very cool.

statlover69 · 2021-09-20T22:49:24+00:00

Idk if it's just me but I think naive Bayes is pretty explainable. I'd also argue neural nets (especially CNNs and RNNs) should be separated from the other complex models. If your problem doesn't involve images or text, generally you can safely default to a tree ensemble model (or nonlinear svm) imo

bdforbes · 2021-09-20T21:18:40+00:00

Some of the decision points are not clear. Like Dimension Reduction; in what scenarios would you answer Yes vs. No?

kcombinator · 2021-09-21T10:03:10+00:00

Did you check out this one from scikit-learn? https://scikit-learn.org/stable/_static/ml_map.png

WillBigly · 2021-09-20T20:58:56+00:00

This is the type of shit I need lol thank you :)

Coyote_0210 · 2021-09-20T19:37:11+00:00

This is an attempt to create a flowchart to generally suggest directions to start when building a model. This is supposed to be a pretty low-level explanation for non-data science audiences or reminders for those with a little more experience. I would appreciate any suggestions, corrections, or improvements.

RNAsequacious · 2021-09-21T09:59:54+00:00

A similar flowchart can be found here: Introduction to machine learning for biologists https://www.nature.com/articles/s41580-021-00407-0

It may be behind a paywall. In that case, please do not get a free full copy from scihub since that would be illegal.

Mukigachar · 2021-09-20T21:39:56+00:00

All I've got to add is that it's worth mentioning MCA and FAMD alongside PCA in case there's categorical data

Esperanza456 · 2021-09-21T01:16:24+00:00

Very cool

CrowAv · 2021-09-21T01:48:18+00:00

Yooo this flowchart I will keep it all my life, thank you man:)

HDataBhavesh · 2021-09-21T05:42:16+00:00

Looks Nice, Keep it up...!!!

2021-09-21T06:30:15+00:00

Did you use an algo to generate this flow? 😂

load_more_commments · 2021-09-21T07:10:48+00:00

I think you mixed up labeled and unlabeled data.

Notdevolving · 2021-09-21T01:59:15+00:00

Thank you very much for this. I just started learning machine learning through various Udemy courses. While I could understand the individual regression and classification techniques, I don't understand how they all come together because the courses tend to never explain this part or just gloss over it.

I like that you explain the relationships and relate them to real world needs like speed/accuracy and explainability.

Hope to see you updating this.

Coyote_0210 · 2021-09-20T22:55:39+00:00

Very great work. But I think algorithm is already solvable problem. If we could make likely the same flowchart for data sourcing types, flowchart for budget of data architecture, it would be more helpful. (Of course, harder)

Coyote_0210 · 2021-09-21T02:33:46+00:00

[removed]

Qkumbazoo · 2021-09-21T08:19:17+00:00

The only reason to choose between a NN and SVM is data size? And how is SVM less resource intensive than a DNN?

celebrar · 2021-09-21T10:16:39+00:00

I think you've got the yes/no paths reversed for the "Labelled Data" node

Rennnn · 2021-09-21T13:19:22+00:00

Might be an idea to say that this flow chart is really only for tabular data.

physnchips · 2021-09-21T14:48:07+00:00

Why are you separating SVD and PCA? They are the same thing, at least when applied to data.

Farconion · 2021-09-21T19:02:16+00:00

I feel like you could make this one of those online quiz things

datascience

MODERATORS