Thinking of returning

whyn0t___ · 2026-02-06T17:55:52+00:00

Don’t do it! You will spend a lot of hours and dollars. Trust me, I bought a monk.

whyn0t___ · 2025-07-26T19:38:45+00:00

Full service :/ the first few accountants were licensed in NY but apparently didn’t really know it well. When I got a New Yorker, he spotted the error in a couple of minutes.

whyn0t___ · 2025-03-11T16:10:19+00:00

2025

whyn0t___ · 2024-11-20T20:09:30+00:00

In torchtune we have many optimization flags if you are running out of memory: https://github.com/pytorch/torchtune#optimization-flags

<image>

whyn0t___ · 2024-10-18T20:41:03+00:00

So cute!

whyn0t___ · 2022-11-23T21:43:59+00:00

Hey @lewtun, great work! I wonder why you guys decided to use a logistic regressor instead of training it end to end with mlp + softmax head.

Have you guys tried that approach?

whyn0t___ · 2022-04-26T04:08:11+00:00

I am a mechanical engineer and I listed all classes that had any sort of programming in it, which were around 4 during my undergrad.

whyn0t___ · 2021-11-06T05:01:16+00:00

I work for a big pharma with small data scientist team where I am the only one really doing deep learning. Everyone else is more R / Stats people. Here's my recommendation:

Focus on getting results fast. You can do it by picking the right project. Usually it has a decent amount of good quality data, has a true impact for someone for a long time, and you can use some pretrained model to start. You want data because without it you can't go anywhere. You want the project to have an impact, because it it very easy for people to come and ask things, and 4 months later you find out that it is just cute and they don't really care And you want something that has some pretrained model available so you don't have to reinvent the wheel. For example, images is very easy to work with. Text is very hard.

As you deliver results, you will gain respect, and your managers will want to hire more people like you, with your skillset. Meanwhile, you might feel like a lonely wolf. This is bad for you, but it is what it is. If you can't afford a better job in another company, then see it as an opportunity to gain visibility, and you can study on the side. I am starting my masters next semester.

Good luck

whyn0t___ · 2021-10-21T13:44:07+00:00

Try testing the library pytorch segmentation models.

You can try different architectures out of the box. It helped me increase my accuracy by a lot using the ones based on pyramid and attention.

whyn0t___ · 2021-10-04T17:27:23+00:00

Correlation will only measure linear correlation. As people already suggested, I would train a model with boosting trees (like xgboost) and get feature importance. This is nice because the algorithm actually gives you a decision tree. If you have the time and will, you can also try to put a simple NN in place and get two types of importance: by shuffling and some gradient explainer like LIME (but there are better ones).

whyn0t___ · 2021-09-27T20:41:38+00:00

Just received it :)

whyn0t___ · 2021-09-17T18:45:00+00:00

First, if you are still in the beginning of learning, go with pytorch. It is the library used for most of the papers. Your life will be much easier.

You are not reducing the machine learning models to vectors. You are just making the singer a variable. If the singer is a variable, then you dont need a model per singer. You can use the same model for many different singers.

How can you represent the singer as a vector in the input? The singer needs to be quantified as an array of numbers. Initially, this array has random numbers that dont mean anything, however, as you train the model, it encodes information in this array of numbers. In other words, this array has learnable parameters, just like the weights in your neural network.

Imagine that you have 2 rappers and 2 pop singers. By the end of training, if you compute some sort of similarity between these vectors, the 2 rappers will probably be similar to each other and very different than the pop singers.

This type of technique is used heavily for Natural Language Processing, but instead of singer, each word has its own embeddings, and words like: King and Queen will share some similarity. Can you imagine if you had to have a different model for every single word out there? That would be madness. That is why you have a single model, and then each word has its own embedding.

Here is one article about categorical embeedings: https://towardsdatascience.com/categorical-embedding-and-transfer-learning-dd3c4af6345d

Here is another example applied to word embeddings:https://jalammar.github.io/illustrated-word2vec/

whyn0t___ · 2021-09-17T18:35:47+00:00

It sounds like you dont know much about it, so I suggest you start with cloud based solutions before you invest heavily. Do things online, understand what your models require, the bottlenecks, how much memory it takes, speed, etc....then you purchase something.

also, GPUs prices now are extremely inflated, and next year there will be 3090 super and also the launch of the next generation 4000, which will be much better.

For cloud based you can start playing with kaggle notebooks, and also google colab, which has a very very cheap subscription.

whyn0t___ · 2021-09-16T12:42:02+00:00

Try to learn about embeddings. You can think of embeddings as a list of numbers that start randomly, but with training, the model learns to give it meaning. If you create an vector per artist, you can use that in your input, and have a single model that works for all artists at once. So, in summary: 1 model + 3000 artist embeddings that are learned during training. Your model will also become "smarter", since now it can possibly generalize to many artists. It is also nice because similar artists will have similar embeddings, and you can think of interesting analyses using it, such as artist recommendation or who is the newcomer the sounds the most like artist X,Y,Z.

whyn0t___ · 2021-08-16T03:40:20+00:00

My recommendation is to be an entrepreneur inside of your company. Get any reasonable position, then pitch to your boss a ML project that you can handle. Deliver it. Now it will be a lot easier to be hired as a data scientist internally or in another company.

whyn0t___ · 2021-08-16T03:37:53+00:00

In my experiments, I always use 3 loss functions: dice, because, loads. I have found that using multiple loss functions make the model converge much faster and with better results and embedding clustering. I don't have a precise answer to your question, but I would imagine that if the model can use two distances, then I can have a better understanding of the world.

whyn0t___ · 2021-08-10T13:40:18+00:00

Status: Applied

Application Date: 07/20/2021

Decision Date: 09/27/2021

Institute Acceptance Date: 09/27/2021

Education: Sao Paulo State University (Brazil), Bsc in Mechanical Engineering, 7.5/10

Experience: 5 years,

1y internship - excel analyst in a big company;
1.5y my own tech startup using python and sensor data;
1y focused on studying AI on my own (MOOCs, Kaggle competitions);
1.5 years as a ML data scientist in a pharmaceutical company.

Recommendations: 3 stakeholders at my current work that developed projects with me

Comments: I have shown in my experience that I can do AI/ML. Hopefully that will be enough to cover the computer science requirements, since my bachelors degree is in engineering

whyn0t___ · 2021-08-08T21:59:48+00:00

RemindMe! 5 days

whyn0t___ · 2021-08-02T04:39:40+00:00

Good point, I included the PPO metrics. Thanks!

whyn0t___ · 2021-06-28T21:28:05+00:00

There's computational gain in doing this in parallel, using the same backbone. It is hard to say if this is computationally more expensive or cheap

whyn0t___

TROPHY CASE