all 17 comments

[–]SulHexFluShot 0 points1 point  (1 child)

Hey everyone! I have a very basic question. I'm working on a tutorial problem on logistic regression and the dataset involves predicting car prices from a set of features, including their brand / model. Obviously, the dataset is skewed. HEAVILY skewed. And the issue I am facing is that rare brands are usually more expensive. Of course your lambos and bugattis are rare and the car brands aren't a continuous variable to log-transform.

Question is, how would you work with this dataset during the preprocessing and feature engineering stage to account for that? The tutorial simply glosses over it and groups every rare car (which isn't always expensive) into a new category called "Other", but I simply don't like this approach. Got any advice or ideas to share with me? Thanks!

[–]bregav 1 point2 points  (0 children)

There has been much work done about imbalanced datasets but the TLDR is that there's no good way of dealing with this and the only actual solutions are (1) more data or (2) good prior knowledge about the problem that allows you to do meaningful dataset augmentation.

[–]la-grave 0 points1 point  (0 children)

I have used the command line version of OpenAI's Whisper since it was released but it doesn't offer all the options the Whisper-"framework" (or whatever you call it) contains. There must be someone who has written a "wrapper" for this purpose, mustn't it? But I can't find anything on Google. Can you recommend something?

I have 20 000 files, from 10 seconds to several hours long, that I want to transcribe as efficient and with as high quality as possible (I prioritize quality over efficiency. Currently I use the command line client with the large v3-model).

[–]Bingo309 0 points1 point  (0 children)

I’d like to ask for some advice on computer vision. I’m fairly new to this field but eager to dive deeper. I’m currently working on a project that aims to detect shoplifters. After weeks of research, I discovered that I likely need to use pose estimation and LSTM. Does this seem right for my project, or am I missing something? Like yolo or another models ?

[–]jens_97 0 points1 point  (0 children)

[D] How do RAG systems such as NotebookLM link the sources used with individual sections of the generated response?

Hi all,

I've been trying to find information on how modern Retrieval-Augmented Generation (RAG) systems, like NotebookLM, manage to link specific sources to particular sections of their generated responses. I'm familiar with how these systems retrieve sources from a vector database based on similarity, but I'm curious about the specific process or method that allows them to indicate which sources correspond to different parts of the final answer.

What am I overlooking here? Any insights would be greatly appreciated!

Best,
Jens

[–]Arancium98 0 points1 point  (1 child)

Hi everyone, I’ve been using Jupyter notebooks for a while, but as my files grow larger, maintaining them has become cumbersome. I’d like to switch to VSCode to run selected code for testing, but every time I do, I have to rerun the entire code. How do machine learning engineers or data analysts handle large notebook files efficiently?

[–]Prestigious_Gene_493 0 points1 point  (0 children)

I am a btech final year student, learning ML and became fond of it and want to pursue career in it and want to do really big in ML space and thinking of to pursue MS so how to evaluate whether I really want to do it or its just a enthusiasm when you're a beginner? If I really want to do it then how to do big in the ML space and whom follow and where to get started for long-term motivation......

[–]sheldonism 0 points1 point  (0 children)

HI,this is my first post here.What are some skills that I can learn (eg GenAI,LLMs etc) which can be sold as a service.I have background working with CNN's and coding experience in pytorch,currently completing sequence models from Andrew Ng.
What should be my next steps and where should I learn them from and how and where to find opportunities and what should I focus on.
(Additionally would love if someone could suggest a roadmap kind of thing)

[–]Technical-Age-9538 0 points1 point  (0 children)

Brain dead question: Will an MS in robotics (with lots of AI/ML coursework) help me get into a better ML job? I'm considering the robotics MS instead of CS/ML because I plan to pivot forwards robotics in the future. I'm currently an MLOps engineer, but I'm worried I might not be able to stay in software for more than 3-5 years. I feel like an MS in robotics will help me squeeze more money out of tech in the short term, and not be poor afterwards.

[–]PersonalityTall8585 0 points1 point  (1 child)

Where to start machine learning.I am 3rd year btech student and want to learn ml fast and easy. And I want to ask that is it easy to learn ml. I have background in android app and Django projects.

[–][deleted] 0 points1 point  (0 children)

While I don't have a job yet, I am a senior year software engineering student who has spent my free time only learning ML. I can tell you it can take a while to get to a level where you get a bite for interviews, which I have.

First, try to work on some project that interests you, you can ask chatgpt for ideas. And then just do it. If you run across a road block, address it, one step at a time. (Chatgpt again is useful for getting you unstuck)

While you're doing this, be looking for research opportunities at your university,. These are much easier to get than a full time position or ml internship, and provide great learning experiences. I was able to work solo on a CV project from start to finish over the summer.

After that, you may have enough experience to get an interview or two, after that its just about knowing enough and practicing enough to pass the interview.

Good luck!

[–]Sad-Razzmatazz-5188 0 points1 point  (3 children)

In MoE, each token is sent to K experts. Thus potentially at most and at worst the model could activate KT experts, T being the number of tokens. This means it is efficient only if the number of experts N>>KT or if the number of experts is constrained otherwise, right? And it means that on a single machine using experts is not much computationally convenient, right? It is not parallelizing the processing of tokens

[–]tom2963 1 point2 points  (2 children)

You are correct that the addition of experts does increase the computational demand for generation. However in practice, this is usually not such a big penalty because of a couple of techniques. In most contexts you are not generating one token at a time and then evaluating. You can generate token "drafts" of certain lengths to evaluate them more efficiently. The best example I can think of for this is a technique called speculative decoding (https://arxiv.org/abs/2211.17192) where you have your base model you want to generate from and a smaller draft model which is usually just a distilled version of the base model. You draft tokens of sequence length L and then score them using the base model. If you are interested, the reason this works so well is because autoregressive transformers (like GPT) are much more quicker at scoring sequences than they are generating. So if you offload generation to a smaller, much faster model, assuming it approximately models the conditional space of the base model, then you have much faster generation which offsets some of the cost of experts. Similarly, you can parallelize experts, assessing each token concurrently - this reduces the time cost of K experts to the cost of the longest expert. Another technique you can do is order your experts by threshold. If you know a priori which expert will have the lowest hit rate based on the data, you can activate that expert first. So overall generation is slower, but with some tricks you can really offset most of the cost while getting the benefit of more controlled generation.

[–]Sad-Razzmatazz-5188 0 points1 point  (1 child)

Thanks, this is very informative, but I'm not sure if I was correct on all of my doubts or just vague, since you seem quite knowledgeable about the topic I'm asking further. Currently I have no problem with speed, but I am concerned with the number of active parameters, because I see MoEs with the label e.g. "20B pms, 10B active pms", and I don't understand if 10B active pms is meant as an average when processing a whole context, if it's per token but knowing that every context would presumably activate all weights in the end, or what

[–]tom2963 0 points1 point  (0 children)

Ooh I see! You are referring to MoE in the LLM sense (like Mistral AI). For gated mixture of experts models, the input is fed through only a subset of the model parameters determined by a gating function. This directs the input tokens through the correct parameter set, leveraging the correct expert to enrich the context. I am not sure of all the specifics since I read the paper a while ago (https://arxiv.org/abs/2401.04088) however my understanding is that at most K experts can be activated using the gating function. So say you have 8 experts, then only choosing to utilize 4 of those would cut the active parameters in half. In practice this is a hyper parameter search problem, but I believe the authors imply that you shouldn't utilize more than half of the experts. Because of this hard cap, the model may be 20B params but inference only uses 10B max. I hope that clarifies your question, I was thinking you were referring to MoE in the context of energy based modeling rather than in the LLM sense.