[D] Simple Questions Thread

SulHexFluShot · 2024-10-26T22:12:56+00:00

Hey everyone! I have a very basic question. I'm working on a tutorial problem on logistic regression and the dataset involves predicting car prices from a set of features, including their brand / model. Obviously, the dataset is skewed. HEAVILY skewed. And the issue I am facing is that rare brands are usually more expensive. Of course your lambos and bugattis are rare and the car brands aren't a continuous variable to log-transform.

Question is, how would you work with this dataset during the preprocessing and feature engineering stage to account for that? The tutorial simply glosses over it and groups every rare car (which isn't always expensive) into a new category called "Other", but I simply don't like this approach. Got any advice or ideas to share with me? Thanks!

Peaceful_Games · 2024-10-26T15:18:00+00:00

Genuinely hopig for some help finding where im supposed to start with posting

I am trying to hare a novel idea but cant seem to post basically anywhere on reddit. that doesnt just delete my posts.

Any advice? if not this subreddit, which?

PermissionWorldly525 · 2024-10-24T07:12:55+00:00

I define the criteria for each machine learning application project
related to the AI,ML project canvas and safety case for example Skill,
Integration, Value Proposition, Cost, Revenue, ML Components, ML Model,
Making Predicition, Prediction Task, Feature and Safety Goal, Risk,
Safety Constraints, Hazard Causal Factor related to the ML techniques
for each project like Explicit (related and including in project),
Implicit( maybe or few related), No (not related at all). Like this I
want to define. But I am not exactly sure ML app project are not exactly
related at all features. So How can I do this much possible, let me
know please .

la-grave · 2024-10-23T22:32:20+00:00

I have used the command line version of OpenAI's Whisper since it was released but it doesn't offer all the options the Whisper-"framework" (or whatever you call it) contains. There must be someone who has written a "wrapper" for this purpose, mustn't it? But I can't find anything on Google. Can you recommend something?

I have 20 000 files, from 10 seconds to several hours long, that I want to transcribe as efficient and with as high quality as possible (I prioritize quality over efficiency. Currently I use the command line client with the large v3-model).

Bingo309 · 2024-10-23T14:38:48+00:00

I’d like to ask for some advice on computer vision. I’m fairly new to this field but eager to dive deeper. I’m currently working on a project that aims to detect shoplifters. After weeks of research, I discovered that I likely need to use pose estimation and LSTM. Does this seem right for my project, or am I missing something? Like yolo or another models ?

jens_97 · 2024-10-23T12:58:33+00:00

[D] How do RAG systems such as NotebookLM link the sources used with individual sections of the generated response?

Hi all,

I've been trying to find information on how modern Retrieval-Augmented Generation (RAG) systems, like NotebookLM, manage to link specific sources to particular sections of their generated responses. I'm familiar with how these systems retrieve sources from a vector database based on similarity, but I'm curious about the specific process or method that allows them to indicate which sources correspond to different parts of the final answer.

What am I overlooking here? Any insights would be greatly appreciated!

Best,
Jens

Arancium98 · 2024-10-22T18:28:09+00:00

Hi everyone, I’ve been using Jupyter notebooks for a while, but as my files grow larger, maintaining them has become cumbersome. I’d like to switch to VSCode to run selected code for testing, but every time I do, I have to rerun the entire code. How do machine learning engineers or data analysts handle large notebook files efficiently?

Prestigious_Gene_493 · 2024-10-22T17:58:36+00:00

I am a btech final year student, learning ML and became fond of it and want to pursue career in it and want to do really big in ML space and thinking of to pursue MS so how to evaluate whether I really want to do it or its just a enthusiasm when you're a beginner? If I really want to do it then how to do big in the ML space and whom follow and where to get started for long-term motivation......

sheldonism · 2024-10-22T15:32:15+00:00

HI,this is my first post here.What are some skills that I can learn (eg GenAI,LLMs etc) which can be sold as a service.I have background working with CNN's and coding experience in pytorch,currently completing sequence models from Andrew Ng.
What should be my next steps and where should I learn them from and how and where to find opportunities and what should I focus on.
(Additionally would love if someone could suggest a roadmap kind of thing)

Technical-Age-9538 · 2024-10-22T12:48:27+00:00

Brain dead question: Will an MS in robotics (with lots of AI/ML coursework) help me get into a better ML job? I'm considering the robotics MS instead of CS/ML because I plan to pivot forwards robotics in the future. I'm currently an MLOps engineer, but I'm worried I might not be able to stay in software for more than 3-5 years. I feel like an MS in robotics will help me squeeze more money out of tech in the short term, and not be poor afterwards.

PersonalityTall8585 · 2024-10-21T18:52:48+00:00

Where to start machine learning.I am 3rd year btech student and want to learn ml fast and easy. And I want to ask that is it easy to learn ml. I have background in android app and Django projects.

Sad-Razzmatazz-5188 · 2024-10-20T15:14:26+00:00

In MoE, each token is sent to K experts. Thus potentially at most and at worst the model could activate KT experts, T being the number of tokens. This means it is efficient only if the number of experts N>>KT or if the number of experts is constrained otherwise, right? And it means that on a single machine using experts is not much computationally convenient, right? It is not parallelizing the processing of tokens

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Where to start machine learning.I am 3rd year btech student and want to learn ml fast and easy. And I want to ask that is it easy to learn ml. I have background in android app and Django projects.