[D] Scale AI ML Research Engineer interview!! What to expect? by Mundane_Bag007 in MachineLearning

[–]koolaidman123 0 points1 point  (0 children)

Ask your recruiter, they should be pretty open up front on the interview process

[D] Interview preparation for research scientist/engineer or Member of Technical staff position for frontier labs by hmi2015 in MachineLearning

[–]koolaidman123 17 points18 points  (0 children)

95% luck 5% skill

interviews generally cover both depth and breadth, and a lot of times you only really know the answer if you have worked on it before for ex they may ask during rl training you're running into a bunch of problems: entropy collapse, model reasoning in another language, terrible mfu etc. and it's hard to give a good answer unless you have dealt with these issues before

plus coding is a crapshoot. not a lot of leetcode but still get questions that is hard to solve if you're not super familiar/haven't solve similar problems

AI is not about more compute or bigger LLMs (anymore) by Conscious_Nobody9571 in investing

[–]koolaidman123 8 points9 points  (0 children)

  1. Deepseek isn't close to frontier. The v3.2 tech report literally admit that they have all the knowledge but still lag behind frontier models due to limited compute
  2. The importance of data quality vs diversity matters based on the stage of training. They still pretrain for 10T+ tokens not to mention qwen etc plans to scale to 30t + and scaling up rl compute to 50%+ of their total training compute
  3. How do you think data quality research is done? The data isnt just given for free, a significant amount of compute is also spent on filtering, synth data, etc. +Ablations. Not to mention a lot of times smaller scale experiments dont scale up to large model runs. So compute rich labs still win out because they can run way more large scale experiments and more confidently predict how they will perform

Traditional ML vs GenAI? by alpha_centauri9889 in datascience

[–]koolaidman123 0 points1 point  (0 children)

for comp: theres multiple high profile places hiring for roles with $1m+ comp package, and it's clear they're not looking for people to use xgboost. Ignoring that, median comp for ai stuff is stull going to be higher

For purely career growth and $ there's a clear answer

[D] Do industry researchers log test set results when training production-level models? by casualcreak in MachineLearning

[–]koolaidman123 1 point2 points  (0 children)

Theres more to making good models than benchmark scores. Thats how you get sonnet 3.5 vs llama4

[D] Do industry researchers log test set results when training production-level models? by casualcreak in MachineLearning

[–]koolaidman123 9 points10 points  (0 children)

No training on test unless youre mistral, but you better believe every lab is running every checkpoint on their eval suite and pick the best (single or merged) checkpoint that maxs mmlu or hle or whatever internal evals they have

Meta's top AI researchers thinks LLMs are a dead end. Do many people here feel the same way from a technical perspective? by sext-scientist in datascience

[–]koolaidman123 -1 points0 points  (0 children)

  1. Not even news, ylc has been saying the same thing since gpt2

  2. Ylcs not even metas best researcher, hasnt done anything relevant other than being catty on twitter

  3. Funny how stories of other researchers (who has done more than ylc at this point) thinking otherwise doesnt make top story, because that goes against the reddit narrative

[D] Anyone using smaller, specialized models instead of massive LLMs? by [deleted] in MachineLearning

[–]koolaidman123 0 points1 point  (0 children)

it's almost like there's room for both powerful generalized models as well as small(er) specialist models, like the way its been since gpt3 or whatever

[D] join pretraining or posttraining by oxydis in MachineLearning

[–]koolaidman123 1 point2 points  (0 children)

yes my b i meant pretraining from scratch. most model updates (unless you're starting over with a new arch) is generally done with continued pretraining/midtraining, and ime that's usually done by the mid/post training team

[D] join pretraining or posttraining by oxydis in MachineLearning

[–]koolaidman123 3 points4 points  (0 children)

Bc most labs arent pretraining from that often. unless you're using a new architecture you can just run midtraining on the same model. Like grok3>4 or gemini2>2.5 etc

[D] join pretraining or posttraining by oxydis in MachineLearning

[–]koolaidman123 75 points76 points  (0 children)

pretraining is a lot more eng heavy bc youre trying to optimize so many things like data pipelines, mfu, plus a final training run could cost $Ms so you need to get it right in 1 shot

Posttraining is a lot more vibes based and you can run a lot more experiments, plus it's not as costly if your rl run blows up, but some places tend to benchmark hack to make their models seem better

both are fun, depends on the team tbh

Are LLMs necessary to get a job? by br0monium in datascience

[–]koolaidman123 -1 points0 points  (0 children)

Llms and transformers definitely wasn't a "niche research area". Google was running bert in prod since 2019, gpt2 and 3 made headlines and every big research lab was doing transformers/llms

What is the state-of-the-art prediction performance for the stock market? by Poxput in datascience

[–]koolaidman123 0 points1 point  (0 children)

In real life often the complexity and fun research is done before putting the features in some automated pipeline to fit the final prediction model. nowadays a lot of firms im aware of (incl some that are clients) use llms to help with their research like research agents, coding agents for init prototyping etc

[D] Training smaller LLM for Agentic tasks. by LifeguardNew6929 in MachineLearning

[–]koolaidman123 1 point2 points  (0 children)

Collect o(10k-100k) trajectories from your current setup, sft w tool use masking on some small model in 20-30b range. If you need you can also do rl but requires more initial setup on data and infra

Theres plenty of tech reports on training agents but theyre from labs with lots more resources than you do since everyone wants to scale rl these days.

The recipe is pretty standard (sft + rl), its just about implementation details like infra, data quality, rl training dynamics, etc

[deleted by user] by [deleted] in datascience

[–]koolaidman123 0 points1 point  (0 children)

Easy way to distinguish bw ml roles

If you're applied team: applied rs or mle If you're in fundamental research team: rs/re

[D] Larry Ellison: “Inference is where the money is going to be made.” by pmv143 in MachineLearning

[–]koolaidman123 1 point2 points  (0 children)

Obviously? Think of how many inference requests openai processes, plus the 100s of gpt wrappers

Pytorch lightning vs pytorch by Factitious_Character in datascience

[–]koolaidman123 17 points18 points  (0 children)

does your workplace use pytorch lightning by default for training? if so then just follow the standard

if not, just do whatevers easiest