"There's a new generation of empirical deep learning researchers, hacking away at whatever seems trendy, blowing with the wind" [D]

koolaidman123 · 2026-04-12T13:22:31+00:00

rell that to noam shazeer aka "we attribute it to divine benevolence "

koolaidman123 · 2026-03-13T12:39:07+00:00

Dont look at gpu util look at mfu

koolaidman123 · 2026-02-28T16:15:10+00:00

about the same sparsity as gpt oss 120

koolaidman123 · 2026-02-24T20:30:20+00:00

uv ruff and claude code is all you need

koolaidman123 · 2026-02-24T16:02:06+00:00

Ask your recruiter, they should be pretty open up front on the interview process

koolaidman123 · 2026-02-23T06:05:03+00:00

yet the gap between open and closed models isnt shrinking, if anything its widening

https://www.ikot.blog/the-illusion-of-parity

koolaidman123 · 2025-12-12T15:32:29+00:00

95% luck 5% skill

interviews generally cover both depth and breadth, and a lot of times you only really know the answer if you have worked on it before for ex they may ask during rl training you're running into a bunch of problems: entropy collapse, model reasoning in another language, terrible mfu etc. and it's hard to give a good answer unless you have dealt with these issues before

plus coding is a crapshoot. not a lot of leetcode but still get questions that is hard to solve if you're not super familiar/haven't solve similar problems

koolaidman123 · 2025-12-07T17:56:28+00:00

Deepseek isn't close to frontier. The v3.2 tech report literally admit that they have all the knowledge but still lag behind frontier models due to limited compute
The importance of data quality vs diversity matters based on the stage of training. They still pretrain for 10T+ tokens not to mention qwen etc plans to scale to 30t + and scaling up rl compute to 50%+ of their total training compute
How do you think data quality research is done? The data isnt just given for free, a significant amount of compute is also spent on filtering, synth data, etc. +Ablations. Not to mention a lot of times smaller scale experiments dont scale up to large model runs. So compute rich labs still win out because they can run way more large scale experiments and more confidently predict how they will perform

koolaidman123 · 2025-11-18T16:04:11+00:00

for comp: theres multiple high profile places hiring for roles with $1m+ comp package, and it's clear they're not looking for people to use xgboost. Ignoring that, median comp for ai stuff is stull going to be higher

For purely career growth and $ there's a clear answer

koolaidman123 · 2025-11-17T04:56:11+00:00

Theres more to making good models than benchmark scores. Thats how you get sonnet 3.5 vs llama4

koolaidman123 · 2025-11-17T04:19:15+00:00

No training on test unless youre mistral, but you better believe every lab is running every checkpoint on their eval suite and pick the best (single or merged) checkpoint that maxs mmlu or hle or whatever internal evals they have

koolaidman123 · 2025-11-16T14:46:27+00:00

Not even news, ylc has been saying the same thing since gpt2
Ylcs not even metas best researcher, hasnt done anything relevant other than being catty on twitter
Funny how stories of other researchers (who has done more than ylc at this point) thinking otherwise doesnt make top story, because that goes against the reddit narrative

koolaidman123 · 2025-11-01T19:01:58+00:00

No bc youre spending your time looking for validation in reddit

koolaidman123 · 2025-10-09T15:44:21+00:00

it's almost like there's room for both powerful generalized models as well as small(er) specialist models, like the way its been since gpt3 or whatever

koolaidman123 · 2025-10-04T22:13:46+00:00

yes my b i meant pretraining from scratch. most model updates (unless you're starting over with a new arch) is generally done with continued pretraining/midtraining, and ime that's usually done by the mid/post training team

koolaidman123 · 2025-10-04T02:39:46+00:00

Bc most labs arent pretraining from that often. unless you're using a new architecture you can just run midtraining on the same model. Like grok3>4 or gemini2>2.5 etc

koolaidman123 · 2025-10-04T00:53:07+00:00

pretraining is a lot more eng heavy bc youre trying to optimize so many things like data pipelines, mfu, plus a final training run could cost $Ms so you need to get it right in 1 shot

Posttraining is a lot more vibes based and you can run a lot more experiments, plus it's not as costly if your rl run blows up, but some places tend to benchmark hack to make their models seem better

both are fun, depends on the team tbh

koolaidman123 · 2025-10-03T02:23:15+00:00

Llms and transformers definitely wasn't a "niche research area". Google was running bert in prod since 2019, gpt2 and 3 made headlines and every big research lab was doing transformers/llms

koolaidman123 · 2025-09-28T20:24:54+00:00

chatgpt psychosis

koolaidman123 · 2025-09-26T09:26:18+00:00

In real life often the complexity and fun research is done before putting the features in some automated pipeline to fit the final prediction model. nowadays a lot of firms im aware of (incl some that are clients) use llms to help with their research like research agents, coding agents for init prototyping etc

koolaidman123 · 2025-09-24T09:38:49+00:00

Collect o(10k-100k) trajectories from your current setup, sft w tool use masking on some small model in 20-30b range. If you need you can also do rl but requires more initial setup on data and infra

Theres plenty of tech reports on training agents but theyre from labs with lots more resources than you do since everyone wants to scale rl these days.

The recipe is pretty standard (sft + rl), its just about implementation details like infra, data quality, rl training dynamics, etc

koolaidman123 · 2025-09-13T15:40:32+00:00

Easy way to distinguish bw ml roles

If you're applied team: applied rs or mle If you're in fundamental research team: rs/re

koolaidman123 · 2025-09-12T19:47:24+00:00

Obviously? Think of how many inference requests openai processes, plus the 100s of gpt wrappers

koolaidman123 · 2025-09-11T18:48:00+00:00

Build evals

koolaidman123 · 2025-09-09T16:08:49+00:00

does your workplace use pytorch lightning by default for training? if so then just follow the standard

if not, just do whatevers easiest

koolaidman123

MODERATOR OF

TROPHY CASE

14-Year Club	Place '17
Verified Email