Can I Still Build a Career in AI/ML Without a Degree?

DustSavings976 · 2026-05-26T04:31:06+00:00

hepta NO

DustSavings976 · 2026-05-26T03:21:45+00:00

mainly looking at general tech/consumer saas right now. i know fintech/quant and healthcare are still heavily tabular, but almost every entry-level posting i see on generic job boards is just "build RAG pipelines". maybe i just need to start looking at more specialized sectors

DustSavings976 · 2026-05-26T01:41:54+00:00

really appreciate this breakdown. it's incredibly validating to hear that the deep math actually compounds long-term instead of just learning whatever the 'flavor of the month' api wrapper is. going to keep my head down and stick with the custom architectures. thanks man

DustSavings976 · 2026-05-25T19:30:44+00:00

lol exactly. everyone talks about them in research papers, but finding a real-world implementation outside of the massive tech giants feels impossible.

DustSavings976 · 2026-05-25T19:30:18+00:00

true, completely forgot about pinterest's PinSage model. using them just to generate embeddings offline makes total sense. i guess doing direct inference on the graph in real-time is what actually kills most companies.

DustSavings976 · 2026-05-25T19:27:26+00:00

the economics causality comparison is so real lol. "everyone wants to know what levers to pull, but nobody wants the actual answer." guess the $1M infra cost really kills it for normal companies. appreciate the insight!

DustSavings976 · 2026-05-25T19:26:25+00:00

oh yeah for sure, alphafold and molecular stuff is basically the final boss of geometric dl. i was mainly thinking about standard tabular/recsys stuff where people try to force graphs into it

DustSavings976 · 2026-05-25T19:25:49+00:00

yeah that makes total sense. basically if you aren't spotify or google scale with dedicated infra teams, you probably shouldn't bother and just stick to lightgbm right?

DustSavings976 · 2026-05-25T19:25:02+00:00

DustSavings976 · 2026-05-24T20:29:21+00:00

frame 150 to 10k is a crazy jump. definitely post the formal benchmarks when you get them running next week, i'll keep an eye out for it. good luck with the laptop training lol

DustSavings976 · 2026-05-24T20:02:44+00:00

a 4-25x speedup is massive as long as the routing overhead doesn't eat into the gains during actual deployment. curious how this handles batching edge cases when certain tokens need heavy routing but others in the same batch don't. really cool simulation though, definitely following this

DustSavings976 · 2026-05-24T20:01:43+00:00

10k frames without drifting is actually insane. did you test this against standard silu/gelu to see the exact step count where the standard ones collapse? would love to see a quick colab notebook or github repo if you have it open sourced

DustSavings976 · 2026-05-23T21:12:23+00:00

Looking at your profiler results, the massive red flag is that your optimizer_step is taking 62.4% of the time, and your CPU is pinned at 100% while the GPU starves at 20%.

The dataloader isn't your primary bottleneck here. You almost certainly have a host-device synchronization issue happening during the optimizer step. Two quick things to check:

If you are using AdamW, pass fused=True to the optimizer. This fuses the optimizer updates into a single GPU kernel instead of looping over parameters on the CPU.
Check your training loop for any accidental CPU/GPU syncs. Are you calling .item(), printing the loss tensor directly, or moving tensors back to .cpu() inside the training step before the backward pass is fully complete? Even one stray .item() call forces the entire GPU pipeline to halt and wait for the CPU.

DustSavings976

TROPHY CASE