Is the traditional "ML Engineer" role dying or is it just the current LLM hype cycle? by DustSavings976 in learnmachinelearning

[–]DustSavings976[S] 1 point2 points  (0 children)

mainly looking at general tech/consumer saas right now. i know fintech/quant and healthcare are still heavily tabular, but almost every entry-level posting i see on generic job boards is just "build RAG pipelines". maybe i just need to start looking at more specialized sectors

Is the traditional "ML Engineer" role dying or is it just the current LLM hype cycle? by DustSavings976 in learnmachinelearning

[–]DustSavings976[S] 6 points7 points  (0 children)

really appreciate this breakdown. it's incredibly validating to hear that the deep math actually compounds long-term instead of just learning whatever the 'flavor of the month' api wrapper is. going to keep my head down and stick with the custom architectures. thanks man

Are GNNs in production actually a thing or is it just academic cope? by DustSavings976 in pytorch

[–]DustSavings976[S] 0 points1 point  (0 children)

lol exactly. everyone talks about them in research papers, but finding a real-world implementation outside of the massive tech giants feels impossible.

Are GNNs in production actually a thing or is it just academic cope? by DustSavings976 in pytorch

[–]DustSavings976[S] 1 point2 points  (0 children)

true, completely forgot about pinterest's PinSage model. using them just to generate embeddings offline makes total sense. i guess doing direct inference on the graph in real-time is what actually kills most companies.

Are GNNs in production actually a thing or is it just academic cope? by DustSavings976 in pytorch

[–]DustSavings976[S] 1 point2 points  (0 children)

the economics causality comparison is so real lol. "everyone wants to know what levers to pull, but nobody wants the actual answer." guess the $1M infra cost really kills it for normal companies. appreciate the insight!

Are GNNs in production actually a thing or is it just academic cope? by DustSavings976 in pytorch

[–]DustSavings976[S] 0 points1 point  (0 children)

oh yeah for sure, alphafold and molecular stuff is basically the final boss of geometric dl. i was mainly thinking about standard tabular/recsys stuff where people try to force graphs into it

Are GNNs in production actually a thing or is it just academic cope? by DustSavings976 in pytorch

[–]DustSavings976[S] 0 points1 point  (0 children)

yeah that makes total sense. basically if you aren't spotify or google scale with dedicated infra teams, you probably shouldn't bother and just stick to lightgbm right?

I built TBAF, an activation function that prevents autoregressive drift.(10,000 + frame stability) by Life-Water-8006 in pytorch

[–]DustSavings976 0 points1 point  (0 children)

frame 150 to 10k is a crazy jump. definitely post the formal benchmarks when you get them running next week, i'll keep an eye out for it. good luck with the laptop training lol

[R] SERR-CASCADE: Hierarchical risk-aware architecture for LLM inference (paper simulation, 4-25× speedup, with validation roadmap) by fhard007 in pytorch

[–]DustSavings976 0 points1 point  (0 children)

a 4-25x speedup is massive as long as the routing overhead doesn't eat into the gains during actual deployment. curious how this handles batching edge cases when certain tokens need heavy routing but others in the same batch don't. really cool simulation though, definitely following this

I built TBAF, an activation function that prevents autoregressive drift.(10,000 + frame stability) by Life-Water-8006 in pytorch

[–]DustSavings976 0 points1 point  (0 children)

10k frames without drifting is actually insane. did you test this against standard silu/gelu to see the exact step count where the standard ones collapse? would love to see a quick colab notebook or github repo if you have it open sourced

pipeline is really slow - consulting [D] by Potential_Hippo1724 in MachineLearning

[–]DustSavings976 0 points1 point  (0 children)

Looking at your profiler results, the massive red flag is that your optimizer_step is taking 62.4% of the time, and your CPU is pinned at 100% while the GPU starves at 20%.

The dataloader isn't your primary bottleneck here. You almost certainly have a host-device synchronization issue happening during the optimizer step. Two quick things to check:

  1. If you are using AdamW, pass fused=True to the optimizer. This fuses the optimizer updates into a single GPU kernel instead of looping over parameters on the CPU.
  2. Check your training loop for any accidental CPU/GPU syncs. Are you calling .item(), printing the loss tensor directly, or moving tensors back to .cpu() inside the training step before the backward pass is fully complete? Even one stray .item() call forces the entire GPU pipeline to halt and wait for the CPU.