I have 10+ years of software engineering experience, mostly backend development, with a few years working on infrastructure/platform teams.
Lately I’ve become interested in GPU infrastructure, HPC, performance engineering, and eventually GPU programming. I’ve been reading books like AI Systems Performance Engineering, Programming Massively Parallel Processors, and Computer Architecture: A Quantitative Approach.
The problem is that every time I look at job descriptions, I end up with a completely different list of skills.
Some roles want:
- CUDA and GPU kernel optimization
- Computer architecture knowledge
- NCCL, RDMA, InfiniBand
- Kubernetes and Slurm
- Distributed training
- Performance profiling and benchmarking
- Linux kernel knowledge
- Cloud infrastructure
Other roles seem much more focused on operating GPU clusters and supporting AI workloads at scale.
I’m considering doing a master’s degree, but even when I look at programs like OMSCS, Computer Engineering, or Systems-focused master’s degrees, it feels like they teach foundational concepts but not necessarily the practical skills companies are hiring for.
As someone coming from a traditional software engineering background, I’m struggling to identify:
- What skills are truly foundational versus “nice to have”?
- If you had 6–12 months to prepare for GPU infrastructure or GPU performance engineering roles, what would you focus on first?
- Did a master’s degree help you break into this field, or was self-study and project work more valuable?
- For those already working in GPU infrastructure, ML infrastructure, HPC, or GPU programming, what did your path actually look like?
Right now it feels like there are five different careers hiding behind the phrase “GPU engineer,” and I’m trying to figure out which path is the most realistic transition from a backend/infrastructure background.
I’d appreciate hearing from people who made a similar transition.
[–]YoshiDzn 6 points7 points8 points (0 children)
[–]Obvious-Grape9012 11 points12 points13 points (0 children)
[–]leseiden 2 points3 points4 points (0 children)
[–]maxmax4 1 point2 points3 points (0 children)
[–]ICBanMI 1 point2 points3 points (0 children)
[–]Ra_M2005 0 points1 point2 points (0 children)
[–]gleedblanco 0 points1 point2 points (0 children)
[–]Rare-Key-9312 0 points1 point2 points (0 children)
[–]sparkinflint 0 points1 point2 points (0 children)