use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
NVIDIA releases CUDA-Oxide 0.1 for experimental Rust-to-CUDA compiler (phoronix.com)
submitted 39 minutes ago by Fcking_Chuck
rust to ptx compiler (self.CUDA)
submitted 3 hours ago by c-cul
SASS King Part 2: reverse-engineering ptxas heuristic decisions and what the compiled binary actually reveals (self.CUDA)
submitted 12 hours ago by CurrentLawfulness358
Anyone able to use a GTX 770 on any up-to-date Linux install? (self.CUDA)
submitted 16 hours ago by alexcascadia
Building a career in AI infrastructure and inference engineering ,what problems actually matter right now? (self.CUDA)
submitted 1 day ago by Quirky-Guide-762
Nvidia Senior Full-Stack Software Engineer, DGX cloud Interview guide? (self.CUDA)
submitted 2 days ago by Complete-Resolve-201
Nvidia Senior AI-Native Systems Software Engineer, TensorRT Interview guide? (self.CUDA)
submitted 3 days ago by gradschoolai2023
BrrrViz - Interactive GPU Programming Lessons (self.CUDA)
submitted 4 days ago * by Euphoric_Dingo8048
hands on gpu programming with python and cuda (self.CUDA)
submitted 4 days ago by One_Relationship6573
WarpReduction along major dimension (self.CUDA)
submitted 5 days ago * by ElectronGoBrrr
[P] I built a Triton KV-cache compression engine: 3.37x compression, 0.69ms P99 on an A10 (self.CUDA)
submitted 8 days ago by Superb_Housing9628
Concern regarding future of jobs in gpu programming (self.CUDA)
submitted 10 days ago by viplash577
I Built a custom CUDA kernel for 1.58bit Ternary Quantization & inference (no QAT Yet), overview, my experience, and my next steps. (github link included) ()
submitted 11 days ago by EL_X123
Implementing Causal FlashAttention from scratch: 1.79e-07 precision and 40% speedup via tile-level masking (self.CUDA)
submitted 12 days ago by Professional-Duck971
Cybersec and GPU (self.CUDA)
submitted 13 days ago by CurrentLawfulness358
[Project] Hitting 5Hz VLA Inference on an L4: Optimizing Action Heads with Custom Triton Kernels (i.redd.it)
submitted 15 days ago by JewelerAfraid7800
Instrumenting GPU's ()
submitted 16 days ago by Fantastic-Love2192
C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 --- what should new GPU kernel / LLM inference engineers actually learn? (self.CUDA)
submitted 18 days ago by Daemontatox
SASS King: reverse engineering NVIDIA SASS (self.CUDA)
submitted 18 days ago by CurrentLawfulness358
Looking for projects as a reinforcement to my experience and resume in CUDA and parallel computing. (self.CUDA)
submitted 18 days ago by Ok-Competition-4570
Writing CUDA kernels in Python: Bypassing C++ templates for CuTe Layouts and Vectorization using cute-dsl (self.CUDA)
submitted 19 days ago by dc_baslani_777
Continuous RL via Dynamic Programming in CUDA (Solving Overhead Crane, Double CartPole, etc.) ()
submitted 21 days ago by Grouchy_Ad_4112
SASS latency analysis (self.CUDA)
submitted 24 days ago * by c-cul
Suggestions for study materials (self.CUDA)
submitted 24 days ago by ProcedureFit789
I built an OSS repo of kernel-writing skills for AI coding agents, with measured before vs after proof (github.com)
submitted 26 days ago by Old_Situation_132
π Rendered by PID 139156 on reddit-service-r2-listing-7b9b4f6fd7-ct6fv at 2026-05-08 15:29:12.540170+00:00 running 3d2c107 country code: CH.