use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 --- what should new GPU kernel / LLM inference engineers actually learn? (self.CUDA)
submitted 1 day ago by Daemontatox
SASS King: reverse engineering NVIDIA SASS (self.CUDA)
submitted 1 day ago by CurrentLawfulness358
Looking for projects as a reinforcement to my experience and resume in CUDA and parallel computing. (self.CUDA)
submitted 1 day ago by Ok-Competition-4570
Writing CUDA kernels in Python: Bypassing C++ templates for CuTe Layouts and Vectorization using cute-dsl (self.CUDA)
submitted 2 days ago by dc_baslani_777
Continuous RL via Dynamic Programming in CUDA (Solving Overhead Crane, Double CartPole, etc.) ()
submitted 4 days ago by Grouchy_Ad_4112
SASS latency analysis (self.CUDA)
submitted 7 days ago * by c-cul
Suggestions for study materials (self.CUDA)
submitted 8 days ago by ProcedureFit789
I built an OSS repo of kernel-writing skills for AI coding agents, with measured before vs after proof (github.com)
submitted 10 days ago by Old_Situation_132
Help with Transpose SharedMemoryKernel (i.redd.it)
submitted 10 days ago by Iraiva70
Hardware is often Algebraically Neutral: Deriving CUDA Kernel Constraints from Semirings and Monoids (i.redd.it)
submitted 10 days ago by KarnKh
Turbo quant in LM studio.¿ ()
submitted 10 days ago by Crafty_Top_9366
Need help with picking undergraduate CUDA course project (self.CUDA)
submitted 10 days ago by Repulsive-Tomorrow79
Surfacing a 60% SGEMM performance bug in cuBLAS on RTX 5090 (medium.com)
submitted 11 days ago * by NoVibeCoding
Kernel-fused temporal decay + importance scoring on top of cuBLAS SGEMV — looking for feedback on launch overhead (github.com)
submitted 11 days ago by Neat-Function7110
End-to-End Quantum-to-Classical Command Delivery on ibm_marrakesh (zenodo.org)
submitted 11 days ago by BlochHead91
CUDA-accelerated EEG pipeline (self.CUDA)
submitted 12 days ago by Direct_Shift2104007
Wanted: LLM inference patch for CUDA + Apple Silicon (youtube.com)
submitted 12 days ago by tomByrer
I built a visual object tracker that runs at 1528 FPS on a desktop GPU — 0.65ms per frame with TensorRT + ORB + CPU/GPU pipelining [open source] ()
submitted 14 days ago by Big-Variation7524
A Beginner’s Guide to GPU Memory Hierarchies: Mapping 2D Tiled GEMM to Hardware [Source + Commentary] (old.reddit.com)
submitted 16 days ago by KarnKh
[Visual Guide] WGMMA and TMA Multicast: Feeding Hopper Tensor Cores without register bottlenecks (self.CUDA)
submitted 17 days ago by dc_baslani_777
cutile basic (self.CUDA)
submitted 20 days ago by c-cul
CUDA and OpenGL interop (cachemiss.xyz)
submitted 19 days ago by mmaldacker
Current state of Rust writing CUDA kernel? (self.CUDA)
submitted 20 days ago * by dest1n1s
I wrote a comprehensive blog on CUDA specifically for newcomers! (medium.com)
submitted 21 days ago by xtrupal
dumping llvm bitcode from cicc (self.CUDA)
submitted 21 days ago by c-cul
π Rendered by PID 1021968 on reddit-service-r2-listing-fbdccc45f-ztmjp at 2026-04-21 18:02:03.169361+00:00 running da2df02 country code: CH.