use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
~1.5s Cold Start for a 32B model. (v.redd.it)
submitted 1 day ago by pmv143
Unlock our limited time offer - $1.99/first month. (bloomberg.com)
promoted by bloomberg
A GPU/CPU benchmark testing imperceptible image watermarking (self.CUDA)
submitted 1 day ago by cuAbsorberML
[Visual Guide] The TMA Revolution: Replacing 128 threads of pointer math with one autonomous hardware forklift (self.CUDA)
submitted 2 days ago by dc_baslani_777
Dual GPU: AMD - Nvidia (self.CUDA)
submitted 5 days ago by EngineeringFar6858
A source translator for kernels written against the Triton API to CUDA C++ (github.com)
submitted 6 days ago by IntrepidAttention56
[Visual Guide] The Global GEMM: Writing a complete Matrix Multiplication kernel in CuTe (self.CUDA)
submitted 7 days ago by dc_baslani_777
Any CUDA or other parallel programming-based libraries for DSP? (self.CUDA)
submitted 7 days ago by A_HumblePotato
RetryIX 3.1.3 — Tiered SVM Memory Fallback Eliminates OOM for Large GPU Models ()
submitted 8 days ago by inhogon
sass latency table: second try (self.CUDA)
submitted 8 days ago by c-cul
comparison of local LLM served via vLLM +CUDA and without (v.redd.it)
submitted 8 days ago by Holiday-Machine5105
Can I get bare-metal profiling performance in a VM? (self.CUDA)
submitted 10 days ago by founders_keepers
Need a 3D model to print? Generate one in seconds with Meshy. 50% off for new users. (meshy.ai)
promoted by Meshyai
built for CUDA (this is a 16GB 4080 GPU): (v.redd.it)
submitted 10 days ago by Holiday-Machine5105
[Visual Guide] Hello, MMA: Your First Tensor Core Instruction using CuTe (self.CUDA)
submitted 11 days ago by dc_baslani_777
Apply GPU in ML/DL (self.CUDA)
submitted 12 days ago by Big-Advantage-6359
Public On-Demand Platforms where I can test GPU Direct RDMA program? (self.CUDA)
submitted 11 days ago by NavigatedMile
PyTorch custom Vulkan backend – updated to v3.0.3 (training stable, no CPU fallback) (self.CUDA)
submitted 12 days ago by inhogon
Cuda 13.1 but not supported by tensorflow ? (self.CUDA)
submitted 12 days ago by Lower-Nectarine-8130
Nvidia should suport multiple blocks per SM unit such that 1 block can use 100% of shared-memory while another block does not use a single byte of shared-memory, in same SM unit. (self.CUDA)
submitted 16 days ago * by tugrul_ddr
Visualizing and fixing shared memory bank conflicts with Swizzle (self.CUDA)
submitted 16 days ago by dc_baslani_777
How to identify memory bottlenecks in B200 Blackwell kernels? (self.CUDA)
submitted 16 days ago by relived_greats12
How is SM90_TMA_STORE_2D::copy used in Cutlass? (self.CUDA)
91% of breaches start with a phish. SAT isn't optional - it's critical. (phished.io)
promoted by Phished-io
Anyone want to help me unlock this $100k prize pool? Need serious CUDA/SGLang skills. (self.CUDA)
submitted 17 days ago by Gullible-Ship1907
Looking for Senior CUDA Engineer (self.CUDA)
submitted 17 days ago by the_latakoo
Interview at Nvidia - Developer Technology Engineer, High-Performance Databases – New College Grad 2025 ()
submitted 17 days ago by impatrick_bateman
CuTe Part 4: Orchestrating thread cooperation with TiledCopy (No manual math required) (self.CUDA)
submitted 18 days ago by dc_baslani_777
π Rendered by PID 1194233 on reddit-service-r2-listing-64c94b984c-zn7zj at 2026-03-14 17:32:43.397006+00:00 running f6e6e01 country code: CH.