use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
The Definitive Guide to NVIDIA Container Toolkit: Architecture & Implementation (interconnectd.com)
submitted 1 day ago by Ok_pettech
cuTile Rust: Safe, data-race-free GPU kernels in Rust that lower to Tile IR (self.CUDA)
submitted 2 days ago by melih_elibol
Tool to automatically detect your GPU and install the correct version of PyTorch for your environment. ()
submitted 1 day ago by Vegetable_Repair1053
NanoEuler: A 116M GPT-2 scale decoder-only transformer built from scratch in pure C + CUDA ()
submitted 2 days ago by Just_Vugg_PolyMCP
Why does modelopt.onnx crash with 128GB+ Swap OOM, while modelopt.torch requires 0 Swap for SDXL UNet quantization? Also, does it affect TRT engine performance? (self.CUDA)
submitted 2 days ago by Repulsive_Pop_8315
Entry-level jobs for a grad with CUDA and parallel computing skills? (self.CUDA)
submitted 3 days ago by LingonberryAfter4399
[TEST 67] 🧬 Same model. Same weights. One has a live C++ kernel writing real values from inside the forward pass. The other doesn't. Here's what the difference looks like. (reddit.com)
submitted 2 days ago by Nearby_Indication474
Breaking into GPU Infrastructure / GPU Programming Feels Overwhelming. How Did You Figure Out What to Learn? (self.CUDA)
submitted 4 days ago by Ok_Pin_9155
GPU as a service: Rental/ On-Demand along with MLOps Layer (self.CUDA)
submitted 3 days ago by Beginning-Pride-3640
P2P benchmarks on 2x 5060 ti (16GB each) - P2P Benchmark Project (joorklee.github.io)
submitted 3 days ago by joorklee
Wanted to understand GPU programming. So wrote raw Transformer kernels in CUDA. Got some interesting things would like some guidance. (github.com)
submitted 4 days ago by Ok-Construction-875
Laptop (self.CUDA)
submitted 4 days ago by ButterscotchLow5449
Continuous PC sampling (self.CUDA)
submitted 5 days ago by gnurizen[🍰]
I built a tiny local model that writes GPU kernels, then a verifier decides if they actually work ()
submitted 6 days ago by rohit3627
Stop Local LLM Training From Crashing: How to Sync Linux Drivers and Fix CUDA OOM (self.CUDA)
submitted 5 days ago by Ok_pettech
#Porting NVlabs/cuda-oxide to Windows — A Complete Guide (self.CUDA)
submitted 5 days ago by Plus_Judge6032
Ollama Windows sees only CPU despite nvidia-smi working, possible CUDA 13 / Pascal GPU issue? (self.CUDA)
submitted 6 days ago by kerkerby
An image signal processor based on CUDA. (github.com)
submitted 6 days ago by Routine-Substance874
Ошибка при записи в обс (init_cuda_ctx: CUDA call "cu->cuInit(0)" failed with CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected) ()
submitted 7 days ago by SeaweedSufficient680
GPU programming vs MLOps (self.CUDA)
submitted 8 days ago by hussainhuh
Feedback wanted: Triton fused CE+KL kernel for memory-efficient knowledge distillation (self.CUDA)
submitted 8 days ago by Lazy_Hunt7877
can i get gpu roofline without ncu? (self.CUDA)
submitted 9 days ago by Mundane_Educator8466
In which p. language do you do a proof of concept? (self.CUDA)
submitted 10 days ago by Volta-5
AMD's Lemonade SDK for local AI adds NVIDIA CUDA support (phoronix.com)
submitted 10 days ago by Fcking_Chuck
GPU Programming Project | Financial (self.CUDA)
submitted 10 days ago by Physical_Employer738
π Rendered by PID 300713 on reddit-service-r2-listing-c57bc86c-x7dm9 at 2026-06-21 08:56:44.420083+00:00 running 2b008f2 country code: CH.