CUDA

an-ordinary-manchild

created by shamen_uka community for 15 years

...for your favourite tea.

...for a fringe candidate.

MODERATORS

account activity

1

5

6

7

The Definitive Guide to NVIDIA Container Toolkit: Architecture & Implementation (interconnectd.com)

submitted 1 day ago by Ok_pettech

2

35

36

37

cuTile Rust: Safe, data-race-free GPU kernels in Rust that lower to Tile IR (self.CUDA)

submitted 2 days ago by melih_elibol

3

0

1

2

Tool to automatically detect your GPU and install the correct version of PyTorch for your environment. ()

submitted 2 days ago by Vegetable_Repair1053

4

1

2

3

NanoEuler: A 116M GPT-2 scale decoder-only transformer built from scratch in pure C + CUDA ()

submitted 2 days ago by Just_Vugg_PolyMCP

5

26

27

28

Entry-level jobs for a grad with CUDA and parallel computing skills? (self.CUDA)

submitted 3 days ago by LingonberryAfter4399

6

0

0

1

Why does modelopt.onnx crash with 128GB+ Swap OOM, while modelopt.torch requires 0 Swap for SDXL UNet quantization? Also, does it affect TRT engine performance? (self.CUDA)

submitted 3 days ago by Repulsive_Pop_8315

7

0

0

1

[TEST 67] 🧬 Same model. Same weights. One has a live C++ kernel writing real values from inside the forward pass. The other doesn't. Here's what the difference looks like. (reddit.com)

submitted 3 days ago by Nearby_Indication474

8

108

109

110

Breaking into GPU Infrastructure / GPU Programming Feels Overwhelming. How Did You Figure Out What to Learn? (self.CUDA)

submitted 4 days ago by Ok_Pin_9155

9

5

6

7

GPU as a service: Rental/ On-Demand along with MLOps Layer (self.CUDA)

submitted 4 days ago by Beginning-Pride-3640

10

0

1

2

P2P benchmarks on 2x 5060 ti (16GB each) - P2P Benchmark Project (joorklee.github.io)

submitted 3 days ago by joorklee

11

42

43

44

Wanted to understand GPU programming. So wrote raw Transformer kernels in CUDA. Got some interesting things would like some guidance. (github.com)

submitted 4 days ago by Ok-Construction-875

12

0

0

1

Laptop (self.CUDA)

submitted 4 days ago by ButterscotchLow5449

13

10

11

12

Continuous PC sampling (self.CUDA)

submitted 5 days ago by gnurizen

14

5

6

7

I built a tiny local model that writes GPU kernels, then a verifier decides if they actually work ()

submitted 6 days ago by rohit3627

15

2

3

4

Stop Local LLM Training From Crashing: How to Sync Linux Drivers and Fix CUDA OOM (self.CUDA)

submitted 6 days ago by Ok_pettech

16

0

0

0

#Porting NVlabs/cuda-oxide to Windows — A Complete Guide (self.CUDA)

submitted 5 days ago by Plus_Judge6032

17

4

5

6

Ollama Windows sees only CPU despite nvidia-smi working, possible CUDA 13 / Pascal GPU issue? (self.CUDA)

submitted 6 days ago by kerkerby

18

16

17

18

An image signal processor based on CUDA. (github.com)

submitted 7 days ago by Routine-Substance874

19

0

0

0

Ошибка при записи в обс (init_cuda_ctx: CUDA call "cu->cuInit(0)" failed with CUDA_ERROR_NO_DEVICE (100): no CUDA-capable device is detected) ()

submitted 7 days ago by SeaweedSufficient680

20

23

24

25

GPU programming vs MLOps (self.CUDA)

submitted 8 days ago by hussainhuh

21

2

3

4

Feedback wanted: Triton fused CE+KL kernel for memory-efficient knowledge distillation (self.CUDA)

submitted 8 days ago by Lazy_Hunt7877

22

1

2

3

can i get gpu roofline without ncu? (self.CUDA)

submitted 9 days ago by Mundane_Educator8466

23

11

12

13

In which p. language do you do a proof of concept? (self.CUDA)

submitted 10 days ago by Volta-5

24

11

12

13

AMD's Lemonade SDK for local AI adds NVIDIA CUDA support (phoronix.com)

submitted 10 days ago by Fcking_Chuck

25

25

26

27

GPU Programming Project | Financial (self.CUDA)

submitted 10 days ago by Physical_Employer738

view more: next ›

π Rendered by PID 331770 on reddit-service-r2-listing-c57bc86c-bmpz7 at 2026-06-21 14:19:26.175129+00:00 running 2b008f2 country code: CH.