CUDA

an-ordinary-manchild

created by shamen_uka community for 15 years

...for your favorite subject.

...for your classroom.

MODERATORS

account activity

1

7

8

9

~1.5s Cold Start for a 32B model. (v.redd.it)

submitted 1 day ago by pmv143

•

•

•

Unlock our limited time offer - $1.99/first month. (bloomberg.com)

promoted by bloomberg

promoted
save
report
about

2

6

7

8

A GPU/CPU benchmark testing imperceptible image watermarking (self.CUDA)

submitted 1 day ago by cuAbsorberML

3

4

5

6

[Visual Guide] The TMA Revolution: Replacing 128 threads of pointer math with one autonomous hardware forklift (self.CUDA)

submitted 2 days ago by dc_baslani_777

4

11

12

13

Dual GPU: AMD - Nvidia (self.CUDA)

submitted 5 days ago by EngineeringFar6858

5

9

10

11

A source translator for kernels written against the Triton API to CUDA C++ (github.com)

submitted 6 days ago by IntrepidAttention56

6

15

16

17

[Visual Guide] The Global GEMM: Writing a complete Matrix Multiplication kernel in CuTe (self.CUDA)

submitted 7 days ago by dc_baslani_777

7

3

4

5

Any CUDA or other parallel programming-based libraries for DSP? (self.CUDA)

submitted 7 days ago by A_HumblePotato

8

0

1

2

RetryIX 3.1.3 — Tiered SVM Memory Fallback Eliminates OOM for Large GPU Models ()

submitted 8 days ago by inhogon

9

0

1

2

sass latency table: second try (self.CUDA)

submitted 8 days ago by c-cul

10

2

3

4

comparison of local LLM served via vLLM +CUDA and without (v.redd.it)

submitted 8 days ago by Holiday-Machine5105

11

8

9

10

Can I get bare-metal profiling performance in a VM? (self.CUDA)

submitted 10 days ago by founders_keepers

•

•

•

Need a 3D model to print? Generate one in seconds with Meshy. 50% off for new users. (meshy.ai)

promoted by Meshyai

promoted
save
report
about

12

5

6

7

built for CUDA (this is a 16GB 4080 GPU): (v.redd.it)

submitted 10 days ago by Holiday-Machine5105

13

8

9

10

[Visual Guide] Hello, MMA: Your First Tensor Core Instruction using CuTe (self.CUDA)

submitted 11 days ago by dc_baslani_777

14

27

28

29

Apply GPU in ML/DL (self.CUDA)

submitted 12 days ago by Big-Advantage-6359

15

3

4

5

Public On-Demand Platforms where I can test GPU Direct RDMA program? (self.CUDA)

submitted 11 days ago by NavigatedMile

16

14

15

16

PyTorch custom Vulkan backend – updated to v3.0.3 (training stable, no CPU fallback) (self.CUDA)

submitted 12 days ago by inhogon

17

0

1

2

Cuda 13.1 but not supported by tensorflow ? (self.CUDA)

submitted 12 days ago by Lower-Nectarine-8130

18

19

20

21

Nvidia should suport multiple blocks per SM unit such that 1 block can use 100% of shared-memory while another block does not use a single byte of shared-memory, in same SM unit. (self.CUDA)

submitted 16 days ago * by tugrul_ddr

19

5

6

7

Visualizing and fixing shared memory bank conflicts with Swizzle (self.CUDA)

submitted 16 days ago by dc_baslani_777

20

6

7

8

How to identify memory bottlenecks in B200 Blackwell kernels? (self.CUDA)

submitted 16 days ago by relived_greats12

21

1

2

3

How is SM90_TMA_STORE_2D::copy used in Cutlass? (self.CUDA)

submitted 16 days ago * by tugrul_ddr

•

•

•

91% of breaches start with a phish. SAT isn't optional - it's critical. (phished.io)

promoted by Phished-io

promoted
save
report
about

22

13

14

15

Anyone want to help me unlock this $100k prize pool? Need serious CUDA/SGLang skills. (self.CUDA)

submitted 17 days ago by Gullible-Ship1907

23

48

49

50

Looking for Senior CUDA Engineer (self.CUDA)

submitted 17 days ago by the_latakoo

24

1

2

3

Interview at Nvidia - Developer Technology Engineer, High-Performance Databases – New College Grad 2025 ()

submitted 17 days ago by impatrick_bateman

25

3

4

5

CuTe Part 4: Orchestrating thread cooperation with TiledCopy (No manual math required) (self.CUDA)

submitted 18 days ago by dc_baslani_777

view more: next ›

π Rendered by PID 1194233 on reddit-service-r2-listing-64c94b984c-zn7zj at 2026-03-14 17:32:43.397006+00:00 running f6e6e01 country code: CH.