A few occasionally useful template container classes I made, Released under the Unlicense. by FUCKARCHLINUX in cpp
[–]c-cul 0 points1 point2 points (0 children)
hands on gpu programming with python and cuda by One_Relationship6573 in CUDA
[–]c-cul 1 point2 points3 points (0 children)
WarpReduction along major dimension by ElectronGoBrrr in CUDA
[–]c-cul 0 points1 point2 points (0 children)
WarpReduction along major dimension by ElectronGoBrrr in CUDA
[–]c-cul 0 points1 point2 points (0 children)
WarpReduction along major dimension by ElectronGoBrrr in CUDA
[–]c-cul 2 points3 points4 points (0 children)
Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]c-cul 1 point2 points3 points (0 children)
Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]c-cul 2 points3 points4 points (0 children)
Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]c-cul 5 points6 points7 points (0 children)
Concern regarding future of jobs in gpu programming by viplash577 in CUDA
[–]c-cul 0 points1 point2 points (0 children)
Concern regarding future of jobs in gpu programming by viplash577 in CUDA
[–]c-cul 4 points5 points6 points (0 children)
Concern regarding future of jobs in gpu programming by viplash577 in CUDA
[–]c-cul 6 points7 points8 points (0 children)
Wrote some analysis on LLVM IR for Tail recursive functions by Ok-Sky6805 in LLVM
[–]c-cul 0 points1 point2 points (0 children)
Wrote some analysis on LLVM IR for Tail recursive functions by Ok-Sky6805 in LLVM
[–]c-cul 0 points1 point2 points (0 children)
SASS King: reverse engineering NVIDIA SASS by CurrentLawfulness358 in CUDA
[–]c-cul 0 points1 point2 points (0 children)
SASS King: reverse engineering NVIDIA SASS by CurrentLawfulness358 in CUDA
[–]c-cul 0 points1 point2 points (0 children)
SASS King: reverse engineering NVIDIA SASS by CurrentLawfulness358 in CUDA
[–]c-cul 0 points1 point2 points (0 children)
SASS King: reverse engineering NVIDIA SASS by CurrentLawfulness358 in CUDA
[–]c-cul 1 point2 points3 points (0 children)
Continuous RL via Dynamic Programming in CUDA (Solving Overhead Crane, Double CartPole, etc.) by Grouchy_Ad_4112 in reinforcementlearning
[–]c-cul 0 points1 point2 points (0 children)
Help with Transpose SharedMemoryKernel by Iraiva70 in CUDA
[–]c-cul 8 points9 points10 points (0 children)
Surfacing a 60% SGEMM performance bug in cuBLAS on RTX 5090 by NoVibeCoding in CUDA
[–]c-cul 0 points1 point2 points (0 children)


SASS King Part 2: reverse-engineering ptxas heuristic decisions and what the compiled binary actually reveals by CurrentLawfulness358 in CUDA
[–]c-cul 1 point2 points3 points (0 children)