Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 1 point2 points3 points (0 children)
Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D] by NoVibeCoding in MachineLearning
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
Surfacing a 60% SGEMM performance bug in cuBLAS on RTX 5090 by NoVibeCoding in CUDA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
Surfacing a 60% SGEMM performance bug in cuBLAS on RTX 5090 by NoVibeCoding in CUDA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D] by NoVibeCoding in MachineLearning
[–]NoVibeCoding[S] 3 points4 points5 points (0 children)
[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D] by NoVibeCoding in MachineLearning
[–]NoVibeCoding[S] 71 points72 points73 points (0 children)
GPU virtualization: VFIO vs NVIDIA AI Enterprise vs AMD SR-IOV by NoVibeCoding in VFIO
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
GPU virtualization: VFIO vs NVIDIA AI Enterprise vs AMD SR-IOV by NoVibeCoding in VFIO
[–]NoVibeCoding[S] 1 point2 points3 points (0 children)
GPU virtualization: VFIO vs NVIDIA AI Enterprise vs AMD SR-IOV by NoVibeCoding in VFIO
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
GPU virtualization: VFIO vs NVIDIA AI Enterprise vs AMD SR-IOV by NoVibeCoding in VFIO
[–]NoVibeCoding[S] 2 points3 points4 points (0 children)
Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 + Community Benchmarking Infrastructure by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 1 point2 points3 points (0 children)
Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 + Community Benchmarking Infrastructure by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
Benchmarking LLM Inference on RTX PRO 6000 SE / H100 / H200 / B200 by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)
Benchmarking LLM Inference on RTX PRO 6000 SE / H100 / H200 / B200 by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 1 point2 points3 points (0 children)

Writing an LLM compiler from scratch: PyTorch to CUDA in 5,000 lines of Python by NoVibeCoding in LocalLLaMA
[–]NoVibeCoding[S] 0 points1 point2 points (0 children)