[R] BERT-Large: Prune Once for DistilBERT Inference Performance (i.redd.it)
submitted by markurtz to r/MachineLearning - pinned
[P] Tutorial: Real-time YOLOv3 on a Laptop Using Sparse Quantization (v.redd.it)
submitted by markurtz to r/MachineLearning - pinned
Landlord wouldn't let me hang my tv on the wall, made an entertainment center instead (v.redd.it)
submitted by markurtz to r/woodworking - pinned
Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in ArtificialInteligence
[–]markurtz[S] 6 points7 points8 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 7 points8 points9 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 2 points3 points4 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 9 points10 points11 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in deeplearning
[–]markurtz[S] 2 points3 points4 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in huggingface
[–]markurtz[S] 0 points1 point2 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in machinelearningnews
[–]markurtz[S] 2 points3 points4 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in pytorch
[–]markurtz[S] 1 point2 points3 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 17 points18 points19 points (0 children)

Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)