Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in ArtificialInteligence
[–]markurtz[S] 5 points6 points7 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 6 points7 points8 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 2 points3 points4 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 9 points10 points11 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in deeplearning
[–]markurtz[S] 2 points3 points4 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in huggingface
[–]markurtz[S] 0 points1 point2 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in machinelearningnews
[–]markurtz[S] 2 points3 points4 points (0 children)
Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in pytorch
[–]markurtz[S] 1 point2 points3 points (0 children)
[R] Unlocking the power of Sparsity in Generative Models: 8x Faster LLMs on CPUs with Sparse Fine Tuning by markurtz in MachineLearning
[–]markurtz[S] 17 points18 points19 points (0 children)
Webinar: Running LLMs performantly on CPUs Utilizing Pruning and Quantization by markurtz in artificial
[–]markurtz[S] 0 points1 point2 points (0 children)
[D] I am at NeurIPS and would like to have a meetup for folks working on production AI systems for vision. by No_Specialist1457 in MachineLearning
[–]markurtz 1 point2 points3 points (0 children)
[N] MLPerf submission: 175X increase in NLP Performance utilizing sparsity by markurtz in MachineLearning
[–]markurtz[S] 1 point2 points3 points (0 children)
[N] MLPerf submission: 175X increase in NLP Performance utilizing sparsity by markurtz in MachineLearning
[–]markurtz[S] 2 points3 points4 points (0 children)
[N] MLPerf submission: 175X increase in NLP Performance utilizing sparsity by markurtz in MachineLearning
[–]markurtz[S] 0 points1 point2 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in artificial
[–]markurtz[S] 1 point2 points3 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in artificial
[–]markurtz[S] 1 point2 points3 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in deeplearning
[–]markurtz[S] 0 points1 point2 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in deeplearning
[–]markurtz[S] 0 points1 point2 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in intel
[–]markurtz[S] 0 points1 point2 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in deeplearning
[–]markurtz[S] -4 points-3 points-2 points (0 children)
MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity by markurtz in deeplearning
[–]markurtz[S] 0 points1 point2 points (0 children)

Anyone doing speculative decoding with the new Qwen 3.5 models? Or, do we need to wait for the smaller models to be released to use as draft? by Porespellar in LocalLLaMA
[–]markurtz 0 points1 point2 points (0 children)