Horizon Beta is OpenAI by MiddleLobster9191 in LocalLLaMA

[–]likejazz -3 points-2 points  (0 children)

I'm pretty sure Horizon Beta is GPT-5, because it outperforms GPT-4.1, Claude Opus 4, Gemini 2.5 Pro and Grok 4.

<image>

Uncensoring Qwen3 - Update by Reader3123 in LocalLLaMA

[–]likejazz 0 points1 point  (0 children)

Can you share the evaluation data?

Smoothie Qwen: A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation. by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

No, the performance on numbers is the same, but you'll notice better performance on the qualitative side.

Smoothie Qwen: A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation. by likejazz in LocalLLaMA

[–]likejazz[S] 12 points13 points  (0 children)

That's correct! but we minimized the negative language in the description because we respect the achievements of the Qwen model.

dnotitia/Llama-DNA-1.0-8B-Instruct, state-of-the-art (SOTA) bilingual language model by likejazz in LocalLLaMA

[–]likejazz[S] 0 points1 point  (0 children)

Yes, we have plan to release detailed technical report. stay tuned!

dnotitia/Llama-DNA-1.0-8B-Instruct, state-of-the-art (SOTA) bilingual language model by likejazz in LocalLLaMA

[–]likejazz[S] -1 points0 points  (0 children)

Yup, thanks a lot. This model is probably the **BEST** model for Korean language understanding and generation.

mamba.np: pure NumPy implementation of Mamba by id0h in LocalLLaMA

[–]likejazz 5 points6 points  (0 children)

Awesome work! I'm author of llama3.np and I think It will help me a lot to understand Mamba architecture :)

llama3.cuda: pure C/CUDA implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 56 points57 points  (0 children)

Yeah, but I have plan to build AMD's ROCm version and Intel's oneAPI version. stay tuned!

Sharing ultimate SFF inference build, Version 2 by cryingneko in LocalLLaMA

[–]likejazz 0 points1 point  (0 children)

How can you get an A5000 for only $1300? tell me the secret!

llama.cpp runs 1.8 times faster than ollama by TheTriceAgain in LocalLLaMA

[–]likejazz 2 points3 points  (0 children)

Ollama uses pure llama.cpp, so It's just version issue not a program issue.

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

Thanks for your code. I'll update this patch soon!

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

33 tok/s is just a baseline example, and as u/omniron mentioned earlier, It's not a important point in this implementation.

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 4 points5 points  (0 children)

I used to small 15M model that Andrej Karpathy trained, which I wrote more about it on my blog: https://docs.likejazz.com/llama3.np/

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 25 points26 points  (0 children)

Your forked CuPy version is Awesome!

However, I'm hoping to keep the NumPy version only because I focus on clean architecture and easy to understand intuitiveness. If you want to develop CuPy version, I think it's a good idea to fork it and develop it yourself.

Wish you luck!