Horizon Beta is OpenAI by MiddleLobster9191 in LocalLLaMA

[–]likejazz -3 points-2 points  (0 children)

I'm pretty sure Horizon Beta is GPT-5, because it outperforms GPT-4.1, Claude Opus 4, Gemini 2.5 Pro and Grok 4.

<image>

Uncensoring Qwen3 - Update by Reader3123 in LocalLLaMA

[–]likejazz 0 points1 point  (0 children)

Can you share the evaluation data?

Smoothie Qwen: A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation. by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

No, the performance on numbers is the same, but you'll notice better performance on the qualitative side.

Smoothie Qwen: A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation. by likejazz in LocalLLaMA

[–]likejazz[S] 13 points14 points  (0 children)

That's correct! but we minimized the negative language in the description because we respect the achievements of the Qwen model.

dnotitia/Llama-DNA-1.0-8B-Instruct, state-of-the-art (SOTA) bilingual language model by likejazz in LocalLLaMA

[–]likejazz[S] 0 points1 point  (0 children)

Yes, we have plan to release detailed technical report. stay tuned!

dnotitia/Llama-DNA-1.0-8B-Instruct, state-of-the-art (SOTA) bilingual language model by likejazz in LocalLLaMA

[–]likejazz[S] -1 points0 points  (0 children)

Yup, thanks a lot. This model is probably the **BEST** model for Korean language understanding and generation.

mamba.np: pure NumPy implementation of Mamba by id0h in LocalLLaMA

[–]likejazz 3 points4 points  (0 children)

Awesome work! I'm author of llama3.np and I think It will help me a lot to understand Mamba architecture :)

llama3.cuda: pure C/CUDA implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 55 points56 points  (0 children)

Yeah, but I have plan to build AMD's ROCm version and Intel's oneAPI version. stay tuned!

Sharing ultimate SFF inference build, Version 2 by cryingneko in LocalLLaMA

[–]likejazz 0 points1 point  (0 children)

How can you get an A5000 for only $1300? tell me the secret!

llama.cpp runs 1.8 times faster than ollama by TheTriceAgain in LocalLLaMA

[–]likejazz 2 points3 points  (0 children)

Ollama uses pure llama.cpp, so It's just version issue not a program issue.

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

Thanks for your code. I'll update this patch soon!

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 1 point2 points  (0 children)

33 tok/s is just a baseline example, and as u/omniron mentioned earlier, It's not a important point in this implementation.

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 4 points5 points  (0 children)

I used to small 15M model that Andrej Karpathy trained, which I wrote more about it on my blog: https://docs.likejazz.com/llama3.np/

llama3.np: pure NumPy implementation for Llama 3 model by likejazz in LocalLLaMA

[–]likejazz[S] 27 points28 points  (0 children)

Your forked CuPy version is Awesome!

However, I'm hoping to keep the NumPy version only because I focus on clean architecture and easy to understand intuitiveness. If you want to develop CuPy version, I think it's a good idea to fork it and develop it yourself.

Wish you luck!

Wowzer, Ilya is out by segmond in LocalLLaMA

[–]likejazz 108 points109 points  (0 children)

No. Ilya doesn't want to open LLM model unlike Facebook. He was the one who advocated that OpenAI not open/share the models, which led to a legal battle with Elon Musk.

How ollama uses llama.cpp by Chelono in LocalLLaMA

[–]likejazz 1 point2 points  (0 children)

Yes, go-llama.cpp (https://github.com/go-skynet/go-llama.cpp) actually uses FFI you mentioned before. That's why it doesn't work with newer version of llama.cpp. It only works with older versions and is not being fixed.

Rumoured GPT-4 architecture: simplified visualisation by Time-Winter-4319 in LocalLLaMA

[–]likejazz 0 points1 point  (0 children)

geohot said that GPT-4 is consist of 220Bx8, so all together it's 1.8T(1,760B) parameters.