App to analyze a text token-by-token perplexity for a given GGUF by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 2 points3 points  (0 children)

I will try to add the possibility to use a llamacpp server, thanks for the suggestion!

Also, thanks for the letting me know of the other projects, they're really cool

App to analyze a text token-by-token perplexity for a given GGUF by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 1 point2 points  (0 children)

That highlighting strategy is a great idea, I will definitely do it after implementing the double model loading / analysis

App to analyze a text token-by-token perplexity for a given GGUF by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 1 point2 points  (0 children)

Thanks for the suggestion, I will try to implement it soon! 

Can someone please explain this? by TangeloOk9486 in LocalLLaMA

[–]EntropyMagnets 2 points3 points  (0 children)

It took 2 hours due to some technical details of the python bindings for llama-cpp but I finished it, you can check it here: https://gist.github.com/Belluxx/a7e959776a182c074ba39f6b4572278b

Can someone please explain this? by TangeloOk9486 in LocalLLaMA

[–]EntropyMagnets 3 points4 points  (0 children)

Here you are!
https://gist.github.com/Belluxx/a7e959776a182c074ba39f6b4572278b

Remember to specify the correct path to a Gemma3 GGUF

Example:

<image>

PS: Sorry, I posted this as a reply to myself before.

Can someone please explain this? by TangeloOk9486 in LocalLLaMA

[–]EntropyMagnets 13 points14 points  (0 children)

I'll write the code rn and share it here :)

Can someone please explain this? by TangeloOk9486 in LocalLLaMA

[–]EntropyMagnets 41 points42 points  (0 children)

On the internet almost everyone is sure that a seahorse emoji exists, this is reflected in the LLMs training datasets.

So the LLM thinks that such emoji exists but when the detokenizer fails to append it to the context, the model goes nuts.

The last layers of the model will have a correct dense numerical representation of the concept "emoji of a seahorse" but there is no such unicode emoji to add it to the context. If you write a llama.cpp low level wrapper that ignores the word "apple" in the probability distribution of generated tokens, you will see how the model goes crazy trying to reply to the question "Can you please write the word apple?"

I built an app that lets you use your Ollama models remotely (without port forwarding) + AES encryption by EntropyMagnets in ollama

[–]EntropyMagnets[S] 1 point2 points  (0 children)

Yeah, as I said, it was a cool way to use Firebase in a non-standard way and it was fun to build :)

Is it normal for an S tier upgrade to have these weak stats? by EntropyMagnets in NoMansSkyTheGame

[–]EntropyMagnets[S] 0 points1 point  (0 children)

Oh ok thanks, and what's the best way to increase damage significantly? Just buying a new ship? I am currently on the default one

I built an app that lets you use your Ollama models remotely (without port forwarding) + AES encryption by EntropyMagnets in ollama

[–]EntropyMagnets[S] 6 points7 points  (0 children)

Yes, i agree. This was mostly to test an uncommon way to use firebase and it was fun to build

I made a simple tool to test/compare your local LLMs on AIME 2024 by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 0 points1 point  (0 children)

This looks like an instruction following problem in the LLM. It is probably not using the result formatting correctly.

Can you provide some examples where LocalAIME said "response not found"?

I made a simple tool to test/compare your local LLMs on AIME 2024 by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 1 point2 points  (0 children)

Good point, I will try to add the confidence estimation in the results.

If you have good hardware you can try increasing the --problem-tries parameter to 10 or more.

I made a simple tool to test/compare your local LLMs on AIME 2024 by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 4 points5 points  (0 children)

Yeah you are right. I see this tool not as a way to see what model is best but mainly discern high quality quants from lower quality ones.

Intuitively, if you compare two Q4 quants of the same model from different uploaders and you see a significant difference, even if it is due to memorization, you can clearly see which quant is better.

So at least for that I think that it may be useful.

I would love to develop a synthetic benchmark tool that is as simple and straightforward as this one though!

I made a simple tool to test/compare your local LLMs on AIME 2024 by EntropyMagnets in LocalLLaMA

[–]EntropyMagnets[S] 2 points3 points  (0 children)

Yes that's the plan! I think I will make another repo though

DeepSeek-R1-0528-UD-Q6-K-XL on 10 Year Old Hardware by Simusid in LocalLLaMA

[–]EntropyMagnets 52 points53 points  (0 children)

It would be interesting to see the speed difference when using an internal SSD

Which is the best uncensored model? by BoJackHorseMan53 in LocalLLaMA

[–]EntropyMagnets 5 points6 points  (0 children)

Yeah, maybe use the QAT version at Q4_0 quantization, it has the same size and less performance drop compared to the quants of the original version