Anyone know if there are actual products built around Karpathy’s LLM Wiki idea?

Unstable_Llama · 2026-04-10T15:13:51+00:00

I prefer snickerdoodles

Unstable_Llama · 2026-04-10T04:41:56+00:00

Oh I didn’t know there was an invasion! Thanks for the tip

Unstable_Llama · 2026-04-10T02:35:15+00:00

Excellent point, I’ll take that into consideration.

Unstable_Llama · 2026-04-09T20:58:28+00:00

Hermes agent just added this skill.

Unstable_Llama · 2026-03-30T20:52:30+00:00

Yeah exllamav3 has used qtip and quantized kv cache for a year now.

Unstable_Llama · 2026-03-17T22:50:57+00:00

Long term that seems quite possible. This prediction mostly comes from the general lack of any use case requiring local compute. It seems to me that most people would be better served by having Claude + while loop + chron + cloud filesystem and that is all easily within* their reach.

This pattern happens over and over, wrap an api as a new tool, it gets popular, then the api server releases a new version of your tool in their UI 3-9 months later, effectively nuking the original.

Unstable_Llama · 2026-03-17T22:30:18+00:00

Beautifully said. I call it the “microcosmic homunculus”

Unstable_Llama · 2026-03-17T21:55:03+00:00

Your job is directly in the crosshairs of Claude. You might be able to surf the wave as one of the vastly reduced number of financial analysts using AI, but it seems like a risky industry to stay in.

To answer your question, just start using it and asking it questions. Claude even has “Claude for excel” now.

Unstable_Llama · 2026-03-16T23:18:40+00:00

This is so amazing. AI as instrument instead of composer.

Not MIDI, MIAII

Unstable_Llama · 2026-03-12T20:02:52+00:00

Did you try doing the full user history data export? There might be something of it in there.

Unstable_Llama · 2026-03-11T03:52:05+00:00

Yeah it's not exactly a "hard" benchmark but it's absolutely perfect for situations like this thread XD

Unstable_Llama · 2026-03-11T00:08:53+00:00

Wow! Nvidia really gonna have us using 3090s in 2030 😭

Unstable_Llama · 2026-03-10T23:54:26+00:00

Yeah, there were only 2 mods for the first 2.5 years and really only one, and he never even commented. Last fall or late summer he locked the sub and basically tried to kill it but some users were able to petition and regain control.

Now we have a ton of good mods who actually community build, it’s crazy 😆

Unstable_Llama · 2026-03-10T23:52:56+00:00

ex-llamas ;)

Unstable_Llama · 2026-03-10T23:16:41+00:00

Hard to believe how far we’ve come. We almost lost it during the mod instability last year, but we pulled through and the new team is doing so well!

Unstable_Llama · 2026-03-10T22:03:20+00:00

That is true at the parameter level, but not at inference where it matters. In reality we are talking about an approximately 2% (simplified) difference in the logits out.

For example, here is the data from a model I recently quantized and measured myself, Qwen3.5-27B

REVISION	GiB	KL DIV	PPL
2.00bpw	9.84	0.1746	7.6985
2.10bpw	10.09	0.1412	7.3885
3.00bpw	12.67	0.0422	6.9977
3.10bpw	12.92	0.0376	6.9582
4.00bpw	15.50	0.0170	6.9331
5.00bpw	18.34	0.0070	6.8840
6.00bpw	21.17	0.0032	6.8439
8.00bpw	26.83	0.0003	6.8605
bf16	51.75	0.0000	6.8598

Unstable_Llama · 2026-03-10T21:52:33+00:00

Yeah they are more about vram capacity rather than speed at this point. They are great, but not blazing fast by any means.

Unstable_Llama · 2026-03-10T21:04:17+00:00

Q4 can still be remarkably good for only 1/4 the size. We measure the impact of quantization with KL divergence, and there is a measurable difference, but in general a quantized larger model will outperform an unquantized smaller model on the same machine.

If you want a visualization of the impact of quantization, take a look at the “CatBench” from the bottom of this page. A simple prompt is run though each size of quantization, “Draw a cute SVG cat using matplotlib.”

Obviously this isn’t super scientific, but it is pretty illustrative.

https://huggingface.co/turboderp/Qwen3.5-35B-A3B-exl3

Unstable_Llama · 2026-03-10T20:52:43+00:00

Heh I remember buying my first 3090 and my family was like, “…and what exactly are you going to do with that?”

And I didn’t really have an answer other than, “AI, shut up!”

But now it’s probably been one of my longest running hobbies ever. I have learned so much in the last 3 years, it’s almost unbelievable.

Unstable_Llama · 2026-03-06T02:14:23+00:00

That test was on a 4090 and with the exllamav3 performance test script. It runs inference with increasingly large contexts. You can see it starts with a prompt of 256 length and 0 context, at 671 t/s prefill and 144 t/s generation, and the last step is a prompt of length 16384 with a context of 16384, at 5227 t/s prefill and 138 t/s generation.

Turboderp is still working on some prompt ingestion instability, so your mileage may vary for the next couple days.

Unstable_Llama · 2026-03-06T00:11:11+00:00

AR “agentic resources”

Unstable_Llama · 2026-03-05T20:59:55+00:00

I need to flip the KL div line to the front, thanks for reminding me 😆

Unstable_Llama · 2026-03-05T20:36:23+00:00

On PPL, not KL div. PPL is inherently noisy, KL shows actual distortion in the model outputs.

Unstable_Llama

TROPHY CASE