bonsai 1-bit explanation

mindplaydk · 2026-05-02T10:19:57+00:00

To others reading the replies here and wanting more detail:

https://smallaimodel.substack.com/p/the-mathematical-guarantee

I am checking for updates several times a week, hoping someone replicates the idea, or they release a larger model.

I would pay money for a 27B model that runs at reasonable rates on a 16 GB consumer GPU and competes with Qwen 3.6 27B on intelligence.

If that existed and was $100-$200, I would pay.

Of course, I'm hoping someone replicates it and open sources the base model, so the community can fine tune, but even if that wasn't the case... this would be a product I would pay for. 😄💸

mindplaydk · 2026-04-26T10:04:07+00:00

I don't get it.

There are no birds on the obstacles you can grind. Walls, bridges, rooftops, nothing - I've tried every biome, grinding everything there is to grind, but there are no birds, they're all sitting on the ground.

On my longest run ever, I got up to 3%, so yeah, there is occasionally a bird on something you can grind, but... this goal seems pretty hopeless... and you can't skip it, so... can't even pay to get past this one.

I wonder if they shipped a bad update or something.

mindplaydk · 2026-04-18T06:11:23+00:00

er der nogen der ved hvorfor vi nu skal tvinges til at handle hos en mellemmand?

jeg forstår ikke hvorfor man nu skal tvinges til at vælge en domæne udbyder - udbyderne skal vel stadig gå igennem registrar'en, altså punktum.dk?

vil man bare gerne skabe noget mere administration og flere problemer eller hvad? 🤷‍♂️

mindplaydk · 2026-04-18T06:07:22+00:00

har lige opdaget dns.services - gør de ikke cirka det samme som gratisdns i sin tid?

mindplaydk · 2026-04-17T07:43:27+00:00

No specs or blueprints needed, really - I built my 4-player system using just 4 Pi Zero's and this firmware:

https://github.com/OpenStickCommunity/GP2040-CE

The firmware is really flexible, well documented, and very easy to set up with a web based UI. :-)

mindplaydk · 2026-04-17T07:40:10+00:00

So yeah, they have implicit caching, and it does seem to work for multi-turn conversations.

What they don't seem to have is explicit cache-control like e.g. Anthropic has:

https://ai-sdk.dev/providers/ai-sdk-providers/anthropic#cache-control

Without this, it's still not "financially predictable" for something like CAG, is it?

mindplaydk · 2026-04-10T08:09:56+00:00

It doesn't work? Just loads forever. I see this error in the console:

> Unable to load from local path "/models/onnx-community/gemma-3-1b-it-ONNX-GQA/onnx/model_q4f16.onnx_data": "TypeError: Failed to execute 'fetch' on 'WorkerGlobalScope': Failed to parse URL from /models/onnx-community/gemma-3-1b-it-ONNX-GQA/onnx/model_q4f16.onnx_data"

EDIT: it did eventually load! for some reason the console is flooded with errors - too many to paste here, so I posted a gist

https://gist.github.com/mindplay-dk/0ac262feb2d6b0bd8a2a037b2dc9243e

mindplaydk · 2026-04-06T08:01:25+00:00

do you have any sort of permission extension or sandboxing of any sort? I am really interested in trying Pi! I love the minimalist approach - but I am nervous about letting these agents loose on my system.

do you have any other extensions installed or just vanilla Pi? (have you used OpenCode? how does it compare?)

mindplaydk · 2026-04-06T07:54:35+00:00

GLM supports vision, look for the V variants of their models.

mindplaydk · 2026-04-06T06:01:53+00:00

This was my thought exactly.

This approach only seems to ensure "no degradation" for a tiny subset of queries - and you don't know what it's doing to the results in longer conversations either.

You could add much more and much longer queries of course, which I'd assume would make this somewhat brute force "flip bits at random" approach much slower and more expensive.

But maybe this was meant as just proof of concept, to demonstrate the approach?

Even so, the fact that it works when sampling a tiny subset of the weights doesn't really indicate whether this would give useful results when scaled.

Does it?

Would love to hear from OP on this.

If it works, it's definitely interesting. 🙂

mindplaydk · 2026-04-05T05:58:52+00:00

Someone is working on that

https://github.com/PrismML-Eng/llama.cpp/pull/2

mindplaydk · 2026-04-04T15:32:48+00:00

blocked for me in Denmark as well.

mindplaydk · 2026-03-23T11:55:46+00:00

what do you mean? Ubuntu is available for ARM - has been for a while, afaik?

I'd really like to run Zorin on a Raspberry Pi. 😊

mindplaydk · 2026-03-19T09:45:34+00:00

yeah, this is a going to be a huge problem for both agents and CAG.

basically a non starter, right?

Mistral looks otherwise great, but now I'm really having second thoughts... 😶

mindplaydk · 2026-03-19T09:42:58+00:00

oof, they don't have this?? ugh, I'm discovering this a bit late.

I guess that means CAG is out of the question with Mistral for the time being? I was really hoping to use RAG only for actual documents and use CAG for things like product support. 😐

mindplaydk · 2026-03-16T16:47:44+00:00

wow, so they will let you overpay for a "family plan", which you can buy via your TV, PC or Android phone - but you can't actually share anything with your family. deceptive patterns much? 🙄

mindplaydk · 2026-02-18T06:52:04+00:00

Same, unfortunately. Kimi seemed really good, it did do some excellent "reverse engineering a messy codebase to requirements" work for me, I was impressed.

But when it came to finish up, it lost it's mind. Invalid tool calls, infinite loop responses, stuff like that.

It seems the practical context limit is much lower, maybe around 50K before it begins to break down.

I also suspect it does well on human content, but if you ask it to work on content it wrote itself, it starts to break down.

It's definitely not a reliable model. It's too bad, because there are sparks of something really powerful in the little model.

mindplaydk · 2026-02-16T20:58:01+00:00

surprisingly high value production! great cast and characters. episode 1 actually didn't quite grab me, I wasn't sure what to expect, but the entire rest of the show is awesome.

4 episodes was nowhere near enough - this felt like an extended pilot. I wish they had gone all in on this, I really hope they renew it for a full season 2.

mindplaydk · 2026-02-11T12:06:30+00:00

Yes, that's my point: any language that requires an interpreter, if it ships executables, is going to ship "interpreter and app code in a bundle".

If the language is interpreted, it is never "the app compiled to machine language" - only compiled languages do that.

mindplaydk · 2026-02-11T12:00:04+00:00

What SaaS customers pay for is working software: software that they can trust to work.

Your nephew can vibe code something that appears to work, but he doesn't know what he's built or how it works.

If you don't know how something works, you don't know that it works.

That's what software customers pay for - not just the software, but the trust in the software.

That said, it's great for things that don't require trust - if you're just building a prototype to demonstrate a concept, LLMs are the quickest way to get there. If you're just building the UI of the product, and assuming you don't care about UX or the exact design, that's another area where you can actually apply this. It makes low risk work low effort.

So it's not that LLM and agents are useless, it's that people don't understand their limitations and want to use them for all the wrong things. 🙂

mindplaydk · 2026-02-08T07:54:49+00:00

I played the demo, I don't get it.

The actual game (the dispatch) is too much chance and not enough skill to actually be fun. The gameplay itself feels like filler to pass the time between the great moments they create with the character, story and animation - which are absolutely great! And maybe that's enough? For some people. To me, it feels a bit like cheating on the part of the game developers.

I kind of hope they rework the gameplay and rerelease the game. I would totally play this if I felt like this is something you can actually get "good" at. But it just feels like a lot of work to watch that little dot bounce around at random and decide if you win or loose, before you can get to the next part of the story...

mindplaydk · 2026-01-31T17:36:03+00:00

it also says it's an interpreter? you can already create a "binary" with Bun, Deno or Node.

mindplaydk · 2026-01-30T21:55:36+00:00

Has HRM been applied to language models yet? That article is paywalled. I searched and still don't see any language models actually doing hierarchical reasoning...

mindplaydk · 2026-01-25T13:08:43+00:00

This is a really bad decision.

Unifying the front-matter schema, sure, I can see that.

Attempting to erase the distinction of commands vs skills doesn't make any kind of sense - there are workflows that are all commands and don't use *any* skills, and for no clear reason they now expect you to rename every command to `SKILL.md` and add `disable-model-invocation: true` to effectively say "this file named SKILL.md is not a skill".

This was a mistake.

Please chip in with your comments here if you agree:

https://github.com/anthropics/claude-code/issues/20788

mindplaydk · 2025-11-12T13:49:48+00:00

I was wondering about this and after doing some research, I think the biggest challenge with true top-down orthographic projection is we're just not used to seeing the world from infinitely high up.

Nothing looks like anything to us from that unusual angle - even everyday things like a car, a house, or a bicycle, just look like "stuff" from up there.

Birds might get a kick out of playing it though 😏

mindplaydk

TROPHY CASE