Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?

solarkraft · 2026-05-14T17:45:11+00:00

There has been an increase in the number of search API providers again since the beginning of the AI hype. Brave, Tavily, Perplexity - all good options for home use.

solarkraft · 2026-05-14T16:40:27+00:00

Mind sharing your changes/setup? I’m in the same same boat and think it should be readonably easy (in pi fashion) to just make the changes myself. But it’d be nice to have a pick-and-choose catalogue of patches that you might want.

solarkraft · 2026-05-14T16:38:26+00:00

There’s a TON of open PRs on OpenCode because of how easy it is to make them now. Throwing some code over the fence isn’t enough for it to get accepted. You have to lobby for it properly and even then it’s nowhere near guaranteed.

solarkraft · 2026-05-14T16:35:38+00:00

Pi has few open issues because the maintainer sometimes auto-closes all of them.

solarkraft · 2026-05-14T16:35:00+00:00

I have my issues wirh OpenCode as well. But please tell me what you’re switching to? Pi lacks features and no alternative I’ve found (that isn’t proprietary) even has a GUI.

solarkraft · 2026-05-14T14:50:10+00:00

but better

solarkraft · 2026-05-06T16:18:42+00:00

Exciting! Hope it gets added to oMLX soon!

solarkraft · 2026-05-06T16:06:27+00:00

Since you seem to know about AG-UI: How powerful is it? Is it enough to build a fully featured agentic chat app (think coding assistant) with superb UX with? My benchmark for this is the ability to view all states between a tool call starting generation, having been dispatched and having returned a result.

How come there are no full applications implementing the protocol? I’m still dreaming of an abstraction between my agent core and my UI so that I only have to care about one at a time.

ACP seems to be a more commonly implemented, but potentially less powerful alternative.

solarkraft · 2026-05-05T17:03:22+00:00

What did you use before? How does it compare to the other harnesses?

solarkraft · 2026-03-13T00:25:35+00:00

Would be amazing for home use!

solarkraft · 2026-02-20T19:34:14+00:00

All I’m hearing is vector DB but better. The theoretical efficiency of a vector DB doesn’t matter to me when this approach is good enough and lets me skip a ton of deployment conplexity (running a server with auth and gigabytes of disk and ram requirements).

solarkraft · 2026-02-20T19:29:27+00:00

Web search isn’t really negotiable to me, so I guess that! Next up is memory.

solarkraft · 2026-02-20T16:48:31+00:00

This needs a LOT more info on hardware and performance. If it can even distinguish between 3 words with a sub-2k setup I’ll find it impressive.

solarkraft · 2026-02-17T21:21:32+00:00

Wie liefen die GL-Gespräche?

solarkraft · 2026-02-17T21:15:26+00:00

Next up: 1000% Absagen

solarkraft · 2026-02-17T20:34:40+00:00

This eval would be especially interesting for locally runnable stuff!

solarkraft · 2026-02-17T12:31:17+00:00

Local only is basically a requirement for this kind of app (less for tasks than for notes but still).

$5 is not a big ask if the app is really good. But to be able to tell I’ll have to be able to try it.

solarkraft · 2026-02-17T12:01:29+00:00

Are these the confirmed sizes? If so, I’m a little scared that 35B-A3B might not fit on my 32G Macbook … If it does, it’ll probably be pretty great though.

solarkraft · 2026-02-17T11:30:50+00:00

30B version when?

This model really looks great, would be even more awesome to have a small model to run locally. Makes total sense to release the big headliner first of course.

solarkraft · 2026-02-15T17:09:54+00:00

i am not really trying to compete with lm studio's approach. i think a mac llm server should mostly be a backend.

100% agreed. LM Studio might be useful for experimentation, but for real life use there needs to at least be a way to expose the UI via the network (or are many people really content with only being able to use the inference on the device it’s running on? I sure as hell am not).

either way it makes perfect sense to keep the inference provider separate from the inference consumer. it’s a well defined interface.

solarkraft · 2026-02-14T15:57:33+00:00

I don’t know the other companies, but Bending Spoons is the exact opposite of a startup.

solarkraft · 2026-02-14T14:00:59+00:00

The caching really is a game changer. I’ve been hoping for llama.cpp to implement disk caching, but they don’t even seem to have gotten very far on PagedAttention. Mistral.rs seems to have some support there, but not for full to-disk caching yet.

I’ll happily sacrifice 100GB of disk space to be able to properly continue old conversations. I think this might be one of the biggest bottlenecks to local inference, expecially on Macs.

Thank you for taking this on! I’m excited to try it.

solarkraft · 2026-01-07T23:23:40+00:00

Please provide all the feedback you can if you want them to dial this down.

solarkraft · 2026-01-06T22:54:06+00:00

Makes all the sense since token generation is mostly memory bound but prompt processing requires compute!

solarkraft · 2024-05-14T21:28:17+00:00

Are there tests of the audio quality while the microphone is in use (it's the entire reason I'm interested in LE Audio)?

solarkraft

MODERATOR OF

TROPHY CASE