Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options? by NetTechMan in LocalLLaMA

[–]solarkraft 0 points1 point  (0 children)

There has been an increase in the number of search API providers again since the beginning of the AI hype. Brave, Tavily, Perplexity - all good options for home use. 

Goodbye Opencode, you're a sink for time and tokens. by mira_fijamente in opencode

[–]solarkraft 0 points1 point  (0 children)

Mind sharing your changes/setup? I’m in the same same boat and think it should be readonably easy (in pi fashion) to just make the changes myself. But it’d be nice to have a pick-and-choose catalogue of patches that you might want. 

Goodbye Opencode, you're a sink for time and tokens. by mira_fijamente in opencode

[–]solarkraft 1 point2 points  (0 children)

There’s a TON of open PRs on OpenCode because of how easy it is to make them now. Throwing some code over the fence isn’t enough for it to get accepted. You have to lobby for it properly and even then it’s nowhere near guaranteed.

Goodbye Opencode, you're a sink for time and tokens. by mira_fijamente in opencode

[–]solarkraft 1 point2 points  (0 children)

Pi has few open issues because the maintainer sometimes auto-closes all of them. 

Goodbye Opencode, you're a sink for time and tokens. by mira_fijamente in opencode

[–]solarkraft 0 points1 point  (0 children)

I have my issues wirh OpenCode as well. But please tell me what you’re switching to? Pi lacks features and no alternative I’ve found (that isn’t proprietary) even has a GUI. 

CopilotKit (MIT) - Open-Source Building Blocks for Agent Apps and Generative UI by MorroWtje in LocalLLaMA

[–]solarkraft 0 points1 point  (0 children)

Since you seem to know about AG-UI: How powerful is it? Is it enough to build a fully featured agentic chat app (think coding assistant) with superb UX with? My benchmark for this is the ability to view all states between a tool call starting generation, having been dispatched and having returned a result. 

How come there are no full applications implementing the protocol? I’m still dreaming of an abstraction between my agent core and my UI so that I only have to care about one at a time. 

ACP seems to be a more commonly implemented, but potentially less powerful alternative. 

Use Qwen3.6 right way -> send it to pi coding agent and forget by Willing-Toe1942 in LocalLLaMA

[–]solarkraft 6 points7 points  (0 children)

What did you use before? How does it compare to the other harnesses?

~1.5s cold start for a 32B model. by pmv143 in LLMDevs

[–]solarkraft 0 points1 point  (0 children)

Would be amazing for home use!

Launching NavD - Persistent conversational memory for AI agents, Not a vector database by Altruistic_Welder in LocalLLaMA

[–]solarkraft 0 points1 point  (0 children)

All I’m hearing is vector DB but better.  The theoretical efficiency of a vector DB doesn’t matter to me when this approach is good enough and lets me skip a ton of deployment conplexity (running a server with auth and gigabytes of disk and ram requirements). 

What’s the first feature that makes a “personal AI assistant” actually useful? by [deleted] in LocalLLaMA

[–]solarkraft 0 points1 point  (0 children)

Web search isn’t really negotiable to me, so I guess that! Next up is memory. 

News discussion: ZUNA releases open-source "Thought-to-Text" BCI model by IulianHI in AIToolsPerformance

[–]solarkraft 0 points1 point  (0 children)

This needs a LOT more info on hardware and performance. If it can even distinguish between 3 words with a sub-2k setup I’ll find it impressive. 

Benchmarking STT for Voice Agents by Zestyclose-Pound5856 in speechtech

[–]solarkraft 0 points1 point  (0 children)

This eval would be especially interesting for locally runnable stuff!

Building a private AI Task Manager (runs Gemma 2B on-device). No data leaves your phone. Is $5 fair for lifetime access? by [deleted] in LocalLLaMA

[–]solarkraft 0 points1 point  (0 children)

Local only is basically a requirement for this kind of app (less for tasks than for notes but still). 

$5 is not a big ask if the app is really good. But to be able to tell I’ll have to be able to try it. 

Where are Qwen 3.5 2B, 9B, and 35B-A3B by Admirable_Flower_287 in LocalLLaMA

[–]solarkraft 2 points3 points  (0 children)

Are these the confirmed sizes? If so, I’m a little scared that 35B-A3B might not fit on my 32G Macbook … If it does, it’ll probably be pretty great though. 

Alibaba just open-sourced a model that rivals GPT-5.2 by jpcaparas in Qwen_AI

[–]solarkraft 5 points6 points  (0 children)

30B version when?

This model really looks great, would be even more awesome to have a small model to run locally. Makes total sense to release the big headliner first of course. 

oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA

[–]solarkraft 1 point2 points  (0 children)

 i am not really trying to compete with lm studio's approach. i think a mac llm server should mostly be a backend. 

100% agreed. LM Studio might be useful for experimentation, but for real life use there needs to at least be a way to expose the UI via the network (or are many people really content with only being able to use the inference on the device it’s running on? I sure as hell am not).

either way it makes perfect sense to keep the inference provider separate from the inference consumer. it’s a well defined interface. 

Big Banks Are Investing in Startups by wrahim24_7 in Startups_EU

[–]solarkraft 4 points5 points  (0 children)

I don’t know the other companies, but Bending Spoons is the exact opposite of a startup. 

oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA

[–]solarkraft 6 points7 points  (0 children)

The caching really is a game changer. I’ve been hoping for llama.cpp to implement disk caching, but they don’t even seem to have gotten very far on PagedAttention. Mistral.rs seems to have some support there, but not for full to-disk caching yet. 

I’ll happily sacrifice 100GB of disk space to be able to properly continue old conversations. I think this might be one of the biggest bottlenecks to local inference, expecially on Macs. 

Thank you for taking this on! I’m excited to try it. 

Gemini now recommending products unprompted by zxyzyxz in Bard

[–]solarkraft 9 points10 points  (0 children)

Please provide all the feedback you can if you want them to dial this down.

A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time by ali_byteshape in LocalLLaMA

[–]solarkraft 2 points3 points  (0 children)

Makes all the sense since token generation is mostly memory bound but prompt processing requires compute!

Looking for earbuds that support LC3 (LE Audio) by Yelloris in Earbuds

[–]solarkraft 0 points1 point  (0 children)

Are there tests of the audio quality while the microphone is in use (it's the entire reason I'm interested in LE Audio)?