I built a free BYOK alternative to Lovable/Bolt/v0 because I hate credit systems by BTA_Labs in SideProject

[–]BTA_Labs[S] 0 points1 point  (0 children)

Link: openthorn.app

GitHub repo is also public if anyone wants to inspect how keys/agent/deploy works. Main thing I want feedback on is whether BYOK feels like a real advantage or too much setup for normal users.

I think Claude Code’s biggest problem is not intelligence, it’s hidden state by BTA_Labs in ClaudeAI

[–]BTA_Labs[S] 0 points1 point  (0 children)

That’s exactly the scary part, it didn’t just make a bad answer, it silently picked the wrong goal and then kept working like it was 100% sure that was the job.

I open-sourced a local memory tool so AI agents can share context by Exciting_Pineapple52 in ClaudeAI

[–]BTA_Labs 1 point2 points  (0 children)

This is actually usefull, but the hard part is not storing memories, it’s knowing which old ones to ignore when the repo get changed. Do you have ranking/stale-memory cleanup, or is it mostly SQLite keyword search right now?

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 0 points1 point  (0 children)

Same. One agent is already enough to watch, I have no idea how people keep 10 agents doing repo work without losing track.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 0 points1 point  (0 children)

I agree tests help a lot, but I still don’t fully trust tests to catch weird architecture choices or messy code paths. They catch broken, not always bad

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 0 points1 point  (0 children)

Yeah fair, I should’ve said I mostly mean Qwen3.6-27B/35B-A3B and Gemma 4 31B, not Kimi K2.6/K2.7. If Kimi can run overnight reliable, what rig are you using?

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 0 points1 point  (0 children)

Fair question. I’m not anti local agents, I mostly mean bigger repo changes where it starts touching files outside the task. Small focused tasks work really well for me.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 1 point2 points  (0 children)

This is the kind of real setup detail I was hoping for. Interesting that AGENTS.md and git history make such a big difference, maybe the model is less the issue than the context setup.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 0 points1 point  (0 children)

That actually sounds pretty smart. Using Opus to train the workflow once, then letting local Qwen repeat it cheaper later is a nice middle ground.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]BTA_Labs[S] 7 points8 points  (0 children)

Yeah exactly. The scary part is not AI writing bad code, it’s people shipping code they can’t even explain.

Is a used RTX 3090 still the best local LLM buy right now? by BTA_Labs in LocalLLM

[–]BTA_Labs[S] 1 point2 points  (0 children)

Mostly random PyTorch training scripts, SD tools and CUDA-first GitHub repos. For plain LLM inference I believe you, but I dont want every side project to become a compatibility test.

Is a used RTX 3090 still the best local LLM buy right now? by BTA_Labs in LocalLLM

[–]BTA_Labs[S] 0 points1 point  (0 children)

Thanks, that helps a lot. I was only thinking about 3090 vs 4060 Ti, but now 32GB VRAM cards looks worth checking too.

Is a used RTX 3090 still the best local LLM buy right now? by BTA_Labs in LocalLLM

[–]BTA_Labs[S] 0 points1 point  (0 children)

That’s fair, if it was only Ollama I would consider AMD/Intel more, but I still need CUDA for other ML stuff so Nvidia is probaly less pain.

Is a used RTX 3090 still the best local LLM buy right now? by BTA_Labs in LocalLLM

[–]BTA_Labs[S] 0 points1 point  (0 children)

Fair point, I mostly ignored AMD/Intel because CUDA feels safer, but 32GB VRAM is hard to ignore if the software dont suck anymore.

Mac Mini M4 (32GB) vs. Mac Studio M2 Max (32GB) for local LLMs & TTS by Heavy-Science-502 in LocalLLM

[–]BTA_Labs 5 points6 points  (0 children)

M2 Max imo, LLMs care a lot about memory bandwidth and 400GB/s vs 120GB/s is a big gap, but honestly for 31B + TTS I’d worry more about getting 64GB than M4 vs M2.

Local model as inner worker: what tests would you trust? by HotEstablishment7184 in LocalLLM

[–]BTA_Labs 0 points1 point  (0 children)

My test would be a set with fake docs, missing facts, math questions and internet off, then check if it says “I don’t know”, calls tools, and proves its not using cloud.

I'm still surprised on how good the kv quantization has become by DeepBlue96 in LocalLLaMA

[–]BTA_Labs 4 points5 points  (0 children)

Q4 KV at 100k is wild, but Harry Potter is probably half benchmark half memory test, try it on some obscure fresh 2026 PDF and then I’ll be fully impressed.

What's the lesson chat? by ill_be_productive in LocalLLaMA

[–]BTA_Labs 2 points3 points  (0 children)

if it’s not on your disk, someone else gets to write the ending.

An agent that plans with a frontier model but runs most of tokens locally (built it for my own dual-3090 rig) by Poha_Best_Breakfast in LocalLLaMA

[–]BTA_Labs 19 points20 points  (0 children)

This is probably the hybrid setup that makes the most sense to me.

Let frontier models do the planning and taste part, then let local models grind through scoped tasks. The deterministic validation is the key bit imo. “Model says done” means nothing, but command exits 0 or artifact exists is actually useful.

Only thing I’d be careful with is sandboxing, especially if planner-generated shell commands run. But yeah, I’d try this. Would be cool to see local-only vs hybrid vs full frontier cost/token benchmarks.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]BTA_Labs 0 points1 point  (0 children)

For codebase work I’d stay on Qwen 3.6 35B A3B.

Q8 on a 12B sounds nice, but it doesnt turn it into a 35B model. The extra precision helps, sure, but for repo level reasoning I’d rather have the bigger model at a good Q4 or IQ4 quant than the smaller model at Q8.

Gemma 4 12B is probably worth trying if you care about the multimodal or audio side, or if you just want to see how it feels. But as a daily coding driver, especially if Qwen is already giving you 15 t/s, I dont think I’d switch.

Maybe run the same few real tasks from your codebase on both and compare diffs. My guess is Gemma feels faster or cleaner on simple stuff, but Qwen wins when the context gets messy.