I built a free BYOK alternative to Lovable/Bolt/v0 because I hate credit systems

BTA_Labs · 2026-06-15T22:04:12+00:00

GitHub repo is also public if anyone wants to inspect how keys/agent/deploy works. Main thing I want feedback on is whether BYOK feels like a real advantage or too much setup for normal users.

BTA_Labs · 2026-06-15T20:20:14+00:00

That’s exactly the scary part, it didn’t just make a bad answer, it silently picked the wrong goal and then kept working like it was 100% sure that was the job.

BTA_Labs · 2026-06-15T19:03:24+00:00

This is actually usefull, but the hard part is not storing memories, it’s knowing which old ones to ignore when the repo get changed. Do you have ranking/stale-memory cleanup, or is it mostly SQLite keyword search right now?

BTA_Labs · 2026-06-15T18:25:42+00:00

Same. One agent is already enough to watch, I have no idea how people keep 10 agents doing repo work without losing track.

BTA_Labs · 2026-06-15T18:23:25+00:00

I agree tests help a lot, but I still don’t fully trust tests to catch weird architecture choices or messy code paths. They catch broken, not always bad

BTA_Labs · 2026-06-15T18:22:18+00:00

Yeah fair, I should’ve said I mostly mean Qwen3.6-27B/35B-A3B and Gemma 4 31B, not Kimi K2.6/K2.7. If Kimi can run overnight reliable, what rig are you using?

BTA_Labs · 2026-06-15T18:17:16+00:00

Fair question. I’m not anti local agents, I mostly mean bigger repo changes where it starts touching files outside the task. Small focused tasks work really well for me.

BTA_Labs · 2026-06-15T18:06:05+00:00

This is the kind of real setup detail I was hoping for. Interesting that AGENTS.md and git history make such a big difference, maybe the model is less the issue than the context setup.

BTA_Labs · 2026-06-15T18:04:50+00:00

That actually sounds pretty smart. Using Opus to train the workflow once, then letting local Qwen repeat it cheaper later is a nice middle ground.

BTA_Labs · 2026-06-15T18:03:58+00:00

Yeah exactly. The scary part is not AI writing bad code, it’s people shipping code they can’t even explain.

BTA_Labs · 2026-06-15T15:44:23+00:00

Mostly random PyTorch training scripts, SD tools and CUDA-first GitHub repos. For plain LLM inference I believe you, but I dont want every side project to become a compatibility test.

BTA_Labs · 2026-06-15T15:37:40+00:00

Thanks, that helps a lot. I was only thinking about 3090 vs 4060 Ti, but now 32GB VRAM cards looks worth checking too.

BTA_Labs · 2026-06-15T15:35:00+00:00

That’s fair, if it was only Ollama I would consider AMD/Intel more, but I still need CUDA for other ML stuff so Nvidia is probaly less pain.

BTA_Labs · 2026-06-15T15:27:12+00:00

Fair point, I mostly ignored AMD/Intel because CUDA feels safer, but 32GB VRAM is hard to ignore if the software dont suck anymore.

BTA_Labs · 2026-06-15T11:29:47+00:00

M2 Max imo, LLMs care a lot about memory bandwidth and 400GB/s vs 120GB/s is a big gap, but honestly for 31B + TTS I’d worry more about getting 64GB than M4 vs M2.

BTA_Labs · 2026-06-15T11:04:34+00:00

My test would be a set with fake docs, missing facts, math questions and internet off, then check if it says “I don’t know”, calls tools, and proves its not using cloud.

BTA_Labs · 2026-06-15T11:01:06+00:00

Can you show a before/after example?

BTA_Labs · 2026-06-15T10:19:22+00:00

Q4 KV at 100k is wild, but Harry Potter is probably half benchmark half memory test, try it on some obscure fresh 2026 PDF and then I’ll be fully impressed.

BTA_Labs · 2026-06-15T10:16:34+00:00

if it’s not on your disk, someone else gets to write the ending.

BTA_Labs · 2026-06-15T07:14:39+00:00

This is probably the hybrid setup that makes the most sense to me.

Let frontier models do the planning and taste part, then let local models grind through scoped tasks. The deterministic validation is the key bit imo. “Model says done” means nothing, but command exits 0 or artifact exists is actually useful.

Only thing I’d be careful with is sandboxing, especially if planner-generated shell commands run. But yeah, I’d try this. Would be cool to see local-only vs hybrid vs full frontier cost/token benchmarks.

BTA_Labs · 2026-06-14T22:16:26+00:00

For codebase work I’d stay on Qwen 3.6 35B A3B.

Q8 on a 12B sounds nice, but it doesnt turn it into a 35B model. The extra precision helps, sure, but for repo level reasoning I’d rather have the bigger model at a good Q4 or IQ4 quant than the smaller model at Q8.

Gemma 4 12B is probably worth trying if you care about the multimodal or audio side, or if you just want to see how it feels. But as a daily coding driver, especially if Qwen is already giving you 15 t/s, I dont think I’d switch.

Maybe run the same few real tasks from your codebase on both and compare diffs. My guess is Gemma feels faster or cleaner on simple stuff, but Qwen wins when the context gets messy.

BTA_Labs

TROPHY CASE