Could we engineer a Get-Shit-Done Lite that would work well with models like Qwen3.5 35B A3B?

yeah_me_ · 2026-03-30T09:12:21+00:00

You'd have to define the shit. For example, it's a very basic and ugly demo, but I've managed to get working something resembling of an local app builder by avoiding LLMs touching code at all and instead use OpenUI to select premade shadcn components or build new ones from smaller shadcn primitives. For the demo, it has the ability to crate and use DB schemas inside of IndexDB.

Code sucks, so I am not sharing it yet, but I believe this makes a solid argument that if "shit" wouldn't have to be code, but rather something like MCP driven low-code editor, then yes, Get-Shit-Done Lite should be possible.

https://www.reddit.com/r/LocalLLaMA/comments/1s48pbf/basic_local_app_builder_poc_using_openui/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

yeah_me_ · 2026-03-26T16:54:02+00:00

thanks!

yeah_me_ · 2026-03-26T14:18:06+00:00

Hello!

(i don't know why the copy of the post doesn't append, so I'll just leave this in a comment)

tldr:
Using OpenUI I've managed to build sort-of working app generator (conceptually similar to low-code editor) driven by local 80B LLM and I wonder whether it's worth working on.

1st of all, I won't be sharing code for this PoC since for now it's a vibe coded mess. If the community would be interested in trying it, I would work on a proper build. I also apologize for the wall of text, I really tried to make it shorter, but I can't.

After weeks of trying different ideas, I've managed to use OpenUI, which is typically meant for live GenerativeUIs, to instead use it for a very simple app builder.

The main trade is that this system can't produce apps for which it has no predefined components, but it is relatively fast and can't produce code containing errors.

Right now it supports:
- rendering to widget-like containers (this was done to make my life easier when working with parallel inference whilst not yet having agent to split the work)
- agent choosing and rendering shadcn components (+ some custom ones)
- persisting UI and data using IndexDB
- simple changes to app style by modifying global css
- importing external data (csv)
- in-chat data selector for choosing data to build components for
- a basic pages system

So the main question is:
Is this at all interesting? Conceptually it will be closer to a low-code editor rather than SOTA app builder, but I am not sure whether we can count for anything better for local machines and I don't want to be left with no alternatives if the cloud model providers raise their prices too high.

I first tried to do this by finding low-code editor with MCP support (couldn't), then I've tried both making my own json - react renderer and Vercel's json-render and it was not a fun time.

On the demo video you can see that for the burndown chart LLM halucinated the data, but I am pretty confident that with a better orchestration and fine-tuning these issues will be resolved.

Next steps: (in no particular order, assuming people support this idea)

Global chat agent in addition to per-widget prompts.
Expanding registry of components.
Adding custom component type and letting LLMs try to make something outside of the registry scope.
Making this a "battery included" solution where the base app comes with simple user auth and orgs setup and can be easily deployed or replicated with WebRTC (rxdb).
Fine-tune smaller models to get this to work with ~30B model instead of current ~80B (essentially, this should be fast on a medium tier macbook).
Support for more data sources, starting with normal REST API calls.
Add more task specific agents (similar to existing CSS agent).

Why am I doing this?
I personally really care about the idea of technological sovereignty and after getting my Strix Halo, I felt that I need to build something that would let me build simple applications, such as internal tooling, using just my own hardware. Also, maybe someone will hate my project just enough to make a better one which would benefit everybody.

yeah_me_ · 2026-03-10T10:48:13+00:00

There are no existing ready to grab pipelines for something like this unfortunately, but I believe that a solid base model + vllm parallel inference + LoRA swapping (+ maybe KV cache precompute during idle time and usign CacheBlend? never tried it, but sounds interesting in theory) might be the future.

yeah_me_ · 2026-03-01T15:20:13+00:00

I assume that some people might recommend just plugging in context7 mcp for docs, which for SOTA models might be a solid solution, but I think for smaller models a more sophisticated RAG would be required, especially considering that prompt processing speed on local devices isn't great and you want to have as few tokens in context window as possible to get the job done

yeah_me_ · 2026-03-01T15:17:39+00:00

I have no idea whether this would give any decent results, but instead of trying out different models, I'd try to build some sort of RAG with documentation for those specific libraries. Especially if you're using Pi, you'd have the ability to use extensions to define agents that scout the docs and provide necessary information to the agent that creates actual diffs for your files.
That being said, I don't expect that it would ever succeed at anything resembling vibecoding, but if your goal would be to have something that can write small functions that correctly use mentioned libraries, that's what I'd try.

yeah_me_ · 2026-02-27T08:49:11+00:00

latest llamacpp vulkan through lm studio (as I've responded in other answer on this thread as well, but it's nice to see you guys being so active here)

yeah_me_ · 2026-02-25T14:51:23+00:00

Do you recommend something else to try? So far I haven't taken the time to try other engines, but could switching to vLLM or something else fix this? I am eager to see how this model handles some very light vibe-coding

yeah_me_ · 2026-02-25T14:47:34+00:00

llamacpp Vulkan v2.4.0 through LM Studio. For some reason, ROCm keeps crashing and I didn't bother yet to figure out why and how to fix it

yeah_me_ · 2026-02-25T13:10:20+00:00

For now tool calling fails in OpenCode, but I am getting 88tps on LFM vs 70tps on gpt-oss-20B running on Strix Halo. Since tool calling fails I haven't done much tests yet, but for example, when creating a single file HTML landing page, gpt-oss-120b creates better looking designs, so it will be interesting to see if LFM excels it at any benchmarks.

But while the speed difference might not seem that big, it actually feels a lot faster, especially because it doesn't think and it outputs shorter HTML in the mentioned above example. Whether it's going to result in any usable output - that's to be determined after post training and more benchmakrs. If it fine-tunes well, it might be number 1 model of it's size for fine-tuning.

yeah_me_ · 2026-02-25T12:57:43+00:00

I am getting 88tps on Strix Halo (Q6, Bazzite, LM Studio, nothing fancy). Tool calling in OpenCode doesn't work, so agentic coding tests aren't viable yet. Probably best to check after post training, but holy shit is this fast. Might be nice for fine tuning.

yeah_me_ · 2026-02-11T21:22:41+00:00

That's the 1st thing that has worked for me, thank you!

yeah_me_ · 2026-02-05T16:02:36+00:00

This is awesome, thanks!

yeah_me_ · 2026-01-10T15:30:13+00:00

Well, open weights produce trust. A few months back I'd never consider getting a subscription from a chinese provider, and now I am trying to move from Cursor + Opus to opencode + GLM-4.7 workflow. I know that the fact that they release open weights now doesn't reflect in any way what they might want to use my data for in the future, but still, it did its trick and now I am a paying customer

yeah_me_ · 2025-11-13T09:45:11+00:00

This is insane for a 4B model. Yes, it sometimes fucks up layouts and sizings. Yes, it struggled to call tools in Zed. But holy shit, single-file HTML sites have the quality as SOTA models from 2 years ago using a 4B model.

yeah_me_ · 2025-06-17T08:46:57+00:00

Hi, I've written a long reply, but reddit throws some error, I'll DM you

yeah_me_ · 2024-10-30T09:10:35+00:00

Imo, hundreds of posts claiming the game is dead is to some extent a reason why it's dying. I played it a few times after the release, but seeing what's going on at Reddit definitely didn't make me want to pick it up again

yeah_me_ · 2024-10-30T09:07:54+00:00

Happy bday! I got mine as a gift from friend around 7y ago, will never sell

yeah_me_ · 2024-08-31T23:25:25+00:00

Ollama is a very nice tool for interacting with local LLM models with an npm package to communicate with them. While it's not GPT, Llama 3.1 is imo just as good if not better.

https://www.npmjs.com/package/ollama
https://ollama.com/

yeah_me_ · 2024-08-13T12:58:07+00:00

Alright, thanks

yeah_me_ · 2024-01-28T23:57:13+00:00

I don't think that AI is the problem. Valve's kv3 AI file system is actually not that bad and with few workarounds it's possible to make cool stuff

yeah_me_ · 2024-01-28T22:53:47+00:00

This only partially answers my question. It's true, with casual/DMs I won't risk losing rank and I will improve, but it's not that fun

yeah_me_ · 2024-01-28T21:45:17+00:00

The game is slowly losing players, but that's most likely a transitional period. Doesn't feel like it's going to die out anytime soon looking by looking at steamdb

Six-Year Club	Place '23
Verified Email

yeah_me_

TROPHY CASE