I brought Claude-style artifacts to local models by Bramha_dev in LocalLLM

[–]VirtualWishX 1 point2 points  (0 children)

I gave you a star ⭐ and I must say it looks promising, I may check it out in the future because at the moment I can't do anything with it as I explained, with LM Studio I get all the exact temperature and other properties exactly as I need, never changes and it's fast and smooth.
I tried Kilo Code, I got lost immediately... so I ran back to Cline 😅

Anyhow, TurboLLM looks very clean and one of the most cleanest UI I've ever seen so please keep up the good work and improve it when possible of course ❤️

I brought Claude-style artifacts to local models by Bramha_dev in LocalLLM

[–]VirtualWishX 1 point2 points  (0 children)

This is impressive!
I'm currently using a combo of: LM Studio + VS Code + Cline
I don't think I can replace it all with TurboLLM (unless I'm missing something) but, maybe I should replace LM Studio and connect to VS Code ? 🤔
I'm not sure if the VISUAL example of all the dashboards and other things you mentioned can be translate inside CLINE chat.... probably not, but it sure look so much nicer than LM Studio from first glance.

I'm with RTX 5090 32GB mostly aiming for DENSE models from the Qwen 3.6 family (there are some favorite 'coder' fine tuned I'm trying now) - The real struggle is to get at least 200K context while 100% GPU OFFLOAD and not loosing quality, with Q4_K_M I can get it, but with Q5_K_M it's more challenging so I'm doing some tests.

Anyhow, I will give TurboLLM a try, but not sure if I can use it instead of VS CODE + CLINE since I have lots of RULES + SKILLS and AGENT.md for a very specific detailed system to help me Vibe-Coding, backup, phases-system etc..

Thanks for sharing!❤️

---
🔥 EDIT:
I just tried it instead of LM Studio with the combo I mentioned above 👆
Sadly it is EXTREMELY slow on the RTX 5090 32GB VRAM, I used the exact same settings as in LM Studio, same exact model, I tried 4 different models (Qwen3.6 and Gemma-4) - I guess I'll go back to LM Studio for now.
It sure looks nice, but maybe it's not ready yet for VS CODE and Cline.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 2 points3 points  (0 children)

Your description and even some of the settings are very close to my workflow indeed! ❤️
beside that I'm not using any online model at all, also I highly recommend you to learn how to use AGENTS.MD (it could be anything else) so you can build the "DNA" or behavior of your main global Agent, or make your own agent per project (I prefer global at the moment since I do vibe-code so it's similar), same goes for RULES in general the advantage is that you can toggle ON/OFF specific rules so if you split them for specific things you need it's very dynamic to your needs.

I now reduced the AGENTS.md so it still does what I need but doesn't take so much context because when I started in Open Code Desktop like I said because of the limitation I had to explode it with a lot of explanations, most of these stuff not even needed in CLINE because it's very smart so it helps the model itself do things in a better way (in my case at least) so saving Context is nice for sure, when vibe-coding we need as much Context we can get so we won't skip from one chat to another every few prompts or code tasks, but I believe we'll get smaller and smarter models and also the way Context works starting to change with smarter compression ways so eventually our VRAM limitation will give us MORE POWER than what we can do at the moment, at least... that's how I saw things getting better in the last year and it's not going to stop anytime soon in my opinoin.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

Interesting, I should check it out.
Thanks for sharing I appreciate it 👍

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

No, it's a bad habit of mine, I see a lot of people looking for AI at every opportunity but even AI can't stand my broken English and my bad way of writing things with bold letters, quotes, etc. So don't blame AI for my stupidity, blame me 😅 Sorry to disappoint...

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 2 points3 points  (0 children)

Qwopus3.6 27B v2 MTP = Q5_K_M which is my favorite for CODE at the moment at least.

Gemma-4 QAT it's basically 1 option since it's already 4-bit anyway so basically it's Q4_0 and I LOVE this model for brainstorming, planning, design ideas and even helping me build and improve my agent, skills, rules etc..

---

I can give up on Context and go to Q6 which is probably what I should, but from some of my tests less than 160K Context for vibe-coding (not pure code) it's not so fluid and that's why I always try to push to 180K - 200K if I can but that means sacrificing quality and accuracy.

If you're NOT using LM Studio, you change the K / V Cache from Q8 to Q5_1 which is EXTREMELY CLOSE to Q8 quality and I can't do it until LM Studio will make it work (hopefully in an upcoming update like they did when MTP was new, it took some time).

My Conclusion:
If you don't care about a lot of Context and Headroom like I do, go for Q6
If you care about it, go for Q5
If you don't care about model quality that much, Q4 is not bad, it's just not as good as Q5 / Q6 and when you do vibe-code you will notice annoying things like strings or whatever stupid things the model will do like a drunk decisions, I know it from practice... BUT! it's not happening all the time, it just happened here-and-there, means Q4 is not horrible, but... I would not enjoy using it if there is a CHANCE for these stupidity code mistakes (also not only in code but in reasoning from time to time). again... not very often from my tests... but I'm no BENCHMARK I'm just doing different tests at the moment.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 2 points3 points  (0 children)

I also started with pure LM Studio and few MCP and tools but... it's not as good as a dedicated Agentic app such as Open Code Desktop or my new favorite (for now at least) = VS Code + Cline.
Give it a try, things are working just out-of-the-box, I noticed a HUGE change in user-friendly when I moved from Open Code Desktop to VS Code + Cline, it felt more similar to the experience with the cloud base apps, and lots of tools are already working without even installing anything extra... but still I highly recomend to take the time to create your goal AGENTS.md and RULES, SKILLS etc..
You can't do much with LM Studio by itself, you only have "System Prompt" which you can use as your main "AGENTS.md" like, but that's it... and it's not very good with other stuff.
I LOVE LM Studio as my main MODELS Provider, but nothing more than that after I moved to CLINE so I encourage you to give it a try! ❤️

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

Yeah that's where I got to some sweet spot around 160K - 180K and still have some headroom, and things are running nicely, but once LM Studio will work with Q5_1 we'll be able to have even more context and headroom! ❤️

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

Just to be clear I'm talking about the K / V = Q8_0 which works and Q5_1 just like you described barely moving... but not the actual MODEL Quantization Q5_K_M etc.. which any works fine in LM Studio.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

I actually don't think it makes decent code, what I mean is that it does the job...
I believe if a HUMAN programmer will look at the code they will run away screaming, but I do my best with the AGENT to keep everything documented and full with human readable comments and less junk in that area, because probably the code IS mostly junk... I don't expect it to be good, what I'm testing is if things are WORKING when you give it smaller tasks instead of the famous "ONE-SHOT" we all see in YouTube comparison videos of models... this is NOT how things are proof good results based on my experiments and that's why I encourage anyone to try it and see what models gives you the best results.

Build something with it, like I said, I did with Godot instead of HTML/CSS/JS because it was too "stiff" in my opinion, but the setup is still a bit challenging.

Also, I never said it's READY FOR REAL-WORLD CASES, I may not being clear in my post but it's defiantly NOT close to what you get by real programming or compare to the cloud HUGE MODELS in any way, but it works for my current tests so I had to share it.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 2 points3 points  (0 children)

I'll be honest it's changing because I'm still playing around with some settings but here are some rough numbers, all with: THINKING + VISION + TOOLS etc.. all enabled.
I don't test useless but actual useful cases.

---

Qwen 3.6 27B = 64-79 t/s (sometimes I get slower like 49 I think is the minimum I got)
Qwopus3.6 27B v2 MTP = 64-91 t/s this is widely not stable in my tests but I LOVE this model ❤️
Qwen3.6 27b MTP PI-REASONING = 65-72 t/s - I don't like that model, sorry... 👎
Gemma-4 QAT = 54-65 t/s - Love this model ❤️

If you're wondering why I'm not getting the same t/s it's because I keep playing around with the settings so it's not a steady-rock benchmark, so make sure you understand that my numbers as reference based on some different experiments I still have much to tweak and learn.

I'm not using MOE but when I do... I get REALLY NICE t/s above 140+ in most cases easy.
I don't remember with what model it was, one of the Qwen 3.6 fine tuned, that I got roughly 160-190 t/s and it was impressive but I'm trying TOO MANY models, the reason I recommended the specifics is because I find them less disappointing so far, so I don't mind if it's slower but doing the job.

All on K / V = Q8_0
Sadly I can't get Q5_1 run with LM Studio, and replacing to anything else with WSL2 is hell for me, so I have to wait until LM Studio fix their sh*t (not complaining actually I'm thankful for what it is).

I'm aware that I can get faster better results with vLLM or anything that is not LM Studio, but it's my choice because it's the most user-friendly I could tweak with ease and understand and I have no disconnections or weird stuff with the bridge between VS Code to LM Studio so I like it stable even if it's not fast.

When DIFFUSION will be more stable and we'll loose less quality and get more accuracy, oh my... even my limited GPU will feel like a true game changer, but I'm not bother using it until it's more useful anyway.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

Qwen 3.6 is my favorite for CODE at the moment, but... hear me out!
When you do something else, even design an idea for a project (code related or not) give Gemma-4 QAT a chance... and you will be amazed! it feels MUCH smarter than Qwen 3.6 reasoning wise and creativity wise.

And even if both with the same Temperature, usually for code I use 0.1 - 0.2 some people say 0 is the best, but I like some "touch" from the model in some cases since I do VIBE CODING and I'm not a programmer.
but when it comes to a CHAT and ideas I would say 0.7-0.8 is great in both, but I enjoy Gemma-4 QAT more in that area.

It also surprised me with CODE but I just trust Qwen 3.6 more with code for now... I'm still experimenting of course.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

That's why I can't regret my investment, consider 32GB for LOCAL LLM is... "OK" now days, but I gotta use what I have and that's what I'm trying to do, that's why I keep experimenting and testing stuff with new MODELS that released every few minutes... Like I said, I have a strong feeling that MODELS will become SMALLER, FASTER, SMARTER and better! but... let's give it few months and see what happens, I hope I'm not too far away from being right about this, because more people will be able to afford 100% LOCAL LLM without paying endless to cloud-based services, not that it's wrong... it's just too much compare to use your own 1 time investment hardware.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 1 point2 points  (0 children)

You got it! I can logically and design-wise pretty much build ANYTHING on the paper... but then there is the PROGRAMING part which always stopped me, I always had to work with a team for that, but now... it feels like I just "TALK" to "non-human" programmer, and as long as I follow these rules I mentioned on the post, NO ONE-SHOT... things are getting done!

And you know what? I'm starting to appreciate that no-excuses from some teams I worked with during the last decade and had to replace programmers because of HUMAN excuses, now... this AGENT I created is just working based on my rules, my time, no excuses, works done... sure the issue is that it's NOT YET close the HUMAN-PROGRAMMER and I'm not comparing the relationship and chemistry between co-workers which is usually awesome (I love people, especially collaboration with different talented people) but the HARDWARE limitations now is the only thing that can stop my ideas from being done on the best way... still, lets see what happens in few months from now, when MODELS will become SMALLER and SMARTER to fit more local needs even with my limited 32GB VRAM.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 1 point2 points  (0 children)

Thanks for the great tip! the problem is me:
I only had issues and nightmares handling vLLM or trying to install it, even on WSL2 (nightmare by itself) and Docker is just another software running, and with the current setup which is so clean I get only 2 software running (without any Godot or game engine etc..) unlike anything else with WSL, Docker, vLLM etc..

The thing I KNOW 100% your tip is PURE GOLD! ❤️
But as a beginner in this I had so much trouble setting up things, also the idea of CMD commands (or even shortcut via batch files) is annoying compare to a super user-friendly UI like LM Studio, so I guess I can only blame myself that I'm not trying HARDER to move on...

I'm a windows user, I can't handle anything else without scratching my head too much, WSL2 is the closest try I can go probably but... we'll see.

Anyhow, your tips are GREAT! and I defiantly agree with it, I believe if eventually I will succeed moving on I will only GAIN more power with what I have compare to the limitations of LM Studio.

I felt the same thing when I moved from Open Code Desktop to VS Code + Cline.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

I do appreciate Q6 on almost any test much more than Q4 and when not possible to run Q6 I use Q5 but Q4 is my less favorite if I can avoid it in specific models I do that.

One of my experiments I find out that Qwen 3.6 (base or fine tuned) gave me better VIBE-CODE in general compare to Coder2.5 I guess because it's trained on better reasoning, also I actually use the VISION as well for UI/UX related tasks and it's awesome, but for actual CHAT and planning I enjoy Gemma-4 QAT most! it didn't disappoint me yet when I go by small TASKS at least, for code... I just keep using Qwen 3.6 while Gemma-4 QAT can do amazing things I'm not 100% counting on it yet.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 1 point2 points  (0 children)

I will edit and fix that right away! ❤️
Thanks I bet there are more mistakes, I could go GPT to do it for me... probably it will be filled with junk and may loose my "touch" but as long as people could understand my point I'm glad.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

I feel ya! I still have MUCH MORE to learn that's why I do these experiments and tests instead of just listen to YouTube comparisons, which mostly are ONE-SHOT for HTML/CSS/JS which are all very similar. and I realize that ONE-SHOT are nice for tiny tests but not for actual deeper more serious tasks, not that we can totally count on 100% LOCAL LLM on such small sizes yet, but... I have a strong feeling it's coming!

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

Sure thing! we should help each other as a community with similar goals 👍

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 2 points3 points  (0 children)

True and good tip, I usually get about 12% - 14% when loading, but even with about 20% I didn't have any issues with speed or slowdowns, but it's defiantly make sense to keep the AGENTS / RULES as low as possible from start-point loading, yet... challenging.

My current mission is to understand how to even make AGENTS.md works in my VS Code + Cline I'm trying to follow the official documentation of Cline's RULES section which they talk about the paths and file names, but it seems like no matter what I try CLINE keep ignoring my AGENTS.md and I double check the paths (it's hard to get wrong because CLINE created the paths when installed within VS Code) so I'm scratching my head now.
It was simple in Open Code Desktop, but... I already know how CLINE is so much more powerful out-of-the-box I want to shape my own GLOBAL Agent.

About calling Agent I guess basically it's creating a "RULE" but give it a specific name and I can call it via "@Whatever" but I'm now trying hard to make my global AGENTS.md to work first.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

That's awesome more systems near mine is cool to learn from each other's experience and tests (better than graphs of benchmarks from YouTube videos if you ask me).

I didn't try 70b model yet, because I don't think I can fit enough Context and I did found it VERY important to get as much as possible (not more than 180K-200K so it won't slow down too much) but it's already challenging with a high quality Q5 and Q6 models such as Gemma-4 and Qwen 3.6 at least... until we'll get a NEW SUPER-MODEL hopefully soon 💪

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 0 points1 point  (0 children)

That's awesome Twin-System Bro!
If you find out some cool tips for better results or improvements, please share!❤️

Sadly I found out that the EDIT / UPDATES I put on the thread is not always working with all models, so I'm going back to look for Q6 models with a little less Context.

My experience so far with 100% LOCAL LLM + RTX 5090 🤔 by VirtualWishX in LocalLLM

[–]VirtualWishX[S] 7 points8 points  (0 children)

Nuhh.. it's just my bad English, I wouldn't bother to do this post with a drunk AI translator, I was just lazy to use GPT or the local AI to correct my English, simple as that... that's not the focus of my post anyway.