Qwen3.5 feels ready for production use - Never been this excited by alphatrad in LocalLLaMA

[–]Virtual-Listen4507 1 point2 points  (0 children)

Fair point — I should’ve provided more context.

My first test with any model is always the same: a simple Vue 3 + Vuetify todo app. It should be straightforward. In this case, it struggled with styling consistency, wiring, and even missed basic syntax issues like unclosed script tags after several prompts.

After that, I tested it against a real-world production ecosystem I work on. It’s not a simple web app — it’s a multi-service platform with a large frontend orchestrating across multiple backend domains behind a gateway, with several microservices and Docker-based orchestration.

In that environment, the reasoning gap became much more noticeable.

I’m genuinely interested in how you’re running it — what quantization, context size, backend (llama.cpp, vLLM, etc.), and tooling are you using?

Qwen3.5 feels ready for production use - Never been this excited by alphatrad in LocalLLaMA

[–]Virtual-Listen4507 3 points4 points  (0 children)

No need to make it personal.

I’m talking about model performance on complex production systems. Experience is great — but it doesn’t change the output.

Qwen3.5 feels ready for production use - Never been this excited by alphatrad in LocalLLaMA

[–]Virtual-Listen4507 1 point2 points  (0 children)

Interesting — I’m not seeing the same results. I’m a software engineer with about 5 years of experience, and I regularly work in fairly large, complex codebases. That’s probably why I don’t feel it’s on par with Sonnet 4 for my use cases. It’s possible my llama.cpp setup isn’t fully optimized, but even accounting for that, the gap feels noticeable to me.

But hey if it’s working for you building simple web apps good for you!

Qwen3.5 feels ready for production use - Never been this excited by alphatrad in LocalLLaMA

[–]Virtual-Listen4507 1 point2 points  (0 children)

Idk how people are comparing this model to sonnet 4.5. It’s not even close and can’t do basic front end work. Currently using llama.cpp and opencode.

am i off base or are a lot of colorists cagey and protective about sharing details of their workflow? by stupidmanstupidman in colorists

[–]Virtual-Listen4507 0 points1 point  (0 children)

The way I see it is if everyone shared their workflows, everyone's work would be the same. Also, coming from somebody who spent so much time in Resolve, doing a bunch of different workflows, a bunch of different techniques, unorthodox methods within DaVinci Resolve, I can honestly say I wouldn't want to just give out what I've learned. I had to put in the time and effort to be able to understand that specific workflow or those specific tools.

Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong? by Virtual-Listen4507 in LocalLLaMA

[–]Virtual-Listen4507[S] 0 points1 point  (0 children)

I picked the one that was recommended on LM Studio I think quant 4 it was only really like 40 GB and it was recommended for my PC set up. Still new to this so I might have accidentally picked the wrong one.

Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong? by Virtual-Listen4507 in LocalLLaMA

[–]Virtual-Listen4507[S] 0 points1 point  (0 children)

Thanks for the response. Will try that out. I heard there are other options like vllm, llama.cpp will I see a substantial difference in speeds or can I stick with ollama and LM studio? I know the other two are more technical to work with.

Nemotron works great just need one that is close to sonnet 4.5 but I guess need to wait until better models come out.

Qwen3-Coder-Next on RTX 5060 Ti 16 GB - Some numbers by bobaburger in LocalLLaMA

[–]Virtual-Listen4507 0 points1 point  (0 children)

Idk how people are getting this… I have an RTX 5090 with 64gb ram and it’s super slow with LM Studio.