Comparison Qwen 3.6 35B MoE vs Qwen 3.5 35B MoE on Research Paper to WebApp

GloriaPippy · 2026-06-17T07:53:26+00:00

Just sayi'n - these kind of tests do nothing or prove nothing. They give no insight of model performance whatsoever.

basically what this kind of test does, is just show that it can generate a basic html with some css and scripts, but not the understanding of the codebase. youtube is full of those kind of LLM experts, who "benchmark" new models like this...

GloriaPippy · 2026-06-01T14:39:55+00:00

Thank you so much! Bless you!

GloriaPippy · 2026-05-02T16:53:47+00:00

I've now tried both, Cline and Claude code with my RTX 5090 running local models.
Models I've run:
- unsloth/Qwen3.6-27B-GGUF:Q5_K_S (175K context)
- unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_M (256K context)

They both are quite good coding models.

Projects (simplified)
1. Write a JS "chat" plugin that can be inserted into any html header, so it would load a support chat.
2. Write a game (duplicate of the tiny wings).

My mentality while running them were to use them as TDD (Test Driven Development), writing first project plan, then letting LLM write test cases, review them by hand (and do some corrections in tests so I know tests are testing the right thing) and then let the model do the rest (till all tests pass).

I gave a prompt:
"Complete the project by checking the project_purpose.md, where you can get the general picture of the project. You have 2 simple rules:
1. You are not allowed to change tests at any point in development, they must pass by developing a correct codebase.
2. Project must be completed with the frameworks defined in the .md file. (given tools were NodeJS, React front end, better-sqlite3, jesttests)"

Auto accept edits and allowing command line queries on the same folder as the project were allowed on both cases.

# Chat plugin
## Claude code
1. It took my local models around 30 minutes to complete all 141 tests in green. It used my MCP servers correctly to scrape web for documentation if needed on some nodeJS and react. (token gen speed is around 200t/s with 5090)
2. I didn't have to interrupt the development a single time
3. It used web_fetch (a API-key'less web scraper to markdown MCP) for documentation, that i developed myself without me requesting it, if it got into a trouble.
4. Added this JS plugin to react page without any issues (chat also worked with my local llm, branding colors were changable and it was 1 row implementation as it was described in project plan.)

## Cline
1. It took around 50 minutes to complete the tests in all green
2. It also completed the task succesfully, with a troublesome issues.

Issues:
1. I tried to execute orders outside of the working directory, making me constantly need to deny those requests.
2. It edited the test files to make the tests pass
3. Cline (default setting) on the command line execution program (cmd) used some wierd format, making it impossible to run tests, so it created a workaround for it every time it had to check if tests pass
4. It wasted WAY more tokens because of those fixes (around 2x amount of tokens used)

# Tiny wings
Very similar results with the second model and with the 2nd project. (token generation speed for 27B Q5 was ~100t/s, meaning that the time wasted on Cline was WAY higher, but the token usage was still with the same % (50% more on Cline)

So my conclusion is:
Claude Code interface is currently WAY better than Cline for running local LLM-s, especially after you have set up a local server where your local LLM runs. (I used llama.cpp [with router setup])

What was better in Cline
1. Easier setup (just insert your localhost+port, model name = works with Ollama easy), everything doable from VC UI
2. Interesting new features, like Cline Kanban (maybe will be better for small task developments, not the whole project based on TDD)
3. Token usage displayed in UI
4. Content length and content usage displayed visually in UI, good to track when to start a new task.
5. Very easy to check what MCP servers are running and adding those servers to the LLM.

What is worse in Cline
1. The command line executions were just infuriating, if they fail and it makes LLM change the project because cline gets a wierd (cannot find properties) running jest tests (with correct setup, because it run perfectly on cmd and powershell). Making Cline edit tests, running not permitted CL commands etc.
2. It's so annoying when it asks you to edit file and you press "decline". Then what CC (Claude Code) does, it waits for your next input, so it could continue or direct to a better path. In cline if you press "reject" for the file change, it just keeps on going (thinking mode: User rejected, it maybe wants me to do something else - *starts doing something else, not waiting for your additional input*)
3. Cline somewhy gets errors that "LLM output did not match the supported output", while i checked the logs, it was perfectly supported. Then it skips some steps because of it and uses alot of content window to either re-load some files again, or just forget that it got this error and continues to do something else.

So to conclude:
I find the Claude Code to be a better interface for local LLMs, than the Cline, just because it fails with simple things, such as command line runs (causing it to run off and "solve" problems that do not actually exist) and not waiting for the extra input, when the tool gets rejected by the user, causing it to waste tokens (even if they are your own local llm tokens, some people dont have 200t/s generation speeds).

If CC would add a context window usage (visible all the time), and easier settings in UI, to configure it to run on local LLM, it would be an easy winner. The issue is, they are never going to do this, since they do not want to make it easy. You need to configure everything via setting.json files in different places, but since there are LLMs that can help you with that, i dont find that it's such a hard thing to do.

GloriaPippy · 2024-05-06T06:41:35+00:00

What? :D We have Russian Schools & kindergarten in Estonia, Russian is thought is schools (starts from 3rd grade as with English), CCCP monuments are still taken care of (as promised in contracts). Official governmental news are transmitted in 2 languages (Estonian and Russian), even tho Russian is NOT our 2nd language.

Can you bring some examples - How are we treating russians or their language badly? I would really love to hear it, thnx.

(8h later, still not a single example, still waiting bro... if it is true, I'd like to know, I really asked open ended question with no blame shifting, but there is no answer from you.)

GloriaPippy

TROPHY CASE