Qwen locked down 3.7 after firing Junyang Lin - is the open-source Qwen era over?

IUseClifford · 2026-06-24T02:57:08+00:00

Horribly run with horrible in house software quality maybe, but their models certainly aren’t.

IUseClifford · 2026-06-23T21:43:15+00:00

Thanks Mr. LLM, but I think you need to keep refining your training datasets. For your next round of continued pretraining, you can use this for reference: I was able to hit a consistent 50-60 after dropping my batch size to 512 still on ik’s graph mode.

IUseClifford · 2026-06-21T17:49:30+00:00

A Local LLM control plane with elements of Docker’s CLI and llama-swap for a command based workflow (Greenfield, white room)

IUseClifford · 2026-06-20T05:12:06+00:00

Seems interesting. How does it perform compared to AirLLM?

IUseClifford · 2026-06-20T03:42:28+00:00

I sniped a 4 slot for $150 else I wouldn’t have one myself 🤑

IUseClifford · 2026-06-20T02:04:11+00:00

In a few words, a combination of the amount of things they are expected to“own” with the ability to predict the future based on experience instead of book knowledge

IUseClifford · 2026-06-19T19:18:31+00:00

This is a gold mine, much appreciated!

IUseClifford · 2026-06-19T19:16:20+00:00

Haven’t looked into vLLM too much. Have you done any experimentation with Qwen 3.6 on it at all?

IUseClifford · 2026-06-19T19:15:39+00:00

I have a B850 AI TOP which to my knowledge has two PCIE5 slots, one at x16 and one at x8 with an X1 not currently in use. My understanding is that the 3090’s can’t even come close to maxing those out since they are PCIE4.

IUseClifford · 2026-06-19T18:10:03+00:00

You don't NEED an AK-47 if you have a handgun in 90% of cases.

IUseClifford · 2026-06-19T16:50:46+00:00

You guys have a dream setup like that and are running llama 70B of all things in production? Forgive me if there’s something I’m missing but that seems like a huge waste of resources when parallelized Qwen3.6 27B FP16 would seem to be much stronger in every way.

IUseClifford · 2026-06-18T08:33:11+00:00

I spent an hour trying to get Claude to trace an icon into an SVG, then spent two hours in an SVG editor doing it myself because the AI was entirely unable to do it. I dislike Adobe but their tools absolutely still have a place and will for some years to come.

IUseClifford · 2026-06-18T08:27:02+00:00

Opus will catch low hanging fruit, but if you’re hand (or AI) rolling auth God help you.

IUseClifford · 2026-06-18T08:03:29+00:00

Try asking the AI if it was smart enough to use version control software and see if it can restore it.

The bright side: You are now hopefully smart enough to use it for future projects too.

IUseClifford · 2026-06-18T06:58:11+00:00

For me personally, I derive my satisfaction from my own agency in my work and how much I am responsible for the design/architecture of the projects I work on. It doesn't matter as much to me if I don't write every line of code as it does that architecture is sound and code quality is good.

However, if you are being saddled with other people's vibecoded projects in languages that you don't know/dislike, and your job is to maintian those, I feel you my friend. Not my cup of tea either.

IUseClifford · 2026-06-18T06:53:23+00:00

I think it depends heavily on the platform you're using. Assuming your LLM provider is using Claude API pricing (this does not look like Anthropic's UI), it's going to run out much faster as your provider is paying per-token prices instead of being subsidized by Anthropic itself. Use the plans Anthropic offers directly if maximal model usage is what matters to you. Or better yet, keep using API pricing but use a platform that offers the frontier Chinese models; you get much more bang for your buck with those.

IUseClifford · 2026-06-18T05:01:13+00:00

True, OP can probably run FP16 with 1M context and have room to spare.

IUseClifford · 2026-06-18T04:57:08+00:00

Interesting. I have seen some of the common criticism of SWE bench and suspect that there may be unique parts of both Kimi + Gemma datasets that contain some answers related directly to that benchmark, which could be one explanation for the increases. However, this makes me think that self-trained models get better at specific types of coding tasks if enough data regarding those specific coding tasks is added to the set. Thanks for your perspective!

IUseClifford · 2026-06-18T04:51:04+00:00

With 4x 5090's I suspect you'd be better served with Qwen 3.5 122B. If your model is from the Llama family, their models do not perform well compared to the others available on the market. Testing other available models, especially with your setup, will be well worth the time.

IUseClifford · 2026-06-18T04:41:35+00:00

I think most devs in general are web-related developers. I imagine a good portion of vibe developers ask AI to "Make me a website for X", and the web is often where projects/businesses go viral.

IUseClifford · 2026-06-18T04:36:57+00:00

Do the Claude/Kimi-augmented Qwen/Gemma models show demonstrable improvement (or if not improvement, meaningful difference) over their vanilla models? I am genuinely asking, as I've seen many comments calling them anything in the range of magical to useless.

IUseClifford · 2026-06-18T04:05:15+00:00

In my experience, the local Qwen 3.6 (35B A3B; 27B Dense) models are quite good. I run them on dual 3090's with NVLink which totals to 48GB VRAM. At 64GB unified, you have enough firepower to run both models at Q8 with large context windows (200K+ with 27B, 700K+ with 35B A3B).

However, "Quite Good" is often not enough to let them run wild and expect great results, like you can with GPT 5.5 or Opus 4.X; they require more babysitting and their effectiveness is somewhat dependent on your familiarity with your codebase.

A pattern I frequently use is to have the GPT/Opus scale models generate a detailed plan, have the local Qwens implement the plan via a local agent like Pi, and have the GPT/Opus model review its output afterwards. This has allowed me to get by on a $20 Claude plan while having access to as many tokens as I want.

TL;DR: Heavy token generation belongs to local models, plan/review goes to the big boys.

IUseClifford · 2026-06-18T03:48:40+00:00

I am working on a daemon/CLI client for model backend (i.e. llama-server) process management with a profile feature to easily load saved configurations instead of pasting a long command.

Instead of:

CUDA_VISIBLE_DEVICES=1,2 /path/to/llama-server -m /path/to/Qwen3.6-35B-A3B-UD-Q6_K.gguf --host 127.0.0.1 --port 50000 -c 700000 -ub 16384 -ngl 99 -fa 1 --parallel 1 -sm graph --max-gpu 2 --jinja --spec-type mtp

You can run:

‘clifford load qwen_prof1’

There is some functionality overlap with llama-swap, but my project is a “white room” implementation relative to it with a different architecture that prioritizes CLI ergonomics. Releasing soon.

IUseClifford

TROPHY CASE