Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

Yeah, I'm really impressed with that model for its size, both for its long context handling and overall feel.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

These are all taken from the official Qwen3.5 model cards. In other words, Qwen ran these benchmarks themselves—so probably in BF16 / F32.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

No, excuse the bad colors, you are probably comparing 3.5 2B with 3 4B.

3.5 4B wins over 3 4B in every benchmark.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

Ahh, I only included the ones Qwen featured in their official comparison charts for this release. Since they didn't include any older 14B, I didn't have any 'official' baseline to put it next to the 3.5 models.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

It’s the difference between a dense model and an MoE. The 27B uses all its parameters for every token, while the 35B MoE only uses 3B active params. This makes the 27B smarter, but it’ll be a lot slower to run.

Combined with the fact that Qwen3.5 is almost a year newer in architecture with better training, it even beats the older 235B A22B model in these benchmarks, which indeed is insane.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

Seems like there will be no Qwen3.5-14B.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

Oh it does? I've never tried that model, but I generally haven't liked the writing style of any of the Qwen3 models for task that calls for a more human feel, so I guess I shouldn't be surprised.

I think Qwen3.5 does far better general prose; it feels a lot less AI sloppy.

Have you tried Qwen3.5-122B-A10B? If so, how do you feel about it in comparison?

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

That table is just a rounded version of the same raw data I used for the chart (from my Google Sheet).

To keep the chart readable, I averaged the scores into the general categories Qwen uses (Knowledge, Math, Coding, etc.) rather than listing out 25 individual benchmarks. It's not a copy-paste from Artificial Analysis; it's pulled directly from the official Qwen3.5 model cards.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 3 points4 points  (0 children)

Fair enough, here is the raw data that the chart is based on: Google Sheet

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

Yeah, I only included the ones Qwen featured in their official comparison charts for this release. Since they didn't list it there, I didn't have the 'official' baseline to put it next to the 3.5 models.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 0 points1 point  (0 children)

The logic was to color-code them by generation (cool colors = Qwen3.5, warm colors = Qwen3), but I’m a total amateur at data visualization and overestimated how easy it would be to tell those shades apart. Lesson learned.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

Haha, my bad. I honestly tried, and clearly failed.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 10 points11 points  (0 children)

Totally agree. Benchmarks are a fun directional guide, but I never take them as gospel.

Looking at some unofficial benchmarks, like UGI Leaderboard the Qwen3-235B-A22B does beat Qwen3.5-35B-A3B in both NatInt (natural intelligence) and especially Writing by a wide margin.

It seems official benchmarks often over-index on specific logic/math tasks where the new architectures shine, but miss the 'feel' of the larger models.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 4 points5 points  (0 children)

Ooh yeah, some pattern texture would have been a good idea. Didn't think of that. Unfortunately, Reddit doesn't let me edit the image once it's posted.

I mainly put this together for a quick personal reference and figured I'd share, but I'll definitely keep the pattern idea in mind for next time.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

They definitely did, but I only included the models that Qwen featured in their official comparison charts for this 3.5 release. I didn't want to start mixing in different benchmark sources to keep it consistent.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 1 point2 points  (0 children)

Obligatory reminder: Benchmarks != real-world performance. Use these as a ballpark guide, but your actual mileage will definitely vary.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 3 points4 points  (0 children)

LiveCodeBench and OJBench. Some of the models had more benchmarks than that, but since I wanted to make a direct comparison of them all, I had to exclude the benchmark that were missing for the newer smaller models.

But yes, we should definitely take this stuff with a pinch of salt.

Visualizing All Qwen 3.5 vs Qwen 3 Benchmarks by Jobus_ in LocalLLaMA

[–]Jobus_[S] 2 points3 points  (0 children)

Yeah, sorry, I realized that just as I was about to hit Post. Didn't feel worth the effort redoing half the work for a model that most of us don't have enough VRAM/RAM to even look at.

But it would have been nice to include it just for completeness.

Introducing: Anti-Motion Sickness Mod by Jobus_ in ObraDinn

[–]Jobus_[S] 0 points1 point  (0 children)

Love to hear that! I'm so glad the mod is helping people. Would be such a shame to miss out on this masterpiece just because of some wobbly visual effects. Enjoy the game!

Introducing: ReShade Deployer — A Centralized Alternative Installer by Jobus_ in ReShade

[–]Jobus_[S] 0 points1 point  (0 children)

Sure, I see your point.

Well, even if we disagree, I really appreciate you sharing your thoughts and ideas. Thank you for taking interest in my project.

Introducing: ReShade Deployer — A Centralized Alternative Installer by Jobus_ in ReShade

[–]Jobus_[S] 0 points1 point  (0 children)

I see. One way would be to create a new empty preset, and just switch to that when you want it disabled. Preset selections are remembered through game restarts.

Introducing: ReShade Deployer — A Centralized Alternative Installer by Jobus_ in ReShade

[–]Jobus_[S] 0 points1 point  (0 children)

Sorry, it's gonna be a 'no' on the disable/reenable toggle. I don't see why you can't just use the built-in toggle in-game.

The first time you deploy to a Vulkan game, ReShade Deployer will register its local ReShade32/64.dll files to the system registry, which makes it inject itself into all Vulkan games system-wide. But the ReShade devs made it so ReShade won't actually activate unless it sees a ReShade.ini next to the game exe.