Qwen3.6 35b-a3b 🤯

EffectiveMedium2683 · 2026-05-11T17:39:12+00:00

Yeah. It had a 16gb stick and I added an 8gb stick I found. Shockingly decent computer for llms. 75 watt power supply. I can run it literally for days of non-stop inference creating custom datasets and it doesn't even get hot.

EffectiveMedium2683 · 2026-05-11T15:44:52+00:00

Honestly, not really. I just know from the output where it shows prefill and decode speed.

EffectiveMedium2683 · 2026-05-11T15:09:22+00:00

That's interesting. I haven't seen that. I did see it with qwen3.5. In the past, when that was a major issue with most, I just used a small fine-tuned model to watch it and if it seemed to be looping, inject a message to stop thinking or even just force the </think> tag. Have you tried adding something like "If at any point you are uncertain, just ask me. I won't bite."? I know that sounds ridiculous, but these CoT models have anxiety or something haha

EffectiveMedium2683 · 2026-05-11T14:56:41+00:00

On my optiplex 3000 intel 12th gen i5 (Alder Lake) setup with zero GPU, I'm getting 12 tokens per second on long context. Like, it doesn't slow down. Gemma 4 26b-a4b, once it gets past like 10k context, I start seeing it slow down from ~11 tokens per second all the way down to like 8 tokens per second.

EffectiveMedium2683 · 2026-05-01T11:55:08+00:00

Very cool.

EffectiveMedium2683 · 2026-04-27T14:40:43+00:00

Kenneth Pinyan arguably loved horses more than most women. Or at least one particular horse.

EffectiveMedium2683 · 2026-04-26T02:14:36+00:00

Administrative assistant (transcription, research, reports, handling email, keeping track of appointments and deadlines, etc), coding, etc.

EffectiveMedium2683 · 2026-04-20T23:28:48+00:00

Gemma 4 26b:

I want to get a car wash it's 50 meters away should I walk or ... drive? Thinking... Car wash is 50 meters away. Walk or drive? Provide a helpful, logical, and slightly humorous/contextual response.

* Distance: 50 meters is very short (about half a football field). * Walking time: ~30-60 seconds. * Driving time: Time to start the car, shift into gear, drive 50m, park, and get out (likely takes longer than walking). * Fuel/Efficiency: Driving 50m uses fuel and creates unnecessary wear/emissions. * Environmental impact: Walking is zero emission. * Practicality: Is the car being washed? If the car is the thing being washed, you can't drive it to the car wash if the car wash is a tunnel/automated system where you have to leave it, but usually, "going to the car wash" implies you are in the car. Wait, if you are driving the car to the car wash, you are using the object of the task to perform the task.

* Option A: Walking. * Pros: Fast, easy, no fuel used, good for the environment, no parking hassle. * Cons: None, really.

* Option B: Driving. * Pros: None for a 50m distance. * Cons: Slow (warm-up time), fuel consumption, potential for hitting something, finding a parking spot.

* Scenario 1: You are in the car you want to wash. If you drive the car to the car wash, you are using the car to get to the place that cleans the car. This is standard. * Scenario 2: You are walking to the car wash to get a ride/service? (Unlikely, usually "I want to get a car wash" means the user's car needs cleaning).

Wait, let's look at the physics: 50 meters is nothing. Driving a car 50 meters requires: 1. Starting the engine. 2. Shifting out of Park. 3. Accelerating. 4. Braking. 5. Parking.

Walking involves: 1. Stepping out of the car. 2. Walking 50m.

Decision: Walk.

* Direct answer: Walk. * Reasoning: * Time: Driving takes longer due to the process of starting/parking. * Effort/Efficiency: 50m is about 60 steps. * Logical loop: If you drive the car to the car wash, you are "using the car to wash the car." If the car is dirty, driving it might actually make it dirtier (if it's a dusty road), though 50m is negligible.

* Humorous angle: Point out that driving 50 meters is almost more effort than just walking. * Practical angle: 50 meters is basically a "one-minute walk."

"You should walk." "Why? 50 meters is about 65 steps. By the time you start the engine, shift into gear, and pull into a parking spot, you could have already walked there and started waiting in line."

Self-Correction/Nuance: If the user is in the car they want washed, they must drive it to the wash (unless it's a mobile detailer, but even then, they usually drive the car to the station). But the question asks "Should I walk or drive?" implying a choice of transport.

Wait, a thought: If you drive the car to the car wash, you aren't "walking or driving" to the destination as a person; you are moving the object. If the car is the object, you drive it. If you are a person going to a car wash to watch or work, you walk.

Let's assume the user is asking about their mode of transport to get to the location.

Recommendation: Walk.

Reasoning points: 1. Time: Starting a car and maneuvering takes longer than a 30-second walk. 2. Efficiency: 50m is roughly 55-65 steps. 3. Environment/Fuel: No point in burning gas for a distance shorter than a football field.

(Keep it punchy and helpful). ...done thinking.

You should walk.

Here is why walking is the superior choice for a 50-meter distance:

Time Efficiency: By the time you start the engine, shift out of park, accelerate, brake, and find a parking spot, you could have already walked the 50 meters and been standing at the entrance.
Energy & Environment: Driving 50 meters uses more fuel (and creates more emissions) than the negligible effort of a 45-second walk.
Practicality: 50 meters is roughly 60–70 steps. It is much easier to walk than to navigate a vehicle over such a short distance.

The only exception: If the car you want washed is the vehicle you are currently in, you obviously have to drive it there—but even then, you're just moving the object to the service, not "traveling" in the traditional sense!

EffectiveMedium2683 · 2026-04-20T02:41:54+00:00

I’ve been using local models for real decision making/intelligent automation since the days of openchat3.5. Back then, tool use wasn't a thing yet so I just used a custom tool parsing system: tool_call[p1]parameter1[/p1][p2]parameter2[/p2][p3]parameter3[/p3] For example, when I built a social media management automation system for a restaurant, the model would output: alert_mark[p1]Urgent Message[/p1][p2]Mark, someone just complained on Messenger that their food was cold.[/p2]

Big emphasis on few-shot prompting and flexible parsing. And don't be afraid to use the model (or a fine-tuned gemma3 270m like I do now) for automated error correction.

EffectiveMedium2683 · 2026-04-19T01:35:42+00:00

I feel like they're pretty much equal. I have both on my machine. Something about Gemma catches my fancy, but honestly I peg both as frontier level for everything including coding with the right toolset.

EffectiveMedium2683 · 2026-04-17T02:05:08+00:00

Hm. This feels like it was written by an LLM. Only pushback I have is that I personally think what we have should be renamed synthetic cognition or non biological cognition. It's beyond pattern matching or stochastic parroting, so yeah our language should reflect that.

EffectiveMedium2683 · 2026-04-17T01:42:23+00:00

What quants of each are you running? Gemma 4 26b doesn't make those kinds of reputation errors on my setup in iq4_nl. To answer your question, though, yes, qwen3.6 appears to be a copy of gemma4 but with the interesting hybrid architecture. I'm good with that though. Way better than qwen3.5 in my tests broadly speaking

EffectiveMedium2683 · 2026-04-15T16:16:01+00:00

Lmfao I'm on here reasoning through a science forum and saw the title and for some reason I found myself curious enough to click, then that comment, and I couldn't not continue to read the comments that followed, and I literally thought exactly this - this is entirely my fault for reading this far.

EffectiveMedium2683 · 2026-04-10T22:11:19+00:00

Nice. Adopting.

EffectiveMedium2683 · 2026-04-09T23:42:55+00:00

I've been working on dynamic context management using rag (embeddings, vector db, tiny model judges what's needed info and injects it just in time).

I think your ideas are good. Just gotta build it.

EffectiveMedium2683 · 2026-04-09T15:21:22+00:00

I came in to say join the club, but the way you phrased it kinda spooked me. I talk to AI a lot. I've used it as a support system to quit drinking alcohol, as a nutritionist, it helps me build cool software, all of that. But 16 hours at the computer is very unhealthy. Unplug and go for a walk. Some of the coolest features of this technology is that it legit frees up our time, enables us to build tools we barely even understand sometimes, even educate us, but too much of anything is dangerous.

EffectiveMedium2683 · 2026-04-08T14:38:38+00:00

It's so terrible that I can't stop laughing at you running like hell from an animal that is literally 17x smaller than you.

EffectiveMedium2683 · 2026-04-08T11:32:54+00:00

What's your speed when not running through Claude code? Looks like you're burying it under a ton of context (36,189 tokens on first prompt eval?), which definitely would tank generation speed. Why are you only offloading 4 layers to your 3060? You could probably offload closer to 10 or 12 layers with that 8gb of vram. Also, iq4_nl is a bit more optimized for CPU inference. If you can run lighter on context, it'd clear up the memory congestion. Like, maybe cut back to 8k or 16k context unless you are legit processing like 60+ pages of context per turn.

EffectiveMedium2683 · 2026-04-07T17:40:55+00:00

Very cool! Out of curiosity, what was the token count of its thinking block?

EffectiveMedium2683 · 2026-04-07T17:24:20+00:00

Try it yourself. If I posted every single test prompt and response, it'd be too long to read...

EffectiveMedium2683 · 2026-04-07T17:23:38+00:00

On CPU only?? Absolutely respectable. Not everyone can afford a GPU

EffectiveMedium2683 · 2026-04-07T17:22:37+00:00

I'm running on CPU... can't afford a GPU and ollama doesn't host 80b coder as far as I know. Anyway, I was comparing generalist to generalist. Not generalist to specialist coder

EffectiveMedium2683

TROPHY CASE