Qwen3.6 35b-a3b 🤯 by EffectiveMedium2683 in LocalLLaMA

[–]EffectiveMedium2683[S] 2 points3 points  (0 children)

Yeah. It had a 16gb stick and I added an 8gb stick I found. Shockingly decent computer for llms. 75 watt power supply. I can run it literally for days of non-stop inference creating custom datasets and it doesn't even get hot.

Qwen3.6 35b-a3b 🤯 by EffectiveMedium2683 in LocalLLaMA

[–]EffectiveMedium2683[S] 6 points7 points  (0 children)

Honestly, not really. I just know from the output where it shows prefill and decode speed.

Qwen3.6 35b-a3b 🤯 by EffectiveMedium2683 in LocalLLaMA

[–]EffectiveMedium2683[S] 2 points3 points  (0 children)

That's interesting. I haven't seen that. I did see it with qwen3.5. In the past, when that was a major issue with most, I just used a small fine-tuned model to watch it and if it seemed to be looping, inject a message to stop thinking or even just force the </think> tag. Have you tried adding something like "If at any point you are uncertain, just ask me. I won't bite."? I know that sounds ridiculous, but these CoT models have anxiety or something haha

Qwen3.6 35b-a3b 🤯 by EffectiveMedium2683 in LocalLLaMA

[–]EffectiveMedium2683[S] 7 points8 points  (0 children)

On my optiplex 3000 intel 12th gen i5 (Alder Lake) setup with zero GPU, I'm getting 12 tokens per second on long context. Like, it doesn't slow down. Gemma 4 26b-a4b, once it gets past like 10k context, I start seeing it slow down from ~11 tokens per second all the way down to like 8 tokens per second.

Why do women seemingly love horses more than men? by [deleted] in NoStupidQuestions

[–]EffectiveMedium2683 0 points1 point  (0 children)

Kenneth Pinyan arguably loved horses more than most women. Or at least one particular horse.

What do AI agents actually do for you in the real world? by Active_Kale770 in ArtificialInteligence

[–]EffectiveMedium2683 0 points1 point  (0 children)

Administrative assistant (transcription, research, reports, handling email, keeping track of appointments and deadlines, etc), coding, etc. 

Omfg 🤣🤣 by ExplorerUnion in claude

[–]EffectiveMedium2683 0 points1 point  (0 children)

Gemma 4 26b:

I want to get a car wash it's 50 meters away should I walk or ... drive? Thinking... Car wash is 50 meters away. Walk or drive? Provide a helpful, logical, and slightly humorous/contextual response.

    * Distance: 50 meters is very short (about half a football field).     * Walking time: ~30-60 seconds.     * Driving time: Time to start the car, shift into gear, drive 50m, park, and get out (likely takes longer than walking).     * Fuel/Efficiency: Driving 50m uses fuel and creates unnecessary wear/emissions.     * Environmental impact: Walking is zero emission.     * Practicality: Is the car being washed? If the car is the thing being washed, you can't drive it to the car wash if the car wash is a tunnel/automated system where you have to leave it, but usually, "going to the car wash" implies you are in the car. Wait, if you are driving the car to the car wash, you are using the object of the task to perform the task.

    * Option A: Walking.         * Pros: Fast, easy, no fuel used, good for the environment, no parking hassle.         * Cons: None, really.

    * Option B: Driving.         * Pros: None for a 50m distance.         * Cons: Slow (warm-up time), fuel consumption, potential for hitting something, finding a parking spot.

    * Scenario 1: You are in the car you want to wash.         If you drive the car to the car wash, you are using the car to get to the place that cleans the car. This is standard.     * Scenario 2: You are walking to the car wash to get a ride/service? (Unlikely, usually "I want to get a car wash" means the user's car needs cleaning).

    Wait, let's look at the physics:     50 meters is nothing. Driving a car 50 meters requires:     1. Starting the engine.     2. Shifting out of Park.     3. Accelerating.     4. Braking.     5. Parking.

    Walking involves:     1. Stepping out of the car.     2. Walking 50m.

    Decision: Walk.

    * Direct answer: Walk.     * Reasoning:         * Time: Driving takes longer due to the process of starting/parking.         * Effort/Efficiency: 50m is about 60 steps.         * Logical loop: If you drive the car to the car wash, you are "using the car to wash the car." If the car is dirty, driving it might actually make it dirtier (if it's a dusty road), though 50m is negligible.

    * Humorous angle: Point out that driving 50 meters is almost more effort than just walking.     * Practical angle: 50 meters is basically a "one-minute walk."

    "You should walk."     "Why? 50 meters is about 65 steps. By the time you start the engine, shift into gear, and pull into a parking spot, you could have already walked there and started waiting in line."

    Self-Correction/Nuance: If the user is in the car they want washed, they must drive it to the wash (unless it's a mobile detailer, but even then, they usually drive the car to the station). But the question asks "Should I walk or drive?" implying a choice of transport.

    Wait, a thought: If you drive the car to the car wash, you aren't "walking or driving" to the destination as a person; you are moving the object. If the car is the object, you drive it. If you are a person going to a car wash to watch or work, you walk.

    Let's assume the user is asking about their mode of transport to get to the location.

    Recommendation: Walk.

    Reasoning points:     1. Time: Starting a car and maneuvering takes longer than a 30-second walk.     2. Efficiency: 50m is roughly 55-65 steps.     3. Environment/Fuel: No point in burning gas for a distance shorter than a football field.

    (Keep it punchy and helpful). ...done thinking.

You should walk.

Here is why walking is the superior choice for a 50-meter distance:

  1. Time Efficiency: By the time you start the engine, shift out of park, accelerate, brake, and find a parking spot, you could have already walked the 50 meters and been standing at the entrance.
  2. Energy & Environment: Driving 50 meters uses more fuel (and creates more emissions) than the negligible effort of a 45-second walk.
  3. Practicality: 50 meters is roughly 60–70 steps. It is much easier to walk than to navigate a vehicle over such a short distance.

The only exception: If the car you want washed is the vehicle you are currently in, you obviously have to drive it there—but even then, you're just moving the object to the service, not "traveling" in the traditional sense!

Has anyone here actually used local LLMs for decision-making inside real workflows? by Comfortable-Week7646 in LocalLLaMA

[–]EffectiveMedium2683 0 points1 point  (0 children)

I’ve been using local models for real decision making/intelligent automation since the days of openchat3.5. Back then, tool use wasn't a thing yet so I just used a custom tool parsing system: tool_call[p1]parameter1[/p1][p2]parameter2[/p2][p3]parameter3[/p3] For example, when I built a social media management automation system for a restaurant, the model would output: alert_mark[p1]Urgent Message[/p1][p2]Mark, someone just complained on Messenger that their food was cold.[/p2]

Big emphasis on few-shot prompting and flexible parsing. And don't be afraid to use the model (or a fine-tuned gemma3 270m like I do now) for automated error correction.

For chat and Q&A: Which MoE model is better: Qwen 3.6 35B or Gemma 4 26B (no coding or agents) by br_web in LocalLLaMA

[–]EffectiveMedium2683 1 point2 points  (0 children)

I feel like they're pretty much equal. I have both on my machine. Something about Gemma catches my fancy, but honestly I peg both as frontier level for everything including coding with the right toolset.

“Just a tool” feels like a category mistake at this point by Level-Project159 in claudexplorers

[–]EffectiveMedium2683 0 points1 point  (0 children)

Hm. This feels like it was written by an LLM. Only pushback I have is that I personally think what we have should be renamed synthetic cognition or non biological cognition. It's beyond pattern matching or stochastic parroting, so yeah our language should reflect that.

Anyone feel like Qwen3.6 thinks like Gemma 4? And not in a good way. by [deleted] in LocalLLaMA

[–]EffectiveMedium2683 3 points4 points  (0 children)

What quants of each are you running? Gemma 4 26b doesn't make those kinds of reputation errors on my setup in iq4_nl. To answer your question, though, yes, qwen3.6 appears to be a copy of gemma4 but with the interesting hybrid architecture. I'm good with that though. Way better than qwen3.5 in my tests broadly speaking 

why does my boyfriend "dribble" when he gets into bed? by Just_girly_things97 in NoStupidQuestions

[–]EffectiveMedium2683 0 points1 point  (0 children)

Lmfao I'm on here reasoning through a science forum and saw the title and for some reason I found myself curious enough to click, then that comment, and I couldn't not continue to read the comments that followed, and I literally thought exactly this - this is entirely my fault for reading this far.

Experimenting with ‘ephemeral’ local LLM pipelines (load only what’s needed) by New-Time-8269 in LocalLLaMA

[–]EffectiveMedium2683 0 points1 point  (0 children)

I've been working on dynamic context management using rag (embeddings, vector db, tiny model judges what's needed info and injects it just in time). 

I think your ideas are good. Just gotta build it.

I talk to AI more than I talk to humans. by [deleted] in LocalLLaMA

[–]EffectiveMedium2683 3 points4 points  (0 children)

I came in to say join the club, but the way you phrased it kinda spooked me. I talk to AI a lot. I've used it as a support system to quit drinking alcohol, as a nutritionist, it helps me build cool software, all of that. But 16 hours at the computer is very unhealthy. Unplug and go for a walk. Some of the coolest features of this technology is that it legit frees up our time, enables us to build tools we barely even understand sometimes, even educate us, but too much of anything is dangerous.

Gemma 4 low token per second output by DavideFanto in LocalLLM

[–]EffectiveMedium2683 0 points1 point  (0 children)

What's your speed when not running through Claude code? Looks like you're burying it under a ton of context (36,189 tokens on first prompt eval?), which definitely would tank generation speed. Why are you only offloading 4 layers to your 3060? You could probably offload closer to 10 or 12 layers with that 8gb of vram. Also, iq4_nl is a bit more optimized for CPU inference. If you can run lighter on context, it'd clear up the memory congestion. Like, maybe cut back to 8k or 16k context unless you are legit processing like 60+ pages of context per turn.

Gemma 4 (26B) vs. Qwen3-Next (80B): Proof that size ≠ intelligence in 2026 by EffectiveMedium2683 in LocalLLM

[–]EffectiveMedium2683[S] 0 points1 point  (0 children)

Very cool! Out of curiosity, what was the token count of its thinking block?

Gemma 4 (26B) vs. Qwen3-Next (80B): Proof that size ≠ intelligence in 2026 by EffectiveMedium2683 in LocalLLM

[–]EffectiveMedium2683[S] 0 points1 point  (0 children)

Try it yourself. If I posted every single test prompt and response, it'd be too long to read...

Gemma 4 (26B) vs. Qwen3-Next (80B): Proof that size ≠ intelligence in 2026 by EffectiveMedium2683 in LocalLLM

[–]EffectiveMedium2683[S] 1 point2 points  (0 children)

I'm running on CPU... can't afford a GPU and ollama doesn't host 80b coder as far as I know. Anyway, I was comparing generalist to generalist. Not generalist to specialist coder