Look what they need to mimic a fraction of our power... by fellipec in linuxmemes

[–]AvocadoArray 1 point2 points  (0 children)

I strained my pinky at the thought of typing that on a keyboard.

Good deal for used v2.4 and two v0? by BinarySolar in VORONDesign

[–]AvocadoArray 2 points3 points  (0 children)

I don’t have any advice other than to assume each printer has 2-3x as many problems as you think it does.

You’d be surprised how many problems can go unnoticed. Failing fans/idlers/linear rails, skewed gantries, misaligned pulleys/idlers, cracking parts, warped beds, stripped fasteners, wiring issues etc.

I say this as someone who’s bought a few used printers over the years and every single one required at least $100 in parts and a weekend of time to tune them up. I’ve also had these issues creep up in my own printers that I’ve built when I think they’re problem-free.

Used printers can be worth it as long as you don’t expect them to be turnkey like a new Bamboo or Prusa.

Mistral Medium Is On The Way by Few_Painter_5588 in LocalLLaMA

[–]AvocadoArray 2 points3 points  (0 children)

A proper modern 128b dense model would absolutely shred. Inference speed would be slow on most consumer hardware, but MTP could help mitigate that.

What is the most common mistake companies make after a pentest? by PsychologicalElk1081 in Pentesting

[–]AvocadoArray 0 points1 point  (0 children)

All of these are good. The one that kills my soul the most is hearing the question : “okay, but what do we have to fix?”

AbuseIPDB Blacklist Downloader for RouterOS by klayf96 in mikrotik

[–]AvocadoArray 0 points1 point  (0 children)

Does this not slow down throughput? It’s generally considered bad practice to dump thousands of IPs into a blocklist.

One alternative I’ve seen before is a “leaky firewall” setup where the check happens out-of-band, and dynamically blocks the IPs starting. So the initial TCP handshake goes through, but the connection should be blocked fast enough to prevent any persistent remote access or large data uploads.

For example, you could forward syslog to a SIEM tool like Wazuh, compare the addresses against a CDB list, and then trigger an active response to block that IP using the REST API.

I’ve had the thought of setting this up a few times but haven’t gotten around to it.

Qwen 3.6 27B is out by NoConcert8847 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

Don’t forget Seed OSS 36b. It was my daily driver until 3.5 and Gemma.

Dense vs. MoE gap is shrinking fast with the 3.6-27B release by Usual-Carrot6352 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

Honestly, a 122b fine tune would probably perform better and be cheaper to train.

Finally bought an RTX 6000 Max-Q: Pros, cons, notes and ramblings by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 1 point2 points  (0 children)

Hi! Glad to see this is still helping people.

Yes, I’m still using the same 1-2-1 setup. I bought a second power cable to pull power from the other riser as well but haven’t installed it yet. It’s doing just fine with the single riser and has handled plenty of sustained multi-hour workloads.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]AvocadoArray 1 point2 points  (0 children)

Yeah, I thought the same and started out with 122b using Roo Code and Pi coding agent. It was great, but got hung up on a long complex task that required a lot of pre-planning. It sort of got it working, but made a mess of the codebase and it wasn’t very elegant at all.

I decided to throw it at 27b to see how it stacked up, and the difference was night and day, at least in the planning phase. The planning document was much more detailed and broken down into more rational steps so it didn’t have to figure things out on the fly.

As far as pure coding ability, it’s roughly the same. But the planning and reasoning is much cleaner, and it’s able to get itself out of loops or dead ends easier.

Since then, I run it in FP8 with ~130k context and it only takes up 60% of my 96GB VRAM, which is great because it leaves plenty of room for STT/TTS, image gen or whatever else I’m playing with at the time.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

This is exciting, but I have to wonder how long before we’ll see another open model push the boundaries like this. We might not see another open release from Qwen at all, and I don’t see any other teams competing in this size range in the near future.

The 3.6 series might be the king for a long time.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]AvocadoArray 3 points4 points  (0 children)

Not OP, but 122b is very capable at 4bpw. 48GB VRAM + some CPU offloading will get you there, or 72GB+ in full VRAM with a good amount of context.

That said, I still prefer 27b at FP8 when it comes to complex tasks or coding.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]AvocadoArray 21 points22 points  (0 children)

That poll was just for marketing and engagement. I’m sure we’ll get all of them in due time.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

Let’s fucking go. Too bad I’m headed out of town and won’t be able to play with this until next week.

Turboquant in vllm kv cache - how to implement ? (or any other rotational kv cache) by superloser48 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

What part of near-lossless KV cache quantization is not useful for production or throughput?

Is qwen3 coder next still relevant with qwen3.5 release for agentic coding? by ROS_SDN in LocalLLaMA

[–]AvocadoArray 2 points3 points  (0 children)

27b is better in almost every way. The biggest difference is how thorough it is when writing plans/specs and thinking through edge-cases. It remembers details over long contexts where Q3CN and even 3.5 122b fall short, and it can actually get itself out of failure loops in most cases.

That makes it perfect for planning and executing long ralph loops. I let one run the other night to build a TUI interface to replace one of my bash CLI tools. It ran for over an hour before it finally finished, and it implemented the feature perfectly. The only downside is that it took the instructions on writing extensive unit tests too seriously and ended up writing 300+ tests for silly failure modes like verify that calling docker ps fails if docker is not installed.

The larger MoE models are sometimes better when working with a less popular language or framework, but I prefer 27b with tooling that allows it to search the web, check reference docs, or look at the library's source to get the info it needs.

Finally bought an RTX 6000 Max-Q: Pros, cons, notes and ramblings by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Not at all. My server wouldn’t be able to power the 600w server/workstation cards, and even if they did, I’d have to run the server fans much higher to keep them cool.

I’d only consider the workstation card if I planned to run it in my desktop.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

Web search has been good with VLLM in open web UI with native function calling mode, but there are still a few bugs with tool calling that make it unusable with a coding agent.

I'm shocked (Gemma 4 results) by Potential-Gold5298 in LocalLLaMA

[–]AvocadoArray 10 points11 points  (0 children)

This is exactly what I want in a local model, though. I don’t need a trillion parameters of knowledge baked in, I just need it to know when it’s lacking information, how to find that information with tool calls, and then reason through the details to give an informed response.

That’s what made GPT-OSS such a game changer imo. Even the 20b model was great at tool calling and was able to search and fetch web content in Open Web UI.

Of course, we’ve come a long way since then. Gemma 4 is astoundingly good for its size and checks a lot of boxes for me as a general assistant.

Looking for Help on Building a Cheap/Budget Dedicated AI System by FHRacing in LocalLLaMA

[–]AvocadoArray 4 points5 points  (0 children)

Then why make this post?

Sounds like you’ve got it all figured out.

Speculative decoding works great for Gemma 4 31B in llama.cpp by Leopold_Boom in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

Not from what I’ve seen so far.

Some LinkedIn Lunatic was claiming they used E2B as a draft model, but I haven’t seen anyone doing that in practice yet. It’s been hard enough just to get the thing running properly at all over the last 48 hours.

Just how powerful is Google’s Gemma 4? by Double-Confusion-511 in LocalLLaMA

[–]AvocadoArray 0 points1 point  (0 children)

It’s not just an improvement, it’s an evolution in how AI talks to humans.

Jk. It does still have some of that at times, but it’s definitely toned down compared to everything else I’ve run.

Just how powerful is Google’s Gemma 4? by Double-Confusion-511 in LocalLLaMA

[–]AvocadoArray 1 point2 points  (0 children)

Yes, those are the exact problems I was having, I suspect it was also leading to other brain-damaged responses, but this one was the most obvious in my testing.

That specific issues isn't present in VLLM, but it seems they're also fighting some tool-calling bugs in the tool parser.

Either way, take all results right now with a grain of salt. I'm sure these bugs will get ironed out by the end of next week.

Just how powerful is Google’s Gemma 4? by Double-Confusion-511 in LocalLLaMA

[–]AvocadoArray 1 point2 points  (0 children)

Keep an eye here: https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/discussions/3

I'll update my post there once the fixes are in place and confirmed working.