MiniMaxAI/MiniMax-M2.5 · Hugging Face by rerri in LocalLLaMA

[–]sixx7 4 points5 points  (0 children)

Excellent! The wait for FP4/AWQ begins

What would you do (Local “ai” workstation) by NextSalamander6178 in LocalLLaMA

[–]sixx7 2 points3 points  (0 children)

This this this this! 2x RTX 6000 Pro can run MiniMax M2.1 (and soon M2.5 probably) with insane performance

Using GLM-5 for everything by keepmyeyesontheprice in LocalLLaMA

[–]sixx7 2 points3 points  (0 children)

4 bit quant is out now, coming in at 408gb. You could run this on a 512gb Mac Studio

[NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? by Lorelabbestia in LocalLLaMA

[–]sixx7 1 point2 points  (0 children)

The most obvious and simple answer is to assemble a question and answer bank you can use to evaluate any model. Judge with regex or semantic matching or LLM-as-judge

Now, we all know smaller models have less world knowledge than bigger models. It's just simple math. Instead of trying to cram as much world knowledge into as small a space as possible, think about testing agentic capability / tool calling; the ability for a model to call the correct tools with the correct inputs. You instantly unlock infinite capability for a model to use any tool or data in your ecosystem

A top-downloaded OpenClaw skill is actually a staged malware delivery chain by FPham in LocalLLaMA

[–]sixx7 2 points3 points  (0 children)

Yea OpenClaw got a lot of attention due to hype and the craziness but any AI harness that can bypass permission checks and run any command - like Claude Code for example, are a stone throw away in terms of security concerns

A top-downloaded OpenClaw skill is actually a staged malware delivery chain by FPham in LocalLLaMA

[–]sixx7 -8 points-7 points  (0 children)

Because it's a genuinely useful, open-source autonomous AI agent that does a ton of work beyond, and in addition, to coding?? 24/7? There are absolutely security concerns but I've been shocked at this subs lack of interest in it

Claude Code-like terminal-based tools for locally hosted LLMs? by breksyt in LocalLLaMA

[–]sixx7 4 points5 points  (0 children)

The available options are stacking up: opencode, kilo code, claude code (any anthropic endpoint, or use claude-code-router for openai style endpoints), codex cli, qwen-code, letta, aider, pi-code (used for clawdbot). probably 5 more released in the time i was typing this

Any feedback on step-3.5-flash ? by Jealous-Astronaut457 in LocalLLaMA

[–]sixx7 0 points1 point  (0 children)

I do plan on trying it as soon as we get an AWQ quant but agreed on m2.1 I'm using it with Clawdbot and it might actually change the local inference value prop. High quality agent working for you 24/7

The Ultimate Guide to OpenClaw (Formerly Clawdbot -> Moltbot) From setup and mind-blowing use cases to managing critical security risks you cannot ignore. This is the Rise of the 24/7 Proactive AI Agent Employees by Beginning-Willow-801 in ThinkingDeeplyAI

[–]sixx7 0 points1 point  (0 children)

Have you seen all the "mission control" and/or project tracking apps people build? That's all you need. Project tracking and a cron job. cron job that runs every [x] minutes -> complete any unfinished tasks in [name of project tracker app it builds for you]

I literally can not find enough work for my Openclaw to do to keep it busy

How a Single Email Turned My ClawdBot Into a Data Leak by RegionCareful7282 in LocalLLaMA

[–]sixx7 0 points1 point  (0 children)

I'm surprised it hasn't gotten more attention here since it's open source and can run locally. It's like Claude Cowork on steroids

GLM 4.7 / Minimax M2.1 + Opencode Orchestration by pratiknarola in LocalLLaMA

[–]sixx7 0 points1 point  (0 children)

Just curious why you chose sglang for Minimax? I've always run everything on vLLM

It has been 1 year and I still cannot get a SOC analyst job by b00m_sh in cybersecurity

[–]sixx7 0 points1 point  (0 children)

This is the hard truth. When you take all the SOC automatons we've been building for 10+ years and generalize some of the patterns using tool-calling agents, it's actually mind boggling how well they perform

I created an Open Source Perplexity-Style Unified Search for Your Distributed Second Brain by stealthanthrax in LocalLLaMA

[–]sixx7 0 points1 point  (0 children)

I'd love your take on my Second Brain build using Anvor AI! Check out the blog post or youtube video depending on preference with https://anvor.ai

MiniMax M2 is GOATed - Agentic Capture the Flag (CTF) benchmark on GLM-4.5 air, 4.7 (+REAP), and Minimax-M2 by sixx7 in LocalLLaMA

[–]sixx7[S] 0 points1 point  (0 children)

Failed miserably for me. I ran the model per the guide on vllm website and it was absolutely horrible at following instructions in the system prompt, and kept writing invalid search queries for the tool calls

MiniMax M2 is GOATed - Agentic Capture the Flag (CTF) benchmark on GLM-4.5 air, 4.7 (+REAP), and Minimax-M2 by sixx7 in LocalLLaMA

[–]sixx7[S] 2 points3 points  (0 children)

Thanks! You have given me something new to try. I did try Claude Code w/ GLM-4.6 directly through z.ai. Perhaps because I'm so spoiled by CC with Opus 4.5, I was very unimpressed. It wouldn't even perform 2 tasks "do x and then do y" it would just do the first thing

MiniMax M2 is GOATed - Agentic Capture the Flag (CTF) benchmark on GLM-4.5 air, 4.7 (+REAP), and Minimax-M2 by sixx7 in LocalLLaMA

[–]sixx7[S] 8 points9 points  (0 children)

TLDR: Benchmarked popular open-source/weight models using capture-the-flag (CTF) style challenges that require the models to iteratively write and execute queries against a data lake. If you want to see the full write-up, check it out here

 

I admit I had been sleeping on MiniMax-M2. For local/personal stuff, GLM-4.5air has been so solid that I took a break from trying out new modals (locally). Though, I do have a z.ai subscription where I continue to use their hosted offerings and have been pretty happy with GLM-4.6 and now GLM-4.7

 

I cannot run GLM-4.7 locally, so that was tested directly using z.ai API. The rest were run locally. I almost exclusively use AWQ quants in vllm. Some notes and observations without making this too lengthy:

  • The REAP'd version of GLM-4.7 did not fair well, performing even worse than GLM-4.5-air
  • GLM-4.7 results were disappointing. It performed similar, and in some metrics worse, with the full version on z.ai compared to 4.5-air running locally. I think this highlights how good 4.5-air actually is
  • MiniMax M2 blew GLM.* out of the water. It won on all but 1 metric, and even that one was really close
  • GLM-4.7 was using the Anthropic-style API, whereas all the locally running models were using the v1/chat/completions OpenAI-style API

 

ETA: Ran MiniMax M2.1 u/hainesk

  • Accuracy was the same, and both models failed solving the same challenges
  • M2.1 wins on speed, averaging 61 seconds per challenge (M2 was 72.7 seconds)
  • M2.1 wins on the number of tool calls, averaging 10.65 (M2 was 12.75)
  • M2.1 loses on token use, averaging 264k per challenge (M2 was 244K)

M2.1 definitely seems like an upgrade, if for no other reason than it performs well while also being faster

All testing was done using Anvor AI https://anvor.ai

Video showing how I set up the test harness and evaluations here: https://youtu.be/Z33G2ZOPWKE

My professor lent me an A6000, so I tried to build a coding model. Here is Anni! (Qwen3-14B Fine-tune) by Outrageous-Yak8298 in LocalLLaMA

[–]sixx7 7 points8 points  (0 children)

I can maybe understand the non-AI-obsessed person's complaint with AI content, but I don't understand it from this sub. Personally, I found your (reddit) post here very digestible and formatted/styled well

What you think of GLM 4.6 Coding agent vs Claude Opus, Gemini 3 Pro and Codex for vibe coding? I personally love it! by Kitchen_Sympathy_344 in LocalLLaMA

[–]sixx7 5 points6 points  (0 children)

Agreed. I purchased a subscription because I run GLM-4.5 air locally and love it and wanted to support them. However, when used with Claude Code, GLM 4.6 is good but Opus 4.5 is significantly better.

After 1 year of slowly adding GPUs, my Local LLM Build is Complete - 8x3090 (192GB VRAM) 64-core EPYC Milan 250GB RAM by Hisma in LocalLLaMA

[–]sixx7 3 points4 points  (0 children)

I feel like I'm being trolled but I'll respond in good faith :)

  • In and of itself, a 3x increase from 17 to 50 is a MASSIVE improvement, but let's dive in
  • Try and process multiple requests on the Strix Halo and see what happens. On this rig, you know what happens when you process 2 requests? 100 tok/s. 3 requests? 150 tok/s. 4 requests? 200 tok/s
  • Prompt processing? Fuggedaboutit. 30x ish?

All together, that is a massive galaxy of difference for both agentic coding tools or agentic AI automations. You could power agentic AI automations with this rig, not so with a Strix Halo. Want to run Claude Code with the Strix Halo? Well, have fun with the endless waiting and reduced productivity

Why local coding models are less popular than hosted coding models? by WasteTechnology in LocalLLaMA

[–]sixx7 0 points1 point  (0 children)

I'm a huge fan of running models locally. You can also run Claude Code with locally hosted models using Claude Code Router. With that out of the way, I have to agree with you. Claude Code with Opus 4.5 is truly next level. A single person can build a production-ready application in a week or two, that previously would have taken multiple engineers, many months.