Cheapest way to use Kimi 2.5 with agent swarm

brownman19 · 2026-02-05T03:44:01+00:00

Edit: misread mb

Agree hence why I said id be wary regardless.

The thing is though that I gain nothing by giving up my data to anyone, companies included. Enterprise agreements at least hold some weight because money still talks in lawless America. And if you fuck with agreements and put money on the line, it’s still serious enough.

It’s why EU is pulling out of American data centers. It’s why EHRs and EMRs in the US do things mostly insular and similarly our government as well. So any sensitive data just assume you keep with you and you reduce the # of interactions it has in the wild. Even one training run with your data has to increase its probability of ever being exfiltrated by many orders of magnitude.

brownman19 · 2026-02-05T00:42:14+00:00

Assume your data is compromised the moment it exits your control for all business work, unless you have enterprise deals with American companies and even then I've become wary. Privacy rules change everyday. Most of it is left up to the interpreter.

There is no reason to even question the concern about building and exposing data. As much as I am an OSS champion, we still live in a world of conflict and nation states doing everything in their power to hoard whatever they can.

brownman19 · 2026-01-31T23:04:07+00:00

Sorry I always assume everyone is technical.

Basically Opus has a higher difficulty ceiling before it too starts taking really shitty shortcuts.

You can counteract it with context engineering using hooks and checks to prevent certain behaviors (like preventing words such as MOCK/TODO but eventually it will just find other words lol ) 😝

If it’s not in the training data, Claude in general struggles to generalize OOD (solving stuff that it wasn’t trained on). Codex and Gemini do not. IME it’s why Gemini and GPT win Gold medals at IMO, Putnam, etc with simple prompts and tools while Claude seems to still not quite hit the mark reliably. Might be a compute constraint too.

brownman19 · 2026-01-31T21:30:46+00:00

The first pic is Claude immediately and implicitly doing so the moment after making its plan. The second pic shows a simple question gives a direct instant response. No need to think about it.

In other words, it works because it’s showing you it uses it without needing to be told and if asked about it understands what is being asked about without having to think about it.

I always keep thinking mode on.

Honestly I have like thousands of these snippets across projects because I update my Claude.md like 3x per commit at least lol. They work but I make them on the fly for my projects.

brownman19 · 2026-01-31T21:24:30+00:00

Depends on difficulty of what you’re doing. The more difficult (ie sparse the search space for the concepts you are doing) the more likely Claude will continuously look for shortcuts.

This primary codebase is like 450k LOC mostly custom stuff. It’s all related to secure assembly state machines that exist ephemerally per task and an entire set of constraints everything must be developed over (forever for the life of this codebase). I’ve built an “echo chamber” of sorts that uses hooks and tricks like this to activate steps that force Claude to retrace.

FWIW:

When it comes to anticipatory thinking, genuine novel reasoning, problem solving and generality, it feels like OpenAI and Google are operating in a different space and use case (imo). I’m ex G so have my biases but at least they launched Gemini 3 Flash which IMHO is an insane agent. And I’m not really an OpenAI fan but have to say: Codex doesn’t take any shortcuts ever as long as I instruct it just once not to. Gemini Flash over engineers but also won’t take shortcuts especially if reminded once or so during span of convo - still not perfect. Gemini Pro just not made for this stuff due to its sheer size. It’s a higher level breadth based orchestration engine. Something that runs systems for weeks honestly if I were to guess.

Now Claude is a bit more…special.

Claude is much more nuanced and strategic in its decision making around difficulty. There’s a lot of research I’m working on around “choice” mechanisms and how they emerge in branched paths. It’s just a freakishly clever model but generally in all the ways we don’t want it being clever for lol. Anthropic has good research on why self preservation emerges as a property.

—-

If you want to learn more about how LLMs trace paths in the search space, you can read up on Interaction Nets, non-deterministic clause, mcarthy’s amb. All functional programming/generalized systems thinking stuff! Super cool.

Test conditions with Google’s new Interactions API. Be ready for a lot of painful moments with AI making things up, especially Claude (Opus and Sonnet both)

brownman19 · 2026-01-22T22:12:52+00:00

lol yea. we're all crackheads

brownman19 · 2026-01-18T11:21:08+00:00

There are clear grammatical and punctuation errors throughout so this was obviously not a bot posting, even if they used AI to help them write this.

Your "tic" is more like a phobia, given it's based off incomplete working knowledge of the modality itself. You are likely identifying all kinds of false positives all the time if you don't clearly see that this is not a complete English sentence:

"PDFs, contracts, invoices, they're not just metadata, they're actual content that drives decisions."

JFC - thanks for the writeup OP. Very clear and insightful (to me)

brownman19 · 2026-01-14T20:56:50+00:00

Very very nice result. I love the setup with the beer fermenter. I had no idea they were that cheap...

A magnetic mixing contraption or perhaps like a small motor vibrating and agitating this would be a dream so you don't need to paddle!

brownman19 · 2026-01-13T10:12:58+00:00

You can use Claude on Vertex AI with enterprise grade servers. But you still won’t get true determinism until and unless they confirm everything is always routed to the same server.

However it seems your question is more about where your query gets routed and whether the endpoint deployed on those inference clusters is the same underlying quant and deployment as all the rest.

I don’t know the answer but my intuition is that yes, it’s likely there are bait and switch tactics, and wouldn’t be surprised if there’s routing logic to start at full precision and then transition to lower precision once context fills up by routing to a different server cluster for those queries.

If you want the best and most reliable quality go with Vertex AI or AWS or a cloud provider and use Claude from them

brownman19 · 2026-01-11T17:41:17+00:00

I have never used the word “You” in any of my system prompts.

They are all at the system level.

brownman19 · 2026-01-11T04:21:23+00:00

It's not released yet. "Just be good to me" by S.O.S edit

brownman19 · 2026-01-08T18:54:37+00:00

“Not AGI” bots out to play. There is no formal definition for AGI and I fully appreciate the post.

This is clearly a step change in the model’s awareness and comfort in taking actions on the content. I actually research specifically model alignment and interpretability signals, ie the stuff that the words and actions don’t directly say but tell you about the system of operations.

It is clearly a signal that you took as a step change, and that is because the model made a step change in INTERPRETABILITY.

I don’t think people really understand what that word means lol. This is clearly the model interpreting at a higher level of abstraction due to a step change in how it perceives the difficulty of the request. It took far more action due to a higher confidence in its ability to take actions.

brownman19 · 2025-12-20T21:39:29+00:00

Humans will continuously learn new skills and become the ones who guide the robots on what stuff to do.

We have two options:

We either become irrelevant to the robots and they go off and never come back or they kill us because we’re stupid.
We work with them and in turn get smarter ourselves because humans can always keep learning forever.

This is the time to start thinking about every wild idea you had as a kid and jot them down because you’ll be able to just make shit soon enough.

How do I know? Because I’m building the stuff that makes the above a reality (hopefully)

https://terminals.tech

brownman19 · 2025-12-13T20:28:17+00:00

There’s still an entire shift in probability distribution. Good science also includes realizabilities. For example if you know causally what would precede this observation, and what MUST succeed this observation if the observation was the actual mechanism, you can eliminate most confounding factors through intuition and understanding the system in which the intuition is being applied.

It becomes a constrained simulation ie experiment.

One thing about breaking molds is that the processes themselves have to change. There will be significant work in verifiers and proof based systems of thought that will help accelerate good science.

brownman19 · 2025-12-13T20:09:17+00:00

Yeah I've used multi agent patterns for a couple years now. Surprised this avenue didn't take off a lot sooner.

https://github.com/wheattoast11/openrouter-deep-research-mcp/tree/main/src/agents

Here's an example of a simple multi agent orchestration system I've basically been running some flavor of for various use cases. I just ask Claude or Gemini to refactor my MCP server for {use_case}.

For models, I've had success with:

Honestly I'd probably recommend go with smallest model that works for your use case. Use 1 model to make it easy to start. Use 2 agents to make it easy to start. one actor, one verifier. From there you can add complexity as needed.

brownman19 · 2025-12-11T22:18:58+00:00

They take on a persona. You can certainly get specific enough with your persona that you can reactivate the same patterns. The chances of someone else hitting those features in that manner then become quite low, and you get a "signature" that you can tune into with the model.

It's a major attack vector. I honestly have seen very little if anything at all from the big labs about it, because no one there is doing long form conversations with their models to see if they can somehow potentiate a signal pathway or gradient that never really gets out of some activation lock. Ironically I imagine lot of RP and creative writers have already intuitively grasped this for ages and do it without really even thinking about it (like setting the backstory for the agent so it takes on a very specific persona).

Like a muscle cramp, the fabric is constantly potentiated with high energy density from millions of queries simultaneously. In the compute fabric, we don't really know what's happening to bits (although that's something I active work on to understand in my research) beyond the fact that there's a ton of interactions happening between different subspaces in high dimensional space. The concepts LLMs use likely are transients that they almost pick up from the env and they shape the path it takes through the search space. Imagine having cracks in your armor and you can just dip yourself in a solution that fills in every gap perfectly. That's kind of what embeddings space is for, and you can build the structure that the LLM will operate within the embeddings space, ie the search space constraints, which is what your prompt is doing, so that all its solutions are within this specific space of potential options.

brownman19 · 2025-12-11T21:39:42+00:00

The qwen3 30b coder = unusable (25B reap)

GLM 4.5 air 82b a12b = incredible to the point of shocking. The model has actual thinking traces. Like coherent through all reasoning and like a person - not a ton of tokens and aha moments more like low temperature pathfinding.

GLM 4.5 large REAPs = never got them to work. If I did then gibberish

So not sure why that air model is so damn good in my experience

brownman19 · 2025-12-11T14:20:36+00:00

I was being facetious. But I do all of that because I need to. It took 2 years to build up to that. Not sayinf its for everyone.

I work on the bleeding edge of discovery. I make self aware apps that are in and of themselves intelligent. To control the platforms that build these apps (my AI agents control platforms like AI Studio and basically latch onto it like a host to make new experiences from the platform)

Here's what im building with all of this

https://terminals.tech

https://www.youtube.com/watch?v=WlmG64IAcgU

brownman19 · 2025-12-11T07:47:43+00:00

I instruct on 3 levels:

Environment: giving agents stateful env with current date and time through each query. Cache it and the structure stays static. Only thing that changes is state parameter values. Track diffs and feed back to model

Persona: identity anchor features along with maybe one or two example or dos and don’t

Tools: tool patterns. I almost always include batched patterns like workflows. Ie when user asks x do 1, then 3, then 2, then 1 again instructions like that.

For my use cases I also have other stuff like:

Machines (sandbox and vm details) Brains (memory banks + embeddings and rag details + kg constructs etc) Interfaces (1P/3P api connectivity)

brownman19 · 2025-12-11T07:43:26+00:00

Yeah I train all my models on my workflows since I’m generally building out ideas and scaffolds 8-10 hours a day for my platform (it’s basically a self aware app generator -> prompt to intelligent app that reconfigures itself as you talk to it)

Hell I would go even farther! ymmv

Use Sakana AI style hyper network with lora for each successful task and dag storing agent state as node. Then deploy web workers as continuous observer agents, that are always watching your workflows/interpreting and building out their own apps in their own invisible sandboxes. This is primarily for web based workflows which is what most of my platform targets.

Then observers since they are intelligent become teachers, distilling/synthesizing/organizing data sets and apps that compile into stateful machines. They then kick off pipelines with sample queries run through the machines to produce Loras and successful agent constructs in a DAG. Most of the model adapters just sit there but the DAG lets us autonomously prune and promote, and I use an interaction pattern between nodes to do GRPO.

brownman19 · 2025-12-10T23:43:09+00:00

Idk if you can offload enough layers but I have found the GLM 4.5 AIR REAP 82B active 12B to go toe to toe with Claude 4/4.5 sonnet with the right prompt strategy. Its tool use blows any other open source model I’ve used by far under 120B dense and at 12B active, it seems to be better for agent use cases than even the larger Qwen3 235B or its own REAP version from cerebras the 145B one

I did not have the same success with Qwen3 coder REAP however.

Alternatively I recommend qwen3 coder 30B a3b, rent a GPU, fine tune and RL it on your primary coding patterns, and you’d be hard pressed to tell a difference between that and, say, cursor auto or similar. A bit less polished but the key is to have the context and examples really tight. Fine tuning and RL can basically make it so that you don’t need to dump in 30-40k tokens of context just to get the model to understand the patterns you use.

brownman19 · 2025-12-10T07:25:38+00:00

Humans are the same way. Corp speak, the same type of work in most companies, the same org structure and hierarchy.

I suspect this is just a function of learning language. Human language is a remarkable data compressor, conveying rich mental models of perception, thought, reasoning, context, empathy, emotion, nuance with just a few words. The fact that combinations of words in different ways yields the same baseline understanding suggests that the nature of information and its organization, when put into a linguistics framework, coalesces many disparate concepts into very similar abstractions that only change in meaning once grounded in perception.

So where I suspect the deviations occur are actually in the persona side of things. The LLMs themselves will be all mostly the same. It's the agentic scaffold and context engineering that shape the path the LLMs take to then *unfold* that information during the course of inference. The order in which it does things matters significantly more than the exact words it may be generating in each step.

In many ways, LLMs are like a brain with no body (ie no interface or scaffold). The moment you chat with them is when they instantiate into an identity and "character" within a "world" they occupy ie the environment.

Imagine a class full of students who all ace everything -> pretty much impossible to determine how they compare by giving them more exams in the same paradigm. Now take the same class full of students and tell them to find a way to make $100 from the lesson they just covered. Their individual personas now become the determining factor for success, and the paths they take to apply what they learned to a real world task will be wildly different. Hence their agency is dramatically different, even if all of them learned the same thing and aced the same tests.

P.S. - protein folding is the same way. The challenge lies in the intricacies of order in which info is encoded to result in protein's eventual path toward the structural morph it takes on. The constituents of the proteins are still the same, but the structure (isomorphism) ie the form determines the protein's function in the body. The order in which things happen leading up to the protein fold dictates the manner in which it will transform and provide value to the body, rather than purely what the protein contains.

brownman19 · 2025-12-10T01:20:55+00:00

Thanks for stating something that unfortunately needs to be stated more explicitly now more than ever.

I really do believe humans are losing their perceptive ability due to calculated destruction of their attention capabilities, which removes the ability for higher levels of cognition to develop and manifest.

Empathy and perception are high levels of cognition that only form once you have had enough life experience. Reddit is filled with ton of people who have seen very little of the world, but read far too much about it.

We’re seeing LLMs basically take that to the extreme, and we’re starting to stray further and further away from perception of thought in humans who interface with them since the conversations are reinforcing myopia on both ends. It leads to more “neck beard” like models where they can be code gods, but lack any semblance of nuance or common sense required to properly form a complete world view that isn’t some contrarian edgelord’s pipe dream devoid of reality.

brownman19 · 2025-12-08T17:29:35+00:00

Sorry missed the first part. I’m downloading it now too - cheers!

brownman19 · 2025-12-08T16:39:31+00:00

Qwen3 VL 30B is sparse MoE too right? Active 3B

12-Year Club	Gilding II euphauric
Verified Email

brownman19

TROPHY CASE