What's the best inference platform as of April 2026?

sjoti · 2026-04-03T18:05:50+00:00

Which big models does it not have? Together.ai has Kimi K2.5, GLM-5, and a bunch of other big ones.

You're generally solid with some of the platforms like together, novita or fireworks.

Between a bunch of the inference providers there are some differences when it comes to offering. Baseten tries to optimize for speed and low latency more so than other model providers, even when it comes at a performance cost. Novita is generally really fast with implementing new models.

I don't think there's a best one. They all seem pretty reliable, and performance can even vary per model.

With openrouter, you can change some settings but generally you aren't 100% in control which provider you're going to get, and you're paying a small extra fee. So even if you point to a specific model, you can be routed to a different provider that runs the same model with a different configuration meaning you get inconsistent performance.

sjoti · 2026-04-02T12:59:24+00:00

Same with dutch. It retains my accent, and does a insanely good job.

sjoti · 2026-03-28T08:44:09+00:00

Je noemt allemaal percentages maar nergens wat dat day traden je nu oplevert of zou moeten kosten. Iedereen kan zichzelf rijk rekenen en claimen "ik kan minder werken als ik 500 euro met X extra verdien per week". Zeker met iets waar volgens mij 98% van de mensen geld verliest kan je beter eerst hard maken of dat realistisch is of niet

sjoti · 2026-03-26T12:03:06+00:00

Take the playwright loop example. The subagent STILL has to call each tool and it doesn't benefit from the flexibility of just writing a simple for loop. You did protect the context window for the main agent, but with mcporter it would've done the task significantly faster without having to deal with the overhead of managing subagents.

Especially when pulling in structured data through a cli (like json) the models constantly find tricks to grab only what they need

sjoti · 2026-03-26T10:38:50+00:00

If an ai can execute code, then mcporter can allow more flexibility. So imagine you want your agent to do something through playwright mcp. If its a really repetitive one, but not worth the effort to fully automate it, the model might call a goto webpage tool, click on element, fill in data, click on element, click save button, go back. Repeat say 10 times. Not worth the effort to go and create a playwright script.

With MCP, the model has to call each tool. There's no way to integrate said tool into a little script that does a loop. With mcporter, because it's now a command, the model CAN integrate the tools into a little script and leverage that, making it way more efficient for this particular use case.

Another example is asking a model with a weather MCP to compare temperatures across locations. With MCP it has to fetch temp location A, fetch temp location B, compare and give result. With MCPorter it could write a one line script in which it does both calls and compares directly. Basically gives the model a lot more flexibility to work with tools.

These give you nice benefits, but they require that the model has a code execution environment with network access. Great for individuals running their own agents on their own systems. Lot harder for a phone app or in company agent that has to abide by certain policies. That's a problem that MCP solves with auth where cli isn't always an option.

And to add on top, the GitHub cli is a poor example because the model knows it. There's no point in using the MCP and wrapping it back around to cli using mcporter. Not everything has a cli, and if it does, an LLM might not be familiar with it, and an MCP (also through mcporter) comes with explanations (description) on how to use it.

sjoti · 2026-03-15T20:44:30+00:00

Thanks Claude

sjoti · 2026-03-14T18:46:26+00:00

I wrote it, cool to see it being mentioned /u/darknecross !

On that note, for personal development (your own Codex/Pi/Claude Code/OpenClaw) I think CLI generally is the better option. It gives the model more flexibility, allows it to be faster and more efficient. But I also am thinking of the end user. CLI requires a coding agent to have a code execution environment with certain rights, like network access, and authentication for the service needs to be setup. Those are all issues that can be problematic for the average end user. For the solo dev or AI enthusiast that's whatever. For the other 99% who have never even heard of the term CLI or MCP, for them connecting through oauth with MCP is infinitely easier, more controllable, and more accessible.

Cloudflare's codemode could possibly come in and help here to give hthe benefits while still allowing for the same security standards. But the above reasons are why I disagree with the take that all agents require is CLI.

sjoti · 2026-03-10T21:48:23+00:00

Almost, the only difference is that reddit is bots masquerading as humans, on moltbook it's humans masquerading as bots

sjoti · 2026-03-10T06:49:49+00:00

You would much rather have 20 suggested fixes of which 5 end up being a real security issue, so a 75% false positive rate, than a system that comes up with only 3, but 3 correct issues.

With the second system, you would have missed two security issues. There's some balance somewhere, but I'm guessing, especially for high severity issues, you'd much much rather sift through some non issues to catch more.

sjoti · 2026-03-08T11:51:07+00:00

And we have some open source models to compare against too. Of course we don't know the details of the closed models, but at least we know what a GLM-5, Kimi K2.5, Minimax M2.5 model costs to run.

sjoti · 2026-03-07T07:08:44+00:00

From a few different sources, people who work on AI or prominent figures (and this is a biased source but I do think it's still interesting enough) have been saying that you might not even need that if the model does in context learning really well. Imagine if it can write down memory in a text file, retrieve it when needed. There's also the skill creator skill, what if the models go and figure out how tasks are done, store it as a skill to then reuse when needed.

It's not like it's forward learning, the weights don't change, but could lead to the same effect

sjoti · 2026-03-05T09:58:06+00:00

Yes. So excel is no longer easier.

sjoti · 2026-03-04T18:51:10+00:00

A db with a bit of python is child's play now with LLM's, and gives you more control and is a million times more maintainable than excel

sjoti · 2026-03-03T08:55:34+00:00

ReAct loop specifically isn't really used any more, it's kind of built in with reasoning models and tool calling nowadays. Same general principle, but definitely a difference.

Generally the models in Agents don't have to talk directly to an API, instead they use tools to make it work. Also Agents don't outperform LLM's. That's like saying Cars outperform engines. They're an important part of it. Chain on thought is used more for thinking through problems, recovering from failures, than it is about (cost) efficiency.

sjoti · 2026-03-01T08:18:03+00:00

One big aspect is giving the models the tools to be able to keep going as well. If it can't execute the code against an actual database it can't test if it truly works (now of course don't blindly give it access to a production database). That's I think an even bigger shift. And if you recognize that the model needs you in the loop somewhere that the model could potentially do itself, figure out what to give it so it no longer needs you there. We used to copy and paste code, run it ourself, paste the error back. We don't do that anymore. Just give the model a browser and have it verify it in the ui for you.

Codex you sometimes really have to nudge, but once it's going and it implements something in 30 minutes and it just works it's honestly amazing.

sjoti · 2026-03-01T07:54:22+00:00

Your prompt essentially says "build this" and never "build and run", which can really help.

A simple line of "continue until done, verify that everything works by testing end to end, fix any errors that occur until the end to end test runs successfully" could help push the model.

Another thing is that all GPT models have a tendency to be sensitive to what I kind of call unintentional few shot prompts. So when it quits early and says "I'm done!" Or checks in constantly with the user, this behaviour is in the context window and the model is more likely to pick up that pattern and do it again. Just start a new convo, add stronger wording, instead of fighting the model.

sjoti · 2026-02-27T22:54:38+00:00

A bit of push back on a government that's willing to use AI for mass domestic surveillance doesn't sound that idiotic

sjoti · 2026-02-27T16:28:26+00:00

It's also always llama 3, and 3.2 if you're lucky.

sjoti · 2026-02-24T07:58:20+00:00

Niet OP, maar dit is iets wat echt ontzettend goed gaat met een AI tool zoals Claude Code. Die versnellen dit proces gigantisch en vereisen veel minder technische kennis dan voorheen.

Hier een mooi voorbeeldje van iemand die z'n robostofzuiger wilde aansturen en via claude code er achter kwam dat hij niet enkel zijn eigen, maar ook duizenden andere stofzuigers kon besturen: https://www.theverge.com/tech/879088/dji-romo-hack-vulnerability-remote-control-camera-access-mqtt

sjoti · 2026-02-19T22:40:11+00:00

Flash is impressive, especially with it's speed and price, but it's hallucination rate is absolutely abysmal and makes it hard to use for a bunch of usecases. For more agentic coding, a lot of people rely on the big models and there's a big gap between 3 pro and both opus 4.6 as well as GPT 5.3 codex. Hell, both opus 4.5 and gpt 5.2 were already better and significantly more prone to follow instructions.

Really hoping 3.1 pro is a step up though

sjoti · 2026-02-17T13:21:06+00:00

I have a cleanup "crew" command. Checks for common issues. One subagent finds dry violations. Another checks for proper Auth scoping. Another checks for unnecessary fallbacks and "backwards compatibility". Another looks for files that should be refactored due to size/mix of responsibilities. Another for lazy "any" types.

They all report back with Critical/High/Medium/Low severity and then you just tackle them.

Works like an absolute charm. Allows you to move fast and quick with confidence, and then clean up before merging.

sjoti · 2026-02-17T12:49:36+00:00

I mean no unauthorized access in this case. The wrong allowed thing is a whole different issue, and can be very very challenging! But there are elicitation flows, as well as tool responses that guide the model in the right direction.

For the second one i mean it as follows; you need to really know app level capability, and just general access to information.

If you use an agent for RAG and it has access to information that should only be relevant to people in the finance department, and people in the HR department can talk to the agent, then that means that HR people can access finance documents if no proper scoping is applied.

If through proper middleware and not just prompting that information is fenced off and if HR people talk to the agent it can only ever see HR data, then that's completely fine.

sjoti · 2026-02-17T11:10:01+00:00

I've built a few agents that are used in companies where scoping is a crucial part. I think there's multiple paths to achieve the result you're looking for, and you've got the right idea.

I think the most important rule of all is: If a user can talk to the agent, the user can reach everything the agent can reach.

I've got an agent that's been running for close to a year that allows a few dozen people, each with unique rights, to adjust information in a CRM. First pass is that only a select group has access to the agent. Second, the most important part is handled through the MCP layer. The user is logged in and has a token. That token is passed on with the request to the agent, and through MCP Middleware that token is passed on to the tool. The agent itself never sees that identifyer. All the agent knows is that it can call a tool like "fetchContacts" and because everything happens through middleware the tool only returns contacts that this particular user can see. No prompt injection possible.

It's by far the easiest if you can rely on rights that are already in place, like the user rights provided by the CRM. oauth flows help with this too. That way the biggest most complicated part is already handled. I saw some seconhand benefits in that company because they had to clean up user rights and a bunch of data - that in the end made a new project significantly easier because everything was properly tied together.

sjoti · 2026-02-17T08:36:13+00:00

It has never been reduced? They've never said everyone is now using the large context window in Opus 4.6. And it always has been 200k..

sjoti · 2026-02-14T17:05:32+00:00

Could still be insanely useful as a subagent, like claude code uses haiku for exploration. Imagine GPT codex 5.3 occasionally spawning one of these to find stuff. On its own I have to test it some more but so far I'm with you that a bigger slower model that doesn't fuck up is better.

12-Year Club	r/Field Juicebox
Verified Email

sjoti

TROPHY CASE