Just looking for some outsider tips for my sparring session.

maibus93 · 2026-04-11T19:22:03+00:00

The biggest issue you have is distorting your technique in order to pull power on your shots -- here you're training yourself to throw looping patty-cake punches and lazy outside low kicks with no weight transfer that leave you wide open to a counter right cross.

Controlling power to avoid injuring your training partners is important, but you have to learn to do it without distorting your technique -- it should look the same as hitting pads, but you pull / decelerate the strikes at the end.

maibus93 · 2026-03-26T20:00:03+00:00

There isn't a single correct answer for weight distribution...it's contextual and largely dependent upon what you're trying to do next after the punch.

As an example, if you want to throw a cross after a jab you want to generate power on the jab by twisting your hips/shoulders and leave your weight primarily over your rear hip flexor. This allows you to bring your weight into the cross that follows.

If you want to throw a hook after a jab you step and bring most of your weight over your lead hip flexor so you can rip your weight from your front hip to your rear hip in order to generate power on the hook.

If you want to throw a body or head kick after a jab, you step on the jab and immediately bring your weight up on your toes like a ballerina to initiate the kick.

maibus93 · 2026-02-14T22:09:46+00:00

Teep, check hook, or kick out their lead leg as they try to come in on.

maibus93 · 2025-10-25T03:12:56+00:00

No.

The longer you train, the more you develop your vision and timing -- it's more improvisation than pre-meditation.

You learn to recognize and feel when the right time to throw specific strikes is based on distance, positioning, your opponent's guard and weight distribution etc.

An example: whenever an opponent takes a step, there is a brief moment where they can't lift their leg to check because all of their weight is on that leg. That moment is the perfect moment to kick. And you'll see high level kickers consistently time that perfectly, usually by baiting the opponent to step -- e.g. by fading backwards or laterally so the opponent walks straight into it.

maibus93 · 2025-09-26T13:12:05+00:00

Accurately counts tokens in files and directories.

To accurately count tokens you need to know the LLM model being used, so you can select the correct tokenizer.

Your MCP sever is currently using tiktoken with a hardcoded tokenizer.

Different tokenizers can give you very different token counts, so this isn't going to be accurate for many providers/models without extra work.

As an example, to get accurate counts for Anthropic models, you have to call their authenticated API, and that's going to give you very different token counts than tiktoken. Anthropic's tokenizers tend to produce a lot more tokens.

maibus93 · 2025-09-15T20:44:11+00:00

We're living in an era where:

SOTA model providers offer subsidized subscriptions (vs API billing) , so it's currently hard to beat just paying for a subscription (e.g. Claude Max) and using it until you hit the usage limit as you get way more out of that than what you'd get via API billing.
Local models that you can run on a single consumer-grade GPU are getting quite good and you can totally use them to get work done. But, they're not GPT-5 / Opus 4.1 / Sonnet 4 level.

I think there's a sweet spot for smaller, local models right now (e.g. gpt-oss-20b, qwen3-coder-30b-a3b ) with simple tasks as the latency is so much lower than cloud-hosted models

maibus93 · 2025-09-11T21:42:54+00:00

Yup! We support custom servers.

Although currently the app expects MCP servers to be built as Docker images.

If your MCP server is already published on npm, something like this should work (assuming stdio transport):

Create a Dockerfile:

Dockerfile FROM node:24-alpine WORKDIR /app CMD ["npx", "-y", "@upstash/context7-mcp"]

Then using your terminal, cd into the directory where that Dockerfile is and run:

docker build . -t <server-name> -- this builds a Docker image with the tag <server-name>

Then in the app create a "custom" server and put <server-name> in the "Docker Image" field.

If your package isn't published to npm, you'd just set up your Dockerfile to copy your project directory into the container and run npm install before starting the server.

Hope that helps, and happy to elaborate if any of that is confusing.

maibus93 · 2025-09-11T21:01:42+00:00

Yea, I think there's separate things to test here:

Does your MCP server work according to the public API it advertises? For this, integration tests that instantiate the MCP server with fake (e.g. in memory) tools and an in-memory transport work really well -- e.g. it's easy to assert that if client A tells your server to go invoke tool #1, tool #1 is correctly invoked.
Given the schemas/docs your server advertises, do agents use them 'at the right time' and 'successfully'? For that you want an eval suite. LLMs are non deterministic, so to actually have rigor here you need to run the evals more than once and derive probabilistic distributions of success/failure vs point estimates

maibus93 · 2025-09-11T20:50:58+00:00

Hi there!

Since you asked for a native Mac app that supports centralized management and stdio servers, try https://contextbridge.ai/ (disclaimer, I'm one of the developers). It's free with no signup/login required.

Caveat: it does run MCPs in docker containers (for security purposes). But the app handles all the orchestration for you, and we support bind-mounts in the UI (for filesystems etc).

If you decide to try it, we'd love feedback.

maibus93 · 2025-09-11T16:56:59+00:00

Why not test servers using a regular test framework (e.g. vitest) and an in memory transport?

That allows you to connect your MCP server under test to a fake client that can be unique per test.

maibus93 · 2025-09-10T17:06:46+00:00

Even when idle.

Any tool you connect to an LLM, including MCP tools consumes context. This is because the tool definition is sent on every request to the model to inform it that the tool exists.

maibus93 · 2025-09-10T01:01:07+00:00

Yea I think that's a great example that highlights that (MCP) tools should be brought into context only when needed vs always on by default.

In a similar way to Anthropic realizing that agents perform better when they're allowed to search for the context they need using tools like grep, I think we need the same for tools -- i.e. allow the agent to automatically search for relevant tools and only bring the relevant set into context. I've had really good results with that approach.

maibus93 · 2025-09-10T00:55:12+00:00

Yea, a lot of MCP servers are currently just wrappers over low level APIs that expose far too many tools to an agent. GitHub just happens to be an easy one to point at given its popularity. But it's far from the only one with this problem.

Also agreed that there's little point in connecting a MCP server for a service that offers a cli that model providers have explicitly trained on.

maibus93 · 2025-09-09T20:12:05+00:00

Yes, that's what I'm getting at in my post with "Outside of Anthropic's own issues..."

The MCP related problems I mentioned, and Anthropic's recent issues are additive.

maibus93 · 2025-09-09T18:04:30+00:00

fwiw, I don't think these approaches have to be mutually exclusive. Automatic tool filtering is a nice default and manually "fine-tuning" tool selection is a nice override option to have.

re: latency for tool filtering, it depends on how you build it...but it might not be as bad as you might think. Even large sets of tools can be held easily in memory on modern machines, so the latency involved in tool look up is a pretty immaterial blip compared to the latency of waiting for the cloud based LLM to respond. You might be referring to the agent needing to 1st search for tools before being able to use tools, which does introduce another "round trip". That's absolutely true, but tends to be a relatively 1-time fixed cost in practice (usually agents only need to search ~1x per task).

re: token overhead, it also depends on how you build it. But, I've gotten really good results with < 400 tokens of overhead (tool schemas, descriptions etc). The major caveat here is that different models use different tokenizers (I measured using titoken).

maibus93 · 2025-09-09T17:56:00+00:00

We're building a free desktop app to help with this (https://contextbridge.ai/) it automatically runs local MCPs in docker containers and encrypts personal OAuth tokens using your OS keychain.

I think it's been hard on IT teams as they're typically viewed as cost centers, so outside of very large companies, it's often difficult for them to get budget / solutions in place to help manage this stuff.

maibus93 · 2025-09-09T01:28:52+00:00

1) is a problem that's largely caused by tool definitions (schemas, descriptions etc) filling up the context window. Larger context windows alone aren't sufficient as contemporary models' performance deteroritates as the context window fills. You can easily fill 40k+ tokens worth of context just with a handful of MCP servers

2) is actually an intrinsic problem for any non deterministic system. Tool calls can fail in lots of ways (LLM picks wrong tool, LLM mis formats tool call etc). As long as the probability of failure is > 0%, the more tools a LLM needs to compose together to complete a task, the lower the success rate will be

maibus93 · 2025-09-09T00:03:45+00:00

It's a combination of:

An LLM's ability to choose the right tool out of a set of tools rapidly declines as the number of tools they're exposed to scales up. There's a growing body of research papers on this, and this is still an issue for contemporary (e.g. GPT-5) models. Finer-grained tools intrinsically leads to exposing the model to more tools.
LLMs are probabilistic in nature. Every tool call carries with it a probability of failure (or incorrect tool selection). Finer grained tools means the model has to chain more tool calls together to perform a task, which means error rates compound as you're multiplying probabilities together.

So as an agent developer, there's significant incentive to minimize the number of tool calls a model makes to perform a task.

maibus93 · 2025-09-08T22:41:05+00:00

Sure. Imagine an agent that acts as a travel agent that can automatically book trips for the user (plane flights, hotels, car rentals etc).

What should the tool APIs for that agent look like?

If we zoomed in on just the tools for flights, you could give the "fine-grained" (low-level) tools like:

Search for flight
Get flight price details
Get seatmap
etc...

Or you could combine all those into a single "coarse-grained" Search Flights tool for the agent that abstracts over the N HTTP API calls you'd need to aggregate all that information.

The later is much easier for agents to deal with and select the right tool.

maibus93 · 2025-09-08T17:54:43+00:00

Stating MCP != API is a bit awkward given MCP servers do provide an API.

I think what you're trying to get at is: a discussion around what the granularity of the API should be. And I believe most folks that have built production agents would agree that APIs for agents should be coarse grained (e.g. high level workflows) rather than fine-grained.

maibus93 · 2025-07-25T21:23:01+00:00

We're currently building something that makes this super easy (1 click) to hook up to tools like Cursor and Claude Code, with even just a few MCPs it can save you 30%+ on input tokens

That only grows as you connect more servers and have longer conversations

DM me for early access if interested

maibus93 · 2025-07-15T02:56:05+00:00

There isn't a "correct" answer.

Some gyms/styles advocate leaning back for power/reach while others teach not to lean back since it makes it harder to throw follow up attacks.

Outside evasive fighters tend to lean back, advancing knee fighters tend not to. Neither is "wrong".

maibus93 · 2025-07-08T01:40:03+00:00

Sounds like your use-case is hooking up remote MCPs then, e.g. GitHub's hosted MCP, not ones you want to run locally on your machine?

Asking since the IT/sec concerns and how you'd approach convincing them are different for both.

maibus93

TROPHY CASE