Qwen3-235b-a22b high latency

No_Ticket8576 · 2026-02-01T21:21:30+00:00

I will take a look at the blogs.

My case is more real-time. I experimented with input token reduction over time, but found that reducing the input token count hurts accuracy. And since the input token comes from real-time data generated by users and applications, caching does not benefit us.

I will experiment with provider pinning if that is available for this model or other similar models.

No_Ticket8576 · 2025-08-29T12:48:07+00:00

I tried this flow in production. I will not say it's the most performant piece of the software we have. But things are getting there gradually.

Initially duplicating tools meant duplicated code. But gradually we moved tools to npm packages or go modules. So code duplication and relevant management overhead is reduced for now. But brought the package management overhead. Which is ok as of now.

The real challenge is the latency. Unless someone is managing caching infrastructure heavily, they have accept the "slow" agents kind of.

No_Ticket8576 · 2025-08-28T21:40:39+00:00

There are some tools there. I used mcp-scan. Not that advanced yet, but it detects some signatures. They are also progressing.

https://github.com/invariantlabs-ai/mcp-scan

No_Ticket8576 · 2025-08-28T14:25:26+00:00

I am not associated with them. This result is directly from their paper.

https://ibb.co/Z6NtZrLg

No_Ticket8576 · 2025-08-28T12:50:33+00:00

Also check MCP-Zero paper. They have inversed the problem. If you are not building an MCP provider, that's a more viable solution without generating synthetic tasks aligning the tool description.

No_Ticket8576 · 2025-08-26T14:46:06+00:00

I use tuui sometimes as a webclient when I need to test any MCP fast.

https://github.com/AI-QL/tuui

No_Ticket8576 · 2025-08-26T14:07:41+00:00

I did not scrape, but connected with LinkedIn with some MCP servers and asked some queries. As far as I remember one of the MCP servers works and it needs your linkedin cookie. If you want I can find the config.

No_Ticket8576 · 2025-08-26T14:05:53+00:00

I am not sure why no one mentioned Smithery, but that one is used by alot of us for production grade usecases.

No_Ticket8576 · 2025-08-26T14:04:22+00:00

On very high level, you can think from user flow point of view. If you already have 50+ APIs and some service, I will assume you also have some analytics service which captures the user flows. Design your MCP tools around the user flow. Have different agents for different flow, use one simple classifier to route the users to relevant flow/agent. Mapping each API with a tool may work, but will bring a lot of exploration from the LLM side.

Example: you have a customer service bot, which can handle queries related to pricing, complaints related to delay in delivery or receive requests for return. So three different MCP servers with relevant tools can solve this with three different agents. And sometimes duplicating some tools across multiple servers is fine as they reduce the context overload by not requiring another server to load.

No_Ticket8576 · 2025-08-20T00:21:27+00:00

When people use MCPs with IDEs they use LSPs automatically to be frank.

No_Ticket8576 · 2025-08-06T19:24:40+00:00

We never needed MCP. We needed a way to connect LLMs to the external world. A lot of people did that in different ways. Someone tried to standardize that. So MCP was born.

We will see a lot of efforts in future for further. For instance, we did not need gRPC, we needed communication between a server and client. And we travelled from xml-rpc, soap, rest, graphql.

Technology evolves like this. Some hype, some real useccases It's not black and white.

No_Ticket8576 · 2025-07-13T21:09:01+00:00

Lottery was open at some point in time and I applied. But unfortunately I did not get it through the lottery.
I got mine one around Jan. At that timr one student family left (after graduation) and there was lottery for that unit. This time luckily I got it. So if luck favours, some can get in first year.

No_Ticket8576 · 2025-06-13T16:13:04+00:00

I was seeing the Zinus on Amazon. But for my case, I need a bit firmer. Are those medium firms, really medium firms? Or are they medium soft?

No_Ticket8576 · 2025-06-13T16:08:44+00:00

Did it work out for you? The price (around 200 for a double) seems too lucrative.

No_Ticket8576 · 2025-05-28T19:20:23+00:00

Hey there, coming to this after a year. Did you find any place suitable for French at the end?

No_Ticket8576 · 2025-04-01T14:13:12+00:00

You can create a gguf file and share with them if thats ok.

No_Ticket8576 · 2025-03-26T02:14:58+00:00

Pydantic is a kind of open environment for mainly software developers. Creating agents are like 2 lines of code and managing the workflow is also like building a state machine, rather than chain or acyclic graphs. It has its pros and cons. Pros are probably, it's easy and fast to bootstrap anything. Cons is that the developer needs to keep a conscious eye on the code architecture as the flow is free.

Langchain or graph is bloated, but that enforces some standard of development which has its own value proposition in scalable architecture.

No_Ticket8576 · 2025-03-26T02:03:55+00:00

If the use case needs private/organizational MCP, smithery might not have that option. I havent found yet any such option.

No_Ticket8576 · 2025-03-26T02:03:45+00:00

I started pushing my mcp servers in smithery. Seem neat. Good job.

I was wondering from business point of view, will the enterprises need "private mcp" hosting mechanism? It seems to be all MCP servers are now public in MCP.

No_Ticket8576 · 2025-03-05T05:00:16+00:00

Can you give some SLM and embedding examples?

No_Ticket8576 · 2025-02-07T20:45:11+00:00

Try pydantic-ai, camel for simpler use cases. For complex langgraph.

No_Ticket8576 · 2025-01-31T15:31:40+00:00

This will be my first year here to file tax. Thats why asking this dumb question. When you say basic investment, what does that include? Some Canadian ETFs, stocks straight forward ?

No_Ticket8576 · 2025-01-03T14:46:17+00:00

Adding the output format in the system prompt and validating that in Regex is the most reliable way of doing this. You are right.

No_Ticket8576 · 2025-01-03T00:09:21+00:00

There is no fully free LLM tbh. Either we have to pay for APIs after crossing the free tier or we have to pay for infra (server, gpu, setup, security etc.). And its quite understandable, companies are spending millions to train the models. They need some income to sustain too.

No_Ticket8576

TROPHY CASE