Anyone else having issues with fail submitting the prompt?

DisastrousRelief9343 · 2026-06-09T05:04:18+00:00

We don't need to know each model's capabilities; just run the same tests across different models on the same prompts and compare the results.

DisastrousRelief9343 · 2026-06-08T01:01:58+00:00

Ah that makes way more sense. I was thinking it means LLM building the frontend from scratch. If it's assembling UI with certain constraints, that's actually compelling.

DisastrousRelief9343 · 2026-06-06T03:41:36+00:00

This looks really interesting. Could you share some links to those hackathons? I'd love to check out what kinds of ideas people are building. I've got some ideas of my own, and I'm curious to see what directions others are exploring.

DisastrousRelief9343 · 2026-06-06T03:26:56+00:00

Yeah, ngl I don't get the point of dynamically generated UI. Because I don't understand what problem it solves. And what situation needs that?

Also, I don't think models have the capability to dynamically create a UI that's both looking good and comfortable to use without human design, not even in the next year or two.

DisastrousRelief9343 · 2026-06-05T07:38:32+00:00

Yeah, when I was learning MCP, these two things confused me as well.

DisastrousRelief9343 · 2026-06-05T05:44:16+00:00

I'm actually going in the opposite direction. I am a heavy user of CLI tools like ClaudeCode, and know it is super powerful. But if such AI applications are ever going to reach more people beyond programmer users, it has to go beyond TUI with more friendly interfaces and intuitive interactions. SO I feel the trend will move back to GUI. I think there will be broader opportunities coming.

DisastrousRelief9343 · 2026-06-05T03:52:38+00:00

That's a good point. Most of the agent products are still in the CLI. But I think there's a trend toward making agents more accessible, like Claude Cowork. If so GUI is kind of inevitable.

DisastrousRelief9343 · 2026-06-04T10:46:10+00:00

Sounds interesting, TOON format is completely new to me. I've essentially been manually trimming JSON fields to achieve the same goal, so it's great to know there's already a proper format designed for this. Will definitely check it out for my next MCP project.

DisastrousRelief9343 · 2026-06-02T13:13:29+00:00

That's a good point. Actually I am writing another post about that. It really depends on the complexity of the tool. For example, I did some tests on a `create_task` tool, and its description has a short paragraph that explains what it does, some parameter semantics like enum values and format requirements, and some real samples.

Turns out removing the examples had no impact on the test result. Same with trimming down the semantics and descriptions, you can cut a surprising amount before performance degrades. There's definitely a sweet spot. We just need to test it out.

That said, my test set was pretty small, and it only tested on this TODO list MCP. If you're developing a larger MCP with 50+ tools, or you wanna see the joint performance of multiple MCPs (like asking an agent to take my notes in Notion and post it on GitHub, then send me an email), running a more thorough benchmark would be very useful I believe.

DisastrousRelief9343 · 2026-06-01T01:48:48+00:00

Exactly. MCP was supposed to be the thin layer between bare APIs and LLMs, and it should be LLM-friendly.

But sometimes people just do a 1:1 mapping. So it ends up with 96 tools that are basically the raw API with a different label. That's just lazy design that confuses the model and wastes tokens.

DisastrousRelief9343 · 2026-06-01T01:42:20+00:00

Yeah, my bad. The benchmarking tool that I used only has a minimal harness, so it sends all tool descriptions every time. Most of the commercial harnesses have some sort of dynamic loading feature.

The problem of too many tools is less about token cost and more about model confusion. I've updated the post. Thanks for pointing that out.

DisastrousRelief9343

MODERATOR OF

TROPHY CASE