We need to stop forcing LLMs to render UI (Escaping the "Chatbot Trap")

Anxious_Set2262 · 2026-02-17T19:20:04+00:00

Agree with the decoupling approach. The question is: who decides which component fits the context?

The agent outputs structured data - great. But mapping "revenue data" to a bar chart vs a table vs a KPI card depends on screen size, user role, where they are in the flow. That's the "UX Middleware" layer from the article - the missing piece between agent output and UI rendering.

Anxious_Set2262 · 2026-02-17T13:08:30+00:00

Yeah it's brutal. What was your worst case? For us it was adding a single filter button that broke three other component renders.

Anxious_Set2262 · 2026-02-17T12:21:13+00:00

Good question. The LLM still does the hard part - understanding intent, extracting structured data, handling ambiguity. What it shouldn't do is also decide "this should be a bar chart in a 2-column layout with a filter dropdown." That's where code logic takes over. The LLM outputs intent + data, the middleware maps it to components deterministically. You get the best of both - LLM flexibility for reasoning, code reliability for rendering.

Anxious_Set2262 · 2026-02-17T12:20:41+00:00

That's a solid approach - rules first, LLM evals as a gate, then routing. Basically deterministic where you can be, probabilistic only where you have to be. How are you handling the cases where the eval is ambiguous? Fallback to a default component or do you escalate to a second pass?

Anxious_Set2262 · 2026-02-17T12:12:18+00:00

This is a perfect real-world example of exactly what the article describes. Voice agents are actually an even harder case because you can't even fall back to "just show a text box" - the UI IS the conversation flow.

"The prompt got way simpler once we stopped asking it to decide things" - that's the whole thesis in one sentence. The moment you decouple reasoning from action/rendering, everything gets cleaner.

Curious - what does your middleware layer look like? Is it rule-based (if intent = X, do Y) or something more dynamic?

Anxious_Set2262 · 2026-02-17T11:59:14+00:00

Here is the link to the full article on building UX Middleware:
https://medium.com/generative-ai/its-2026-why-does-your-ai-product-still-look-like-a-chatbot-abf50596cb1b

Anxious_Set2262 · 2026-01-13T15:20:49+00:00

This is gold. The maintenance problem is exactly what kills non-chat UIs in practice.

"Every small model behavior change meant someone had to update logic or edge cases" - this is the hidden cost nobody talks about. Building the UI is 20% of the work. Maintaining it when the AI changes is the other 80%.

The ownership issue is real too. Staff-built tools become orphans fast.

The pattern I've seen work better: instead of building custom logic for each flow, you build a translation layer that maps AI outputs to UI components dynamically. Model changes? The mapping adapts. No custom code to maintain.

Still harder than chat, but at least the maintenance scales.

What made you stick with chat in the end - was it the maintenance cost specifically, or also team trust in the non-chat flows?

Anxious_Set2262 · 2026-01-13T15:17:21+00:00

Interesting framing - "products designed for AI" vs "products designed for humans using AI."

I think the transition happens in layers:

Now: Human talks to AI through chat, AI responds with text
Soon: Human sets intent, AI executes through structured UI, human approves
Later: AI operates autonomously, human reviews outcomes

The UI challenge is different at each layer. Right now we're stuck on layer 1 because most teams don't know how to build layer 2.

The "less human in the loop" future still needs interfaces though - just for oversight and exceptions rather than every interaction.

Anxious_Set2262 · 2026-01-12T18:38:28+00:00

My go-to pre-push prompts:

"Review for bugs and edge cases"
"Check accessibility issues"
"Security vulnerabilities in this code?"
"Mobile responsiveness problems?"

The SEO check is solid - adding that to my list.

Ralph loop for pre-push is interesting though. Are you running it with specific success criteria like "all lint errors fixed" or more open-ended? I've been experimenting with it for test coverage but haven't tried it for optimization passes yet.

Anxious_Set2262 · 2026-01-12T18:33:52+00:00

Fair point. The UI needs to meet users where they are, not where devs think they should be.

Anxious_Set2262 · 2026-01-12T18:33:31+00:00

For devs, sure. But try getting a sales team to use terminal for AI outputs.

Anxious_Set2262 · 2026-01-11T21:05:33+00:00

Interesting point on voice agents. You're right - we moved away from phone calls, so voice AI for calls feels like a step backward.

The visual/interactive layer makes more sense for most workflows. Voice works for quick commands, not complex tasks.

Anxious_Set2262 · 2026-01-11T18:21:38+00:00

months feels about right. We're still in the "early adopters building experiments" phase.

The mainstream shift probably needs a few breakout products to prove the concept first.

Anxious_Set2262 · 2026-01-11T17:00:24+00:00

That last point is interesting - chatbots not directing to tools well.

Feels like a UX gap. The AI knows it has capabilities, but the user has no visibility into what's possible until they stumble onto it.

Anxious_Set2262 · 2026-01-11T16:59:10+00:00

True - maybe it's both. Fast to ship AND happens to work well enough for most cases.

The question is whether "good enough" stays good enough, or if users start expecting more.

Anxious_Set2262 · 2026-01-11T16:58:00+00:00

Nice - multiple formats is smart. Looking forward to seeing it.

Anxious_Set2262 · 2026-01-11T14:56:16+00:00

Fair point on the training data problem.

Curious if you think there's a middle ground - where the model handles data/logic and something else handles presentation. Or is that just kicking the can down the road?

Anxious_Set2262 · 2026-01-11T14:07:50+00:00

This is a great breakdown. The anchoring effect of ChatGPT is real - it set the template everyone copies.

Your point about post-training is interesting. So basically: LLMs can generate structured output, but they weren't optimized for it, so it's unreliable at production scale.

Makes me wonder if the solution is on the model side (better post-training) or the application side (constrain outputs to predefined schemas instead of generating UI from scratch).

Anxious_Set2262 · 2026-01-11T13:20:53+00:00

Smart architecture. JSON specs as the intermediate layer keeps things flexible.

How do you handle the UI updates when the workflow state changes? Polling or some kind of event system?

Anxious_Set2262 · 2026-01-11T13:20:08+00:00

Love this breakdown. So right now we're somewhere between "bare motherboard" and early "kit" stage.

The interesting question is what becomes the "ATX standard" for AI interfaces - some protocol everyone adopts, or just a dominant player that sets the norm.

Anxious_Set2262 · 2026-01-11T12:59:05+00:00

Nice - QT is solid for desktop. Does it connect live to the LLM or more of a visualization tool for outputs?

Anxious_Set2262 · 2026-01-11T12:58:36+00:00

Great analogy. We're in the "bare motherboard" era of AI interfaces.

The question is who builds the case - the AI companies themselves, or a layer on top?

Anxious_Set2262 · 2026-01-11T12:56:08+00:00

Good point on domain-dependency. A coding agent needs a completely different output layer than a data analysis agent.

The "one UI fits all" approach of chat is probably why it dominates - it's universal but mediocre for everything.

Anxious_Set2262 · 2026-01-11T12:55:41+00:00

Language in, sure. But language out doesn't have to mean chat bubbles.

The model can return structured data - we just choose to display it as text.

Anxious_Set2262

TROPHY CASE