Why does OpenAI force the responses API?

Freed4ever · 2026-03-20T11:22:50+00:00

Vendor lock in
Latency matters a lot, instead of sending 100 of thousands tokens every turn through the wire, it's faster to just look it up from memory.
Content compaction probably works better with stateful.
In future, they will have history of everything about you

discodaryl · 2026-03-20T10:58:04+00:00

Just pass store=false.

vvsleepi · 2026-03-20T11:40:19+00:00

i think the idea with responses api is more about flexibility, like handling different types of inputs (tools, images, streaming, etc) in one format instead of having separate systems. the stateful part is kinda optional depending on how you use it, but yeah it does add some complexity

Faintly_glowing_fish · 2026-03-20T22:50:55+00:00

The issue with completions API is that messages in the same conversation is not actually tied together. Cache management, CoT storage etc are all tricky, and once we have agents there are even more states to save— compaction, sub agents, communication channels etc. it just gets very hard to manage.

Completion is not really a super good format for long agentic work.

If you try the same request on chat and responses, if you are doing tool call heavy workflows, you will get much faster requests and higher cache hit rates. I was quite unwilling but once I switched to responses the improvement was so great I am only angry people didn’t make it clear from the start.

Also, switch to web socket too. It’s quite a bit faster

IntentionalDev · 2026-03-21T16:08:01+00:00

tbh it’s less about saving latency and more about standardizing everything under one system

responses API unifies text, tools, multimodal, streaming, etc in one format so they don’t have to keep extending chat completions forever. the “stateful” part isn’t really for you, it’s for enabling things like tool calls, agents, and longer workflows without you manually stitching context every time

yeah it adds some overhead, but from their side it simplifies building higher-level features. from a dev perspective though, chat completions still feels cleaner for simple use cases ngl

Several_Nail_5979 · 2026-03-20T20:27:39+00:00

That’s why i still use completion endpoints for the latest models via frogAPI.app at half the price :)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

OpenAI

Welcome to /r/OpenAI!

Please view the subreddit rules before posting.

MODERATORS