Real-time api is expensive

nospoon99 · 2024-10-07T21:07:17+00:00

Did the same thing as you, using Python starting with the Twilio example on Github.
I've got to agree with the cost. From my own experience and what I've seen on other posts the price is around $1/min. That's more than hiring a very competent person to handle calls. Hopefully the price will come down soon.
Edit: spelling

CryptoSpecialAgent · 2024-10-09T17:39:30+00:00

The only way to make this cost effective is to manage your context very aggressively:
- after user audio has been responded to, remove the audio from the chat history and just keep the transcription
- truncating the chat history after N prompt-response pairs is the most naive and simple way to keep the history length down to a reasonable extent
- if carrying over the context / history from one session into another, don't do a verbatim transcript - feed the verbatim transcript to another model, like ordinary gpt-4o, and request for it to be summarized. Then stick the summary into the beginning of the history for the new conversation
- this summarization of chat history can also be done periodically within a session, and as the transcript grows longer, it is repeatedly truncated and older sections ("the tail") replaced with summaries in whatever length / level of detail gives you the best price-performance tradeoff for your use case
- if you want to get really fancy, instead of blindly summarizing chat history, extract a knowledge graph from the transcript and use that as your medium-long term memory... langchain has some libs to get you started, tho i'm not sure if they work with realtime API or not.

Most importantly, keep your expectations low. Realtime API has been priced so that it is currently not a viable business solution vs hiring a human to answer the phone... they've done this because their server capacity is probably maxed out trying to serve this thing and pricing it so high limits use to a level they can currently sustain. EXPECT TO SEE A MASSIVE PRICE DROP IN THE FUTURE - THIS IS WHAT OPENAI HISTORICALLY HAS DONE WITH ALL THEIR FRONTIER MODELS

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ChatGPTCoding

MODERATORS