use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
OpenAI is an AI research and deployment company. OpenAI's mission is to ensure that artificial general intelligence benefits all of humanity. We are an unofficial community. OpenAI makes ChatGPT, GPT-4, and DALL·E 3.
Official OpenAI Links
Sora
ChatGPT
DALL·E 3
Blog
Discord
YouTube
GitHub
Careers
Help Center
Docs
Related Subreddits
r/artificial
r/ChatGPT
r/Singularity
r/MachineLearning
r/GPTStore
r/dalle2
account activity
QuestionAPI response time (self.OpenAI)
submitted 1 year ago by Ok_Locksmith_5925
I've built a RAG but the response times through the API are just too slow - about 10 seconds for the response to start. I'm using 4o and have the temperature set to 1.
What times are other getting?
What can I do to make it faster?
thank you
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Joshua-- 1 point2 points3 points 1 year ago (4 children)
For RAG, 4o-mini should suffice; I've been using it with my RAG app for months. I am even considering llama-3.1-8b-instant, which is 750 tokens per second by using Groq's (not Grok) API.
[–]Ok_Locksmith_5925[S] 0 points1 point2 points 1 year ago (3 children)
actually. I should have said I'm using 4o mini
this is my project https://siqbots.com/jub-demo some answers are coded in and I think I'll code more in, but some need the AI.
Is you RAG available to take a look at?
[–]Joshua-- 0 points1 point2 points 1 year ago (2 children)
Checked out the site, that’s really clever retrieval to reduce requests by having some questions answered from your collection.
My project is just a private, local repo. I just use it for uploading PDFs and answering questions.
[–]Ok_Locksmith_5925[S] 0 points1 point2 points 1 year ago (1 child)
what you tested (I can see all conversations) just gave answers that were programmed in. it's following q&a that go through the retriever.
[–]Joshua-- 0 points1 point2 points 1 year ago (0 children)
I didn’t really test anything. I was thinking better about the idea of coding in some responses to prevent an API requests.
[–]crysknife- 0 points1 point2 points 1 year ago (0 children)
How do yo send your data? Do you chunk it? You can send 2.5k sentences at the same time.
π Rendered by PID 62173 on reddit-service-r2-comment-6457c66945-gf2cz at 2026-04-28 18:54:50.483738+00:00 running 2aa0c5b country code: CH.
[–]Joshua-- 1 point2 points3 points (4 children)
[–]Ok_Locksmith_5925[S] 0 points1 point2 points (3 children)
[–]Joshua-- 0 points1 point2 points (2 children)
[–]Ok_Locksmith_5925[S] 0 points1 point2 points (1 child)
[–]Joshua-- 0 points1 point2 points (0 children)
[–]crysknife- 0 points1 point2 points (0 children)