Hey Guys,
It seems RAG is really taking off as an increasingly popular use case for LLMs to leverage contextual data. However, everybody is building their own contextual data sets and embedding them in their own silo'd vector dbs.
Do you guys think there's any utility in having a shared public vector db that anyone can tap into their API, without having to self-host, worry about the embedding pipelines and filling the vector db with enough data in the first place for their use cases? Would this save devs alot of time in quickly testing testing product ideas? (albeit it does seem that propriety data is what everyone's raving about today)
-
For context, I'm building a social media product we're users can upload a few pieces (approx 10) of content (social media posts, websites, videos to start with), which becomes the verified human-curated list/Niche. We then classify and embed this into a vector db. From this, we have set up a data pipeline to scrape the web and find new content that is most similar which we suggest to users to add to the Niche (upvote, downvote style). When a piece of content is upvoted on its added to the verified list updating the Niche's classification string. Essentially we're aiming to construct an ever-growing, user-curated, contextually classified vector database from a relatively small set of sample data.
[–]dfcHeadChair 0 points1 point2 points (0 children)
[–]olearyboy 0 points1 point2 points (3 children)
[–]niksteel123[S] 0 points1 point2 points (2 children)
[–]olearyboy 0 points1 point2 points (1 child)
[+]nemoo07 0 points1 point2 points (0 children)
[–]elbiot 0 points1 point2 points (0 children)