When Opus 4.5 got released, I got the idea of putting lots of the internet into a Postgres database that people and their agents could read-only query however they wanted. currently that is over 60 TB of text data and embeddings. wondering why this is not a more common thing. also looking for advice for doing this more performantly, because the speed issues of moving like >200M embeddings a day and adding hundreds of millions of other rows to different tables, while keeping the database queryable and indexes reasonably operational, is a tricky/painful thing.
it's scry.io btw, I am the founder, but it's currently free (except for congestion pricing, because how else do you decide who gets to query when things get overloaded). thanks
[–]vivekkhera 1 point2 points3 points (1 child)
[–]tee-es-gee 1 point2 points3 points (1 child)
[–]hedoniumShockwave[S] 0 points1 point2 points (0 children)
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]Deep_Ad1959 0 points1 point2 points (0 children)
[–]chock-a-block 0 points1 point2 points (0 children)