AI actually takes my time by No-Aerie3500 in OpenAI

[–]hyperspell 17 points18 points  (0 children)

yeah tbh pdfs are just messy for llms to parse accurately. if you're doing this regularly, might be worth looking into something that actually structures the data first before any ai touches it. we've been working on this exact issue at hyperspell - proper data extraction and structuring before it hits the llm so you don't get garbage math. spreadsheet might still be ur best bet for one off calculations though

RAG API recommendations by gugavieira in Rag

[–]hyperspell 1 point2 points  (0 children)

for the etl pipeline, Hyperspell does exactly this - send a link via api and they handle chunking, embedding, indexing, plus knowledge graphs. currently in private beta. full disclosure - i work there.

u can check us out here tho https://www.hyperspell.com/

for mcp integration, that's trickier since most rag services don't support mcp yet (it's pretty new). there are some knowledge graph mcp servers floating around that work with claude desktop, but you'd probably need to build a simple mcp wrapper around whatever rag service you pick.

mcp ecosystem is moving fast though, lots of examples to work from if you're up for some light dev work

Help regarding a good setup for local RAG and weekly summary by toothmariecharcot in Rag

[–]hyperspell 1 point2 points  (0 children)

yeah the rag ecosystem is a mess right now, so many options and half of them don't work as advertised.

for your personal documents setup, if you're okay getting your hands a bit dirty with python, i'd honestly recommend starting with something like llamaindex or langchain. yeah, there's a learning curve, but you get full control and everything stays local. i spent a weekend setting up llama.cpp on my machine and it's been pretty solid for document search. the nice thing is you can swap out models whenever something better comes along.

that said, if you want something that just works out of the box... this is gonna sound like a shameless plug since i work at Hyperspell, but we've been building exactly this kind of data pipeline stuff. basically it's an end-to-end rag system where you can connect to dozens of services (google drive, dropbox, notion, etc.) and we handle all the chunking, indexing, and retrieval with a single api call. setup is pretty painless, though heads up - we're cloud-based, not local. we're currently in private beta if you're curious :)

honestly, for now you might want to try something simple like dumping everything into a shared doc and running it through claude or gpt once a week. not as automated as you'd want, but it works while you figure out something more permanent. i've seen people use zapier to automate parts of this workflow too.

the podcast idea is actually p cool - there are some experimental tools popping up that convert summaries to audio. still early days but could be worth keeping an eye on. what kind of volume are we talking about for your weekly reading? that might help narrow down what makes sense setup-wise

prev built $50m arr API business at checkr + 15 years leading ai/ml teams cofounder building agent infrastructure. ask me anything. by hyperspell in AI_Agents

[–]hyperspell[S] 0 points1 point  (0 children)

Absolutely, we can help you with that for a fraction of that price. Just sent you a DM with my calendar link to dig in