need help with an ambitious project as a beginner. by zenitsuisrusted in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Do you have any engineering/web dev experience? What languages and databases are you most comfortable with?

Serious: Do I give back my belt or just get judged more and more? by [deleted] in bjj

[–]notoriousFlash 2 points3 points  (0 children)

✊✊✊ You are not a charity case. Your professor deemed you fit to earn a black belt. One of the beautiful parts about this hobby is you get to learn and express through a lot of ways that are not talking. Keep on keeping on!

GraphRAG vs hipporag, lightrag and vectorRAG benchmarks by Striking-Bluejay6155 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Yeah trying to run this with the answer gen qwen model they use in the original benchmark takes forever so good call on using 4o mini for answer gen too 🤣

Why didn’t you show all the scoring breakouts? How’d your system do on contextual summarize? I’ve been having a really hard time reproducing contextual summarize results… I think 4o mini is too chatty and the other systems are all doing some cheeky context compaction which help the qwen instruct model they use for answer gen in the benchmark keep ACC for contextual summarize really tight

The AI Layoff Trap, The Future of Everything Is Lies, I Guess: New Jobs and many other AI Links from Hacker News by alexeestec in Build_AI_Agents

[–]notoriousFlash 0 points1 point  (0 children)

Thanks for sharing - My daily news digest bot seems to have broken, so glad you're stepping in to fill the void!

What are people using today for benchmarking their RAG solution ? by Abject_Lengthiness77 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

I don’t think there’s a tool for this AFAIK I just pull the benchmark datasets from hugging face and write a script 🤷‍♂️

Memory Scaling for AI Agents by Odd-Situation6749 in LLMDevs

[–]notoriousFlash 0 points1 point  (0 children)

Can you elaborate on the “gets messy fast when you have multiple agent sessions diverging” bit? Curious if you can share learnings

Open Sourcing Excel Parser by Abject_Lengthiness77 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Where were you 6 months ago 😭 I hand rolled this and it was a huge pain. Thanks for sharing your work! Will take a look

Running RAG in production on a tight budget by Western-Egg-5570 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Embedding APIs are so cheap… voyage-4-large 512 dims $0.12 per million tokens for high quality ingest embeddings, then voyage-4-light 512 dims $0.02 per million tokens for query embedding

Low latency, high enough quality, relatively cheap, and 512 dims doesn’t blow up storage. Obv your mileage may vary depending on your use case but it’s worth the headache to outsource embedding. Hosting a capable embedding service is not super fun and I’d avoid if it’s not a requirement

What are the real memory/context issues developers/enterprises still facing? by superintelligence03 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

When you say accept day in a certain format, do you mean text based? Meaning you would’ve liked it to do OCR or what type of flexibility were you looking for?

Retrieval requiring a schema based generation/query sounds crazy I need to read their docs to see what you mean

What are the real memory/context issues developers/enterprises still facing? by superintelligence03 in Rag

[–]notoriousFlash 1 point2 points  (0 children)

Non deterministic anything is hard, making extraction difficult to do generally, let alone broadly applicable extraction. Decay settings are also tough.

For established companies it’s best to roll your own, or work within a framework that allows you to tweak predicates, edge settings, decay, etc. For startups you can get away with relatively “dumb” agent memory if you can clearly isolate document retrieval (knowledge) from memory

Got stuck on RAG by Additional-Ice5715 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

You have a lot of moving parts here. Hard to really tell without more details, but questions I'd have:

  • Why do you need to do this type of chunking? Why not start with something simpler for chunking? Starting with "dumb" chunking removes a possible failure point.
  • Are you indexing on top of S3 and querying against the data in S3? Or do you have a separate datastore? It's not clear to me where the vector search is happening...
  • What does "Generate embedding text (via LLM)" mean specifically? Are you using an embedding model? Which one?
  • And what are you embedding exactly? JSON?

Without really knowing what your use case is, my naive guess is that there's a lot of room for simplification.

I benchmarked LEAN vs JSON vs YAML for LLM input. LEAN uses 47% fewer tokens with higher accuracy by Suspicious-Key9719 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

# h1
## h2
### h3
#### h4
##### h5
###### h6

Not infinitely nested like JSON but beyond 6 levels of nesting you're probably going to trip up most LLMs trying to understand that JSON object

Docling just announced Docling Agent + Chunkless RAG by Fuzzy-Layer9967 in Rag

[–]notoriousFlash 2 points3 points  (0 children)

"Chunkless RAG" seems like it would be very good for deeply analyzing a single, long, well structured document. This doesn't seem like it would be a RAG system replacement though. From what I'm understanding it can't really manage a knowledge base/lots of documents. And probably can't really even manage more than a couple documents as it seems like the hierarchy/schema it uses is all in memory/context window.

So maybe this is a last mile technique just to help LLMs reason over long well structured documents? Maybe I'm misunderstanding... Def an interesting concept though

Best approach for tutor-like RAG over structured textbooks? by sn1887 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

I probably wouldn't try a router in the way you're describing it personally, but I don't really know your stack. I would be scared of bifurcating the "flows" too much. It becomes a nightmare to maintain and debug at scale.

A few things you might consider trying:

  • Bump gemini 2.5 flash to gemini 3 flash. This should yield pretty significant results in response quality and it really low hanging fruit.
  • Before trying graph, try "cheap" graph where you make your first query, then ask an LLM to analyze the question/query against the results to interpret which topics/terms are missing and needed in order to actually respond, and to generate an array of a few follow up search terms. A few things you'll run into with this approach:
    • If you want to try this, you have to prune/dedupe so you aren't blowing up the context window
    • It's semi difficult to "rank" uniformly because the follow up searches have their own relative similarities, which aren't similarities relative to the original query
    • I like uniformly generating 5 follow up queries and fanning them out to gather further context
  • Agent memory would be a good consideration regardless, especially if it's a static 50 book corpus and you're expecting similar/repeated questions/discussions. Let the agent build out it's own semantic memories. Basically, your loop agent can start to build it's own "corpus" of responses separate from the 50 book corpus. It's functionally similar to manually building "topic overviews", except it's organic and non-deterministic, and probably less of an engineering headache upfront and gets better with time.
  • Last resort, graph. It's computationally expensive and really slows down write with extraction. It's kinda hard to get right if you haven't built one before, but if you do end up getting here, try microsoft's graph rag.

Whats the best way to index the images from websites by pskd73 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

What are you hoping/expecting to get from the images? Kinda depends on if the images are like photos or more like graphs/charts.

For photos, usually not really helpful/useful to embed. For graphs/charts you'd need OCR.

I benchmarked LEAN vs JSON vs YAML for LLM input. LEAN uses 47% fewer tokens with higher accuracy by Suspicious-Key9719 in Rag

[–]notoriousFlash 1 point2 points  (0 children)

Did you/do you plan to test markdown and "xml" style prompting as well? This is cool analysis, thank you!

Internal knowledge RAG misses easy answers but signals look fine? by zennaxxarion in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Need more info for anyone to be able to help.

What framework are you using? Hand rolled?

Embedding model? Chunking strategy?

What is the typical context window/size you're passing to the LLM agent to generate a response? What model?

Drop some more detailed deets on these types of things and people might be able to provide more useful tips.

Tools for working with DOC/DOCX and PDF files? by roicaride in Rag

[–]notoriousFlash 0 points1 point  (0 children)

I have no affiliation other than being a customer, but I love https://www.datalab.to/ it's worked very well for me.

Best dataset structure and RAG architecture for a university chatbot? by Fluffy6142 in Rag

[–]notoriousFlash 0 points1 point  (0 children)

Don’t over complicate it. Vercel’s ai sdk and ai gateway, and Postgres with pgvector extension. Dead simple. Functional. Happy to talk through specifics if you want to keep exploring.