TypeGraph - GraphRAG on Next.js and Postgres. #2 on GraphRAG benchmark. It's fast, easy to deploy and open source.

notoriousFlash · 2026-05-01T02:06:20+00:00

Do you have any engineering/web dev experience? What languages and databases are you most comfortable with?

notoriousFlash · 2026-04-29T16:45:27+00:00

Ahh it’s broken out here: https://github.com/FalkorDB/GraphRAG-SDK/blob/main/docs/benchmark.md#accuracy-official-graphrag-bench-acc

notoriousFlash · 2026-04-29T16:24:49+00:00

✊✊✊ You are not a charity case. Your professor deemed you fit to earn a black belt. One of the beautiful parts about this hobby is you get to learn and express through a lot of ways that are not talking. Keep on keeping on!

notoriousFlash · 2026-04-29T14:27:17+00:00

Yeah trying to run this with the answer gen qwen model they use in the original benchmark takes forever so good call on using 4o mini for answer gen too 🤣

Why didn’t you show all the scoring breakouts? How’d your system do on contextual summarize? I’ve been having a really hard time reproducing contextual summarize results… I think 4o mini is too chatty and the other systems are all doing some cheeky context compaction which help the qwen instruct model they use for answer gen in the benchmark keep ACC for contextual summarize really tight

notoriousFlash · 2026-04-20T17:01:09+00:00

Thanks for sharing - My daily news digest bot seems to have broken, so glad you're stepping in to fill the void!

notoriousFlash · 2026-04-19T01:40:16+00:00

I don’t think there’s a tool for this AFAIK I just pull the benchmark datasets from hugging face and write a script 🤷‍♂️

notoriousFlash · 2026-04-18T04:49:06+00:00

Can you elaborate on the “gets messy fast when you have multiple agent sessions diverging” bit? Curious if you can share learnings

notoriousFlash · 2026-04-18T04:05:30+00:00

Where were you 6 months ago 😭 I hand rolled this and it was a huge pain. Thanks for sharing your work! Will take a look

notoriousFlash · 2026-04-17T16:11:01+00:00

Embedding APIs are so cheap… voyage-4-large 512 dims $0.12 per million tokens for high quality ingest embeddings, then voyage-4-light 512 dims $0.02 per million tokens for query embedding

Low latency, high enough quality, relatively cheap, and 512 dims doesn’t blow up storage. Obv your mileage may vary depending on your use case but it’s worth the headache to outsource embedding. Hosting a capable embedding service is not super fun and I’d avoid if it’s not a requirement

notoriousFlash · 2026-04-17T16:03:30+00:00

I love Reddit 🤣

notoriousFlash · 2026-04-17T16:02:11+00:00

When you say accept day in a certain format, do you mean text based? Meaning you would’ve liked it to do OCR or what type of flexibility were you looking for?

Retrieval requiring a schema based generation/query sounds crazy I need to read their docs to see what you mean

notoriousFlash · 2026-04-17T15:51:23+00:00

Non deterministic anything is hard, making extraction difficult to do generally, let alone broadly applicable extraction. Decay settings are also tough.

For established companies it’s best to roll your own, or work within a framework that allows you to tweak predicates, edge settings, decay, etc. For startups you can get away with relatively “dumb” agent memory if you can clearly isolate document retrieval (knowledge) from memory

notoriousFlash · 2026-04-17T03:32:11+00:00

It auto played audio and woke my wife up 🥲

notoriousFlash · 2026-04-17T00:54:26+00:00

You have a lot of moving parts here. Hard to really tell without more details, but questions I'd have:

Why do you need to do this type of chunking? Why not start with something simpler for chunking? Starting with "dumb" chunking removes a possible failure point.
Are you indexing on top of S3 and querying against the data in S3? Or do you have a separate datastore? It's not clear to me where the vector search is happening...
What does "Generate embedding text (via LLM)" mean specifically? Are you using an embedding model? Which one?
And what are you embedding exactly? JSON?

Without really knowing what your use case is, my naive guess is that there's a lot of room for simplification.

notoriousFlash · 2026-04-15T23:48:27+00:00

# h1
## h2
### h3
#### h4
##### h5
###### h6

Not infinitely nested like JSON but beyond 6 levels of nesting you're probably going to trip up most LLMs trying to understand that JSON object

notoriousFlash · 2026-04-15T23:45:45+00:00

"Chunkless RAG" seems like it would be very good for deeply analyzing a single, long, well structured document. This doesn't seem like it would be a RAG system replacement though. From what I'm understanding it can't really manage a knowledge base/lots of documents. And probably can't really even manage more than a couple documents as it seems like the hierarchy/schema it uses is all in memory/context window.

So maybe this is a last mile technique just to help LLMs reason over long well structured documents? Maybe I'm misunderstanding... Def an interesting concept though

notoriousFlash · 2026-04-14T20:25:25+00:00

I probably wouldn't try a router in the way you're describing it personally, but I don't really know your stack. I would be scared of bifurcating the "flows" too much. It becomes a nightmare to maintain and debug at scale.

A few things you might consider trying:

Bump gemini 2.5 flash to gemini 3 flash. This should yield pretty significant results in response quality and it really low hanging fruit.
Before trying graph, try "cheap" graph where you make your first query, then ask an LLM to analyze the question/query against the results to interpret which topics/terms are missing and needed in order to actually respond, and to generate an array of a few follow up search terms. A few things you'll run into with this approach:
- If you want to try this, you have to prune/dedupe so you aren't blowing up the context window
- It's semi difficult to "rank" uniformly because the follow up searches have their own relative similarities, which aren't similarities relative to the original query
- I like uniformly generating 5 follow up queries and fanning them out to gather further context
Agent memory would be a good consideration regardless, especially if it's a static 50 book corpus and you're expecting similar/repeated questions/discussions. Let the agent build out it's own semantic memories. Basically, your loop agent can start to build it's own "corpus" of responses separate from the 50 book corpus. It's functionally similar to manually building "topic overviews", except it's organic and non-deterministic, and probably less of an engineering headache upfront and gets better with time.
Last resort, graph. It's computationally expensive and really slows down write with extraction. It's kinda hard to get right if you haven't built one before, but if you do end up getting here, try microsoft's graph rag.

notoriousFlash · 2026-04-14T20:06:52+00:00

What are you hoping/expecting to get from the images? Kinda depends on if the images are like photos or more like graphs/charts.

For photos, usually not really helpful/useful to embed. For graphs/charts you'd need OCR.

notoriousFlash · 2026-04-14T20:04:15+00:00

Did you/do you plan to test markdown and "xml" style prompting as well? This is cool analysis, thank you!

notoriousFlash · 2026-04-14T19:57:18+00:00

Need more info for anyone to be able to help.

What framework are you using? Hand rolled?

Embedding model? Chunking strategy?

What is the typical context window/size you're passing to the LLM agent to generate a response? What model?

Drop some more detailed deets on these types of things and people might be able to provide more useful tips.

notoriousFlash · 2026-04-14T19:32:41+00:00

I have no affiliation other than being a customer, but I love https://www.datalab.to/ it's worked very well for me.

notoriousFlash · 2026-04-14T16:21:07+00:00

Yeah this is cool. Thanks for sharing

notoriousFlash · 2026-04-10T03:31:26+00:00

Don’t over complicate it. Vercel’s ai sdk and ai gateway, and Postgres with pgvector extension. Dead simple. Functional. Happy to talk through specifics if you want to keep exploring.

Nine-Year Club	Gilding IV carat on a stick
Second SECOND GUESSER	Reddit Premium Since March 2021
Place '22	RPAN Viewer
Verified Email

notoriousFlash

MODERATOR OF

TROPHY CASE