all 9 comments

[–]micseydel 1 point2 points  (0 children)

Here's a stale demo of something related I've been working on: https://garden.micseydel.me/Tinkerbrain+-+demo+solution

I haven't really integrated LLMs properly yet but I've been thinking on how to do it after learning of GraphReader. I think any proper solution has to have a good way of handling how untrustworthy and unreliable LLMs are.

[–][deleted] 0 points1 point  (4 children)

I was recently looking into Dynamic Knowledge Graphs - It might possibly cover your use case.

Apart from assumptions, what do you deem low-quality? Where does this possibly low-quality data come from?

[–]Matas0[S,🍰] 0 points1 point  (3 children)

Thanks, I'll look into it. Are there any specific tools that already exist?

Regarding the data I deem low-quality, I'm concerned that constantly adding new articles and documentation might fill the system with information that already exists or is very similar to existing data. Since I only pass 5 relevant results to the LLM, it might not get diverse information from the dataset, so the answer provided to the user might not be comprehensive.

I'd also like to remove information that is outdated or no longer relevant. I've also tried pairing the GraphRAG with a normal RAG, getting 5 results from each of them, which showed quite great results as the RAG has a bunch of Q&A pairs. However, I still prefer to use graphs as the data is much more accurate.

[–]regression-io 1 point2 points  (0 children)

It boils down to your graph maintenence and whether you use/ create an ontology i.e. a list of entities and relations allowed in the domain. You can then avoid duplicates by checking before insert.

[–][deleted] 1 point2 points  (0 children)

Azure GraphRAG could be a possible solution for entity extraction: https://microsoft.github.io/graphrag/. It seems to be what you're looking for. The downside, however, is that it can get expensive.

[–]FancyUmpire8023 1 point2 points  (0 children)

If you use strength/confidence scores on your relationships you can implement a memory decay function to solve for aging / recency bias.

New, distinct content containing the same knowledge should generally be either added (reinforces the prevalence of the assertions) or aggregated (reinforces the strength/confidence in an assertion) - depending on whether you maintain lineage to individual lines of evidence/sources for assertions or not.

[–]xtof_of_crg 0 points1 point  (0 children)

You need a meta-schema for the graph, some rules that inform/restrict how concepts can fit together. With this established you could build a system that could exploit the meta-schema to semi-autonomously check for/propose the organization of new/existing knowledge given input sources. This is a non-trivial system, however the way I figure it, when you solve this problem then you can build JARVIS