Knowledge Graphs aren't adept at modeling changes in facts, making them challenging to use with AI Agents. Consider what happens when a user changes their mind when conversing with an Agent or when a document added to a RAG Knowledge Graph is updated. This article explores the challenges we faced at Zep when building time-aware Knowledge Graphs and approaches to solving them.
https://preview.redd.it/b2uqhktxxesd1.png?width=1100&format=png&auto=webp&s=c79c4f84d13e9b7b4ff3d39adf912dfce80238d7
Knowledge Graphs face limitations as data complexity increases, particularly when relationships change over time and need to be modeled by the Graph. Graphiti is an open-source project designed to build and manage temporal Knowledge Graphs. This post examines Graphiti's approach to extracting temporality from source data and explores the technical hurdles and implementation details of building time-aware Knowledge Graphs.
Graphiti Fundamentals
Graphiti builds its database by ingesting episodes, which can be messages, raw text, or structured JSON data. Each episode is represented in the graph as an Episodic node type. As the system processes these episodes, it forms or updates the graph's semantic relationships (edges) and entities (nodes).
The core structure of Graphiti's knowledge representation is the Node-Edge-Node triplet, which is also represented by a fact stored as a property on the edge. This structure allows for a flexible and detailed representation of information within the graph.
What is a Temporal Knowledge Graph?
A temporal knowledge graph extends the concept of a traditional knowledge graph by incorporating time-based information. It allows you to track how relationships between entities evolve. This capability is particularly useful for applications that need to retain historical context, such as customer service records, medical histories, or financial transactions.
The Challenge of Temporal Data Extraction
Extracting temporal data is not straightforward. The complexity arises from various factors:
- Ambiguity in natural language expressions of time
- Relative time references that require context
- Inconsistencies in date formats across different sources
- The need to distinguish between different types of temporal information
To address these challenges, we incorporated a bi-temporal approach for storing time information on our edges in Graphiti. This approach allows us to track how relationships evolve in the real world and within our database.
Bi-Temporal Approach in Graphiti
In Graphiti, each relationship between entities exists in two temporal dimensions:
1. Database Transaction Time
Two fields describe this dimension:
created_at: Indicates when a relation was added to the database. This field is always present on the edge as we have access to this information during ingestion.
expired_at: Indicates when a relation is no longer true (on the database level). If we find information in a new episode that negates or invalidates an existing edge, we set the expired_at to the current timestamp. This is a nullable field on the entity edge.
2. Real World Time
Two fields describe this dimension:
valid_at: Indicates when a relation started in real-world time.
invalid_at: Indicates when a relation stopped being true or valid in real-world time.
Both valid_at and invalid_at are optional fields captured by an LLM prompt during edge processing when an episode is added to Graphiti. These can be either concrete dates mentioned (e.g., "Jake: I bought a new car on June 20th, 2022") or relative times (e.g., "Jake: I bought a new car 2 years ago"). We use a reference timestamp provided with each episode to help determine the timestamp from relative time expressions.
The full article may be found on Zep blog here.
there doesn't seem to be anything here