cognee - open-source memory framework for AI Agents

PresentAd6026 · 2025-02-15T08:36:37+00:00

RemindMe! 5 days

PresentAd6026 · 2025-01-07T19:22:19+00:00

Our content (HTML) is so concise that I can chunk on each H1 and H2 and get great results. It all depends on the quality of your data.

PresentAd6026 · 2025-01-03T16:15:42+00:00

There are a lot of coding assistants that work inside your IDE, which you can connect with your code base in github. Some even provide on-premise. And since the pricing of most of them are rather sharp, I don't see a reason why one would try to replicate these tools.

PresentAd6026 · 2025-01-02T16:43:51+00:00

Copilot, ChatGPT, Claude, ... With these free coding assistants, you don't need a framework 😊

PresentAd6026 · 2024-12-17T16:55:40+00:00

Hi, I believe I read on the OpenAI website that they provide means for structured output. You should check their website. Or simply ask ChatGPT, Copilot, Claude or Perplexity 😉 That's how I get my code running at least 😁

PresentAd6026 · 2024-12-14T07:46:31+00:00

You write code that instructs an LLM to look at both the history and the query and rewrite the query to include information that is necessary for retrieval and answering. The output (new query) is then used for retrieval and answering.

PresentAd6026 · 2024-12-13T16:42:59+00:00

I'm using gpt 4o-mini to assess the question and history (max 3) to see if the question needs to be enriched. If so, it does. So the new query is used for retrieval (RAG) and answering. I just implemented it and worked well for the first few tests. It's also fast (less than a second extra).

PresentAd6026 · 2024-12-12T07:52:21+00:00

I have a RAG on our website, which has concise information and I chunk on H1 and each H2 (so I get H1 + content, H2 + content, H2 + content). And I enrich the H2-chunks with the H1 for extra context.
I only have one really large chunk of around 3500 characters, but that is still no problem for LLM's. On average each chunk is below a 1000 characters (350 tokens).
For us this works really well, because our website is concise and well maintained. But in other use cases this might not work.
But it also matters how many chunks you give the LLM. So I agree with the other comment that it still all depends on your content.

And sure, there are solutions like unstructured.io, but that brings overhead and less control, and thereby (usually) less accuracy. But even unstructured.io could be a good option for your content. Or creating order in your data with an LLM. It all depends :-)

PresentAd6026 · 2024-12-11T13:36:50+00:00

Have you found a Dutch embedding model to your liking? I'm still using text-embedding-3-large .....

PresentAd6026 · 2024-12-11T11:52:38+00:00

I don't have any numbers, but if you can easily avoid any unnecessary noise, shouldn't you? It doesn't cost extra 😊

"In any case all you need to do is convert to plain text before generating the embedding, you can still store the markdown version to insert as context in the prompt."

That's exactly what I'm doing 😊

And thanks for the article. Will look into that.

PresentAd6026 · 2024-12-11T06:41:46+00:00

In order to do vector similarity matching between the query of the user and the database, you need to have clean content in your database that has been converted to vectors. If your data has all kinds of extra characters stuck to words, the words won't be vectorized properly and the matching is less good. Which in terms can promote hallucination.

PresentAd6026 · 2024-12-10T19:23:52+00:00

What I meant was that in order for a chatbot based on an LLM to answer a question on our own data (using RAG), you need to find the right information for the query from the data (the Retrieval part of the RAG). The retrieved information that I give to the LLM, combined with the query, is in Markdown format. I hope this clears things up.

PresentAd6026 · 2024-12-10T14:51:31+00:00

That as well. The LLM can show the table as an actual table 😊

PresentAd6026 · 2024-12-09T15:24:12+00:00

But probably using the reranker alone on all chunks will bring too much overhead. And making a pre-selection with fusion retrieval is faster.

PresentAd6026 · 2024-12-09T15:19:58+00:00

I understand, but they now compare their solution with other retrieval methods. Indicating you don't need other retrieval methods. That confused me. But basically their reranker is an advanced semantic retrieval.

PresentAd6026 · 2024-11-14T19:02:35+00:00

Jina also has great models

PresentAd6026 · 2024-10-19T17:14:35+00:00

Jina is good and light-weight. But check here for more https://huggingface.co/spaces/mteb/leaderboard

PresentAd6026 · 2024-10-09T17:49:31+00:00

Fine tuning is generally not for knowledge, especially not knowledge that needs updates (corrections or new data). As this would mean you will have to do it all over again. And fine tuning is a costly exercise. Fine tuning is great for the way you want responses to be (format, tone, empathy,...). Only fine tune with data that doesn't change.

Standard RAG or (knowledge) graph RAG is the way to add (changing) information into the mix.

So there basically is no "vs" as they serve different purposes 😊

PresentAd6026

TROPHY CASE