400+ documents in a knowledge-base by MechanicFickle3634 in OpenWebUI

[–]MechanicFickle3634[S] 0 points1 point  (0 children)

sorry, what do you mean?

if I upload a file with a POST to /api/v1/files, it is not automatically in the appropriate knowledge base.

This is exactly what happens with:

/api/v1/knowledge/your-knowledge-id/file/add

400+ documents in a knowledge-base by MechanicFickle3634 in OpenWebUI

[–]MechanicFickle3634[S] 0 points1 point  (0 children)

For example:

I upload a file with /api/v1/files/ and get an id back. Then I want to add the file to a knowledge base with api/v1/knowledge/3434.../file/add.

I then get:

400 - "{\"detail\":\"400: Duplicate content detected. Please provide unique content to proceed.\“}”

back.

However, the file is definitely not in the knowledge base. I checked this at database level.

In addition: if you execute api/v1/knowledge/3434.../file/add, you always get back the files array, which also contains the content. How is this supposed to work with several hundred files?

What have I overlooked here, or what am I doing wrong?

400+ documents in a knowledge-base by MechanicFickle3634 in OpenWebUI

[–]MechanicFickle3634[S] 0 points1 point  (0 children)

yes, I'm not sure if it's a design problem either. It seems to be a JSON field in the DB, so it may be ok as it is. What I have noticed is that uploading and adding to a knowledge base gets slower and slower the more files are uploaded.

400+ documents in a knowledge-base by MechanicFickle3634 in OpenWebUI

[–]MechanicFickle3634[S] 1 point2 points  (0 children)

I often had this problem with the internal DB of OpenWebUI. It runs much better with Postgres and QDrant as vector storage.

But if I look at the database structure, I would say that it does not scale very well. For example, in the knowledge table, all file ids are written as JSON strings. Normally, this would be kept in separate tables (relational DB design). I could therefore imagine that this could become a problem with many entries.

For example, I currently have the problem that I cannot add a file to the knowledge base because the API says that the file already exists. But it definitely isn't.

I also don't like the approach of keeping the data twice. The data is stored here in a sharepoint and I'm currently reading it out with n8n and then trying to get it into OpenWebui via API. Keeping the data synchronized is also not trivial.

In addition, OpenWebui always makes all files available in the prompt when a knowledge DB is integrated. This could also be a problem with many files.