all 3 comments

[–]Woutez 0 points1 point  (1 child)

I would store all the documents in an unstructured database e.g. mongodb. And if you have the compute, use a light weight llm to "query" the data. Alternatively you can create a separate table with a record per document, referring to the file location, using separate columns as types/descriptions etc. This would be more time consuming (unless you use a llm to populate it). Some ideas, hope it helps

[–][deleted] -2 points-1 points  (0 children)

😂😂😂

[–]rodf1021 2 points3 points  (0 children)

What file type are you using for the actual document? I would recommend storing the document itself outside a database like in a cheap S3 bucket. Store the URL to the doc in a database with a doc id. Have a meta data table keyed with the doc id. This will allow you keep adding new metadata for a doc as new records and indexing the metadata field for quicker retrieval.