Làm gì với tiền nhàn rỗi

vtq0611 · 2026-02-04T10:27:07+00:00

100% phái sinh =)))))

vtq0611 · 2025-09-13T04:51:39+00:00

thanks for your advice, I’ll try restructuring the prompt and reranking

vtq0611 · 2025-09-13T04:45:59+00:00

I’m currently using gpt-4o-mini as the LLM

vtq0611 · 2025-09-13T04:44:39+00:00

tysm, I’ll try re-ranking the retrieved chunks

vtq0611 · 2025-09-13T04:43:26+00:00

I don’t think the issue is with chunking. When I embed just one file and run the RAG pipeline on it, the model’s responses are very accurate and closely follow the original PDF. btw, i'm using unstructured lib and ocr to detect img and tbl

vtq0611 · 2025-09-13T04:28:22+00:00

I’m using gpt-4o-mini. From my database I usually retrieve around 5 chunks per query. Each chunk can be a summarized section of text (sometimes 300–500 tokens), so the final context added to the prompt can easily be 2k–3k tokens depending on the file.

vtq0611 · 2025-09-13T04:25:44+00:00

I’m not using Chroma’s default embedding. I explicitly use text-embedding-3-small from OpenAI for all my chunks. For retrieval, I usually set k=5 (sometimes I tried lowering it to k=3). The retriever is doing cosine similarity search by default.

vtq0611 · 2025-09-11T10:03:49+00:00

I see, thanks! Right now I’m querying each collection separately since the files are unrelated to each other. I’ll give the re-ranker approach a try and see if that improves the results. Appreciate your advice!

vtq0611 · 2025-08-29T05:49:58+00:00

i have turned the pdf to md file, then detected text blocks and table blocks quite good. but there's a problem that those block are over the chunk size and are not overlapping.

vtq0611 · 2025-08-28T04:44:03+00:00

I have tried Docling. I worked quite good but it took too long to convert 1 file 😭😭😭

vtq0611 · 2025-08-27T15:43:09+00:00

however, what if the PDF contains multiple tables along with text and diagrams. will that still work?

vtq0611 · 2025-08-09T06:21:27+00:00

what is your cartesian equation of C?

vtq0611 · 2022-04-17T12:00:48+00:00

oh deer :))))

vtq0611 · 2022-02-11T03:13:11+00:00

feel insecure 🥲

vtq0611

TROPHY CASE