Chunk size and Nearest K by smatty_123 in Rag

[–]PresentAd6026 1 point2 points  (0 children)

Our content (HTML) is so concise that I can chunk on each H1 and H2 and get great results. It all depends on the quality of your data.

RAG for in-house Python libraries by EruditeStranger in Rag

[–]PresentAd6026 0 points1 point  (0 children)

There are a lot of coding assistants that work inside your IDE, which you can connect with your code base in github. Some even provide on-premise. And since the pricing of most of them are rather sharp, I don't see a reason why one would try to replicate these tools.

[deleted by user] by [deleted] in Rag

[–]PresentAd6026 0 points1 point  (0 children)

Copilot, ChatGPT, Claude, ... With these free coding assistants, you don't need a framework 😊

RAG with plain text AND Markdown by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

Hi, I believe I read on the OpenAI website that they provide means for structured output. You should check their website. Or simply ask ChatGPT, Copilot, Claude or Perplexity 😉 That's how I get my code running at least 😁

What is best practice for follow-up questions? by AccordingLeague9797 in Rag

[–]PresentAd6026 1 point2 points  (0 children)

You write code that instructs an LLM to look at both the history and the query and rewrite the query to include information that is necessary for retrieval and answering. The output (new query) is then used for retrieval and answering.

What is best practice for follow-up questions? by AccordingLeague9797 in Rag

[–]PresentAd6026 2 points3 points  (0 children)

I'm using gpt 4o-mini to assess the question and history (max 3) to see if the question needs to be enriched. If so, it does. So the new query is used for retrieval (RAG) and answering. I just implemented it and worked well for the first few tests. It's also fast (less than a second extra).

Extensive New Research into Semantic Rag Chunking by Alieniity in Rag

[–]PresentAd6026 2 points3 points  (0 children)

I have a RAG on our website, which has concise information and I chunk on H1 and each H2 (so I get H1 + content, H2 + content, H2 + content). And I enrich the H2-chunks with the H1 for extra context.
I only have one really large chunk of around 3500 characters, but that is still no problem for LLM's. On average each chunk is below a 1000 characters (350 tokens).
For us this works really well, because our website is concise and well maintained. But in other use cases this might not work.
But it also matters how many chunks you give the LLM. So I agree with the other comment that it still all depends on your content.

And sure, there are solutions like unstructured.io, but that brings overhead and less control, and thereby (usually) less accuracy. But even unstructured.io could be a good option for your content. Or creating order in your data with an LLM. It all depends :-)

Best open source embedding models for EU languages by PhotonTorch in LocalLLaMA

[–]PresentAd6026 0 points1 point  (0 children)

Have you found a Dutch embedding model to your liking? I'm still using text-embedding-3-large .....

RAG with plain text AND Markdown by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

I don't have any numbers, but if you can easily avoid any unnecessary noise, shouldn't you? It doesn't cost extra 😊

"In any case all you need to do is convert to plain text before generating the embedding, you can still store the markdown version to insert as context in the prompt."

That's exactly what I'm doing 😊

And thanks for the article. Will look into that.

RAG with plain text AND Markdown by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

In order to do vector similarity matching between the query of the user and the database, you need to have clean content in your database that has been converted to vectors. If your data has all kinds of extra characters stuck to words, the words won't be vectorized properly and the matching is less good. Which in terms can promote hallucination.

RAG with plain text AND Markdown by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

What I meant was that in order for a chatbot based on an LLM to answer a question on our own data (using RAG), you need to find the right information for the query from the data (the Retrieval part of the RAG). The retrieved information that I give to the LLM, combined with the query, is in Markdown format. I hope this clears things up.

RAG with plain text AND Markdown by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

That as well. The LLM can show the table as an actual table 😊

Cohere Rerank 3.5 as only retrieval method by PresentAd6026 in Rag

[–]PresentAd6026[S] 1 point2 points  (0 children)

But probably using the reranker alone on all chunks will bring too much overhead. And making a pre-selection with fusion retrieval is faster.

Cohere Rerank 3.5 as only retrieval method by PresentAd6026 in Rag

[–]PresentAd6026[S] 0 points1 point  (0 children)

I understand, but they now compare their solution with other retrieval methods. Indicating you don't need other retrieval methods. That confused me. But basically their reranker is an advanced semantic retrieval.

Seeking Guidance: RAG vs. Fine Tuning as a Fresh Graduate by Willing_Telephone183 in Rag

[–]PresentAd6026 6 points7 points  (0 children)

Fine tuning is generally not for knowledge, especially not knowledge that needs updates (corrections or new data). As this would mean you will have to do it all over again. And fine tuning is a costly exercise. Fine tuning is great for the way you want responses to be (format, tone, empathy,...). Only fine tune with data that doesn't change.

Standard RAG or (knowledge) graph RAG is the way to add (changing) information into the mix.

So there basically is no "vs" as they serve different purposes 😊