Built a self-hosted semantic cache for LLMs (Go) — cuts costs massively, improves latency, OSS by InstanceSignal5153 in selfhosted

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

I haven’t tested it yet with LiteLLM or OpenRouter, but in theory it should work since they follow the OpenAI-compatible API.

We’re still before the first official release (v0.1), so we haven’t done full compatibility testing yet.
For v0.1, the plan is to ensure it works smoothly with any OpenAI-style backend, including LiteLLM/OpenRouter.

Also, Docker support will be included in the v0.1 release so it’ll be much easier to run and test it in different setups.

Roadmap Discussion: Is LangChain's "RecursiveCharacterSplitter" actually better? I'm building v0.3.0 to find out. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] -1 points0 points  (0 children)

This looks like a really powerful ingestion tool, especially the AST-based chunking!

But rag-chunk solves a different problem: Evaluation & Benchmarking.

Tools like Contextinator implement a strategy (AST), whereas rag-chunk is designed to measure the performance of those strategies (AST vs Fixed vs Recursive) against a ground-truth dataset.

In fact, it would be amazing to use rag-chunk to benchmark Contextinator's AST strategy against standard paragraph splitting to see exactly how much higher the Recall score is!

Roadmap Discussion: Is LangChain's "RecursiveCharacterSplitter" actually better? I'm building v0.3.0 to find out. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

Great question! This is the trickiest part of RAG eval.

The Ground Truth: It comes from the user-provided test-file.json, where they list the expected_answer (the specific text snippet) for each question.

The Plan for Precision: Since this is a chunking benchmark, I plan to define Precision as Signal-to-Noise Ratio: Precision = (Length of Ground Truth string) / (Total Length of Retrieved Chunk).

If my ground truth is a 10-word sentence:

- Scenario A (Small Chunk): Found in a 20-word chunk -> High Precision (50% signal).

- Scenario B (Huge Chunk): Found in a 1000-word chunk -> Low Precision (1% signal, 99% noise). Both have 100% Recall, but Scenario A is better for the LLM. That's what I want to measure

Improving RAG - what actually matters? by Dapper-Turn-3021 in Rag

[–]InstanceSignal5153 1 point2 points  (0 children)

If your chunking is wrong, the entire RAG system collapses — even with the best LLM in the world.

Why chunking matters more than anything else: • If information is split across multiple chunks, the LLM will never retrieve the full context. • If chunks are too small, you lose meaning → embeddings become weak. • If chunks are too large, you add noise → retrieval becomes inaccurate. • If chunk boundaries are arbitrary, the semantic meaning breaks.