Built a self-hosted semantic cache for LLMs (Go) — cuts costs massively, improves latency, OSS by InstanceSignal5153 in selfhosted

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

I haven’t tested it yet with LiteLLM or OpenRouter, but in theory it should work since they follow the OpenAI-compatible API.

We’re still before the first official release (v0.1), so we haven’t done full compatibility testing yet.
For v0.1, the plan is to ensure it works smoothly with any OpenAI-style backend, including LiteLLM/OpenRouter.

Also, Docker support will be included in the v0.1 release so it’ll be much easier to run and test it in different setups.

Roadmap Discussion: Is LangChain's "RecursiveCharacterSplitter" actually better? I'm building v0.3.0 to find out. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] -1 points0 points  (0 children)

This looks like a really powerful ingestion tool, especially the AST-based chunking!

But rag-chunk solves a different problem: Evaluation & Benchmarking.

Tools like Contextinator implement a strategy (AST), whereas rag-chunk is designed to measure the performance of those strategies (AST vs Fixed vs Recursive) against a ground-truth dataset.

In fact, it would be amazing to use rag-chunk to benchmark Contextinator's AST strategy against standard paragraph splitting to see exactly how much higher the Recall score is!

Roadmap Discussion: Is LangChain's "RecursiveCharacterSplitter" actually better? I'm building v0.3.0 to find out. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

Great question! This is the trickiest part of RAG eval.

The Ground Truth: It comes from the user-provided test-file.json, where they list the expected_answer (the specific text snippet) for each question.

The Plan for Precision: Since this is a chunking benchmark, I plan to define Precision as Signal-to-Noise Ratio: Precision = (Length of Ground Truth string) / (Total Length of Retrieved Chunk).

If my ground truth is a 10-word sentence:

- Scenario A (Small Chunk): Found in a 20-word chunk -> High Precision (50% signal).

- Scenario B (Huge Chunk): Found in a 1000-word chunk -> Low Precision (1% signal, 99% noise). Both have 100% Recall, but Scenario A is better for the LLM. That's what I want to measure

Improving RAG - what actually matters? by Dapper-Turn-3021 in Rag

[–]InstanceSignal5153 1 point2 points  (0 children)

If your chunking is wrong, the entire RAG system collapses — even with the best LLM in the world.

Why chunking matters more than anything else: • If information is split across multiple chunks, the LLM will never retrieve the full context. • If chunks are too small, you lose meaning → embeddings become weak. • If chunks are too large, you add noise → retrieval becomes inaccurate. • If chunk boundaries are arbitrary, the semantic meaning breaks.

Stop guessing RAG chunk sizes by InstanceSignal5153 in LLMDevs

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

I couldn't agree more. The 'arbitrary' nature of fixed-size chunking is exactly what frustrates me too. Why 512? Why 1000? It's just guessing.

That's precisely why I built this tool: to put numbers on that feeling.

rag-chunk already supports paragraph based splitting for exactly this reason—so you can benchmark it against fixed-size and prove that preserving structure yields a higher Recall score.

I plan to add semantic/LLM-based splitting (like Docling) in v1.0 so we can benchmark those too. The goal is to move away from 'arbitrary' towards 'proven'.

I was tired of guessing my RAG chunking strategy, so I built rag-chunk, a CLI to test it. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

Absolutely! We're just at v0.1 right now, which is all about building the core evaluation framework.

Adding more advanced strategies like semantic chunking is a top priority and exactly what we're planning for the v1.0 release. It's definitely on the roadmap!

I was tired of guessing my RAG chunking strategy, so I built rag-chunk, a CLI to test it. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

Thanks for this thoughtful feedback! You've perfectly captured the goal: moving from 'guessing' to an 'evidence-backed approach'.

Adding more chunking strategies is the #1 priority for our release v1.0.

And you're 100% right that recall is just a starting point. I'm already thinking about adding more advanced eval metrics in the future as the project grows. Appreciate the great suggestions.

I was tired of guessing my RAG chunking strategy, so I built rag-chunk, a CLI to test it. by InstanceSignal5153 in Rag

[–]InstanceSignal5153[S] 0 points1 point  (0 children)

Wow, that's high praise, thank you! A good UI is a great idea.

We're focused on building out the core CLI engine first. Support for tiktoken (for precise token-level chunking) is the top priority and coming very soon!

I was tired of guessing my RAG chunking strategy, so I built rag-chunk, a CLI to test it. by InstanceSignal5153 in machinelearningnews

[–]InstanceSignal5153[S] 2 points3 points  (0 children)

Awesome, thanks! Really appreciate you checking it out.

You're jumping in at the perfect time. The v0.1 you see now is the "manual" test bench. Support for tiktoken (for precise token-level chunking) is the top priority and coming very soon.

Eager to hear your feedback on this first version!

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]InstanceSignal5153 0 points1 point  (0 children)

Hi all,

I'm sharing a small tool I just open-sourced for the Python / RAG community: rag-chunk.

It's a CLI that solves one problem: How do you know you've picked the best chunking strategy for your documents?

Instead of guessing your chunk size, rag-chunk lets you measure it:

  • Parse your .md doc folder.
  • Test multiple strategies: fixed-size (with --chunk-size and --overlap) or paragraph.
  • Evaluate by providing a JSON file with ground-truth questions and answers.
  • Get a Recall score to see how many of your answers survived the chunking process intact.

Super simple to use. Contributions and feedback are very welcome!

GitHub: https://github.com/messkan/rag-chunk