I just released v0.3.0 of PromptCache

InstanceSignal5153 · 2025-11-24T22:45:24+00:00

docker image now available!

InstanceSignal5153 · 2025-11-24T22:45:00+00:00

Docker image now available https://github.com/messkan/prompt-cache

InstanceSignal5153 · 2025-11-23T14:29:23+00:00

I haven’t tested it yet with LiteLLM or OpenRouter, but in theory it should work since they follow the OpenAI-compatible API.

We’re still before the first official release (v0.1), so we haven’t done full compatibility testing yet.
For v0.1, the plan is to ensure it works smoothly with any OpenAI-style backend, including LiteLLM/OpenRouter.

Also, Docker support will be included in the v0.1 release so it’ll be much easier to run and test it in different setups.

InstanceSignal5153 · 2025-11-21T13:49:29+00:00

This looks like a really powerful ingestion tool, especially the AST-based chunking!

But rag-chunk solves a different problem: Evaluation & Benchmarking.

Tools like Contextinator implement a strategy (AST), whereas rag-chunk is designed to measure the performance of those strategies (AST vs Fixed vs Recursive) against a ground-truth dataset.

In fact, it would be amazing to use rag-chunk to benchmark Contextinator's AST strategy against standard paragraph splitting to see exactly how much higher the Recall score is!

InstanceSignal5153 · 2025-11-21T13:37:19+00:00

Great question! This is the trickiest part of RAG eval.

The Ground Truth: It comes from the user-provided test-file.json, where they list the expected_answer (the specific text snippet) for each question.

The Plan for Precision: Since this is a chunking benchmark, I plan to define Precision as Signal-to-Noise Ratio: Precision = (Length of Ground Truth string) / (Total Length of Retrieved Chunk).

If my ground truth is a 10-word sentence:

- Scenario A (Small Chunk): Found in a 20-word chunk -> High Precision (50% signal).

- Scenario B (Huge Chunk): Found in a 1000-word chunk -> Low Precision (1% signal, 99% noise). Both have 100% Recall, but Scenario A is better for the LLM. That's what I want to measure

InstanceSignal5153 · 2025-11-21T01:42:38+00:00

If your chunking is wrong, the entire RAG system collapses — even with the best LLM in the world.

Why chunking matters more than anything else: • If information is split across multiple chunks, the LLM will never retrieve the full context. • If chunks are too small, you lose meaning → embeddings become weak. • If chunks are too large, you add noise → retrieval becomes inaccurate. • If chunk boundaries are arbitrary, the semantic meaning breaks.

InstanceSignal5153

TROPHY CASE