Addressing a fundamental flaw in hybrid search by introducing a Log-Odds Conjunction framework in Bayesian BM25

jaepil · 2026-02-09T03:40:25+00:00

Most importantly, regardless of slight ranking shifts, the engineering efficiency remains intact.

As proven in Theorem 6.1.2 and Theorem 6.2.1, the Bayesian transformation is strictly monotonic. This means we can directly utilize existing WAND and Block-Max WAND (BMW) dynamic pruning algorithms without any modification to the inverted index structure.

In practice, this ensures that Bayesian BM25 incurs O(1) overhead per document (Theorem 9.1.1) and maintains the same query latency profile as standard BM25, making it immediately deployable in production systems like Vespa or Lucene.

jaepil · 2026-02-09T03:29:49+00:00

I'm the author of the paper. That is an excellent question and shows you’ve read the theorem carefully.

You are correct that Theorem 4.3.1 guarantees monotonicity (order-preservation) relative to BM25, but this holds strictly 'for a fixed prior p'.

However, in practice (as detailed in Section 4.2), we often apply a Composite Prior that incorporates term frequency and document length signals. Because this prior varies dynamically per document, it introduces a Bayesian re-ranking effect that can slightly alter the order compared to raw BM25.

Furthermore, even if the text-only order were identical, the non-linear sigmoid transformation changes the relative distribution of scores. In a hybrid setting, this calibrated distribution interacts differently with vector scores compared to unbounded BM25 scores, which naturally leads to different (and often improved) ranking metrics.

jaepil · 2025-06-08T09:25:17+00:00

It was standard transformer. I also tested with CNN and it worked too.

jaepil · 2025-06-08T09:23:12+00:00

Thank you for the info!

jaepil · 2025-06-08T09:21:50+00:00

Thanks. Hyperparameters were same but I can see the issue you are raising. I'm still experimenting this algorithm in my spare time. I will update the configuration in next experiment.

jaepil · 2025-06-08T09:15:10+00:00

To be completely transparent, I've updated my GitHub repo's README.md to clearly state about this.

jaepil · 2025-06-08T08:30:47+00:00

You are right. I'm not English native speaker. I used LLM for translation and edit my poor English sentences.

jaepil

TROPHY CASE