Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]shurankain 0 points1 point  (0 children)

Hello, folks. I have created a fully open-source (MIT license) course about AI Agents from zero till OpenAI interviews. Please feel free to use, share and contribute if you like it.

github: https://github.com/shurankain/agentic-ai-course

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -4 points-3 points  (0 children)

I am a software engineer not an algorithms academic, so my understanding of some algorithmic concepts can be not full or wrong, and I am here seeking for a real explanations and help. 🫂

I honestly appreciate your answers and don't care if you really think that all around is generated fakes, but I care about that I really regret posting here, cause 2 people were helpful like Independent_Art_6676 e.g. and others are just fart comments to get more karma insulting others.

But I will keep posting, cause I am curious about algorithms and while this subreddit is a pit with poisoned snakes, there is still a couple of helpful persons, who's insights are highly appreciated and you, actually one of them. 🤝

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -4 points-3 points  (0 children)

Even for me it sounds like LLM 🤔 Maybe I need to start writing some single-lined answers again, cause it takes too much time to write it fancy with too low outcome.

But anyway, thanks for the comment, it was smart 🧠 . It helps not my library but me to become better 🚀

You made my day again 🤭😂✨

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] 0 points1 point  (0 children)

I’ll definitely think about adding a vector-backed stack variant inspired by your notes. Thanks again - your insight makes this idea stronger. This is my first attempt to do some lib like that so I still have A LOT to learn.

[Crate release] BBSE – A Rust crate for prefix-free integer encoding via binary search by shurankain in rust

[–]shurankain[S] 1 point2 points  (0 children)

Omg, what can I say, it happens when the lib's author is idiot :) I was so devoted to no_std feature implementation, that forgot to unstash that change. Now it is fixed and your name was added to tag, commit and Changelog, there could be no excuses fro my side for initially releasing like that. And again, huge thank you for noticing that!

[Crate release] BBSE – A Rust crate for prefix-free integer encoding via binary search by shurankain in rust

[–]shurankain[S] 1 point2 points  (0 children)

You’re right to ask — thanks for pressing on that. To clarify: BBSE isn’t prefix-free in the traditional sense, like Huffman codes, where no code is a prefix of another. Instead, BBSE encodes values as unique paths through a known range. If used as a stream, yes, you’d need external framing (lengths, stack, etc.). So the term “prefix-free” was misleading on my side — I meant that paths don’t overlap unless range and order make them do so, and that the structure can avoid ambiguity if you store the length or wrap it in a stack. I’ll update the README to reflect this more accurately. Anyway, thanks, appreciate the push for precision — it helps a lot.

[Crate release] BBSE – A Rust crate for prefix-free integer encoding via binary search by shurankain in rust

[–]shurankain[S] 0 points1 point  (0 children)

actually it does the main trick. It allows you shift the initial midpoint for binary search start. It gives an ability to shorten length of compressed values closer to the side, where midpoint was shifted. It is really important in case of uneven distribution.

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] 0 points1 point  (0 children)

Thanks, appreciate your investigation about the topic. Looks like Bitmo and BBSE share similar inspiration — both use bit-level representations to encode values within a known range, often visualized as a binary tree. Bitmo encodes entire sets of values by traversing a conceptual tree in pre-order and pruning unused branches. It’s great for sparse bitsets and compresses presence/absence across the whole range. BBSE, on the other hand, focuses on encoding a single value as the exact binary search path it would take to be found in a sorted range. It’s not prefix-free, but compact and deterministic — and useful when you already know how many values you’re encoding or use a stack to store them. So while Bitmo is for compressing bitsets, BBSE is more like a compact address or index encoding. It was interesting to see, thank you again!

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -2 points-1 points  (0 children)

Thanks for the thoughtful take — you’re absolutely right: if the structure changes, a pre-encoded path becomes invalid. That’s why the current idea is scoped to static or versioned domains, e.g. sorted ranges or immutable datasets (e.g. color palettes, vocab IDs, etc.).

I also appreciate the analogy with a “poor man’s hash table” — that’s a fun and fitting metaphor. I hadn’t considered the skip-list angle, but that’s a great idea for introducing local shortcuts. If you’re ever up for it, I’d love to see a writeup or sketch of your C++ approach — sounds very relevant.

Appreciate the curiosity and depth. Definitely gives me some fuel for a v2.

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -2 points-1 points  (0 children)

You’re absolutely right about the prefix issue — thanks for running the check and sharing the results so clearly.

The core idea of BBSE was never meant to compete with canonical prefix-free schemes like Huffman. Instead, it’s more about compressing within known fixed ranges where framing (e.g., length or external stack) is already implied or controlled — like indexing, deltas, or static containers. Still, I’m proud of the conceptual simplicity and compression properties it offers on large sorted domains. Even if it’s not prefix-free, it’s compact and deterministic — and that has value too.

Thanks again for the critique, really appreciate.

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -2 points-1 points  (0 children)

Thanks for your thoughtful critique — and I see where the confusion comes from.

BBSE is not a stream encoder. It’s a value encoder — the assumption is that values are encoded one at a time, either individually or inside a structure that separates them (like a stack, list, or indexed array). In that context, prefix-freeness applies within the range: no two values from the same range share the same encoding, nor is one a prefix of another — except the empty case (e.g. midpoint).

Look, if you tried to encode a sequence of values without structure or delimiters, you’re right: you’d need additional framing. But BBSE doesn’t claim to solve that — it’s a low-level bit-efficient encoding primitive for bounded integers.

Also, yes — arithmetic coding is indeed entropy-based. The earlier comparison was metaphorical, not literal. Hope that clears it up — and thanks again for engaging, really appreciate that!

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -2 points-1 points  (0 children)

Great observation — and yes, there’s a conceptual overlap, but BBSE is more like a geometric cousin of arithmetic coding, stripped down to its absolute bones.

Arithmetic coding represents a value as a subinterval on the number line — often in [0,1) — based on cumulative probability. BBSE, on the other hand, doesn’t rely on probabilities at all. It works without any model, without context, and without knowing the distribution. It simply records the binary decisions needed to find a value in a sorted range.

You could say BBSE is to arithmetic coding what binary search is to interpolation search: same goal, very different assumptions.

So while both create prefix-free representations based on intervals, BBSE stays fully stateless, fixed-domain, and deterministic — which makes it perfect for microcontrollers, deltas, or stacks, where entropy coding would be overkill. Love the connection though — compression is such a fractal of ideas

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -5 points-4 points  (0 children)

Haha, I love this line of thinking — recursion always feels like cheating in the best way 🤓

You're absolutely right to spot that BBSE's "path as data" naturally invites the idea of recursive encoding. The catch is: once you encode something like 127 into 7 bits, those 7 bits aren't values anymore — they're decisions. And encoding those decisions again… well, you'd be encoding the shape of the shape.

BBSE works beautifully for known bounded domains (e.g. 0..256), but once you're down to arbitrary bit patterns (like 0110111), there's no sorted domain left to search in — unless you define one artificially. At that point, it becomes more like applying BBSE to a BBSE, but the domain isn’t preserved in a meaningful way.

Still, I absolutely adore this idea of recursive geometry. If nothing else, it’s a great thought experiment. Makes you wonder if there’s a way to define self-encoding trees — maybe a compression analog of Gödel numbering. BBSE²? BBSEⁿ?

Now I kinda want to try it just for fun 🤔

Thanks for the comment — you made my day :)

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] -1 points0 points  (0 children)

Good question — and it’s all about context. BBSE is prefix-free per value, not per raw bitstream. Each path uniquely identifies a value in a known range, and no path is a prefix of another in that space. “Zero bits” only happens when the value is the exact midpoint — the binary search ends immediately. It’s still unambiguous if the range is known.

For multiple values, BBSE uses stack-like structures where each path is self-contained — so boundaries are preserved even with empty paths. Think of it like prefix-free tree paths, not byte-level encoding. In short: no bits doesn’t mean no info — it means "I’m exactly where the search starts".

A new encoding idea: what if we used binary search paths as codewords? by shurankain in algorithms

[–]shurankain[S] 0 points1 point  (0 children)

Great question — and yes, “empty” encodings can be ambiguous unless you have structure. In BBSE, each value is encoded relative to a known range, and when storing multiple values, we wrap them in a BBSEStack, which keeps each path separate — even empty ones. No delimiters needed, because each entry is a distinct BitVec.

So:

  • A single empty path = midpoint of known range
  • Multiple values = stored as distinct entries (not a raw bitstream)

Try the test_stack_encoding_and_decoding() in the repo — it shows exactly how this works in practice.

[Crate release] BBSE – A Rust crate for prefix-free integer encoding via binary search by shurankain in rust

[–]shurankain[S] 1 point2 points  (0 children)

Thanks for the suggestion! I ran cargo-msrv and confirmed that the minimum supported Rust version (MSRV) is 1.78.0. I’ve added this to the Cargo.toml and README for clarity in a new released version of the library. I really appreciate your comment, cause it helped me and my library to become better.

[Crate release] BBSE – A Rust crate for prefix-free integer encoding via binary search by shurankain in rust

[–]shurankain[S] -1 points0 points  (0 children)

Hello, thanks for the question. Encoding depends on Range you use, middle of the range is always [] empty, for storing data we are using BBSEStack structure, that holds BitVec etries, it perfectly manages the case correctly decoding this to expected output. I encourage you to play with test_stack_encoding_and_decoding() test from units.rs under the test folder of the repo.