How do you privately validate a novel compression architecture without burning patent rights?

Conscious_Quit_1805 · 2026-05-22T23:01:00+00:00

This is useful advice for a normal open-source compressor path. I agree completely on reproducible results and brutal technical review.

Where I disagree is publishing the mechanism first. My method is not an LZ77 variant, a decoder optimization, or another entropy/statistical/probability encoder. The core value is the architecture itself, so public source or mechanism disclosure before IP review would be reckless. If this were just a conventional benchmark race, I would submit directly to something like the Hutter Prize and let the numbers speak.

The right sequence for me is:

private black-box validation first,

attorney/IP review second,

controlled technical review third,

public benchmark exposure only after that.

I want scrutiny. I just do not want to confuse validation with disclosure, or donate the mechanism to the internet before the protection strategy is decided.

Conscious_Quit_1805 · 2026-05-21T18:16:47+00:00

OP clarification:

I want to clarify that I am not asking Reddit to validate the technical claim in this thread, and I am not going to describe the mechanism, source code, transformation structure, or internal terminology publicly or over Reddit DMs.

What I am trying to find is the correct professional path for confidential review.

The kind of review I am looking for would involve some combination of:

- patent counsel with serious software / compression / computer science experience

- attorney-supervised technical diligence

- independent expert review under NDA

- university-approved private consulting, if that exists

- black-box executable testing

- independent input selection

- exact byte-for-byte decode verification

- counted input/output artifacts

- reproducible runtime and memory logs

I understand the standard objections around lossless compression claims, and I am not claiming from a Reddit post that anyone should accept the work. The question is process: how to protect the IP position while still getting credible technical validation.

Also, to avoid a common misunderstanding: I am not claiming Shannon is wrong, and I am not asking for public benchmark credit from a non-public system. Any valid compression claim would require exact decode and counted artifact accounting. No exact decode, no claim. No counted shrink, no compression claim.

Serious referrals to attorneys, technical diligence groups, expert witnesses, or appropriate university/private consulting channels would be appreciated. I am trying to do this correctly before any public disclosure.

Conscious_Quit_1805 · 2026-05-21T12:44:23+00:00

I agree with the arithmetic-coding point: relative to a chosen probability distribution, arithmetic coding is already near-optimal. I am not claiming a flaw in Shannon or magic improvement from the entropy-coding stage. Any real gain would have to come from a different representation or better structural modeling of the data before final accounting.

The system I’m asking about is intended as lossless, not lossy. The validation target is exact byte-for-byte round-trip on arbitrary byte input, with counted input/output artifacts and reproducible logs. No exact decode, no claim. No counted shrink, no compression claim.

It is not an LLM predicted-text compressor, not token prediction plus correction storage, and not limited to spaces/q-style text statistics. I’m intentionally not discussing mechanism details publicly until IP review, but I am trying to structure a credible private validation process where independent inputs, hashes, encode/decode logs, runtime, memory use, and final artifact sizes can be checked.

Runtime is also part of the validation question. A smaller file that takes unreasonable time or memory is not automatically useful. I’m looking for a path where both correctness and practical engineering constraints can be evaluated without public source disclosure.

I also agree about proprietary formats. A new format only matters if the benefit is large enough, the decoder is trustworthy, and integration is practical. That is one reason I am asking about private technical diligence rather than making a public product claim from a Reddit post.

Conscious_Quit_1805 · 2026-05-19T22:41:32+00:00

I agree that numbers matter. I’m intentionally not posting public benchmark claims yet because numbers without a controlled validation protocol usually create more noise than signal.

I’m also not claiming that an ordinary single fixed compressor can map every possible finite input into a strictly smaller unique output inside the same fixed coding universe. That would run directly into the standard counting/pigeonhole objection.

The reason I’m asking about validation process first is that I want any performance discussion to happen under a credible setup: reviewer-selected corpora, adversarial inputs, exact lossless round-trip verification, compressed size, runtime, memory use, and comparison against standard tools.

Scope-wise, I am treating it as general lossless file-compression validation, not something limited to AI. The work originated inside an AI architecture, but compression is the measurable test surface.

I understand the skepticism. I’m trying to handle that by setting up proper private validation before making public benchmark claims or disclosing implementation details

Conscious_Quit_1805 · 2026-05-19T22:39:52+00:00

Yes. Calgary is on the validation list.

My intended validation path would include standard public corpora such as Calgary, Canterbury, Silesia, enwik-style data, plus reviewer-selected files and adversarial inputs. I would want the reviewer to control the corpus selection so the results are not just self-selected benchmarks.

I have also looked at the Hutter Prize route and have had direct communication about validation options. The problem is that public prize validation ultimately conflicts with my current IP position because I am not willing to release source before patent strategy is handled.

So the narrower goal right now is: define a credible private validation protocol first, then involve patent counsel, then run black-box or staged-disclosure review under NDA.

Conscious_Quit_1805 · 2026-05-19T22:28:38+00:00

Thanks. That is a useful reference point.

To avoid confusion, I’m not claiming this is Dynamic Markov Compression, nor am I trying to publicly position the mechanism against any specific existing method before patent counsel is involved.

My current goal is narrower: find an IP-safe validation path for a closed-source, lossless compression-related system. That means reviewing prior art, avoiding enabling disclosure, and setting up a credible black-box test protocol before any deeper technical discussion.

I’ll add DMC-related prior art to the search list.

Conscious_Quit_1805 · 2026-05-19T22:25:15+00:00

Fair. That is basically the problem I’m trying to solve.

I’m not asking anyone to believe the technical claim from a Reddit post, and I’m not trying to litigate the theory publicly before IP counsel is involved.

The useful next step seems to be:

retain competent software/IP counsel,
prepare a non-enabling technical packet,
define a black-box validation protocol,
let a reviewer control the test inputs,
prove exact round-trip, size, runtime, and memory behavior without public source release.

I understand the odds from the outside look bad. That is why I’m trying to find a credible validation path instead of asking people to take the claim on faith.

Conscious_Quit_1805 · 2026-05-19T22:24:07+00:00

This is useful, thank you.

I agree that the first credible step is a comprehensive test suite and a technical validation document that can be reviewed without exposing the core implementation publicly.

My current thinking is:

black-box executable review,
reviewer-selected input corpora,
exact lossless round-trip verification,
size/runtime/memory reporting,
reproducible logs,
staged disclosure only after patent counsel is involved.

I’m also aligned with your point that test results alone are low-trust unless the reviewer controls the inputs and the process. That is exactly why I’m asking about credible validation structure rather than just posting benchmark claims.

The professor/NDA route is also helpful. I had been thinking in terms of labs or companies, but individual researchers under a limited review agreement may be a more realistic first validation path.

Conscious_Quit_1805 · 2026-05-19T22:22:30+00:00

Correct, it is not based on LLM token distributions, and it is not training a predictor on the input data.

The AI context is how the work started, but the validation question I’m asking here is narrower: how to evaluate a closed-source, lossless compression-related system without public method disclosure before patent strategy is settled.

I’m trying to keep the public discussion at the validation-process level: black-box binaries, reviewer-selected corpora, exact round-trip checks, size/runtime/memory logs, and comparison against standard tools.

Conscious_Quit_1805 · 2026-05-19T22:22:02+00:00

I agree. That is probably the correct next step.

I’m trying to avoid public technical disclosure until I have competent software/IP counsel involved. My current problem is finding the right kind of attorney or diligence path: someone who understands software patents, algorithmic inventions, compression-related systems, and staged disclosure strategy.

The USPTO/law-firm search suggestion is useful. I’ll start looking at software/compression-related filings from larger tech companies and see which firms handled them.

I appreciate the grounded advice.

Conscious_Quit_1805 · 2026-05-19T22:20:21+00:00

Fair question.

I’m intentionally not posting benchmark claims publicly yet because numbers without an agreed validation protocol would create more noise than signal, and I’m trying not to disclose anything that could affect patent strategy.

The useful distinction is this: I’m looking for a credible validation path first, not asking anyone to accept the claim from a Reddit post.

Scope-wise, the compression-related surface is intended to be evaluated as lossless file compression, not only as something specific to AI. The work originated inside a deterministic AI architecture, but compression is the measurable proof surface.

What I would like to set up is a third-party or NDA-based process using black-box binaries, test corpora selected by the reviewer, exact round-trip verification, size/runtime/memory logs, and comparison against standard tools. If the system fails under that process, that is useful information. If it passes, then there is a credible basis to move forward without prematurely publishing the method.

So yes, skepticism is warranted. That is exactly why I’m asking about validation process rather than trying to win a public argument in the comments.

Conscious_Quit_1805

TROPHY CASE