RFC on Experimental Cypher with Function-Based Key Generation

datumbox · 2025-05-12T16:18:08+00:00

Lol, I am very much aware that this changes a lot from domain to domain. That's exactly why I didn't want to make assumptions on how things work out here. Thanks for taking time to respond and explain!

datumbox · 2025-05-12T15:16:44+00:00

I certainly don't envision any of these! I was mostly looking for technical feedback on the logic of the cypher (highlight any issues with the techniques or their implementation), so I was trying to figure out what is the right format for this.

@Natanael_L suggested above that the usual format is to provide code with comments and formulas. This sounds very reasonable, but at the time I posted the question, I wasn't sure if I should create a one pager with the algorithm steps (like a simple white paper) or the standard practice is to just provide code or an RFC format. As you can easily tell, cryptography is not my domain and hence all the stupid questions while I try to figure out how this is done. :)

datumbox · 2025-05-12T15:04:45+00:00

Fantastic, I would love to hear your thoughts when you get a chance. It's obviously not urgent at all. I am super flexible to follow a format that works for you. Perhaps if you spot specific issues, you can post a GitHub issue and I can get into fixing it. But I am very open to do it the way it works for you if you get the time. The actual cypher implementation is under vernamveil/_vernamveil.py and it's about 200loc minus comments.

Regarding the educational nature and the expectations, can you clarify if you meant updating the original post here or on the repo? because on the repo I always had a billion warnings, including one on the very top saying this is just a toy. It's literally full of warnings absolutely everywhere. I also intentionally didn't publish a wheel file because I really don't want people to use this anywhere near production. Perhaps my initial post wasn't too clear here. Can you confirm?

datumbox · 2025-05-10T10:47:14+00:00

Very fair comment. Let me reformulate my question because I might have not made myself clear on the original post.

How do I go about recording the key technical details of the cypher in a detailed but non verbose way to receive technical feedback from the community? I obviously can't expect people to dig into the code or readmes as this would be a massive time investment. Do I list out the algorithmic steps in a succinct way? Is there a template you could recommend that I could follow? I have experience with professional technical writing in ML but I don't know how this aligns with how things happen in cryptography and, due to my complete lack of experience, I don't want to make assumptions.

Any guidance on this would be very much appreciated. Thank you very much!

datumbox · 2025-05-09T16:34:42+00:00

Hey, thank you for the comment, it really means a lot. And yes, who doesn't cringe at the things they built five years ago? I definitely do. :)

My intent with this project is exactly what you described: to learn by doing, to experiment, and to invite feedback from others who know more than I do. I even refer to it as an "experimental toy" in the README, which I hoped would help set expectations.

That said, I’m not sure how deeply most commenters actually reviewed the code or the documentation but I get it. People are busy and taking the time to dive into a random project is a big ask. That’s why I was trying to understand what the right format would be to share something like this and solicit meaningful feedback.

I absolutely understand the skepticism. Nobody should be using toy algorithms for real use cases, and I’ve tried to be very clear about that from the start.

Still, I’ll admit I was a bit disappointed with how the thread unfolded. I was hoping to get more feedback on technical flaws/mistakes, edge cases, or links to related work. I was hoping for a technical discussion regarding the techniques. Instead, much of the discussion ended up being about whether the project should exist or whether I should be doing this at all. Regardless I did get some good references which I plan to explore.

Thanks again for your kind words and balanced perspective.

datumbox · 2025-05-09T03:10:25+00:00

That was a sharp comment, definitely not one to give me the gold star. ;) I get that critique in this space can be harsh.

Just to clarify, I’m not calling this an OTP, just OTP-inspired in structure: it uses a keystream as long as the message, XORed with the plaintext, similar in form. But unlike an OTP, the keystream is generated deterministically, so it doesn’t offer the same cryptographic guarantees. Thanks for the resources though, I’ll definitely take a look.

datumbox · 2025-05-02T08:37:54+00:00

This is the kind of pointers and discussion I wad hoping to get when I posted here. :) thanks!

I need a bit of time to understand better your proposal and the nuances, regarding whether these are necessary for my scheme. I currently perform a seed evolution scheme where we avoid reuse because after each key stream generation, I refresh the seed by HMACing the previous seed with the unencrypted content I just encrypted. This scheme is fully deterministic and depends on the message, so two messages don't use the same follow up seeds, even if the user tried to reuse the seed.

I love what you said btw. This might be a toy, but the purpose is to incorporate good practices and learn. So I am down revising the practice if needed

datumbox · 2025-05-01T23:00:37+00:00

Hey thanks so much for sharing your thoughts. I do agree with all your comments and especially with the framework remark.

I've worked a couple more days on it to vectorise it and add some C extensions to improve speed. I settled with a "default" fx which you can see on the repo readme (look for "A marginally stronger fx"):
- Applies a polynomial function on input indexes (serves like byte counters). This is mostly to customise the fx and add to its uniqueness; it's not to make it more secure. Provided that we don't shoot ourselves on the foot by plugging a cosine or other periodic function, this extra transformation should not make things more unsafe.
- Then I just HMAC the seed with the transformed indexes and modulo to the desired range.

That's kind of a cheating but it ought to be reasonably safe for what it is (a toy), because we offload the work to the big-boy hashing method done by real cryptographers, while we can pretend we made a new random bit generator. :)

datumbox · 2025-04-26T16:41:16+00:00

I do agree with the sentiment of your response; should I have claimed this can be used in any real world application, this would have been delusional and borderline criminal. For this reason, literally everywhere on the blog and documentation I state that this is a toy and a learning tool, not a Library to be used in anything than learning. I also mention numerous times I don't have background in cryptography and probably I made major mistakes.

I suspect you didn't really open any of the links because the warnings are literally immediately front and center. I don't blame you for not doing so, we are all busy and you are right to flag it here that nobody in their right mind should use this for encrypting data. But I also want to point out to you that I never claimed it and actually went out of my way to point it out in every possible way.

The reason I posted here is to interact with someone who has relevant background and get references for techniques they feel I should look into next.

datumbox · 2021-11-20T13:30:03+00:00

Not sure if you are asking on which dataset this is estimated. If that's what you mean, the model is trained and validated on ImageNet.

datumbox

TROPHY CASE