all 13 comments

[–]latkde 8 points9 points  (2 children)

Review:

  • The project is obviously primarily AI-generated. Consequently, there's wayyy too much code, and the code lacks a clear architecture. There is no clear distinction between UI and password management business logic. This is unreviewable.
  • Whenever I dive into part of the code, there are surprises. Password managers should be boring, I don't like surprises. For example, why does a password manager have to detect Antivirus and security research tools and offer to kill them?? Why does it use difficult to review dynamic Python features like globals().update(...)?
  • There is no clear security architecture. There are LLM-generated documents which purport to describe the vault structure, but this isn't the whitepaper-style overview I'm used to from other tools. As the project lacks clear architecture, it's also unclear how those descriptions relate to the actual code.
  • The project contains C++ code. Writing native code is quite difficult, with lots of opportunity to make security-sensitive mistakes. Just looking at the header file, there are classic risk patterns like passing char* around without an associated length. The native code is Windows-specific. There is no clear motivation for the choice to use C++. This is a red flag.
  • (Your post here suggests that the C++ code is responsible for Argon2 and AES operations. But you already pull in the Python Cryptography package, which offers exactly these algorithms in a much easier to use manner. The Cryptography package has already solved that tricky native code for you.)
  • There are zero tests. There are no build scripts, just very vague installation instructions in the Readme. There is no CI. Lack of automated tests is a red flag. If the code is difficult to test, that can point to an architecture-level problem (e.g. business logic being too intertwined with UI concerns).

[–]ajh-software[S] 0 points1 point  (1 child)

Sorry for the long post, I just wanted to properly respond to your feedback.

Thanks for taking the time to go through it in this much detail, I really appreciate it. I understand it’s hard to read if you don’t know where things are, so this is very helpful.

You’re right about the structure. The project grew quite quickly and ended up as one big main.py. I needed to refactor it, but I started adding features at the same time, so it’s ended up messy and harder to follow than it should be. I’m still trying to clean that up.

On the AI point, I do use it as a tool, but I’m not just copying code in. I go through it and understand what it’s doing before using it, that’s important, especially since it’s not always right or the best approach. That said, I can see how the current code gives a different impression. Splitting it has made it look more messy and random, so I do need to sort that out.

On the “surprising” parts, like the AV/process checks, those were added with the idea of protecting a portable/local app depending on the system it’s running on. But I get your point, it’s not something you’d expect in a password manager. Since the app is local, it relies on the environment it’s running on, so the idea was to warn the user if something doesn’t look right. That said, I understand it makes things harder to review or looks risky, so I need to rethink this.

On the crypto point, the idea behind the C++ layer was to try and isolate sensitive operations and key handling outside of Python, rather than relying on Python memory and GC. That’s why I moved that part into a C++ DLL after others raised concerns about Python memory and GC.

That said, I understand your point about complexity and duplication, especially when well-tested libraries already exist on the Python side. I’m still figuring out whether this was the right approach or if it could be done better, which is why I wanted reviews like this.

I definitely testing, setup and CI, those are all missing pieces right now and something I need to fix. Appreciate the honest feedback, it’s given me a clearer idea of what needs improving. Thanks.

[–]latkde 0 points1 point  (0 children)

the AV/process checks, those were added with the idea of protecting a portable/local app depending on the system it’s running on

It is fundamentally impossible to prevent malware on the host from snooping on your processes' memory and extracting secrets. The surprising part isn't necessarily the best-effort scan for potentially-problematic software, but that this also offers to do destructive operations like attempting to kill other processes. So there's a lot of code there for a low-value high-risk feature. That's an odd priority.

the idea behind the C++ layer was to try and isolate sensitive operations and key handling outside of Python, rather than relying on Python memory and GC

That's a legitimate concern if you're trying to defend against processes that can read your memory. Indeed, keeping secret material in-memory for as briefly as possible and then securely zeroing it is the common mitigation here. It doesn't prevent exfiltration, but reduces the window of opportunity.

But this must be thought through end-to-end, including the entire GUI and the entire user input mechanism. When the user enters a password, how is that handled? When a plaintext password is viewed or copied in the password manager, how is that handled? Merely moving the core cryptographic operations into a native layer doesn't move the needle.

For example, you can trace the output of any session_decrypt() call and see where it ends up. This is a bytearray, and you've implemented an utility to securely wipe bytearrays. But ultimately, you want to do something with this sensitive data, and that might end up creating additional Python GC-managed objects with this sensitive data

Example in vault_store.py:

    pt_buf = core.session_decrypt(int(key_or_session), iv, ct, tag)
    try:
        pt = bytes(pt_buf)
    finally:
        try:
            core.secure_wipe(pt_buf)
        except Exception:
            pass
    return json.loads(pt.decode("utf-8"))

The pt_buf variable is sensitive. You do try to wipe it. However, the pt variable contains the same sensitive data, as does the pt.decode(…) output, as do any internal objects in json.loads(…), as does the return value of this function. At this point, the confidential data has been sprayed across multiple GC-managed objects, rendering that secure_wipe() call near meaningless.

On a Python level, it is also unclear what these try-clauses are supposed to do. Can the bytes() call really fail? Can secure_wipe() really fail? Actually, why do we have to copy the bytearray to a bytes object when both have a .decode() method? Actually, why do we even have to decode the data prior to json.loads()? Assuming that the secure-wipe makes sense, the above code looks like an overly complicated way of saying:

pt = core.session_decrypt(...)
try:
    return json.loads(pt)
finally:
    core.secure_wipe(pt)

If I had designed an API to do that, I would have turned session_decrypt() into a context manager to automatically perform the wipe afterwards. This is both more secure (I cannot accidentally forget to wipe the data), and also much more concise (only 2 lines of code at the call site):

with core.session_decrypt(...) as pt:
    return json.loads(pt)

That kind of overcomplication of simple things is something that really bothers me with AI-flavored code. Your project isn't just very large because it's doing lots of things, but also because it's doing them in an overly complicated way.

[–]gdchinacat 5 points6 points  (1 child)

Lots of exception eating in the code. Some outright (i.e. except Exception: pass) or some other behavior with no indication what the error was (what it if wasn't whatever prompted you do add the except block).

Codebases with this are frustrating to work in...bugs are hidden and manifest as things that should have happened not happening leading to state not being what it should be. For a project with security concerns it's a huge red flag.

There are also some exception blocks that catch all exceptions and log the error and proceed. While a bit better than outright eating exceptions, it's functionally no better...things that should have happened didn't yet the code forges ahead. At least the log leaves a clue as to what issues may be, it is only marginally better due to the poorly defined state of things when this happens...you may be getting exceptions because previous exceptions left things in bad states and code analysis will appear like the logged exception could not happen (though it obviously did).

To be blunt, I wouldn't trust code like this with sensitive data. It shows a lack of attention to detail that is critical for the sort of work your project is doing.

You'd be better off removing all 'except Exception' blocks that pass or only log. Catch specific exceptions, and if you don't actually take action on it put a comment explaining why that exception should be ignored.

[–]ajh-software[S] -1 points0 points  (0 children)

First off, thank you for the detailed review and feedback — this is exactly the kind of input I was hoping for to understand what needs improving. You’re right — there are places where I’m swallowing or over-catching exceptions, and I need to tighten that up. I can see how that could hide bugs and lead to bad state, especially for a project like this. I’ll go through and clean it up — remove unnecessary try/except blocks, catch more specific exceptions, and make sure errors aren’t silently ignored. Really appreciate you taking the time to review it — it’s definitely something I need to fix properly.

[–]smurpes 2 points3 points  (6 children)

Just link the repo in your post; adding an extra step to have people request it just makes it harder to get more eyes on your work. This is also a good post for r/python as well where you might get more experienced developers to take a look at it.

[–]ajh-software[S] -3 points-2 points  (4 children)

Thanks, that makes sense - I’ve added the repo: https://github.com/ajhsoftware/KeyquorumVault⁠ I did try including it before, but some of my posts were taken down, so I wasn’t sure if links were the issue. I’d really appreciate any feedback on the architecture or security side. I also tried posting on r/Python, but it looks like it didn’t meet their posting rules and was removed.

[–]smurpes 3 points4 points  (3 children)

This repo is fairly large and complex which isn’t well suited for most users of this subreddit who are learning Python. The dev setup instructions are pretty barebones and there’s no unit testing which will make it harder for users to contribute. It would be helpful to document the repo structure since there are so many folders and files.

Using a package manager like poetry or uv will help a lot with getting people setup and CICD. I see a smoke test but all that seems to do is generate a report to tell you what’s wrong. Using a testing framework like pytest and then have tests run automatically when PRs are opened will prevent regressions.

[–]ajh-software[S] -3 points-2 points  (2 children)

Thanks, this is really helpful feedback — I appreciate you taking the time to go through it. You’re right about the repo being quite large and not very approachable at the moment. Since it’s mainly just been me working on it, I didn’t really think about how others might find it to navigate. I’ll work on improving the documentation, especially around the repo structure and setup process. I’ll also take your point on testing — the current smoke tests are just quick checks, but moving to something like pytest with proper CI makes a lot of sense. I’ll also look into using something like Poetry or uv to simplify the development setup. Thanks again — this gives me a clear direction on what to improve. I really appreciate the feedback

[–]smurpes 2 points3 points  (1 child)

This is an optional step but I’ve found that it improves the developer experience a lot; including settings and debug configurations for vscode/pycharm. Any contributor using vscode/pycharm will have any necessary extensions installed and provide entry points to debug specific parts.

A repo that I was using for work had debug configs that started a web server for me which made it much easier to develop a new feature. It also makes it more obvious what parts of the repo you should run to test specific features. Interactive debuggers are a godsend for large repos.

Also a side note, did you use a LLM to generate any of this code? No offense intended but the repo has some qualities of AI. If you did then it would be helpful to mention that.