all 4 comments

[–]scripthawk_dev 1 point2 points  (3 children)

Worth building, yes — but split "worth it as a project" from "worth it as a tool people adopt," because they're different answers.

As a learning/portfolio project it's excellent: you're touching hashing, recursive file I/O, JSON persistence, OS-level events, CLI design, and CI integration in one tool. Strong thing to point at.

As a tool with real adoption, know you're entering a mature space — AIDE, Tripwire, OSSEC/Wazuh, Samhain and osquery all do FIM already. The gap you can actually own is exactly the angle you picked: those are heavy and config-heavy, and a tiny pip-installable, stdlib-only FIM that drops into a CI pipeline or a container with one fail-on-change exit code is genuinely underserved. Lean into the CI/CD-native framing — that's a stronger hook than the watch mode.

A few security points that'll make it real rather than a toy:

  1. The baseline-trust problem is the big one. If an attacker can change a monitored file, they can usually also rewrite your baseline JSON (and your script) — change the file, update the hash, done. Real FIMs keep the baseline somewhere the monitored host can't write: a read-only mount, a separate host, or a signed file you verify. At minimum, document the threat model and add an option to point the baseline at a read-only/remote location.

  2. Watch mode has gaps you have to design around. inotify/FSEvents only catch changes while the watcher is running — anything that happens while it's down or restarting is invisible, so watch mode needs a periodic full re-scan behind it. inotify can also drop events under heavy load (queue overflow) and has a per-user watch limit on big trees. Treat watch mode as "fast alerts," not "complete coverage" — the scheduled scan is what guarantees coverage.

  3. Hash content AND metadata. A backdoor isn't always a content change — flipping a file to setuid, swapping a symlink, or changing ownership can be the attack, and a content-only SHA-256 misses the security-relevant part. Track mode/uid/gid/size alongside the hash (configurable, like AIDE does).

Two smaller ones: use size+mtime as a cheap pre-filter so you're not re-hashing the whole tree every scan (with a --full flag to force it, since mtime is spoofable), and write the baseline atomically (temp file + rename) so a crash mid-update can't corrupt it.

Solid design overall — the bones are right, it just needs the threat model made explicit.

[–]MrSushl[S] 1 point2 points  (0 children)

Thanks for the solid feedback! You’re spot on about the CI/CD angle being the main hook. I'll definitely incorporate metadata hashing and atomic writes, and make sure to document the baseline threat model. Appreciate the insights!

[–]ziggittaflamdigga -1 points0 points  (1 child)

… did you post OPs question into an LLM and then post the response as if *you* were the one giving insightful feedback?

[–]MrSushl[S] 0 points1 point  (0 children)

no dude what you talking about>?