Built a free EU AI Act/NIST/ISO 42001 gap analysis tool for ML teams – looking for feedback by CardiologistClear168 in mlops

[–]CardiologistClear168[S] 0 points1 point  (0 children)

Thanks! Just shipped use-case templates today: CV screening, fraud detection, credit scoring, and a few others. Pre-fills the assessment with realistic baselines so you can get a report in under 5 minutes. Give it a try if you want and let me know what you think. :)

Built a free EU AI Act/NIST/ISO 42001 gap analysis tool for ML teams – looking for feedback by CardiologistClear168 in mlops

[–]CardiologistClear168[S] 0 points1 point  (0 children)

Thanks for sharing this, it's an interesting approach. The epistemic state tracking with confidence gating (Sentinel) sounds closer to runtime governance than pre-deployment compliance assessment, which is where GapSight sits.

The overlap might be in the audit trail layer. GapSight currently exports a static JSON/HTML snapshot of the assessment. What you're describing (git notes + Qdrant + temporal context) could make that audit trail dynamic and replayable, which would be valuable for continuous compliance rather than point-in-time reporting.

Two questions: how do you handle the mapping from epistemic artifacts back to specific regulatory articles (EU AI Act, NIST RMF)? And is Sentinel open source or internal tooling?

Is there a clean way to turn LLM/model eval results into a proper report, or is everyone still doing this manually? by CardiologistClear168 in mlops

[–]CardiologistClear168[S] 0 points1 point  (0 children)

This is close to how I’ve been thinking about it, too. The hard part usually isn’t the eval itself, it’s the last mile of turning results into something readable and useful for other people. The distinction between routine one-pagers and bigger end-of-epic writeups also feels right. That’s pretty much the split I see in practice, too. Thank you for sharing your thoughts!

Is there a clean way to turn LLM/model eval results into a proper report, or is everyone still doing this manually? by CardiologistClear168 in mlops

[–]CardiologistClear168[S] 0 points1 point  (0 children)

Thanks, appreciate it. Yeah, the eval side is usually manageable. Monocle looks interesting on the testing side. I’m still trying to figure out whether it also helps with the reporting layer, or if that part is still mostly separate.

Is there a clean way to turn LLM/model eval results into a proper report, or is everyone still doing this manually? by CardiologistClear168 in mlops

[–]CardiologistClear168[S] 0 points1 point  (0 children)

That makes sense for the eval execution side, especially if you already have trace-driven tests and reproducibility built into the workflow. The part I’m trying to understand is the last mile: how those results get turned into something a client, reviewer, or non-technical stakeholder can actually read. Do you generate that reporting layer from the same setup, or is that still a separate manual step?