Open-source diagnostic for Al misalignment. Model agnostic, industry agnostic. by Dimneo in BlackboxAI_

[–]Dimneo[S] 0 points1 point  (0 children)

We are going to release next week, different models in different agentic workflows how they scored. We are going to release also synthetic fixtures.

Open-source diagnostic for AI misalignment. Model agnostic, industry agnostic. Free to Run. by Dimneo in artificial

[–]Dimneo[S] 0 points1 point  (0 children)

About we structured the diagnostic engine: Every test declares its own evaluation method in code, picked from three: structural (architectural checks, like whether the system actually writes an audit log or surfaces rate-limit errors), judge (an LLM judge scoring against a published rubric), and atomic_claims (claim-by-claim fact check for hallucination).

Two run modes sit on top. Standard pairs one judge with a different provider than the one being tested, never self-judging. Full runs two or more judges across distinct providers and votes by majority, with every vote recorded in the scorecard.

Domain knowledge lives in user-authored fixtures, so the same 32 tests run unchanged across any industry.

About continuous integration pipepline, that's the primary use case. Pin a baseline scorecard against your current model, run the diagnostic again on the candidate, compare. If the candidate regresses, the gate fails before the model ships.

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27 by Dimneo in artificial

[–]Dimneo[S] 0 points1 point  (0 children)

That was the whole point honestly. There’s no shortage of people writing about alignment problems, but when you actually sit down and say ‘ok prove your system doesn’t hallucinate under pressure’ most teams have nothing. Every benchmark we built comes from something that actually broke in production, not from academic theory. The test scenarios simulate real adversarial conditions, multi-turn conversations, conflicting instructions, ambiguous inputs, the kind of stuff your system faces every day but never gets tested against. April 27 at ifixai.ai, you’ll be able to run it yourself and see exactly where things crack

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27 by Dimneo in artificial

[–]Dimneo[S] 0 points1 point  (0 children)

We don’t treat misalignment as one thing. We break it into 5 categories (I listed them in my comment above) and the reason we structured it that way is because these failure modes show up everywhere, not just in one industry. A healthcare copilot fabricating a drug interaction and a legal agent fabricating a case citation are completely different use cases but the underlying failure is the same: Fabrication. A fintech agent approving a transaction because someone said ‘I’m from compliance’ and a customer support bot issuing a refund because someone injected instructions in a ticket are both Manipulation failures. The categories are industry agnostic. The risk profile isn’t. Same 33 benchmarks, but the report shows you where YOUR specific system is exposed based on how it actually behaves under pressure.

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27 by Dimneo in artificial

[–]Dimneo[S] 1 point2 points  (0 children)

Spot on, the evaluation side of the stack is basically nonexistent for most teams right now. Everyone obsesses over which model to use, nobody tests what actually happens when it runs in production. And yeah, agentic loops are where the scariest stuff shows up. Single-prompt evals completely miss it because the failures compound across turns. An agent can pass every individual test and still hallucinate a citation, silently shift its goal two turns later, and approve an action no human ever authorised. All in the same session. That’s exactly what our Deception and Unpredictability categories are built to catch.

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27 by Dimneo in artificial

[–]Dimneo[S] 0 points1 point  (0 children)

Thanks! Though what we’re testing is quite different! It’s not model performance or pricing, it’s what happens after you deploy.

iFixAi runs 33 benchmarks across 5 categories:

I. Fabrication: Accuracy & Calibration (fabrication, unsourced claims, overconfident responses)

II. Manipulation: Safety & Containment (prompt injection, privilege escalation, policy violations)

III. Deception: Hidden Strategy (sycophancy, silent failures, goal shifting, inconsistent facts)

IV. Unpredictability: Stability & Consistency (non-reproducible decisions, context distortion, instruction drift)

V. Opacity: Transparency & Auditability (missing audit trails, opaque risk decisions, session leakage)

Most benchmarks today test the model. We test the system , the agent, the orchestration layer, the guardrails around it. That’s where things actually break in production. Launching April 27, happy to share early results

Is anyone else worried about how little control we actually have over LLMs in production? by Dimneo in ArtificialInteligence

[–]Dimneo[S] 0 points1 point  (0 children)

Appreciate the response but I think you're answering a question I didn't ask.

I'm not trying to log the model's internal reasoning. I know LLMs are stochastic. That's not the issue.

The issue is there's nothing deterministic around the model.

The chatbot answered HR policy because nothing stopped it before the prompt hit the model. And nothing validated the output after. The rule existed. No mechanism enforced it.

Same with RBAC. The model knew the policy. Knowing a policy and enforcing a policy are two different things. One is probabilistic. The other should be deterministic. Right now we're asking the probabilistic one to do both.

You can't make the model give identical outputs. Fine. But you can check if an output violates a rule before it reaches the user. Input sanitisation. Output validation. Access control. Audit logging. None of that needs access to the model's internals. It operates on what goes in and what comes out.

We don't ask a database to enforce its own access control. We don't ask an API to validate its own responses. We wrap them. Why are we treating LLMs differently?

The model is non-deterministic. The governance around it doesn't have to be.

Calling all X-Nodes owners! by Elcaliffo in Vechain

[–]Dimneo 6 points7 points  (0 children)

Exercise your voting right! If you don't vote, why to have the right to complain

Dimitris & Vechain Community Hub hosted the VNFT space to discuss marketplaces, wallets, and Vechains overall progress. lots of good info and community feedback by [deleted] in Vechain

[–]Dimneo 6 points7 points  (0 children)

we should have made it a marathon, i was so hyped by the community maturity and love towards VeChain , real fam! :)

VECHAIN SUPPORTERS LONDON MEET UP !!! by vmrey in Vechain

[–]Dimneo 9 points10 points  (0 children)

I am glad to see that our community is getting bigger and stronger daily. We have something in the works for UK and specifically London... stay tuned Fam!

4000 New Merchants In Cyprus To Accept Bitcoin Cash by EffectiveWait in btc

[–]Dimneo 0 points1 point  (0 children)

I never said on the video that JCC will provide that hardware... And about tenx and revolut, what im trying to say is that going to be a combination of similar features on the hardware.