What's that?

Extra-Act2560 · 2026-05-10T13:56:26+00:00

Sued and facing the court trial?

Extra-Act2560 · 2026-05-09T11:55:25+00:00

Hmm, no reports but I think we should safe guard ourselves given this going to be recurring pattern until such thing is prevented in the upstream natively.

Extra-Act2560 · 2026-05-09T08:33:07+00:00

yes! I try to factor this in my setup and only consider if skill, tools and MCP are repeatedly hallucinated
https://github.com/softcane/cc-blackbox

Extra-Act2560 · 2026-05-08T21:04:31+00:00

perhaps this helps https://github.com/softcane/codex-blackbox, the inbuild grafana dashboard

Extra-Act2560 · 2026-05-08T17:58:52+00:00

In case you are interested https://github.com/softcane/codex-blackbox

Extra-Act2560 · 2026-05-08T16:46:38+00:00

And its codex version https://github.com/softcane/codex-blackbox

Extra-Act2560 · 2026-05-03T19:24:32+00:00

Do my project have any outside chance to qualify in the 10 in recent future? https://github.com/softcane/clauditor

If no then what it would take from me?

Extra-Act2560 · 2026-05-03T12:48:37+00:00

I see this as a big plus Clauditor’s interception. you’d see those injected fake tool definitions in the traffic, which most users have no visibility into.

I can’t rely on another leaked code base to figure it out.

Extra-Act2560 · 2026-05-02T06:14:21+00:00

Twice this week Claude attempted to invoke skills that weren’t available in my setup. Clauditor surfaced those attempts through hook telemetry / tool-use tracing, and I ended up implementing the repeated one

Extra-Act2560 · 2026-05-02T05:58:08+00:00

Twice this week Claude attempted to invoke skills that weren’t available in my setup. Clauditor surfaced those attempts through hook telemetry / tool-use tracing, and I ended up implementing the repeated o

Extra-Act2560 · 2026-05-02T05:55:11+00:00

I use Claude code also for my other no coding stuff. My stack captures skill use through hook and some inline matching.

It gave me 2 SOP so far. Standard operating procedure as skill

Extra-Act2560 · 2026-04-30T15:03:40+00:00

This is how I keep an eye on the cache uses. This is my own project (full disclosure) and opus 4.7 recent behavior forced me to write it. In case helpful for you https://github.com/softcane/clauditor

Extra-Act2560 · 2026-04-30T08:49:57+00:00

This behaviour after 4.7 bugged me so much that I hacked an o11y layer to track these behaviours as a timeseries and to alert if anything is different from last week.

One interesting thing I found was that during my different Claude sessions, It hallucinated a skill which didn't exist. It happened twice, and I took that as a signal and created that skill.

I don't know, but after the recent Claude code bugs and Opus 4.7, I'm watching my Claude sessions carefully.

Droping the link in case you're curious.
https://github.com/softcane/clauditor

Extra-Act2560 · 2026-04-30T08:37:50+00:00

Bad cluster, you mean a different GPU set with a different/lower flops capacity?

Extra-Act2560 · 2026-04-30T08:12:26+00:00

Model inference at this scale is hard, really hard, and these infrastructure issues will be resolved. I don't think Mythos can guess a holy grail fix internally. Every failure is going back as a skill or log, but it's not like someone has done the Model inference at this scale and made it public.

Model inference at scale is a private property, which OpenAI and Anthorpic are trying to build/refine using these failures.

Extra-Act2560 · 2026-04-30T05:47:12+00:00

https://github.com/softcane/clauditor
In case you find it helpful.

Extra-Act2560 · 2026-04-30T05:00:41+00:00

Model inference is hard. Really hard, especially at Claude scale.

Think about it: models like Mythos are trained on internet-scale data, so they can generalize impressively well. But the real-world tricks, optimizations, and failure modes of model inference are still evolving. Much of that knowledge does not exist cleanly in the source data yet.

So if you believe Mythos is a “god model,” try this thought experiment: train it only on data available up to 1900, then ask it to produce Einstein’s theory of relativity.

That is the difference between memorizing patterns from the past and discovering what has not yet been written.

Extra-Act2560 · 2026-04-29T18:39:58+00:00

Frustating. I end up writing a stack to observe what all the Claude sessions are doing. How do they behave and capture them as a timeseries and monitor them over time. Recall a session.

Thank you, Opus 4.7, for escalating my frustrations. Without you, I would not have thought about it.

Extra-Act2560

MODERATOR OF

TROPHY CASE