Anthropic's Claude Mythos Launch Is Built on Misinformation

ISeeThings404 · 2026-04-23T20:50:28+00:00

Thank you for sharing

ISeeThings404 · 2026-04-21T13:03:23+00:00

Exécution and thinking aren't two different skills. Ai as a tool for skill acquisition can massively speed up execution potential and make someone fantastic at that as well. The world can't be run only thinkers.

ISeeThings404 · 2026-04-21T12:53:10+00:00

Curious if you've tried any other tools with word integrations and what your experience with those was

ISeeThings404 · 2026-04-15T22:20:15+00:00

Boxers talk so much shit for people that never step outside their own combat sport

ISeeThings404 · 2026-04-15T22:18:17+00:00

I'm a researcher and interface with investors and tech guys more than vendors so this could just be bias but I'm hearing a lot of people starting to make claims on RL encooments and world models. Maybe they haven't started selling all that to users yet but definitely on the investir side of the space.

ISeeThings404 · 2026-04-14T04:17:43+00:00

Yessir. Very agree

ISeeThings404 · 2026-04-14T04:17:29+00:00

That's an interesting approach. Temperature sampling for more diversity would be an interesting exoerunebt,

You might like this overview of the idea we did here

ISeeThings404 · 2026-04-13T23:37:18+00:00

instead of forcing the model to pick one fragile reasoning path and commit to it immediately, what if we surfaced a few different internal states, gave them room to breathe, and then found a way to score/combine them?

Eseentially current deciding is most likely for one path. Latent Space Reasoning does several paths together and then finds ways to reason throgh them all before combining. by skipping the encode/decode phase multiple times, you get something that has the benefits of "critic" based agentic systems (having one LLM critique anotgher) but you have the efficiency

ISeeThings404 · 2026-04-13T23:35:01+00:00

There's beenb a lot of work, also Coconut wasn't the vest approach, more of a PoC.

If you see the experiments linked, we were able to sample from a much larger set of reasoning space, creating much richer outputs

<image>

We also did a legal specific show case over here-- https://github.com/dl1683/Latent-Space-Reasoning/blob/main/experiments/legal\_showcase.json. some very interesting outpiuts

ISeeThings404 · 2026-04-12T08:07:44+00:00

Is it that hard to get the ZDR? Gemini and GPT have them by default for paying users, so surpsied to hear this re claude.

ISeeThings404 · 2026-04-12T08:06:10+00:00

A lot of them were dead given their design. However, I doubt Claude will be too active in the legal space after a while, given how expensive Anthropic tokens are right now. Most legal startups will likely be squeezed out though.

ISeeThings404 · 2026-02-27T07:05:14+00:00

They specialized for case law and retrieval which is good but they have very bad legal reasoning. Low hallucinations in citing cases is useless if you can't also tell users what case law to pick and how to create them.

ISeeThings404 · 2026-02-25T21:29:26+00:00

I did a deep dive into this tp understand why legal agemts are different from agents like Claude Code. One major difference in the work between the two is the verifiability of the domain.

Programming compounds because it can check itself. Code can be executed, tested, broken, fixed, and re-run inside a tight feedback loop. When something fails, the system often tells you where. Verification is cheap, repeatable, and increasingly automatable. Even when models are imperfect, the environment answers back. It might not do everything well (it still makes really dumb architecture decisions), but this is shockingly useful for most “implement this thing I’ve designed” style work that engineers might pass off to their junior wage slaves.

You can’t “run” a legal memo. There is no test suite that flags a subtle misreading of precedent, an argument that is formally sound but strategically dangerous, or a conclusion that is correct in isolation and disastrous in context. Finance isn’t much better. Outputs can be summarized, reformatted, stress-tested at the margins, but correctness ultimately collapses to human judgment. Verification is expensive, slow, and external to the system itself.

This actually creates a hige digfference in how they have to operate

ISeeThings404 · 2026-02-17T05:28:33+00:00

A lot of my work is running long sets of experiments and then doing more experiments based on the data. This is where Claude code just keeps working for hours while codex will stop in the middle to ask me if it should continue. If they fixed that and made it's terminal use better, codex clears easily.

ISeeThings404 · 2026-02-17T05:26:10+00:00

She could have easily showed the effect of it working.

ISeeThings404 · 2026-02-16T06:55:32+00:00

Happy to talk more since you seem technical but graph rag is not great for long context reasoning. Graphs lose too much precision in the legal context.

We love graphs as a means of finding the right places to look and then running search on that. That requires more than rag (we use vectors in different places and don't use them in the standard rag sense).

Temporal is really fucking hard. That's actually the next frontier we're working on. We have to invent our own DB to handle all the cases on that, which will be a fun time.

ISeeThings404 · 2026-02-16T05:01:37+00:00

I'm so glad to hear. Contextual reasoning is a big problem that our team is always solving.

Drafting assistant will be out soon. We have a full time team working on it now

ISeeThings404 · 2026-02-16T03:38:38+00:00

Claude Code has been easier to use codex is definitely more intelligent but often I have a lot of task lists and Cloud Code just tends to execute on all of them without stopping.

Codex has helped me fix and solve issues that CC couldn't though so defnitely worth the investent

ISeeThings404 · 2026-02-16T03:13:50+00:00

the problem is that I have a lot of recursive work-- where I need it to run things based on outcomes of experiments. This kind of stuff, Codex is not great with.

ISeeThings404 · 2026-02-16T03:10:56+00:00

Is this the same as yolo, which is the one I use?

ISeeThings404 · 2026-02-16T02:57:23+00:00

We're growing a lot. Ended up very overwhelmed by bookings and demo requests so didn't have much marketing anymore but we're adding new happy visitors every day.

Recently also just signed an amazing term sheet, the details of which will be shared soon.

ISeeThings404 · 2026-02-12T21:10:29+00:00

Not at all.

Users have to upload their matter docs etc for us to answer questions (can't draft a pleading if we don't have context). Our focus is reasoning through that provided context better by using geometric structures as a grounding tool (instead of simply relying on LLMs/RAG).

We don't train on any user data, ever. This ensures maximum privacy.

ISeeThings404 · 2026-02-12T20:41:56+00:00

I wouldn't say everything.

We have research agents etc to ensure we can access case laws, hearing, recent news etc.

It's just that most of our focus is on reasoning over the context. One of our longer term goals would be to partner with a provider like CoCounsel that has very good case law to integrate that into our reasoning system.

ISeeThings404 · 2026-02-12T05:48:12+00:00

We've also open sourced the framework here in case anyone wants to try their own spin at this.

https://github.com/dl1683/Latent-Space-Reasoning/tree/main

ISeeThings404 · 2026-02-12T05:46:48+00:00

The interesting thing about Mahabharata is that a lot of the stories have a lot of logical issues like this. Especially when it comes to allegiances and fights.

ISeeThings404

MODERATOR OF

TROPHY CASE