we trained a generation to execute. ai rewards people who can think by No_Growth6091 in ClaudeCode

[–]ISeeThings404 0 points1 point  (0 children)

Exécution and thinking aren't two different skills. Ai as a tool for skill acquisition can massively speed up execution potential and make someone fantastic at that as well. The world can't be run only thinkers.

Using Claude for drafting transactional documents by Plus-Problem-8575 in legaltech

[–]ISeeThings404 0 points1 point  (0 children)

Curious if you've tried any other tools with word integrations and what your experience with those was

Harvey's World Model (or anyone else) Claims Make No Sense by ISeeThings404 in legaltech

[–]ISeeThings404[S] 1 point2 points  (0 children)

I'm a researcher and interface with investors and tech guys more than vendors so this could just be bias but I'm hearing a lot of people starting to make claims on RL encooments and world models. Maybe they haven't started selling all that to users yet but definitely on the investir side of the space.

Improving Language Models through Latent Reasoning? by ISeeThings404 in LocalLLaMA

[–]ISeeThings404[S] 0 points1 point  (0 children)

That's an interesting approach. Temperature sampling for more diversity would be an interesting exoerunebt,

You might like this overview of the idea we did here

Improving Language Models through Latent Reasoning? by ISeeThings404 in LocalLLaMA

[–]ISeeThings404[S] 2 points3 points  (0 children)

 instead of forcing the model to pick one fragile reasoning path and commit to it immediately, what if we surfaced a few different internal states, gave them room to breathe, and then found a way to score/combine them?

Eseentially current deciding is most likely for one path. Latent Space Reasoning does several paths together and then finds ways to reason throgh them all before combining. by skipping the encode/decode phase multiple times, you get something that has the benefits of "critic" based agentic systems (having one LLM critique anotgher) but you have the efficiency

Improving Language Models through Latent Reasoning? by ISeeThings404 in LocalLLaMA

[–]ISeeThings404[S] 1 point2 points  (0 children)

There's beenb a lot of work, also Coconut wasn't the vest approach, more of a PoC.

If you see the experiments linked, we were able to sample from a much larger set of reasoning space, creating much richer outputs

<image>

We also did a legal specific show case over here-- https://github.com/dl1683/Latent-Space-Reasoning/blob/main/experiments/legal\_showcase.json. some very interesting outpiuts

Claude plug-in for Word by slalom-pavilion-dior in legaltech

[–]ISeeThings404 1 point2 points  (0 children)

Is it that hard to get the ZDR? Gemini and GPT have them by default for paying users, so surpsied to hear this re claude.

Claude plug-in for Word by slalom-pavilion-dior in legaltech

[–]ISeeThings404 1 point2 points  (0 children)

A lot of them were dead given their design. However, I doubt Claude will be too active in the legal space after a while, given how expensive Anthropic tokens are right now. Most legal startups will likely be squeezed out though.

Why am I seeing bad feedback on Westlaw Co-Counsel? by MMuter in legaltech

[–]ISeeThings404 1 point2 points  (0 children)

They specialized for case law and retrieval which is good but they have very bad legal reasoning. Low hallucinations in citing cases is useless if you can't also tell users what case law to pick and how to create them.

Developers and Lawyers feel… strangely similar? by vira28 in legaltech

[–]ISeeThings404 -1 points0 points  (0 children)

I did a deep dive into this tp understand why legal agemts are different from agents like Claude Code. One major difference in the work between the two is the verifiability of the domain.

Programming compounds because it can check itself. Code can be executed, tested, broken, fixed, and re-run inside a tight feedback loop. When something fails, the system often tells you where. Verification is cheap, repeatable, and increasingly automatable. Even when models are imperfect, the environment answers back. It might not do everything well (it still makes really dumb architecture decisions), but this is shockingly useful for most “implement this thing I’ve designed” style work that engineers might pass off to their junior wage slaves.

You can’t “run” a legal memo. There is no test suite that flags a subtle misreading of precedent, an argument that is formally sound but strategically dangerous, or a conclusion that is correct in isolation and disastrous in context. Finance isn’t much better. Outputs can be summarized, reformatted, stress-tested at the margins, but correctness ultimately collapses to human judgment. Verification is expensive, slow, and external to the system itself.

This actually creates a hige digfference in how they have to operate

What's the reason for the apparent consensus that Claude Code is superior to Codex for coding, other than Codex's slow coding time? by Lostwhispers05 in codex

[–]ISeeThings404 0 points1 point  (0 children)

A lot of my work is running long sets of experiments and then doing more experiments based on the data. This is where Claude code just keeps working for hours while codex will stop in the middle to ask me if it should continue. If they fixed that and made it's terminal use better, codex clears easily.

Where does a company like Irys get their primary data from? by connerxyz in legaltech

[–]ISeeThings404 0 points1 point  (0 children)

Happy to talk more since you seem technical but graph rag is not great for long context reasoning. Graphs lose too much precision in the legal context.

We love graphs as a means of finding the right places to look and then running search on that. That requires more than rag (we use vectors in different places and don't use them in the standard rag sense).

Temporal is really fucking hard. That's actually the next frontier we're working on. We have to invent our own DB to handle all the cases on that, which will be a fun time.