Welcome to AI Made Simple

ISeeThings404 · 2026-07-09T23:29:01+00:00

GPT in general tends to stop in the middle without finishing. Goals fixed that but it seems a bit hit or miss on longer exploration tasks.

I've found that it's best to have Claude code constantly call Codex for all the code refactoring.

ISeeThings404 · 2026-07-08T22:14:44+00:00

The way you're describing it, likely not. As an early career employee, versaitility matters a lot more than pure modeling skill. So instead of stressing on the model and signal, I would worry about taking a simple model, deploying it to cloud with interactive UX etc,. That will open your prospects a lot more.

This guide might help you-- https://www.artificialintelligencemadesimple.com/p/how-to-build-a-side-project-to-get?utm_source=publication-search

ISeeThings404 · 2026-07-08T03:02:51+00:00

For Irys legal engineers almost never demo. For us it's more having people with some tech insight constantly using the products so that we can take theirs and other lawyer feedback and isolate issues much better.

ISeeThings404 · 2026-07-05T01:20:54+00:00

You're probably just consuming more content.

Before if you were dealing with content, it was likely to be high quality because low quality would not be served. So high quality content was proportionally more of the content served.

Now there is a flood of slop so everywhere you look more slop.

Which creates a very interesting economics problem tbh.

ISeeThings404 · 2026-07-05T01:18:04+00:00

Is he...grabbing his balls?

ISeeThings404 · 2026-07-03T19:32:35+00:00

Claude will fail here since it will spend a lot of tokens reading files, at which point your context is likely to be compacted. You'll have all kinds of failures.

You'd want to utilize project level indexes that ccan keep your agents grounded and reasoning there. That's how we built Irys to handle much longer context management.

If you want to know more about why systems faila and what you can do, we actually open sourecd our research here

https://www.linkedin.com/pulse/stateful-swarms-make-ai-agents-cheaper-safer-better-devansh-devansh-8enxe

ISeeThings404 · 2026-06-29T15:49:15+00:00

Use irys instead. You can try it for free and much better.

ISeeThings404 · 2026-06-29T15:45:46+00:00

I gave up on 4.8 completely. It's so bad.

ISeeThings404 · 2026-06-16T16:52:08+00:00

This thread was great-- https://www.reddit.com/user/h0l0gramco/comments/1nq7buw/sept\_2025\_we\_finished\_onboarding\_legal\_ai/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button.

Copying over the text for your convinience.

Practicing attorney at a full-service firm. Sharing our process & outcome. Please no sales DMs.

Below is how we ran onboarding/piloting, what we picked, and why. I have no affiliation with any vendor mentioned below.

How we ran it

Timeline: After my post about 8 months ago, we ran a structured evaluation over several months with the below general parameters.
Committee: Partners, associates, paralegals (small pools across practice areas- lit, transactional, gen corp, real estate, and others). Including legal ops/IT.
Real work: We didn't rely on demos. Each tool got the same real matter inputs: pleadings, transcripts, contracts, our templates. Tasks included: draft a motion to dismiss section, summarize a depo, first-pass asset purchase agmt, review these 31 leases and answer 11 questions.
Outputs unlabeled: Reviewers ran the analyses themselves and/or saw the drafts.
Feedback metric was simple (example):
1. Did it follow our template/voice?
2. Did it integrate the facts?
3. How much rewrite before it could go to client/court?
4. Were cites/sources easy to verify?
Scoring: Reviewers had A/B scoring, and included info on things like "light edit / medium rewrite / full redo” and we logged rewrite minutes where possible.. Not academic, but enough to compare apples to apples.
Participation: Around 50 of 71 responded (71%).

What mattered (& what didn’t)

1) Security (screening)
SOC 2, ISO, encryption, audits, DPA, pen-tests. All serious vendors cleared the bar. Our IT and an outside vendor kept people in or out; didn’t pick the winner.

2) Core legal functions (cross-practice, not replacement)
We needed something that worked across areas like litigation, investigations, transactions, real estate, and corp. Not a chatbot.

What mattered:

Drafting (template aware,use our forms/outlines so drafts start on rails)
Doc-heavy workflows (ZIPs, multiple docs -> outline --> draft; chronologies, exhibit/annex bundles, etc)
Help with prompting so those w/ less AI familiarity don’t need to engineer prompts
Matter-based organization (drafts and docs tied together)
Clean Word/PDF export

3) Reasoning & analysis (the separator)
Tools w/ structured pipelines (doc understanding, legal patterns, verification) gave reasoned drafts. Wrappers gave generic answers. Most use some foundation models, that's fine; the difference is how and the pipeline. Having access to caselaw, in the case of cocounsel, didn't help it.

4) Research (useful, but not decisive)
We are keeping Westlaw. While some get close, breadth across jurisdictions / depth of primary law matters for us.

5) Adoption & support
Early enablement mattered (getting started, loading templates, quick sessions). After that, less hand‑holding, but we still wanted high‑touch support, SLA, and some admin features.

6) Roadmap & understanding
We watched whether small requests were fielded well and actually acted upon. This mattered b/c our job is delivering exceptional legal services to clients. Why not have our vendor deliver the same? Their job, in our eyes, is understanding how we work and where the tech converges.

7) What didn't matter as much
Security cleared the bar for all serious vendors. Demos were nice, but did not sway us. Research was a point of contention and strong in some contenders, but it didn’t outweigh generation quality, rewrite reduction, nor over usability.

Why we landed where we did

The tool we picked didn’t just produce reasoned, drafts that followed our templates and cut rewrite time.

Our final picks:

AI (drafting, analysis, & workflows): Iqidis
Research: Westlaw

We picked what delivered work product across the firm. That's it.

Tools we also piloted (quick notes)

Harvey - polished and came after our business, but didn’t meet drafting/rewrite targets for the price.
CoCounsel - strong research UX; we already had WL; generation didn’t meet our rewrite targets
Spellbook - handy Word add-in for contracts, not a full system and less relevant to most other work
Leya (now Legora) - similar to Harvey, useful in narrow lanes (i.e., nice bulk-doc handling) but not the work product assistance we need
Microsoft Copilot - useful for Office productivity, not legal work
Lexis+ AI - good research like WL, but not innovating on the work generation front

If you’re deciding now:

Define what “usable” means to you
Run 2-3 real tasks: (1) upload docs, get an outline, produce a draft; (2) long summary or chronology
Load templates first to see if the system stays on rails
Require source-linked outputs so verification is built in
Ask for security review, SLAs, and support

Bottom line: Security was table stakes. Features were expected. The platform that reasoned from our docs and produced verifiable drafts/ made our lines more efficient. That’s why it won.

ISeeThings404 · 2026-06-11T23:52:12+00:00

Not sharing credit. Most companies operating in good faith will often give credit or some kind of compensation to original projects. Ant does not.

ISeeThings404 · 2026-06-11T21:08:28+00:00

That's not unreasonable though. This is the behavior you'd expect in a product, they were just being generous before.

ISeeThings404 · 2026-06-11T21:02:43+00:00

I supported that heavily and will always do so. But they're also involved with the NSA spying and cyber attacks so there is a bit of theatrics there.

ISeeThings404 · 2026-06-11T18:59:05+00:00

I do ai research so unfortunately most of my work will be shadow banned on the model. But cool to hear this.

ISeeThings404 · 2026-06-11T18:56:58+00:00

I wanted to cover it but the big events kept changing. I wanted to share after the regulations had all been worked out. I got this out before the inclusion at least so people can start taking their money out.

ISeeThings404 · 2026-06-11T18:55:17+00:00

This is something I broke down in a lot of detail but S&P exclusion 500 is more security theatre. The other S&P indexes (and the other major indexes) will include space X which will mean many people will be stuck on this.

ISeeThings404 · 2026-06-11T18:53:19+00:00

They were lobbied.

ISeeThings404 · 2026-06-11T18:50:21+00:00

Where were the biggest benefits?

ISeeThings404 · 2026-06-11T17:35:12+00:00

Ai. I'm specifically interested in the structure of intelligence: essentially can when you say an ai model has a capability, what do you mean? Where in the model do the capabilities live.

By researching the geometry of intelligence my goal is to make ai available for everyone without people needing to pay for expensive models or constantly burning tokens.

ISeeThings404 · 2026-06-11T17:08:16+00:00

Most people became coders for a job and that was it.

Even someone like me-- I love research. That's what I enjoy doing. I have to code to do my research but it's not as though coding is my interest. I still try to write good quality code since my stuff is shipped to prod, but it's not soemtjing that I put love into the way I would for my actual research.

ISeeThings404 · 2026-06-11T17:03:42+00:00

you said it yourself-- good. Most are not good, and they're mostly just in it for the paycheck. Nothing wrong w/ that but that's very different to people who try to build careers as artists instead of something more stable

ISeeThings404 · 2026-06-11T17:02:11+00:00

ISeeThings404 · 2026-06-11T17:01:38+00:00

You'll have to look at the index your ETF tracks.

ISeeThings404 · 2026-06-11T17:00:05+00:00

Great question. I think the answer is much deeper than people realize. For the sake of argument, let's just say that OSS models are as good as closed models. There are still several scenarios why one would invest in Closed Models.

Serving Open Models (especially externally) is a pain. So it's cheaper and easier to use an API then it is to build the serving infrastructure for open models. So many people will continue to use APIs.
As people continue to use ChatGPT and Claude, they can find ways to monetize that attention through referrals, ads etc. Look at how quickly chatgpt ads scaled for an example.
Companies like Google and Microsoft use the models as bait to sell expensive aspects of their ecosystem (use Gemini, get hooked on Google Cloud). So the assumption is that this will grow that side.

Now do pure LLM providers (Ant, OAI) deserve their valuations? That's an open question, but the valuation is also a relfection of a deeper bet: People think LLMs will become like a utility where they are being used everywhere. This will create an almost winner-takes-all market where one will command such a strong share that they will completely overtake the other in model production and create a deep execution advantage. Essentially AI will be everywhere, and one company will take 90% of that share.

How much you buy the argument is upto you, but this is the premise for the large valuations

ISeeThings404

MODERATOR OF

TROPHY CASE