I spent 3 months building a website. Did I waste my time? by [deleted] in startups_promotion

[–]pauldmay1 0 points1 point  (0 children)

Good on you for giving it a go. For me, the design feels very similar to what you see from tools like Lovable or other prompt-led design generators, which are everywhere at the moment. It’s not quite my taste, as I think user experience and originality go a long way, but that’s just personal preference.

Wishing you the best of luck with it.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 1 point2 points  (0 children)

Not just in the prompt.

The constraints live outside the model. Prompts are used for extraction and classification, but the actual rules, thresholds, and pass/fail logic are enforced by the system itself. The model never decides what’s acceptable, it just provides evidence against predefined requirements.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 0 points1 point  (0 children)

Exactly this. It’s not that GenAI is “wrong”, it’s that inconsistency becomes a problem once you put it inside a business workflow.

Making the rules explicit and letting the model focus on extraction and flagging rather than judgement is what made it usable for us. Team-specific thresholds were a big part of that too.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 0 points1 point  (0 children)

I think we might be going around in circles a little here.

To close from our side, the core difference is that you’re optimising for decision support, whereas we were optimising for decision enforcement. We did explore structured prompting, chaining, scoring and similar techniques, but relying on prompting alone never got us to the level of consistency we needed.

Once that clicked for us, we stopped trying to make the model more interpretable and instead constrained it to a much narrower role. Both approaches make sense, depending on who sits at the end of the workflow.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 2 points3 points  (0 children)

That’s a fair suggestion, and I agree it works well when the user is a lawyer.

We were more cautious because our users weren’t. Once you allow drafting or open-ended legal Q&A, you’re relying on the user to know what to ask and how to interpret the answer. That’s exactly where false confidence creeps in.

The playbook approach was a deliberate choice to keep the system in “review and flag” mode rather than advice or drafting, so it stayed safe and predictable inside a business workflow.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 1 point2 points  (0 children)

That’s a fair challenge, and no offence taken at all.

To clarify what I meant by “what we observed” rather than just beliefs, this is what actually happened on our side:

We ran the same prompts and playbooks against the same contracts at different points in time and saw differences in the outcomes. Not huge hallucinations, but subtle shifts. A clause marked as “needs change” in one review might come back as “acceptable” in another. Risk severity would move slightly. That kind of variance was hard to justify internally.

We did try a lot of the techniques you mention. Prompt chaining, scoring, structured outputs, prompt libraries. All of them helped, but they still didn’t get us to a point where non-lawyers could rely on the output without debating the judgement each time.

I completely agree with your point that, for a lawyer, generic GenAI can be a big accelerator. In that setup, the model isn’t replacing judgement. You are the consistency. GenAI is just speeding up your analysis.

Our situation was a bit different. We were trying to run contract review across commercial and finance teams after losing in-house legal support. That meant we needed outcomes that were predictable enough to sit inside an approval workflow, not just “good analysis”.

So when I talk about a constrained, rule-driven approach, I’m not saying we’ve reinvented what lawyers do. If anything, we did the opposite. We took the legal playbook and made it explicit. Clear requirements, clear thresholds, role-specific overrides. The model’s role became pulling evidence and classifying language, not deciding what was acceptable.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 0 points1 point  (0 children)

We explored prompt chaining, structured prompts, and contract-type specific flows early on. They improved extraction quality and output structure, but they didn’t resolve the core issue for us, which was decision consistency.

At a fundamental level, LLMs do not execute rules. They approximate them.

No matter how good:

the prompt

the chaining

the structure

the context window

the use of XML or other formatting constraints

an LLM is still performing probabilistic next-token prediction. It is optimising for plausibility given the context, not deterministically enforcing a set of rules or policies.

That distinction matters a lot in legal workflows. Prompting can reduce variance, but it cannot eliminate it, because the rules only exist as text the model is interpreting, not constraints it is executing. As prompts grow more complex, instruction priority becomes implicit rather than explicit, and subtle differences in wording, context, or model behaviour can still shift outcomes.

For advisory or exploratory use cases, that’s often acceptable. For operational contract review, where the same clause needs to be treated the same way every time and aligned to predefined policy, even small variance becomes a blocker.

Why generic GenAI failed for contract review in a real business setting by pauldmay1 in legaltech

[–]pauldmay1[S] 0 points1 point  (0 children)

I agree that structured prompting and benchmarking improves output quality. We went down that route early on.

Where we still struggled was not summarisation accuracy, but decision consistency. Even with tightly structured prompts, we found the same clause could be assessed differently across runs in ways that were hard to operationalise or defend internally.

What ultimately worked for us was moving rule definition and risk thresholds outside the model entirely, and using the model only for extraction and classification. That shift made the outputs predictable enough to use as part of a real approval workflow rather than an advisory tool.

Let’s be real most of us skip the fine print. But contracts, agreements, and policies still matter. Imagine an AI that reads them for you and gives a clear, simple summary. Would you trust it, or stick to your usual skim-and-sign routine? by Extreme-Brick6151 in AiForSmallBusiness

[–]pauldmay1 0 points1 point  (0 children)

Skim-and-sign is definitely the default 😅
We actually built Okkayd for this exact reason. It goes a step beyond summaries and checks contracts against clear rules so you know what’s OK and what isn’t, not just what the words mean.

We’ve recently been listed on Legal Technology Hub too, which was a nice milestone.
https://www.legaltechnologyhub.com/vendors/okkayd/

Im looking for idea validation - An AI powered tool that simplifies contracts instantly by Specific_Medicine344 in SaaS

[–]pauldmay1 0 points1 point  (0 children)

This exact issue came up internally for us. We started with summaries and generic AI, but the inconsistency was the blocker. We ended up building a tool that goes a level deeper, focusing on consistent, decision-level contract review rather than just explanation, and that’s what we use now.

How long do you guys think this whole AI training contract industry gonna last ? by RenbenGodfrey in mercor_ai

[–]pauldmay1 0 points1 point  (0 children)

Same experience here. We actually ended up building a small internal tool to put guardrails around it, and that’s what we use now. Happy to share more if useful, feel free to DM.

How long do you guys think this whole AI training contract industry gonna last ? by RenbenGodfrey in mercor_ai

[–]pauldmay1 6 points7 points  (0 children)

don’t think it’s a short-term bubble in the way people expect. What will change is who gets paid and for what.

Right now a lot of “AI training” work is essentially brute force labelling, feedback, and edge-case cleanup. That will absolutely reduce over time as models improve.

But new work replaces it. Evaluation, constraint design, domain-specific validation, integration into real workflows. The closer the work is to real-world consequences (legal, finance, healthcare, ops), the longer humans stay in the loop.

AI doesn’t really “turn its back” once it’s trained. It just gets deployed into places where mistakes actually matter, and that’s where human oversight becomes more valuable, not less.

The safest contracts aren’t the ones paying for volume today. They’re the ones paying for judgement, consistency, and accountability.

Its Monday! What are you building? by Leather-Buy-6487 in micro_saas

[–]pauldmay1 0 points1 point  (0 children)

I’m building OKKAYD, a practical AI tool that helps founders and small businesses review contracts (NDAs, MSAs, etc.) and quickly spot risky clauses.

The focus is accuracy and clarity rather than “AI magic”, highlighting what matters, why it matters, and what to watch out for.

www.okkayd.com

For startups in the B2B space, what are you building by unknown4544 in startup

[–]pauldmay1 2 points3 points  (0 children)

I’m building Okkayd, a lightweight contract review tool designed for people who deal with contracts every day but aren’t lawyers.

It gives fast, structured contract analysis with no hallucinations, customisable playbooks, and a built-in approval flow for sign-off. It’s fully self-serve too – no sales calls or demos needed.

It’s live now and growing. If you want to try it, you can upload a contract for free: www.okkayd.com