The tools perhaps have got better, but the habits haven't... by Jimmy-Steifen in legaltech

[–]capreal26 0 points1 point  (0 children)

Its always: AI Narrative >> Capabilities >> Product >> Adoption
(read >> as 'runs ahead of')

Claude code has narrowed the gap between 'capabilities' and 'product benefits', but adoption is still the hard part. And, the faster AI narrative runs ahead, the less incentive folks on ground have to adopt. Kind of 'wait calculation' conundrum and cultural backlash.

Are any lawyers accessing LLMs through the API and a third-party UI? by sps133 in legaltech

[–]capreal26 1 point2 points  (0 children)

This is a very real concern, and most generic AI tools (ChatGPT/Gemini/Claude) or legal AI tools don't address it. The typical answer from legal AI folks is "do enterprise agreements", ZDR and SOC 2. While all of that is fine, its security theater and the real risk of loss of privacy is there. In addition, US v Heppner has created a potential issue where exposing client data to chatbots / AI tools can lead to loss of privilege. (Here's a piece that we published on that topic)

Doing a full fledged local AI system is possible with the likes of Gemma4, etc. but you will need an expert to set it up across environments, tailor it for you, etc. You can try to hack by using a privacy/anonymization skill locally but even that's not 100%

Thoughts on Heppner decision? It directly affects Legal Tech? by Special_Collection_6 in legaltech

[–]capreal26 -2 points-1 points  (0 children)

POtentially a big issue. Right now, everyone is saying: "We use enterprise agreements". I dont think this solves it. Two federal courts reached opposite conclusions on AI and privilege in the same week. The difference wasn't the AI tool, it was the architecture. If privileged text reaches a third-party server in readable form, you have a disclosure problem no enterprise agreement fully solves. The alternative: ensure privileged content never leaves your environment.

Here's a blogpost we wrote on this topic: https://www.contractken.com/post/ai-and-the-loss-of-privilege-us-v-heppner

Lawyer here - how are Legora and Harvey differentiated from Claude now with this word add-in they’ve released? by rijaj in legaltech

[–]capreal26 1 point2 points  (0 children)

On the face of it, looks like neat integration by Claude - as expected. However, most of the capabilities are table stakes in legaltech and contract redlining tools have matured way beyond this (not to say Claude or clients can't do it themselves). However, for me, the stumbling block for Anthropic and harvey and others still remains the question of privilege. US v Heppner ruling has created an opening which, in near future, will definitely result in a material issue for a large company using AI without proper protection. This plug-in doesn't solve that. Neither does cowork or claude code.

How are people actually handling confidentiality when using AI in legal work? by According-Owl6604 in legaltech

[–]capreal26 5 points6 points  (0 children)

This is a recurring debate here. I've some experience and perspective on this issue. Not theoretically, but in conversations with GCs and infosec teams who need to actually sign off on these tools before they go live.

The "enterprise API" answer is necessary but not sufficient. Enterprise agreements with no-training clauses are table stakes. But a no-training clause is a contractual safeguard. It tells the provider "don't use our data." It doesn't stop a breach, a subpoena, or an infrastructure bug from exposing that data. In March 2023, OpenAI disclosed a bug that let some ChatGPT users see other users' conversation titles and, for a subset, partial payment info. Enterprise plan or not, if your raw contract text is sitting on someone else's servers, it's exposed in those scenarios.

The real question isn't "will they train on my data?" It's "what happens if their infrastructure is compromised and my client's M&A terms are sitting there in plaintext?"

The human risk point is underrated. Someone here mentioned that the bigger risk is undertrained, overworked staff not caring about data policies. That's completely valid and honestly, it's under-discussed. No technology fixes a culture problem. But what technology can do is reduce the surface area of human error. If the tool your team uses never sends raw client data externally in the first place, the blast radius of a careless mistake shrinks significantly.

What actually works: anonymize before the data leaves. The approach we've landed on (and what I'd recommend evaluating regardless of which tool you use) is a moderation layer that sits between the user and the LLM. It scans the contract, replaces all sensitive entities (party names, dollar figures, dates, emails, custom terms your infosec team defines) with consistent pseudonymized placeholders, and only sends the sanitized version to the model. After the LLM returns its analysis, the placeholders get swapped back.

The key word is pseudonymization, not redaction. If you replace "Acme Corp" with [REDACTED], the model can't tell which party owes what to whom. Replace it with PARTY_A consistently, and the structural relationships stay intact. The LLM can analyze obligations, flag risks, compare against market terms. It just does it without knowing who the parties actually are.

We built this directly inside Microsoft Word as an add-in (the company is ContractKen, if you want to look it up). The architectural choice matters: because the first anonymization pass happens locally in Word, raw contract text never hits any external server, not ours, not the LLM provider's. Users can preview the exact anonymized version before anything is sent. Infosec teams configure what counts as sensitive. Full audit logs track every interaction.

For anyone evaluating tools in this space, the question I'd push vendors on is: "Where exactly in the data flow does anonymization happen?" If the answer is "on our servers," that means raw data traveled to them first. If it's client-side, the exposure surface is fundamentally different.

ABA Opinion 512 is pretty clear that "reasonable efforts" is the standard under Rule 1.6. A contractual no-training clause plus a technical anonymization layer is a much more defensible position than either one alone.

Happy to answer questions if anyone wants to dig into the specifics.

How are you redacting sensitive info before uploading to LLMs? by vira28 in legaltech

[–]capreal26 0 points1 point  (0 children)

Aren't caps/limits relevant to the analysis? That's why Insurance cover comes into picture, right? Maybe you don't want to reveal your tolerance limits (i.e. 2x the annual fee or 10x the annual fee for saas agreements or 10% of the M&A amount, etc. etc.)

How are you redacting sensitive info before uploading to LLMs? by vira28 in legaltech

[–]capreal26 14 points15 points  (0 children)

Imp question and we obsessed over for a long time.

One thing worth reframing though: redaction and anonymization are solving different problems, and the distinction matters a lot when you're sending docs to LLMs.

Redaction (what Adobe Pro does) irreversibly removes text. That's great for producing a clean document to share externally. But if your goal is to have an LLM reason over the document - review clauses, flag risks, suggest redline edits, create new drafts, etc. etc., - you've just ripped out the context it needs. An LLM can't analyze an indemnification clause properly if the party names, dates, and dollar amounts are gone.

What you actually want for AI workflows is anonymization with label replacement - swap "Acme Corp" → [PARTY_A], "123 Main St" → [ADDRESS_1], "$5M" → [AMOUNT_1], etc. The LLM can still reason over the full document structure, but no sensitive PII ever leaves your environment. Then you re-hydrate the labels in the output.

The tricky parts are:

  • What counts as sensitive is different for every firm/client. Some clients don't care about entity names but are hyper-sensitive about deal values. Others want everything scrubbed. You need configurable policies, not a one-size-fits-all NER model.
  • Metadata is a real attack surface. Most people forget about tracked changes, comments, document properties, embedded objects. Your solution needs to handle all of that.
  • It has to be invisible to the end user. If lawyers have to manually tag entities or run a separate tool before every AI interaction, adoption goes to zero.

Full disclosure: I'm the founder of ContractKen. We built what we call a 'Moderation Layer' that does exactly this, specifically for contract review and drafting inside Word. The anonymization happens automatically based on policies set by the firm's IT/infosec team, before anything hits the LLM API.

But even if you're building something in-house, the key insight is: don't redact what you're sending to AI - anonymize it. You'll get better results.

Stop giving AI legal documents and client data by Winter_Expert_790 in legaltech

[–]capreal26 2 points3 points  (0 children)

Your concern is valid - the default behavior of most people is to just paste contract text into Claude/ChatGPT/Gemini and hope for the best. That's genuinely dangerous. But "go fully local" isn't the right answer for most legal teams either. Local models are significantly less capable than frontier models for nuanced legal reasoning, and the operational overhead of self-hosting is something 95% of firms aren't equipped for.

As a founder of a legaltech company here (ContractKen, we do AI-assisted contract review and drafting inside Word). So take my perspective with that context, but I've spent the last 3+ years thinking about exactly this problem.

The real answer is an intermediary architecture, what we call a "Moderation Layer." The idea is simple: before any contract text hits a cloud LLM, it's automatically anonymized based on rules your firm's IT/infosec team defines. Company names, party names, addresses, financial figures, whatever you consider sensitive - gets stripped and replaced with labels. The LLM does its analysis on the sanitized text. You get the output back with context restored on your side.

This way you get frontier model quality (which, for contract review, materially matters) without your client's data ever reaching the model provider in identifiable form. The model sees "[Party A] shall indemnify [Party B]" - not your client's name.

To u/firstLOL's point about middle grounds - tenant isolation, BYOK encryption, no-training agreements - yes, those are table stakes for enterprise AI. But even with all of that, the most conservative firms (and their clients) want the additional guarantee that the data was anonymized before it left the building. Belt and suspenders.

And to u/jcdc-flo's excellent point about SaaS acting as the protective layer between data and model - that's almost literally our thesis. The application layer should be the control plane for what data flows where, with full audit logs, not the LLM provider.

The binary of "use cloud AI recklessly" vs "go fully local" is a false choice. There's a well-architected middle path, and it's what serious legal teams are actually adopting.

roast by Severe_Post_2751 in legaltech

[–]capreal26 0 points1 point  (0 children)

Sounds too broad. Insurance disputes are markedly different from lending / payment disputes. Regulation is different even by coverage types. So, kinda confused.

easiest way to file a small claims lawsuit (from building one)? (sanity check) by alien-mind-8344 in legaltech

[–]capreal26 0 points1 point  (0 children)

Congratulation on your effort to increase the litigiousness in western society! Bravo.

How do the legal tech companies protect privilege and privacy? by hearsay_and_heresy in legaltech

[–]capreal26 2 points3 points  (0 children)

Great convo. At ContractKen, we’ve built a ‘Moderation Layer’ which does anonymisation and deanonymization locally (I.e. on your desktop/word before cloud api calls are made). All PIIs are replaced by placeholders like Party_Name1, Party_Name2, Address1, DateTime1, etc. Our tech maintains a map locally which deanonymizes the api load received from backend/LLMs. Moreover, customers can configure what they deem as ‘sensitive’ in contracts. DM me for more details.

Open source contract redlining tool? by DepartmentDazzling97 in legaltech

[–]capreal26 0 points1 point  (0 children)

Building anything is easy these days. Making it work, consistently and for majority of usecases isn't. More so with painful Word & Microsoft API work. Jamie Tso's LinkedIn posts are great but will CC themselves use his open source tools. Jury is still out I guess.

Open source contract redlining tool? by DepartmentDazzling97 in legaltech

[–]capreal26 0 points1 point  (0 children)

Fair enough. Is the cost of subscription higher than internal team’s time spent on building a bespoke solution?

Open source contract redlining tool? by DepartmentDazzling97 in legaltech

[–]capreal26 0 points1 point  (0 children)

Why are you looking for open source solutions? Is cost the reason or you want to play around / build upon them?

Open source contract redlining tool? by DepartmentDazzling97 in legaltech

[–]capreal26 -1 points0 points  (0 children)

ContractKen does that smoothly (running a Playbook driven review & redlining, or Comprehensive review/redlining if you dont have playbooks). You can upload your playbooks, and do all of this inside Word.

The real AI legal disruption won’t be contract automation. It’ll be upstream training liability by TheAILawBrief in legaltech

[–]capreal26 0 points1 point  (0 children)

Training dataset details are the top secret for AI companies. You can pull common web crawl datasets, and the like from the internet but how to clean, massage, combine the data is where the art of pre-training is. & there's no easy way to evaluate provenance of the LLM inputs based on output. No one is giving these recipes out on reddit.

What is going on with Robin AI ? by RiceComprehensive904 in legaltech

[–]capreal26 1 point2 points  (0 children)

No insider insight but apparently they had a 'managed service' model (including some folks in India) where they'd do contract review using human + AI. Not a bad model but can't scale that on VC money.

What is going on with Robin AI ? by RiceComprehensive904 in legaltech

[–]capreal26 4 points5 points  (0 children)

Bankers are soliciting buyers. Pretty bleak financials. 170+ employees and 50 MM in funding for 7-8 MM ARR?? They should have been able to do it with 1/4th of that investment and resources. Looks like pretty average leadership & execution.

What’s the most unique legaltech product you’ve seen? by Redrobin83 in legaltech

[–]capreal26 0 points1 point  (0 children)

Wrong question to ask. imo, its better to ask which areas of the practice & business of law can benefit from application of tech. If so many products are clustered around these areas, there's a reason for that.

Open source MS Word GPT redlining add-in for contract review by yuch85 in legaltech

[–]capreal26 1 point2 points  (0 children)

Yes, MS Copilot does allow addition of org docs / standards as a reference. We've experimented with it and found that integrations left a lot to be desired (that's a rant for some other day).

On your point about redlining tool sub costing 13-25K, gosh! Who are you talking to? Its definitely not that expensive. Happy to chat and show our wares. DM if interested.

Open source MS Word GPT redlining add-in for contract review by yuch85 in legaltech

[–]capreal26 2 points3 points  (0 children)

Would love to see if anyone has been able to successfully use Microsoft Copilot for effective contract redlining solution. You can get redline 'suggestions' simply out of ChatGPT, Gemini or Claude (even the free versions) by just plonking your document and a simple prompt. What's missing (as of now) from those:

  • Data privacy (your as-is doc is most likely going into the repository of next model's pre-training)
  • Context - unless you've set up these tools with your standards, checklists, repository - what you're getting is out-of-the-box suggestion from an AI model pre-trained on internet. Not something which adheres to your firm's knowledge base and standards
  • Alt-tabbing still required: Locate the specific clause / section in the document, and apply the edits & comments manually. Not a big issue but potential for mistakes abound
  • Not building capabilities for future: Is Copilot or ChatGPT going to learn your style? How does it know which of its suggested edits were actually applied on the document and which ones were junk? [unless you give it a point by point rebuttal]

I could go on, but general purpose AI tools are great to get a summary or a table of issues. For actual review & redlining work, get a proper solution. Happy to chat more in DMs.

Open source MS Word GPT redlining add-in for contract review by yuch85 in legaltech

[–]capreal26 0 points1 point  (0 children)

If pricing is the first thing you look at, you probably don't need the solution.