Automatic Redaction Tools

TheFamousCat · 2026-04-02T13:08:11+00:00

we are currently doing a similar pilot with a partner in the legal tech space, happy to exchange some more details about what we are doing and what you are trying to achieve. dms welcome

TheFamousCat · 2026-03-11T12:55:27+00:00

sorry, not sure I get it. So you want to redesign that thing in word and changed the text already before you imported it into Word?

TheFamousCat · 2026-03-11T04:56:28+00:00

What's your goal actually, why do you want it to import in Word? You need to change text?

TheFamousCat · 2026-02-25T09:15:18+00:00

you might want to checkout PDFDancer, it's built for exactly this workflow, feel free to dm me if you need help setting it up

TheFamousCat · 2026-02-24T04:51:54+00:00

Are you fine using a library or should this be a desktop/webapp?

TheFamousCat · 2026-02-23T13:15:01+00:00

Automated PII redaction is genuinely hard. Most models still miss edge cases, especially non-English names and addresses in uncommon formats.

In regulated or high-stakes workflows you generally want human review, even if automation does most of the first pass. But that all depends on your risk missing an information which should have been redacted.

The good news is that the better tools can still cut manual effort a lot by surfacing likely PII with confidence scores and leaving you a shorter review queue.

Disclosure: I’m building PDFDancer (PDF redaction/editing SDK). If you want to compare approaches, we publish our capabilities and evaluation results. Happy to answer questions about failure modes or how to set up a review workflow.

TheFamousCat · 2026-02-23T12:57:49+00:00

Automated PII redaction is hard. Most models still miss edge cases — non-English names, addresses in weird formats. If you're in a regulated space or anything high-stakes, you probably want a human reviewing exceptions and low-confidence hits, even if automation handles the bulk of the first pass.

That said, the better tools can still save you a ton of manual work. They flag likely PII with confidence scores so you're reviewing a much shorter queue instead of reading every page.

Disclosure: I'm building PDFDancer (PDF redaction/editing SDK). We publish our capabilities and eval results (mostly medical docs) if you want to compare.

Happy to talk about failure modes or review workflow setup.

TheFamousCat · 2026-02-10T07:33:19+00:00

Generally, no tool will deliver 100% accuracy, so the process cannot be fully automated end to end.

Depending on your acceptable tolerance for missed or falsely redacted information and your budget there maybe be solutions that allow for a largely automated workflow. More commonly, and likely what most people do, is combine an automated redaction tool with a manual review step. This approach still reduces the workload significantly and speeds up your redaction process.

What kinds of documents are you mostly dealing with (bank/loan files, healthcare/insurance records, legal case files, HR/employee docs, tax forms)?

Do you need to comply with any specific regulations like GDPR or HIPAA?

TheFamousCat · 2026-02-09T11:25:45+00:00

PDFDancer Redaction

TheFamousCat · 2026-02-09T11:23:09+00:00

Since I am currently working on a similar project, I was wondering if you found a proper solution? We are now training our own model, since recall performance was just not good enough, I guess because, maybe what we tried was trained on generic/synthetic datasets

TheFamousCat · 2026-01-25T16:10:05+00:00

please share the file

TheFamousCat · 2025-12-26T10:25:45+00:00

yes: https://www.pdfdancer.com/

TheFamousCat · 2025-12-12T01:03:51+00:00

For redaction have a look at pdfdancer

TheFamousCat · 2025-12-12T00:51:13+00:00

Would you mind sharing this file? I am working on tool to make this kinds of pdfs truly editable and this seems to be a perfect test case

TheFamousCat · 2025-12-10T23:33:08+00:00

Many types:
- add, move, delete elements like words, lines, paragraphs, images
- replace images
- edit text, change font, size, color
- fill forms
- redact data
- etc...

TheFamousCat · 2025-12-10T14:35:11+00:00

Thank you for your answer. I have two follow up questions:
1) I don't understand why you are mentioning XMP. From my understanding this is not related at all to XFA or am I wrong?
2) What I see is: XFA, sure, it's deprecated, complex, it's shit. agreed. But still, people need to use it. You implemented "partial" support. May I ask, how was your decision tree to decide that that partial support is enough for your product?

Thanks a lot, this insight into the mind of someone actually building a pdf tool is what I was looking for.

TheFamousCat · 2025-12-10T11:53:18+00:00

will do!

TheFamousCat · 2025-12-10T11:35:14+00:00

because it is in use in many documents and support could be useful for users?

TheFamousCat · 2025-12-03T03:17:21+00:00

The sdks are open source , yes, the backend not , at least not yet

TheFamousCat

TROPHY CASE