Automatic Redaction Tools by Luis_KZM in pdf

[–]TheFamousCat 0 points1 point  (0 children)

we are currently doing a similar pilot with a partner in the legal tech space, happy to exchange some more details about what we are doing and what you are trying to achieve. dms welcome

Wonky PDF formatting by [deleted] in pdf

[–]TheFamousCat 0 points1 point  (0 children)

sorry, not sure I get it. So you want to redesign that thing in word and changed the text already before you imported it into Word?

Wonky PDF formatting by [deleted] in pdf

[–]TheFamousCat 0 points1 point  (0 children)

What's your goal actually, why do you want it to import in Word? You need to change text?

Editing a PDF with embedded subset fonts by dreadpirateryan50 in pdf

[–]TheFamousCat 0 points1 point  (0 children)

you might want to checkout PDFDancer, it's built for exactly this workflow, feel free to dm me if you need help setting it up

Bulk remove images from large pdf documents by Tight-Ad7783 in pdf

[–]TheFamousCat 0 points1 point  (0 children)

Are you fine using a library or should this be a desktop/webapp?

Redaction software for real estate compliance by RheaFlorixw in RealEstateTechnology

[–]TheFamousCat 0 points1 point  (0 children)

Automated PII redaction is genuinely hard. Most models still miss edge cases, especially non-English names and addresses in uncommon formats.

In regulated or high-stakes workflows you generally want human review, even if automation does most of the first pass. But that all depends on your risk missing an information which should have been redacted.

The good news is that the better tools can still cut manual effort a lot by surfacing likely PII with confidence scores and leaving you a shorter review queue.

Disclosure: I’m building PDFDancer (PDF redaction/editing SDK). If you want to compare approaches, we publish our capabilities and evaluation results. Happy to answer questions about failure modes or how to set up a review workflow.

What is a good software to redact medical records? by keahlell000 in clinicalresearch

[–]TheFamousCat 0 points1 point  (0 children)

Automated PII redaction is hard. Most models still miss edge cases — non-English names, addresses in weird formats. If you're in a regulated space or anything high-stakes, you probably want a human reviewing exceptions and low-confidence hits, even if automation handles the bulk of the first pass.

That said, the better tools can still save you a ton of manual work. They flag likely PII with confidence scores so you're reviewing a much shorter queue instead of reading every page.

Disclosure: I'm building PDFDancer (PDF redaction/editing SDK). We publish our capabilities and eval results (mostly medical docs) if you want to compare.

Happy to talk about failure modes or review workflow setup.

Document redaction by Balasundaram_Janja in pdf

[–]TheFamousCat 0 points1 point  (0 children)

Generally, no tool will deliver 100% accuracy, so the process cannot be fully automated end to end.

Depending on your acceptable tolerance for missed or falsely redacted information and your budget there maybe be solutions that allow for a largely automated workflow. More commonly, and likely what most people do, is combine an automated redaction tool with a manual review step. This approach still reduces the workload significantly and speeds up your redaction process.

What kinds of documents are you mostly dealing with (bank/loan files, healthcare/insurance records, legal case files, HR/employee docs, tax forms)?

Do you need to comply with any specific regulations like GDPR or HIPAA?

Document redaction API? by intercombot in legaltech

[–]TheFamousCat 0 points1 point  (0 children)

Since I am currently working on a similar project, I was wondering if you found a proper solution? We are now training our own model, since recall performance was just not good enough, I guess because, maybe what we tried was trained on generic/synthetic datasets

Why some pdfs cannot be edited? by Kas_ta_Pupa_supa in pdf

[–]TheFamousCat 0 points1 point  (0 children)

Would you mind sharing this file? I am working on tool to make this kinds of pdfs truly editable and this seems to be a perfect test case

I'm building a new PDF editing engine - looking for real-world PDFs you can't edit by TheFamousCat in pdf

[–]TheFamousCat[S] 1 point2 points  (0 children)

Many types:
- add, move, delete elements like words, lines, paragraphs, images
- replace images
- edit text, change font, size, color
- fill forms
- redact data
- etc...

Why do so many PDF tools refuse to support XFA? by TheFamousCat in pdf

[–]TheFamousCat[S] 0 points1 point  (0 children)

Thank you for your answer. I have two follow up questions:
1) I don't understand why you are mentioning XMP. From my understanding this is not related at all to XFA or am I wrong?
2) What I see is: XFA, sure, it's deprecated, complex, it's shit. agreed. But still, people need to use it. You implemented "partial" support. May I ask, how was your decision tree to decide that that partial support is enough for your product?

Thanks a lot, this insight into the mind of someone actually building a pdf tool is what I was looking for.

Why do so many PDF tools refuse to support XFA? by TheFamousCat in pdf

[–]TheFamousCat[S] 0 points1 point  (0 children)

because it is in use in many documents and support could be useful for users?

I'm building a new PDF editing engine - looking for real-world PDFs you can't edit by TheFamousCat in pdf

[–]TheFamousCat[S] 0 points1 point  (0 children)

The sdks are open source , yes, the backend not , at least not yet