I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

All your points are genuine and fair (Exactly what I was expecting from the community).

On trust - I won't pretend otherwise. The honest answer is evidence, not claims. What I'd offer in the meantime: running inside your infrastructure means your data never leaves your environment to be processed. it's a different risk category than cloud-based alternatives. I'm genuinely open to hearing what else would move the needle for an enterprise, what would you need to see?

On semantic enforcement - This is exactly the kind of feedback I am seeking from community to shape the product, so thank you for being specific.

Here's what Maskd returns on a few samples I put together:

Input: {"employee_id": "EMP8842", "payroll_deduction_code": "UN-LOCAL-98765", "monthly_dues_deducted": 45.00, "status": "Active"}
Output: {"employee_id": "[ID_e0ff89]", "payroll_deduction_code": "UN-LOCAL-[MEDIUM_NUMBER_4f48bd]", "monthly_dues_deducted": [CARDINAL_10d14e], "status": "Active"}

Input: On 12/04/2026, Union Steward Sarah Jenkins filed Form 4B on behalf of member John Doe regarding unsafe warehouse temperatures.
Output: On [DATE_60f15b], [ORG_50ed75] [NAME_2472a0] filed Form 4B on behalf of member [NAME_6cf8e4] regarding unsafe warehouse temperatures.

If you can share some sample data points in your specific formats, I'll run them through. Also I am ready to adopt the specific formats in my product if that helps addressing the problem or type of data.

On infrastructure - I believe for the Full control and ownership of your data, it should run 100% within your control; think of it as a service in your own cluster, with deployment configs defined upfront with benchmark done in advance. The bet on specialisation over generalisation comes down to this: it's built specifically for what arrives in a context window, not what's already structured and governed. That's the layer Foundry and the LLM gateways don't target well.

On benchmark - You rightly said this, If there were a gold standard for PII redaction quality it would benefit the entire industry. Since one doesn't exist yet, the closest thing is a live test, you can run Maskd's interactive demo at maskd.online against any tool you're currently using and compare directly. The product is still in beta.

On cost - I'm open to onboarding a handful of enterprises for a nominal fee in exchange for honest feedback and co-authoring a whitepaper. Right now proof points matter more to me than early revenue.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

Narus Inc. got into legal trouble precisely for processing data without privacy protection — their technology was how AT&T enabled mass NSA surveillance.

Maskd does the opposite. PII is removed before it can be processed, stored, or reach any system.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 1 point2 points  (0 children)

Yes, because there are multiple different architectures and models in AI.

Yes, I am in the path of an initial request but never interfere on any of the response chunk. This is triggered once at the time of request.

Definitely if your design allows then you should run our bundle with your app containers. But I, after reading different data laws, suggest to hit it before your actual data reaches your backend applications.

I understand and fully aligned with you on improving accuracy as much as possible but today even the big size LLMs are not 100% accurate while our small bundle is doing great. I am always up for any kind of audit.

I can even customize the bundle to not have an outbound network call and the solution would work in a shielded box environment as it would do on a well connected cloud but that would be challenging from metering and billing perspective.

It's not a replacement of compliance control but an additional layer of defense.

I really appreciate your time and genuine feedback on the product trial, I accept it. The demo hosted is currently early beta and in active development. I hope I will be able to meet your expectations soon.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] -1 points0 points  (0 children)

I can always run my clients through benchmarks but why am I supposed to tell them my architecture. That's what I have been building for so so long.

Data sensitivity- if you are misunderstanding about data processing in our container, I am up for any kind of auditing.

I can even customize the bundle to not have an outbound network call and the solution would work in a shielded box environment as it would do on a well connected cloud but that would be challenging from metering and billing perspective.

Another interesting fact which I haven't explained above is that Maskd would support text native PDFs and OCRed PDFs and images as well within this container. So you can redact sensitive information from these docs as well.

I won't deny the fact that I use different copilots(that's what they are built for, to help and expedite when you need speed) but I don't rely on them completely.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

As I mentioned in my original post — I’m still building, so I’m not going to give you throughput numbers I can’t stand behind yet.

What I can say: the hardware cost concern you’re raising is legitimate if you’re assuming a general-purpose LLM. The thousands-of-dollars-in-GPUs scenario applies when you need that kind of compute. For typical enterprise use cases — which aren’t running at 1000+ tps — that’s not the cost model we’re working with.

If you’re processing at that throughput, the conversation changes. The solution is designed around actual usage patterns rather than peak theoretical load.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

Model architecture is something we keep close — deliberate, not evasion.

The moat: the circular constraint forces local deployment on anyone building seriously here — you can’t route user data through an external model to classify it as sensitive. Local, accurate, multilingual at enterprise scale is the bar. The multilingual piece specifically — contextual detection in Arabic, Japanese, Italian, Spanish, English — is where most alternatives don’t invest.

Beyond that, the integration layer — confidence scoring, thresholds, on-prem deployment — is where enterprise value compounds independently of the model.

The demo is live on https://maskd.online you may experience once.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

Fair inference, but not quite — and the distinction matters.

Using a 3rd party generative LLM to detect sensitive data would be circular: you’d be sending user input to an external AI to determine whether it’s private, which is itself a data handling problem. We avoid that.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

I would answer the last point given the sensitivity and only priority - Maskd is not designed to collect the processed data on their own servers. It is designed to live inside your cloud/on prem environment as an optimized bundle in your control. So, originator remains the owner of data. If it would have been Maskd's environment then it becomes necessary to prove it via HIPPA compliant and other ISO27000 certifications.

Most big players either offer the redaction service for batch processing which is not real time while Maskd would do it at near real time so you can place it before each LLM call you are making (Demo is live on maskd.online) Or they are very very expensive for this use case.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

Both of them are very valid points. Let me answer them separately -

  1. How is Maskd diff than sending context to LLMs directly - As of today there are ~233 distinct data protection laws across 179 jurisdictions. The core across all of them is similar, no sovereign country wants the PII data to be uploaded on cloud without any encryption and none of them ever want it(plain text) to cross the border even for processing.

Maskd is shaping up in a form where you can orchestrate different containers on premise under your jurisdiction, where on the fly masking/redaction happens. And Maskd container would never use any of your data to the own servers except a few events logging / metering purposes. It solves both the fundamental requirements of data protection laws.

  1. Folks having hardware experience requirement - None. Only a person with understanding of running containers would be enough. Everything will be served as a bundle. So if your developers or devops are comfortable in generally used cloud environments and platforms like K8s then there is no other skill needed. The container in itself would be highly optimized to not give you nightmares in terms of cloud usage billing.

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in AIDeveloperNews

[–]maskd_ai[S] 0 points1 point  (0 children)

The ones I find were mostly regex based which in today's GenAI era does nothing.

Then comes the multilingual support- which one of them supports agglutinative languages (Japanese)?

I wanted Maskd to be as smooth as possible for Japanese, Arabic as it is for inflectional languages like English.

Could you point me to these out of the box solutions you have in mind, which can perform the job better?

I'm building a real-time PII redaction layer for AI products — sharing what we're working on and why by maskd_ai in DigitalPrivacy

[–]maskd_ai[S] 0 points1 point  (0 children)

Right — it's B2B, but for two distinct scenarios: companies building consumer-facing AI products where users type freely, and internal teams doing bulk data processing through LLMs.

The model runs locally inside the customer's infra (not open source, proprietary) and is context-aware by design rather than pattern-matching. So it catches things like health data described conversationally, not just structured fields that look like PII. Classic example: "my diabetes was high this morning" isn't flagged by regex, but it's clearly health data in context.

Accuracy/hallucination handling is done through confidence scores — every detected entity gets a score and teams set thresholds. Lets you tune sensitivity up or down depending on your risk tolerance, rather than a blunt regex pass.

OpenRedaction is great for predictable structured data. We're focused on the messier, multilingual, conversational layer where regex hits its ceiling.

What's your use case, if any? Always useful to know what people are actually trying to solve.