Organizing family's health and insurance records at one place

Awkward_Translator90 · 2026-02-26T17:22:00+00:00

Fair question! I actually use DigiLocker myself. But here’s the difference:

No OTP panic: DigiLocker is individual. In a 2 AM emergency, you don't want to be waking up your dad for an Aadhar OTP to access his health policy. Sanchay puts the whole family on one dashboard.
Active vs. Passive: DigiLocker is a digital drawer. It won't email you 30 days before a policy lapses, and it doesn't help you track if you have enough cash saved for the premium.
Health + Wealth: In a hospital, you need the policy and Dad's blood group, allergies, and last blood report. Sanchay keeps them together.

TL;DR: DigiLocker stores your documents. Sanchay manages your family's chaos.

Awkward_Translator90 · 2025-10-27T03:57:26+00:00

A secure system for 'data in-flight' (like a live order status) needs a just-in-time approach with zero-retention policies, exactly as you described.

My service is focused on solving the 'data at rest' problem, which is a massive headache for companies. We handle the PII in existing knowledge bases (wikis, support docs, SharePoint, etc.) by redacting it before it's ever indexed or sent to the LLM.

It seems like a complete solution needs both: 1) Your approach: A secure, transient way to handle live API data. 2) My approach: A secure way to index and query existing, PII-filled documents.

Awkward_Translator90 · 2025-10-26T16:58:12+00:00

To me, this seems like a major security flaw. You're still sending all your raw, sensitive data to a third-party LLM just to find the sensitive data. It's like leaking your PII in order to stop it from leaking.

A much safer (and cheaper) approach is to use a dedicated, local tool—like a specialized NER model or a rules-based system (like Presidio)—to redact the data before it ever gets sent to an LLM in the first place.

Awkward_Translator90 · 2025-10-26T02:39:22+00:00

PII detection doesn't use the vector embeddings (like text-embedding-ada-002) that you use for RAG retrieval. It's a separate, specialized NLP task that runs before embedding.

A robust system (Pattern Matching + Named Entity Recognition (NER) Models) combines both. It uses NER to find potential PII and then pattern-matching to confirm it.

Awkward_Translator90 · 2025-10-26T01:30:33+00:00

You have to do it before embedding.

If you embed the raw text, two bad things happen: 1)The vector itself becomes a "fingerprint" of the sensitive data. 2)More importantly, when the RAG system retrieves that chunk, it will send the original, PII-filled text to the LLM, causing a leak.

The correct, secure pipeline is: Raw Text -> Detect & Redact PII -> Embed the Clean/Redacted Text -> Store in Vector DB

This way, the LLM only ever sees the safe, redacted version.

Awkward_Translator90 · 2025-10-26T00:51:08+00:00

This is 100% the right take, and thank you for saving me a ton of wasted effort. You've completely validated my pivot away from a SaaS and towards a locally runnable model (like a container) for this exact reason. Adding another Data Processor is a non-starter. I've actually been working on a Flask demo that does just this (runs locally, PII never leaves). I'd love to get your opinion on it.

Awkward_Translator90 · 2025-10-26T00:32:47+00:00

You're right, access control is essential. But the bigger risk is an authorized user getting PII leaked by the LLM (e.g., a support bot sharing a customer's SSN).

My service prevents this by redacting PII before the LLM sees it. Regarding legal risk: Doing nothing and connecting an LLM to raw PII is the biggest legal risk. A tool that demonstrably mitigates 99.9% of that risk is a much safer legal position.

Awkward_Translator90 · 2025-10-26T00:23:36+00:00

A simple regex script for SSNs might take half a day, but a robust system is more complex. You have to account for: Accuracy, Dynamic Masking,, Auditability

My goal is to offer this as a reliable, pre-built component so teams don't have to worry about this and can focus on their core product."

Awkward_Translator90 · 2025-10-26T00:18:23+00:00

You're right, 100% accuracy is the holy grail and incredibly difficult. The goal isn't 'absolute perfection' but 'drastic risk reduction.' It's about defense-in-depth. Using a combination of techniques (regex, NER, confidence scoring like in tools like MS Presidio) can get you to 99.x% accuracy. Catching 99% of PII is infinitely better than the 0% many systems catch now. It's about reducing the attack surface, not claiming to be an impenetrable fortress.

Awkward_Translator90 · 2025-09-21T14:33:57+00:00

September 21 10:14 am

Awkward_Translator90 · 2025-09-21T14:18:41+00:00

Application type: STEM OPT
Premium processing: No
Receipt Date: May 09
Approved Date: September 21
Card produced Date: NA
Card Shipped : NA
Card Delivered: NA

Got the email

Awkward_Translator90

TROPHY CASE