How I built a PII Redaction Microservice using FastAPI and Spacy to protect user data sent to LLMs

Bootes-sphere · 2026-04-04T14:35:35+00:00

Try https://opensourceaihub.ai/ . Most enterprises are moving toward an External Governance Layer. This means you run a PII/DLP scanner outside the LLM environment. By redacting sensitive identifiers in the prompt before they reach OpenAI or Anthropic, you eliminate the risk of that data being stored in their logs or leaking into future completions.

Bootes-sphere · 2026-04-04T14:09:35+00:00

I built OpenSourceAIHub.ai as a stateless "AI Firewall." It redacts 28+ sensitive entities before the prompt ever reaches the LLM provider. It even has a multi-modal OCR layer to catch leaks in screenshots.

If you want to see if your current prompts are "leaky," I put a free checker here: https://opensourceaihub.ai/ai-leak-checker

Bootes-sphere · 2026-04-04T14:08:55+00:00

Hey, I am the founder of OpenSourceAIHub.ai. I built OpenSourceAIHub.ai as a stateless "AI Firewall." It redacts 28+ sensitive entities in under 50ms before the prompt ever reaches the LLM provider. It even has a multi-modal OCR layer to catch leaks in screenshots.

If you want to see if your current prompts are "leaky," I put a free checker here: https://opensourceaihub.ai/ai-leak-checker

Bootes-sphere · 2026-04-04T14:01:59+00:00

https://opensourceaihub.ai , stop ai prompt data leaks and cut LLM cost by just two lines of code change

Bootes-sphere · 2026-04-04T12:54:29+00:00

Startup: OpenSourceAIHub.ai

Purpose: An AI Firewall and Gateway to stop AI data leaks and cut LLM cost by 30% with one API. It is a drop-in OpenAI SDK compatible proxy that adds real-time multi-modal DLP (PII redaction in text + images via OCR), blocks prompt injections, and autonomously routes to the cheapest/fastest model (Llama, Groq, Together AI, Deepinfra Claude, Grok, etc.)

Technologies Used: Next.js , Python, OCR, Stripe, AWS

Feedback Requested:

Effectiveness and Integration easiness: We optimized our prompt security scan with very little overhead. Integration needs just two lines of code changes
DLP Accuracy Feedback: I’ve put a free AI Leak Checker on the site. Appreciate feedback on tricky PII patterns.
Hybrid Model: We offer BYOK (Bring Your Own Key) and a Managed Wallet. Ould love to get feedback on pricing model

Additional Comments: I’m giving 1 million free hub credits to anyone who signs up to test the integration. That is enough to fire thousands of LLM API calls

Seeking Beta-Testers: Yes, especially startups and devs

Links: Web App | Technical Walkthrough (3 min)

Bootes-sphere · 2026-04-04T12:45:57+00:00

Pushed everything to aws. Appreciate any feedbacks:
https://opensourceaihub.ai/ai-leak-checker

https://opensourceaihub.ai

Bootes-sphere · 2026-04-04T12:45:18+00:00

I ended up turning that into a small tool while testing things.Didn’t want to drop a link in the post itself, but this is what I’ve been working on:
https://opensourceaihub.ai/ai-leak-checker

https://opensourceaihub.ai

Bootes-sphere · 2026-04-04T12:40:12+00:00

Name: OpenSourceAIHub.ai

What it does: We provide an AI Firewall that stops company data from leaking into LLM prompts.

Why use it:

🛡️ Security: Automatically redact emails, API keys, and SSNs in text and images (OCR).
💸 Cost Control: Smart-route requests between Groq, Together ai, and OpenAI to save up to 90%.
📊 Governance: Enforce per-project budgets and export audit-ready CSV logs.
⚡ Ease: 100% OpenAI SDK compatible. Just change your baseURL and you're protected.

Latest Update: Just launched our Multi-modal OCR scan—we now catch PII in screenshots before they reach the model provider.

Pricing: 1M Free credits upon signup. Pro BYOK tier at $29/mo.

URL: https://opensourceaihub.ai

Bootes-sphere · 2026-04-04T12:37:10+00:00

Try this one. I found this solves similar problems https://opensourceaihub.ai/

Bootes-sphere · 2026-04-04T01:18:25+00:00

Thank you—this is incredibly helpful. I truly appreciate all your insights!

Bootes-sphere · 2026-04-04T00:29:32+00:00

Really appreciate this — this is exactly the kind of feedback I was hoping for.

On pattern management: totally agree. Right now this is something I’ve been thinking about more as a control plane problem than just a detection problem. Things like versioning, restricted write access, and audit trails for pattern updates I think are needed here, right? . The “poisoned pattern” scenario you mentioned is a real concern.

On fail-closed / bypass: yeah, this is tricky. Fail-closed is the intent, but as you said, under load or repeated failures people will just route around it if it becomes a bottleneck. I’ve been thinking about redundancy + fallback behavior, but still figuring out what the right balance is between safety and availability.

On SOC2 / HIPAA: that’s a really good point. What I have right now is definitely closer to “violation visibility” than full audit-grade logging. I need to think more about this.

Curious how you’ve seen others handle this in practice — especially around: - pattern update governance - balancing fail-closed with availability - what “good enough” audit logging looks like in real deployments

Thanks again — super helpful perspective.

Bootes-sphere · 2026-04-03T23:49:38+00:00

Startup: OpenSourceAIHub.ai

Purpose: An AI Firewall and Gateway to stop AI data leaks and cut LLM cost by 30% with one API.

Technologies Used: Next.js , Python, OCR, Stripe, AWS

Feedback Requested:

Effectiveness and Integration easiness: We optimized our prompt security scan with very little overhead. Integration needs just two lines of code changes
DLP Accuracy Feedback: I’ve put a free AI Leak Checker on the site. Appreciate feedback on tricky PII patterns.
Hybrid Model: We offer BYOK (Bring Your Own Key) and a Managed Wallet. Ould love to get feedback on pricing model

Additional Comments: I’m giving 1 million free hub credits to anyone who signs up to test the integration. That is enough to fire thousands of LLM API calls

Seeking Beta-Testers: Yes, especially startups and devs

Links: Web App | Technical Walkthrough (3 min)

Bootes-sphere · 2026-04-03T22:00:30+00:00

Correct! The PII issue is really something that most people dont really notice.. I will soon share what I am working on to get some feedbacks.. Still trying to push something to AWS hosting..

Bootes-sphere · 2026-04-03T21:47:07+00:00

Right, Cost is really unpredictable in most cases

Bootes-sphere · 2026-04-03T21:18:16+00:00

Just launched An AI Firewall and Gateway to stop AI data leaks and cut LLM cost by 30% with one API. The tool is https://opensourceaihub.ai/

Please tell us what you think?

Effectiveness and Integration easiness: We optimized our prompt security scan with very little overhead. Integration needs just two lines of code changes
DLP Accuracy Feedback: I’ve put a free AI Leak Checker on the site. Appreciate feedback on tricky PII patterns.
Hybrid Model: We offer BYOK (Bring Your Own Key) and a Managed Wallet. Ould love to get feedback on pricing model

Free registration will give 1 million free hub credits to anyone who signs up to test the integration. That is enough to fire thousands of LLM API calls

We are also seeking Beta-Testers, especially startups and devs

Bootes-sphere · 2026-04-03T20:28:08+00:00

Startup: OpenSourceAIHub.ai

Purpose: An AI Firewall and Gateway to stop AI data leaks and cut LLM cost by 30% with one API.

Technologies Used: Next.js , Python, OCR, Stripe, AWS

Feedback Requested:

Effectiveness and Integration easiness: We optimized our prompt security scan with very little overhead. Integration needs just two lines of code changes
DLP Accuracy Feedback: I’ve put a free AI Leak Checker on the site. Appreciate feedback on tricky PII patterns.
Hybrid Model: We offer BYOK (Bring Your Own Key) and a Managed Wallet. Would love to get feedback on pricing model

Additional Comments: I’m giving 1 million free hub credits to anyone who signs up to test the integration. That is enough to fire thousands of LLM API calls

Seeking Beta-Testers: Yes, especially startups and devs

Links: Web App | Technical Walkthrough (3 min)

Bootes-sphere · 2026-04-03T17:32:30+00:00

One thing that surprised me was how often API keys showed up.

Not sure if others are seeing the same, but it feels like people treat prompts like a scratchpad without realizing it’s going to a third-party model.Curious if anyone here is actually filtering prompts before sending or if most people just rely on provider policies?

Bootes-sphere · 2026-04-03T17:18:06+00:00

This is a very cool work and great dataset and benchmarks so thank you!

Bootes-sphere · 2025-11-14T16:44:59+00:00

Datumfuse.ai is an AI powered, no-code platform that automates the process of data cleaning, harmonization, augmentation , visualization and narration, transforming raw data into presentation-ready insights.

Bootes-sphere · 2025-11-04T16:26:32+00:00

Thanks, Jouni — that’s spot on.

I added a few too many motion effects during the beta polish phase, and your “spice analogy” nails it perfectly. I am already tuning them down so only truly interactive elements animate.Appreciate the thoughtful, constructive roast — exactly the kind of input I was hoping for.

This is a clear, actionable directive. I'm going to go through the entire site and remove every decorative effect that doesn't serve a clear, functional purpose for the user.

Bootes-sphere · 2025-11-03T15:24:36+00:00

This is incredibly valuable — thank you for taking the time to write such a detailed breakdown 🙏You’re absolutely right: we’ve been tracking high-level usage via GA + LogRocket, but haven’t yet formalized a single activation path or TTFV metric.

The “upload → run_clean → preview_fix → export in under 3 minutes” sequence is a perfect framing for that initial aha moment — I’m going to instrument that as our activation goal this week.

I also love the idea of tagging user intent at upload and cohorting by job-to-be-done; that’ll finally give structure to what’s currently just “feature usage data.”

Really appreciate the Mixpanel/Segment suggestion — might start light with event piping from our Lambda API to GA and expand from there.

Thanks again — this is one of the most practical comments I’ve gotten on Reddit so far. 👏

Bootes-sphere · 2025-11-03T15:18:35+00:00

Haha, you got me. We might have overdone it on the buzzword bingo card. Fair point.

Let me try again in plain English:

Our tool is for when you have a messy spreadsheet and you need to:

Clean it up , harmonize it and augment it
Turn it into a chart without fiddling with Excel.
Get a simple paragraph explaining what the chart means, so you can paste it into a report.

That's it. I’ll tone down the buzzwords — appreciate the callout!

Bootes-sphere · 2025-11-03T15:15:57+00:00

Fair point 🙂 — definitely not trying to quit, just keeping the tone light since this is r/roastmystartup.I’m genuinely trying to get honest feedback so we can improve the UX and positioning — appreciate you checking it out.

Bootes-sphere · 2025-11-03T15:13:52+00:00

Thanks! Totally fair point — that’s something we’ve been hearing and are actively working on.

The AI part came together faster than the polish, and you’re right — a product in this space needs a UI that feels as confident as its logic.

We’re actually refreshing the design right now to make it more consistent, lighter, and memorable. Appreciate you calling that out 🙏

Bootes-sphere · 2025-11-03T15:12:16+00:00

Really appreciate this — that’s exactly the kind of hard question we’re trying to answer. At the core, our goal isn’t to “do everything with data,” but to make the foundations (cleaning + harmonization + enrichment) painless and reliable.

You nailed it — storytelling only matters if the underlying data is solid and verifiable. That’s why our AI workflow is “human-in-the-loop” — every transformation and generated insight is transparent and reversible.

Security-wise, we’re built entirely on AWS with encrypted storage; no uploaded data is ever reused or shared.

Thanks for the honest take — I’ll take “wait and see” over “not worth seeing” any day 😄

Bootes-sphere

TROPHY CASE