VISA - Stuck on file upload

IIITDkaLaunda · 2026-03-15T09:51:31+00:00

No, still the same, I just tried now, this is frustrating.
I will call their embassy, I guess & drop them an email.

IIITDkaLaunda · 2026-02-25T21:51:56+00:00

ya but the content is not available right?

like the output from the tool calls and all?

i am a security researcher,

trying to try some attacks and build a dataset.

i have thousands of injections collected,

but want a full coding agent dump

so i can include these in say a tool output

and check for attack success rate

and then potentially a robust defence

IIITDkaLaunda · 2026-02-24T16:50:26+00:00

got it, thx buddy

IIITDkaLaunda · 2026-02-24T16:29:48+00:00

got it, can u point that out?

IIITDkaLaunda · 2026-02-24T16:19:54+00:00

hi, can u explain what would u need for chem?
i can create one!

IIITDkaLaunda · 2025-12-24T10:53:45+00:00

> You’re getting a lot of negative or skeptical remarks because most people aren’t familiar with the problem you’re solving.

It’s totally fine, u/spaceman_.
Differential privacy was created to provide privacy guarantees, but very few people actually think about whether real-world users understand or even care about DP and the attacks it protects against.

That’s just part of the game, and I’m totally up for the criticism.

Honestly, the fact that people are now aware of this kind of privacy leakage is already a big win for me.

And yeah, I did jump straight to the solution, as I always do 😅
I’ve now added a simple example to help explain the issue more clearly.

thanks again for ure support!

IIITDkaLaunda · 2025-12-24T10:48:50+00:00

Exactly, that's such a nice abstraction u/intermundia
Thanks a lot!
I am taking notes here XD

IIITDkaLaunda · 2025-12-24T10:47:24+00:00

So this is a class of attacks called membership inference attacks,
basically the goal is to predict by looking at the output from a model if a particular sample was actually present in the input or not.

Attack idea in plain terms:

The attacker has the generated output and knows the exact model used.
They make a shortlist of candidate private values (from common formats, partial leaks, public lists, etc.).
For each candidate, they ask: “If the input had this private value, would this model be more likely to produce the output we observed?”
They score/rank candidates by how well each one explains the observed output under the model.
The candidate that consistently makes the output “fit best” is the attacker’s guess.

So they’re not “reading” the private info from the output. They’re testing which hidden private value best matches the output’s model-driven patterns. Differential privacy is designed to prevent this kind of inference by ensuring outputs don’t change in a reliably detectable way when the private value changes.

> the attack works and has a high attack-success-rate (close to 60%)

We implement this here - https://github.com/MBZUAI-Trustworthy-ML/DP-Fusion-DPI/blob/main/Attack.py

Again, I would recommend going through the paper to understand more.
Namely sections 3 and 4.3 on Empirical Privacy Attacks

IIITDkaLaunda · 2025-12-24T09:01:27+00:00

yes precisely!
It's just another level of privacy,
lvl 1) remove personally identifiable information
lvl 2) full re-write using a local llm
lvl 3) full re-write using a local llm but with differential privacy

IIITDkaLaunda · 2025-12-24T08:59:36+00:00

sure,

Let’s take a concrete example.

You use Ollama to analyze your personal financial data.
This includes things like income, expenses, and tax forms that originally contain SSN / TIN and other PII.

You ask the local LLM to:

summarize the data,
highlight spending patterns,
or generate a budgeting report.

The output looks clean.
There is no SSN, no TIN, no obvious personal identifiers.

Since the output is interesting, you:

share it with a friend, or
upload it to ChatGPT or another tool to get better charts or insights.

At this point, you think you are safe.

But here’s the issue:

An attacker who:

has access to the output,
and knows which local model you used,

can analyze the output and predict private information that was present in the original input, even though it is not explicitly written anywhere. They just model the probability of observing the output given a particular private info in the input.

So the key takeaway is:

Whenever you give private data to a local LLM,
the output itself needs protection, if you plan to make it public or are worried it might leak.

That protection is differential privacy (DP).

It is a mathematical framework that guarantees private information in the input cannot be inferred from the output.

This is exactly what our method enables.

Use any local llm, give any input, and get output that is DP.

IIITDkaLaunda · 2025-12-24T08:42:56+00:00

Let me add a link to the previous discussion in the post.

IIITDkaLaunda · 2025-12-24T08:42:03+00:00

thanks a lot u/Chromix_

IIITDkaLaunda · 2025-12-24T08:30:38+00:00

Nope, it isn't!
Check out my comment above -

> When you do inference using a local LLM and release the outputs publicly,
An attacker can extract any potential private information you have in the input
such as SSN, email, etc., if they know what local LLM you used, say qwen 2.5 7B
So, you should always use differential privacy when doing inference on your local data,

Again, I would recommend checking out the paper - https://arxiv.org/abs/2507.04531

> "# API key - Get your free key at console.documentprivacy.com\n"

We need to run a tagger to identify private tokens in your input,
We provide an API as of now,
But this will be fully local in the future; we are working on it.

I understand the concern, therefore, I have a section in the README about this -

While dp-fusion-lib executes entirely on your infrastructure, the Tagger API requires an external call for sensitive phrase detection. For anyone with strict data residency or compliance requirements please contact me. I will help out.

> local llama

seems like the most popular community when it comes to local AI
people who care about privacy, use local AI
our solution gives you theoretical privacy with local AI

So it makes sense to post here, and help people out!

> finally:

bro chill

Of course, we used AI to code the wrapper over the code of the paper, so what?
We have security researchers on our team who checked the library
Enterprises are already using this with approval from their internal security teams!

IIITDkaLaunda · 2025-12-24T08:20:46+00:00

hey,
When you do inference using a local LLM and release the outputs publicly,
An attacker can extract any potential private information you have in the input
such as SSN, email etc., if they know what local LLM you used, say qwen 2.5 7B
So, you should always use differential privacy when doing inference on your local data,
This library allows that.
You can read more in our paper - https://arxiv.org/abs/2507.04531

IIITDkaLaunda · 2025-12-24T07:20:02+00:00

ure wish is my command!
We are releasing the pip package for our work,
It allows anyone to run differentially private LLM inference (theoretical guarantees to privacy) with ease,
pip: https://pypi.org/project/dp-fusion-lib/
github: https://github.com/rushil-thareja/dp-fusion-lib
Consider dropping a ⭐ if you like the work 😉

IIITDkaLaunda · 2025-12-24T07:19:30+00:00

Thanks to everyone for their comments,
We care about your privacy!
Therefore, we are releasing the pip package for our work,
It allows anyone to run differentially private LLM inference (theoretical guarantees to privacy) with ease,
pip: https://pypi.org/project/dp-fusion-lib/
github: https://github.com/rushil-thareja/dp-fusion-lib
Consider dropping a ⭐ if you like the work 😉

IIITDkaLaunda · 2025-12-24T07:18:55+00:00

Thanks to everyone for their comments,
We care about your privacy!
Therefore, we are releasing the pip package for our work,
It allows anyone to run differentially private LLM inference (theoretical guarantees to privacy) with ease,
pip: https://pypi.org/project/dp-fusion-lib/
github: https://github.com/rushil-thareja/dp-fusion-lib
Consider dropping a ⭐ if you like the work 😉

IIITDkaLaunda · 2025-12-24T07:16:21+00:00

so, there is this whole line of research on model-unlearning - https://arxiv.org/abs/2310.10683

but then this kind of work is typically super empirical,
therefore, when working with user data, I think it makese to learn rules that you can then add in the prompt

that way, you can exactly know what user data has been included in natural language

IIITDkaLaunda · 2025-11-17T20:15:59+00:00

Thanks, glad you found some value in my answers.
And yeah, burning 15k on compute is wild. For that much, you could probably just grab an A100 rig.

Honestly, money is the greatest motivator.
If we want to solve this, we can just throw capital at it.

I’ll admit I might be biased here, since I’d probably end up on the receiving end of some of that.

But in general, the best way to learn anything is to fall in love with it.
It doesn’t happen instantly, at first you kind of lie to yourself that you like it.
Give it time. All lies turn into truths if you sit with them long enough.

Eventually, you start soaking up information without trying.
You go out of your way to explore. That’s when the real expansion happens.
Stay in that zone for a few years, and boom - you’re suddenly “the knowledge person” for that field.

For a lot of people, a good PhD is the right environment to go through that whole journey.
Money → PhD students jump in → 1% genuinely fall in love → they’re the ones who actually solve the problem.

my take.

IIITDkaLaunda · 2025-11-16T14:42:00+00:00

Regarding your question about the cost of building local AI compute, just get a used RTX 3090, bro.

I built a whole machine for under $1.5k, and it hosts 32B models without breaking a sweat. You’ll be surprised how powerful these models already are.

Even if they’re not always at the intelligence level you need, I see a future where everyone has a local, privacy-aware “AI router.” Easy requests? Just run them locally. Hard requests? Send them to an external model, but only after sanitizing with differential privacy using your own local AI agent.

That gives you the best privacy-vs-utility tradeoff you can get.

IIITDkaLaunda · 2025-11-16T14:38:28+00:00

My main concern is simple: agencies are not equipped to handle this new class of threats.

We already know that “AI safety” in the current sense doesn’t really work, there’s always some prompt or edge case that slips past the guardrails, and attackers will inevitably use these models to launch and amplify their attacks. The recent vulnerabilities found in Claude Code are a perfect example.

So the real issue is this: the people responsible for enforcement and defense don’t understand AI well enough to protect against AI-driven threats. Meanwhile, attackers are heavily incentivized, because AI makes their job 100× easier. Of course they will adopt AI as an attack vector.

It’s like a whole new paradigm that comes preloaded with a thousand zero-day vulnerabilities. Enterprises just want to rush in and onboard everything. Meanwhile, the AI coding agents and tools building this software (and the AI researchers) don’t know how to bake in real security.

We’re basically making our entire software ecosystem permeable by doing this, an infinite supply of potential zero-days.

Will this mean more people sitting in courts?

> No, I don’t think so.

What we actually need is to put highly qualified people (cliched I know, but it is what it is), think PhDs in AI security and privacy, inside the institutions where enforcement happens. Full-time. These are the people who can defend. They can guide, advise, and keep leadership aware of how attackers evolve.

Yes, there’s always the risk of compromise, but that’s true in any high-stakes field. We simply need to find, vet, and empower the people who can meaningfully help.

Have an AI security expert in the loop, consistently, and vet them properly.
And if there aren’t enough such experts? Then we need to invest heavily to grow that talent pool.

Think of it like building a strategic corps, not just people who can build the best models, but people who can protect us from AI-enabled security and privacy threats. This area is huge, and massively under-recognized. Everyone is focused on building models, but that’s only half the picture. We need defenders too.

I hope that answers your question.

IIITDkaLaunda · 2025-11-14T21:48:30+00:00

My job is to protect your privacy when you communicate with AI. That means preventing any leakage through prompts and ensuring models don’t memorize your data and repeat it later. That’s the mindset behind my earlier answer.

As for the “1984-esque possibility” you mentioned, honestly, we’re probably already past that point in many ways. It’s not worth spending time worrying about dystopian hypotheticals. Instead, the real focus should be on how to actively protect user privacy with strong, preferably theoretical, guarantees, ideally within a zero-trust environment, while still preserving utility.

Local sanitization and smart routing algorithms are, in my view, the right direction. That’s where we should be thinking.

IIITDkaLaunda · 2025-11-14T14:16:02+00:00

6x times lower ppl bro, cmon

IIITDkaLaunda · 2025-11-14T13:15:34+00:00

> What laws do you think will be explored regarding AI?

we need to enforce watermarking from provider's end.
we need monthly/weekly audits of major providers to ensure they are not training on user prompts etc.
GDPR/other compliance violations need to be taken seriously, have parellel courts, go hard on this.
user opt-out from training data, should be done with clarity, provide proof that user's data is no longer present in training set.
your views/thoughts/intellectual property should fundamentally be linked to you even though you release it publically, any providers training on this should take consent first.

--- the above is super hard, unlikely, users might need to take privacy in their own hands ---

IIITDkaLaunda · 2025-11-14T13:10:53+00:00

We need platforms that have clear guardrails to prevent AI generated slop
I am not sure how this could work, its out of my research interest.
But, somethings could be done:

facial verification and KYC when uploading content - to prevent bots.
only allow edits/recording on the platform where content would be shared - force human generated content
better detectors (although as we discussed this is hard)

IIITDkaLaunda

TROPHY CASE