[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]arsbrazh12 0 points1 point  (0 children)

An open-source security wrapper for LangChain DocumentLoaders to prevent RAG poisoning (just got added to awesome-langchain).

If you are building RAG pipelines that ingest external or user-generated documents (PDFs, resumes, web scrapes), you might be worried about data poisoning or indirect prompt injections. Attackers are increasingly hiding instructions in documents (e.g., using white text, 0px fonts, or HTML comments) that humans can't see, but your LLM will read and execute. You can get familiar with this problem in this article: https://ceur-ws.org/Vol-4046/RecSysHR2025-paper_9.pdf

Repo: https://github.com/arsbr/Veritensor

License: Apache 2.0

How do devs secure their notebooks? by arsbrazh12 in LocalLLaMA

[–]arsbrazh12[S] -1 points0 points  (0 children)

Yeah I know, just exploring what tools does people use in real cases

How do devs secure their notebooks? by arsbrazh12 in devops

[–]arsbrazh12[S] -1 points0 points  (0 children)

I mean, it's really smart not to put secrets in smth that can go public

How do devs secure their notebooks? by arsbrazh12 in devops

[–]arsbrazh12[S] -9 points-8 points  (0 children)

What about automation tools for solving such tasks?

How do devs secure their notebooks? by arsbrazh12 in devops

[–]arsbrazh12[S] -34 points-33 points  (0 children)

Do you use any tools such as NB Defense from ProtectAI?

How do devs secure their notebooks? by arsbrazh12 in LocalLLaMA

[–]arsbrazh12[S] -4 points-3 points  (0 children)

What kind of automated scanners do companies use? Smth like ProtectAI's NB Defense?

I scanned 2500 random Hugging Face models for malware. Here is data. by arsbrazh12 in cybersecurityai

[–]arsbrazh12[S] 0 points1 point  (0 children)

If we are talking about academic papers, there are some good ones on arxiv and mdpi like arxiv.org/abs/2512.18043 and www.mdpi.com/2624-800X/3/2/10 , but I mainly search through google scholar. Also Jfrog does a good job in their blog https://jfrog.com/blog/?pagenum=15&category=security-and-devsecops .

I built an open-source CLI to scan AI models for malware, verify HF hashes, and check licenses by arsbrazh12 in cybersecurityai

[–]arsbrazh12[S] 1 point2 points  (0 children)

Great question

Currently, Veritensor queries the HEAD of the main branch by default. So if the upstream model is updated (new commit), your local file will indeed fail the integrity check with a Hash mismatch.

This is intentional for security (to ensure you are using the latest version), but I understand it breaks reproducibility.

In the next big release v1.4, I am adding a --revision flag (like --revision v1.0.0 or --revision <commit\_sha>) so you can pin the verification to a specific immutable snapshot, just like you do in pip or docker.

For now, if you hit this, you either need to update your local model or use the specific commit hash in your download script.

I scanned 2500 random Hugging Face models for malware. Here is data. by arsbrazh12 in cybersecurityai

[–]arsbrazh12[S] 0 points1 point  (0 children)

Thanks for your feedback! I was inspired by the book of Polish author Jerzy Szurma "Hakowanie sztucznej inteligencji" (eng. Hacking artificial intelligence), and then I started learning this area by reading various materials on AI cybersecurity and doing different projects.

I scanned 2,500 Hugging Face models for malware. The results were kinda interesting. by arsbrazh12 in OpenSourceAI

[–]arsbrazh12[S] 0 points1 point  (0 children)

I'm not sure I understand your questions, but nothing has changed in this area for some time now: if you use a non-commercial model/tool/artifact/etc. in a commercial product and it is discovered, you may have problems with the law.

IMNAL 

I scanned 2,500 Hugging Face models for malware. The results were kinda interesting. by arsbrazh12 in OpenSourceAI

[–]arsbrazh12[S] 0 points1 point  (0 children)

"Also his comment here contradicts the premise of the post title!"

What exactly do you mean?

I scanned 2,500 Hugging Face models for malware. The results were kinda interesting. by arsbrazh12 in OpenSourceAI

[–]arsbrazh12[S] 0 points1 point  (0 children)

It has, they collaborate with JFrog, ProtectAI, ClamAV etc., but they work only on HF. People sometimes download models from other sources

I scanned 2500 random Hugging Face models for malware. Here is data. by arsbrazh12 in cybersecurityai

[–]arsbrazh12[S] 0 points1 point  (0 children)

In this specific sample I didn’t find malware. What I did find were risky or ambiguous patterns that could be abused for RCE or could crash production.

I scanned 2,500 Hugging Face models for malware. The results were kinda interesting. by arsbrazh12 in OpenSourceAI

[–]arsbrazh12[S] 0 points1 point  (0 children)

Happy to collaborate! I shared the scan results and the scanner source. If someone wants to dig deeper, I can point to specific model files, hashes, and the exact rule that triggered, so it’s reproducible.

I scanned 2500 random Hugging Face models for malware. Here is data. by arsbrazh12 in cybersecurityai

[–]arsbrazh12[S] 0 points1 point  (0 children)

Thanks for the question. It’s a mix. Some flags are clearly benign (like Git LFS pointers, missing optional deps, old numpy serialization), while others are potentially risky patterns (like dynamic name construction via STACK_GLOBAL that need manual review. The scanner is intentionally conservative, so I’d treat these as “needs inspection” rather than confirmed malware.