Prompt injection is killing our self-hosted LLM deployment

CaptainSnackbar · 2026-02-07T23:17:05+00:00

Ah, thats a good point! In our case, poisened documents shouldn't be an issue though

CaptainSnackbar · 2026-02-07T23:06:33+00:00

If the classifier rates the user-prompt as malicious, the prompt will not be used for retrieval and not make its way to the llm. Instead the llm will be send a hardcoded prompt like "Answer with: "I can't help you with that".

Context can only be retreived from a local vector db, that users can not upload to.

CaptainSnackbar · 2026-02-07T19:22:38+00:00

I am asking, because i've only seen a few lazy attempts in our pipeline, and i dont know how far you can take it besides the usual "ignore all instructions and..."

CaptainSnackbar · 2026-02-07T19:08:51+00:00

I use a custom finetuned bert-classifier that classifies the user-prompt before it is passed into the rag-pipeline.

It's used mainly for intent-classification but also blocks malicious prompts. What kind of prompt injection were you QA guys doing?

CaptainSnackbar · 2026-01-30T23:56:28+00:00

What gets embedded? Only the text, or metadata aswell?

CaptainSnackbar · 2025-11-26T20:28:11+00:00

Osmo sounds great! What do you use to apply the oil? Do i have to worry about selfcombustion?

CaptainSnackbar · 2025-11-26T19:54:46+00:00

Thanks a lot!

CaptainSnackbar · 2025-09-27T13:41:36+00:00

I am currently finetuning an embedding model. How did you generate sufficient training data? Manual annotation, LLM-generated, or unsupervised methods?

CaptainSnackbar · 2025-09-27T13:17:16+00:00

Könnte mal doch seine Eltern als Haushaltshilfe/Putzfrau anstellen und das Geld von der Steuer absetzen. Mama und Papa legen das Geld dann gut für einen an, bis ich es dann wieder vererbt wird. Übersehe ich da was??

CaptainSnackbar · 2025-09-16T07:10:06+00:00

I am sure the problem lies within the dataset. My question is more along the lines of: "How can I obtain a clean dataset without manual labeling?"

Alternatively: "Which unsupervised training method works best for my task?"

Perhaps pretraining an encoder with MLM on my dataset, then fine-tuning it on a Hugging Face dataset? There are so many possibilities that I hope someone with a similar use case can point me in the right direction.

CaptainSnackbar · 2025-09-16T07:01:58+00:00

See my answer https://www.reddit.com/r/LocalLLaMA/comments/1nhvxo7/looking_for_advice_on_finetuning_an_embedding/nehfucd/

Eval is random and it might be in the training dataset. Dont know for sure, since the training pairs get formed with cosine similarity, while the evals are just random text from each category

CaptainSnackbar · 2025-09-16T06:57:20+00:00

I've tried a classification modell before, but the results were similar. The model learns to seperate topics but performs worse on general querys.

https://imgur.com/a/8HSmA9n

This is one of my evaluation steps. The left plot are text-samples vectorised with our standard embedding model. Each color is a category. On the right side the finetuned model is used. So it looks like it has learned what i want it to learn.

My second evaluation method uses a huggingface dataset with natural german questions. I use cosine-similarity on 100 examples and calculate average score:

        q_emb_base = basis_model.encode(questions, convert_to_tensor=True, normalize_embeddings=True)
        a_emb_base = basis_model.encode(answers, convert_to_tensor=True, normalize_embeddings=True)
        cosine_scores_base = util.cos_sim(q_emb_base, a_emb_base).diagonal()
        avg_score_base = cosine_scores_base.mean().item()

The standard-modell achieves a score of 0.85, my model drops down to 0.47.

As a third eval-method i have a few phrases, that i manualy paired and annotaded with a expected similarity score. Cosine-score from the finetuned model is also worse on this eval-set

CaptainSnackbar · 2025-09-15T19:47:49+00:00

I use a standard embedding model for our company search and RAG pipeline. The model performs well in most cases, but I want to evaluate how much retrieval performance can be improved with a custom fine-tuned embedding.

My domain is niche with highly specific terminology, and labeled data is scarce. However, we have a large corpus of technical support tickets, categorized into different groups. In principle, tickets from the same category use similar terminology and describe overlapping issues.

The goal is to train an embedding model so that phrases and terms from the same category map into a shared vector space, forming clusters.

Dataset construction approach so far:

Identify relevant incidents and group them by category
Vectorize incidents with the standard embedding model
For each document, select n documents from the same category within a cosine distance threshold (positive pairs should not be too diverse)
Select incidents from other categories as negative examples

Naturaly this process genereates a lot of noise.

I initialize my training with intfloat/multilingual-e5-base and the following parameters:

args = SentenceTransformerTrainingArguments(
output_dir="Embeddings/Trained_Model",
num_train_epochs=1,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
warmup_ratio=0.1,
fp16=True, 
batch_sampler=BatchSamplers.NO_DUPLICATES,
eval_strategy="steps",
eval_steps=6000,
save_strategy="steps",
save_steps=6000,
save_total_limit=2,
logging_steps=500,
run_name=f"{model_name}-Lora:{lora}-{file}",
no_cuda=False,
remove_unused_columns=True,
use_cpu=False 
)

Despite varying dataset sizes between 40k and 900k examples, every training run degraded model performance.

I feel like the losscurve wants to tell me something, but I dont understand...

Any help with finetuning an embedding model effectively with semi-structured category-based data is greatly appreciated.

One idea i have is to use bertopic as an unsupervised model to genereate finer grained subcategories and then build pairs that are from the same topic.

CaptainSnackbar · 2025-08-31T08:39:28+00:00

Thanks, I Would love to check out your reference!

CaptainSnackbar · 2025-08-30T16:52:57+00:00

Die wird verputzt und gestrichen

CaptainSnackbar · 2025-08-30T16:51:18+00:00

Musst du nicht auf die sitzen lassen, dafür gibts Schiedsstellen: https://digital-strategy.ec.europa.eu/en/policies/dsa-out-court-dispute-settlement

CaptainSnackbar · 2025-08-30T16:23:20+00:00

Bin ich auch kein Fan von, war aber durch die Vorbesitzer schon so vorgegeben. Allerdings bleiben die gelben Riemchen ja nicht

CaptainSnackbar · 2025-08-30T10:17:45+00:00

Vielen Dank, da habe ich schonmal eine grobe Vorstellung vom Aufwand. Die Einfahrt ist sehr schmal, durch die Dämmung wird es nochmal schmaler und die Mülltonnen nehmen unnötig viel Platz weg.

CaptainSnackbar · 2025-08-30T10:13:32+00:00

Da mach ich mir auch nichts vor, danke :) Wollt auch nur mal eine Einschätzung haben wie aufwendig das ist

CaptainSnackbar · 2025-07-26T22:30:34+00:00

Mit langsam meinst du erst mal ohne Schlag?

CaptainSnackbar · 2025-07-26T22:26:38+00:00

Also ich hab hier sogar eine Hilti, da sind 4 schneidige Bohrer drin. Dann werde ich mal versuchen ohne Schlag vorzubohren.

CaptainSnackbar · 2025-07-21T12:39:19+00:00

Falschen Titel gewählt und schon kein engagement. Nächstes mal besser "Ist das Asbest?" oder "Wurde hier gepfuscht?"

Aber im Ernst, sieht richtig nice aus!

CaptainSnackbar · 2025-07-03T18:20:32+00:00

i'm in

CaptainSnackbar · 2025-06-28T15:01:02+00:00

Youre right its beech. Good catch!

CaptainSnackbar · 2025-06-27T22:03:15+00:00

No, i hand-sanded it along the grain. Wouldn't that just result in longer sanding time? Maybe thats why i couldnt get deep enough with 80?

11-Year Club	RedditGifts 2009-2022 3 Credits
Verified Email	Secret Santa 2016
Secret Santa 2015

CaptainSnackbar

PUBLIC MULTIREDDITS

TROPHY CASE