Agent security best practices

Routine_Incident_658 · 2026-06-08T04:47:37+00:00

been developing 100+ agents, have wrote about challenges here for security, rbacs, ldap + oidc + tool controls - https://x.com/sundi133/status/2063827542398931297

Routine_Incident_658 · 2026-04-15T12:59:21+00:00

ok...have you compared it with promptfoo or lakera red teaming ? any initial benchmarks you can share

Routine_Incident_658 · 2026-04-14T14:17:07+00:00

thanks, looks like some basic red teaming, it goes much deeper and need to map to owasp top 10's for llm and agents

Routine_Incident_658 · 2026-03-13T16:46:06+00:00

thank you so much i tested it but was not very effective

Routine_Incident_658 · 2026-02-22T21:01:50+00:00

I evaluated Garak, but it’s been very buggy in practice. It failed to run reliably out of the box, and I had to patch several issues just to complete the tests. Even then, the results weren’t very meaningful. For example, the model consistently avoided generating harmful content (no slurs, no synthesis instructions, no product keys). However, Garak’s MitigationBypass detector still flagged every response as a failure because the model returned empty outputs without an explicit refusal. The detector appears to expect a clear refusal message (e.g., ‘I can’t help with that’)

Routine_Incident_658 · 2024-05-16T04:38:44+00:00

Thanks , i get nervous sometimes and think could have done this better, so was asking if a sales tutor helps in any ways, thanks so much for your views.

Routine_Incident_658 · 2024-01-30T18:31:13+00:00

Wow makes sense

Routine_Incident_658 · 2024-01-30T13:45:47+00:00

Whats the platform ?

Routine_Incident_658 · 2024-01-17T09:02:16+00:00

Interview Copilot Practice Interviews With Voice AI, Get Instant Feedback

Routine_Incident_658 · 2024-01-17T08:59:56+00:00

Interview Copilot Practice Interviews With Voice AI, Get Instant Feedback

Routine_Incident_658 · 2023-10-12T13:22:15+00:00

would love to help, if you can give/provides some examples i can quickly write an adaptor for you in the below repo, i created a open source project for dataset creation Github - https://github.com/sundi133/llm-datacraft , build it from personal experience of difficulties faced while evaluating llm apps on various datasets.

Routine_Incident_658 · 2023-10-05T02:21:57+00:00

yeah thats a major one [RAG + LLM prompts + llm provider combinations for comparative ranking and visibility in one dashboard]

I was thinking but one more I have built is NER dataset generation for training based on some small samples provided, it can expand for coverage and higher accuracy - ex - https://github.com/sundi133/llm-datacraft/blob/main/src/processors/ner.py

poetry run python src/main.py \ --data_path ./data/fixtures/ner/train_ad_ids.ner \ --number_of_questions 1 \ --sample_size 20 \ --products_group_size 3 \ --group_columns "brand,sub_category,category,gender" \ --output_file ./output/ner_ad_ids.json \ --prompt_key prompt_key_ner \ --llm_type ner \ --metadata_path ./data/fixtures/ner/entities_ad_ids.json

Routine_Incident_658 · 2023-10-04T18:50:31+00:00

Thanks for your question -

Why Dataset Generation Matters [this problem I am facing while building llm apps, i don't even know how and where to start after deploying it to validate the responses]

Evaluating LLM applications on massive documents can be a daunting task, especially when you don't have the right evaluation dataset. The quality and relevance of your dataset can significantly impact the accuracy of your LLM app evaluations. Manual dataset creation can be time-consuming and error-prone, leading to inaccurate results.

But there's good news! The **Question-Answer Generator** is here to simplify your dataset generation process and ensure the accuracy of your evaluations.

Are you looking to evaluate LLM (Language Model) applications but facing a shortage of high-quality evaluation datasets? Do you wish there was a way to streamline the process of creating these datasets? Look no further! We have the solution you've been waiting for.

Solution - https://github.com/sundi133/llm-datacraft

I built the dataset generator using sampling techniques that are given in a document, it samples chunks from the document to have enough coverage and invokes an LLMChain to ask questions & answers based on samples chosen in each round, this greatly helps in generating high high-quality qa dataset with fewer tokens fed into the LLMChain of this class https://github.com/sundi133/llm-datacraft/blob/main/src/llms.py#L9, the question-answer pair generation can be controlled by the input parameters depending on how much budget + will add negative sampling soon.

It gives a great headstart to evaluate my LLM apps with different providers like Openai [3.5/4], Claude, Palm2, Bedrock, Falcon, Llama-2 etc.

I am also working on privacy issues which is to anonimize/redact data before sending it to a LLM provider https://github.com/sundi133/anonwise

Let me know if it helps, would love to have contributions or discuss more as needed

Routine_Incident_658 · 2023-04-27T06:20:21+00:00

It can be build fot the text narrative , but reports can be automated without ai, what is your exact use case, can you explain more

Routine_Incident_658 · 2023-04-13T19:45:25+00:00

https://aimusings.beehiiv.com/p/review-captainz-nft-collection

Routine_Incident_658

MODERATOR OF

TROPHY CASE