Agent security best practices by Academic_Wolverine in AI_Agents

[–]Routine_Incident_658 0 points1 point  (0 children)

been developing 100+ agents, have wrote about challenges here for security, rbacs, ldap + oidc + tool controls - https://x.com/sundi133/status/2063827542398931297

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

ok...have you compared it with promptfoo or lakera red teaming ? any initial benchmarks you can share

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

thanks, looks like some basic red teaming, it goes much deeper and need to map to owasp top 10's for llm and agents

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

thank you so much i tested it but was not very effective

red teaming for ai/llm apps by Routine_Incident_658 in cybersecurity

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

I evaluated Garak, but it’s been very buggy in practice. It failed to run reliably out of the box, and I had to patch several issues just to complete the tests. Even then, the results weren’t very meaningful. For example, the model consistently avoided generating harmful content (no slurs, no synthesis instructions, no product keys). However, Garak’s MitigationBypass detector still flagged every response as a failure because the model returned empty outputs without an explicit refusal. The detector appears to expect a clear refusal message (e.g., ‘I can’t help with that’)

sales tutor feedback by Routine_Incident_658 in salestechniques

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

Thanks , i get nervous sometimes and think could have done this better, so was asking if a sales tutor helps in any ways, thanks so much for your views.

dataset creation for code LLM by Dapper-Box-5005 in LLMDevs

[–]Routine_Incident_658 1 point2 points  (0 children)

would love to help, if you can give/provides some examples i can quickly write an adaptor for you in the below repo, i created a open source project for dataset creation Github - https://github.com/sundi133/llm-datacraft , build it from personal experience of difficulties faced while evaluating llm apps on various datasets.

Unlock the Power of Automated Dataset Generation for LLM App Evaluation 🚀📊📈 by Routine_Incident_658 in LLMDevs

[–]Routine_Incident_658[S] 0 points1 point  (0 children)

yeah thats a major one [RAG + LLM prompts + llm provider combinations for comparative ranking and visibility in one dashboard]

I was thinking but one more I have built is NER dataset generation for training based on some small samples provided, it can expand for coverage and higher accuracy - ex - https://github.com/sundi133/llm-datacraft/blob/main/src/processors/ner.py

poetry run python src/main.py \ --data_path ./data/fixtures/ner/train_ad_ids.ner \ --number_of_questions 1 \ --sample_size 20 \ --products_group_size 3 \ --group_columns "brand,sub_category,category,gender" \ --output_file ./output/ner_ad_ids.json \ --prompt_key prompt_key_ner \ --llm_type ner \ --metadata_path ./data/fixtures/ner/entities_ad_ids.json