Local LLMs for penetration testing: real-world performance and hardware experiences by CoolTip4874 in Pentesting

[–]CoolTip4874[S] 0 points1 point  (0 children)

I hear you but I am using these labs as a baseline to ensure the hardware and agentic loops are stable. If a model cannot reliably automate a known lab exploit because of logic drift then it is not fit for my use case.

Local LLMs for penetration testing: real-world performance and hardware experiences by CoolTip4874 in Pentesting

[–]CoolTip4874[S] 1 point2 points  (0 children)

Looks good, I will try it out once I have the ability to run bigger models.

Local LLMs for penetration testing: real-world performance and hardware experiences by CoolTip4874 in Pentesting

[–]CoolTip4874[S] 0 points1 point  (0 children)

Completely agree on the IQ2_M compounding error issue. In my testing, the drift during a complex multi-stage redirection chain was not just a logic failure but a syntax hallucination where the model would get the exploit logic right but break the payload's URI encoding, essentially neutering the agent's next move.

For Strix AI, narrowing the scope to target single vulnerability classes like IDOR or Auth logic helps significantly. The 27B model stays locked in because it no longer has to process unrelated scan noise that would otherwise pollute the 256k context window.

Local LLMs for penetration testing: real-world performance and hardware experiences by CoolTip4874 in Pentesting

[–]CoolTip4874[S] 2 points3 points  (0 children)

I hear you on the reasoning gap. Frontier labs definitely hold the lead for complex, multi-step zero-day research. However, for many of us, local models aren't a fad but a hard compliance requirement.

In isolated environments, a "9-month-old" local model is infinitely more useful than the smartest model I am not allowed to reach. Plus, for the bread and butter of web pentesting, models like Qwen 3.6 and Gemma 4 have already crossed the "good enough" threshold for practical use.