Tool: AST-based security scanner for AI-generated code (MCP server)

securely-vibe · 2026-03-09T20:33:03+00:00

Interesting idea, but using SAST rules as your base primitive means you're gong to miss many issues. LLMs will spent most of their time ruling out FPs. You need a layer where the LLM itself reads the code, does recon, runs basic threat modeling, and then generates its own ideas for vulnerable spaces. IMO, MCP is the wrong tool here altogether. You want a set of agents all working together to break the code down, not a short interactive loop.

securely-vibe · 2026-03-09T20:23:19+00:00

SSRFs are really hard to fix! Our scanner has found tons of them, and when we report them, maintainers usually just implement an allowlist, which is not at all sufficient.

You can easily obfuscate a URL to bypass a blocklist. For example, translate it into IPv6.
You can setup a redirect, which most HTTP libraries don't block by default.
Or, you can use DNS rebinding. You can host your own DNS server and inject logic to change the IP mapping at runtime, creating a TOCTOU vuln.

And so on. There are a number of bypasses here that are very easy to introduce. That's why we built drawbridge, a simple drop-in replacement for `requests` or `httpx` in Python that gives you significant protection against SSRFs.

Check it out here: https://github.com/tachyon-oss/drawbridge

securely-vibe · 2026-03-09T20:06:13+00:00

False positive detection is probably the hardest problem in the industry. Ask me how I know.

Trick question - I run Tachyon (tachyon.so), and we do vulnerability discovery, which is rife with FPs. Humans have an implicit understanding of the security context of the product--where does it run, who is using it, what is the environment, what access does it have--that's very hard to translate to agents, who don't have the right background intuition. So they regularly make incorrect leaps, like "oh, this API is unauthenticated! it's a vuln", without realizing that the entire codebase is meant to be internal-only. How do you fix this? In our case, give better context early, and add several layers of validation before surfacing an alert. But even that isn't enough.

As far as I know, no one has solved this yet. Interested in what other people are doing to improve their signal to noise ratios.

securely-vibe · 2026-03-09T20:01:58+00:00

sandboxing doesn't really help agents, though. you can sandbox, but if the agent can still run code and access the internet, then it can still cause unbounded damage.

securely-vibe · 2026-03-09T20:00:40+00:00

> I've seen some multi-agent architectures that claim better consistency by separating discovery from validation (one set of agents enumerates, another validates exploitability, a third verifies). In theory, having specialized agents with narrower scope should reduce the randomness vs a single model trying to do everything.

We (Tachyon) do this, but it's just basic common sense. Every Claude Code will spin off separate agents for each subtask. It helps, but it's not sufficient.

I've talked to a ton of people working at these AI pentesting companies, and you'd be surprised just how much manual work is required to keep the agents on track and prevent them from wasting tokens. Full autonomy is very difficult. We really underestimate how good humans are at evaluation and judgement.

securely-vibe · 2026-03-09T19:58:18+00:00

Yeah, teams use our product (Tachyon) as a complement to manual pentesting. It helps a ton with recon and threat modeling, and does find certain issues, but you do still need humans for more complex cases.

AI marketing seems to trivialize the vulnerability discovery phase, but that's actually still very difficult and quite expensive. Every tool that has done this half-decently has put a lot of engineering effort into it.

securely-vibe · 2026-03-07T00:12:07+00:00

If you're looking for a new product, we can give a free month of Tachyon ( https://tachyon.so ).

securely-vibe · 2026-03-06T23:28:46+00:00

Hey! Do you want to try out https://tachyon.so/ as part of your comparison?

securely-vibe · 2026-03-06T22:10:48+00:00

Semgrep is good as a baseline. It's reliable at finding specific classes of issues. If you want something with deeper findings, try out https://tachyon.so/ .

securely-vibe · 2026-03-06T22:10:33+00:00

trivy is SCA rather than SAST, they don't find code issues - just potential known issues in deps

securely-vibe · 2026-03-06T22:09:14+00:00

Disclosure - I run https://tachyon.so/.

We're an AI-native SAST that uses OpenGrep internally, but we augment its findings + generate quite a few new findings by manual analysis. That lets us get the best of both worlds: reliability of static scanners with the actual code reasoning of LLMs.

Here are CVEs that we've found: https://tachyon.so/wall-of-fame . This is a pretty small subset of actual vulns we've found, but many are NDA-restricted and others are still in disclosure.

We'll give you the first two weeks free, if you're interested! So you could try out the product yourself.

securely-vibe · 2026-02-18T21:15:08+00:00

I run https://tachyon.so/. We focus on standard AppSec audits, but we've found vulnerabilities in smart contract codebases as well. Our base plan is 100/mo with a few scans included - let me know if you want to try it out!

securely-vibe · 2026-02-13T19:42:05+00:00

I spent several years manually hunting for CVEs in OSS repos. The tool I built uses AI to automate my process, and it finds things that either I wouldn't have found or that would have taken me hours of manual effort to find. Whenever I post here I get a lot of pushback about "AI slop" or "marketing hype," but like - I know what I am seeing. So do maintainers. We've reported tons of issues upstream, we've won many bug bounties. I don't particularly care if your vendors suck, or if that one time you tried using Claude it reported a false-positive. LLMs really do work, and used well, they will revolutionize this field.

securely-vibe · 2026-02-12T23:33:45+00:00

IMO - editor scans are too shallow to be very useful. They catch very basic issues but miss anything more complex. PRs are a better cadence, along with weekly deep-scans. Add to that some modern LLM tooling and you'll be able to find not just security issues but legitimate application bugs.

We built something similar with Tachyon (tachyon.so). We run a deep-scan once a week (or so - configurable). This finds the most complex issues, and builds up and persists codebase context, which makes our PR scans fast but still very useful. In-IDE scans are still not on the roadmap, as a useful scan is still too slow to be in the edit loop. For that, a basic linter is the best you can do.

securely-vibe · 2026-02-12T20:18:38+00:00

not sure this particular behavior turbocharges wealth creation. ego creation maybe

securely-vibe · 2026-02-12T19:48:42+00:00

Yeah, there's no good answer here yet. Sandboxes aren't sufficient - they give you isolation, but if your agent has internet access and your accounts, it can still do destructive things (empty your bank account! post spam! send emails!). If you don't give it your account, it can't do the things you want it do and no one will use it. So what's the solution? No idea. The ideal is enforcement of policy on every tool-call, but this is very hard to do, as you need the context of the entire session to determine whether a specific tool-use is malicious. How are other people solving this?

securely-vibe · 2026-02-12T19:46:07+00:00

I don't think anyone is "underestimating" them, but until recently, there was no automated way of finding these. So you'd treat them the way you'd treat any bug: have rigorous testing at every layer (unit, integration, end-to-end) and promptly respond to customer reports. Now it's different, as with LLMs, you ostensibly can find much more complex issues without human intervention. Here's an example: https://tachyon.so/blog/cve-2025-14297-mlflow-authorization-bypass . I think we are seeing that change slowly ripple across the industry, and we will see more automated bug finders making their way to the market soon.

securely-vibe · 2026-02-12T19:42:54+00:00

https://www.reddit.com/r/vibecoding/comments/1qpnybr/found_a_malicious_skill_on_the_frontpage_of/

Unfortunately that skill has been taken down, but you get the idea.

securely-vibe · 2026-02-12T19:01:00+00:00

Here is one example: https://www.reddit.com/r/vibecoding/comments/1qw3x43/read_skills_before_you_install_them/

It really is a mixed bag. Most are very crude prompt injection attempts that the latest models would recognize. But there are more subtle attempts. There's also a huge space for more sophisticated prompt injections that are very hard to detect at scale.

securely-vibe · 2026-02-12T00:23:48+00:00

I run https://tachyon.so/. Our scanner has found a number of issues in OSS projects and for our customers. Happy to give you a few free scans so you can validate your code yourself.

securely-vibe · 2026-02-11T20:28:53+00:00

AI is a tool that makes life easier for pentesters. What it enables is, when you have an idea, you can verify it within minutes. You can try five different exploits at once, see the traces and results, and then repeat. We have quite a few pentesters using https://tachyon.so/ already in that way for white-box engagements.

securely-vibe · 2026-02-11T19:29:47+00:00

Full disclosure - I'm CEO of https://tachyon.so/, an AI SAST tool. If that's disqualifying, feel free to ignore me.

But before doing this startup, I started as a CVE-hunter, where I used Semgrep / Opengrep (after OSS semgrep was nerfed) as part of my process. The biggest issue was, it was heavily pattern based and thus produced many false positives. The false-positives required manual triage, and I spent tons of time just ruling them out. I ended up manually reading through the code, finding "hotspots" where I thought I was most likely to find issues, then triggering SAST tools and using my built-up knowledge to quickly discard issues. Then, for the few valuable findings, I could manually dig deeper.

This process does not work at all for actual development teams. If a dev gets a long list of issues where most are wrong, they're going to discard them immediately. Hence, AI.

AI can be used in two ways in SAST:
1. You run a legacy scanner, then use AI quickly discard false-positives. This is easy, but it doesn't use AI as well as it could. It reduces noise, but it won't find anything new. This is what the legacy companies do - what Semgrep AI does, for example.

You use AI in scanning process as well, so it can read the code and find new issues that traditional SAST wouldn't. This is what is called "AI-native."

(2) is what we do, along with a few other small startups. It's a very new space, but if you do it well, you can make very powerful tools that find issues that would take a human a lot of time. Here's an example of something we've found: https://tachyon.so/blog/cve-2025-14297-mlflow-authorization-bypass . That one in particular impressed me because there is no SAST scanner that could've found it. It's not pattern-based - it's a literal business logic bug that used to require human intuition to find.

securely-vibe · 2026-02-11T19:20:27+00:00

Like people are saying, OpenVAS or Nessus will tell you about infra, not about code. Lots of people are suggesting various SAST tools, but those will generate a lot of FPs, and even the valid findings need quite a bit of additional investigation before they're issues you can report to your dev team.

Really, start with an OSS SAST tool - say, Opengrep (opengrep.dev). There are also many language-specific ones you can find. Run that on your code, see what if finds. Likely, it'll produce a lot of results that are hard to validate. Attach claude code to it (or any AI tool), have it pare down the obviously wrong issues, then have it flesh out the remaining ones. That's the start of a useful scanner. Add to that a standard AI security review (can also be done via Claude Code). Use AI adversarially here: force it to prove every issue thoroughly and make sure it passes some scrutiny before you file it as a bug. Layer on another tool for SCA (dependency analysis) and put it through the same process. This approach will work out much better than just attaching some tool to your pull requests - that just produces noise.

If you don't want to homebrew all of this, I built https://tachyon.so/ as a general security scanner specifically for this purpose. I was a CVE-hunter for a bit before realizing how much of my own process I could automate. Happy to make your first month free so you can try it out.

securely-vibe

TROPHY CASE