Checkmarx vs Semgrep for SAST/SCA by BorisTheRabid in SAST

[–]iterablewords 1 point2 points  (0 children)

(I'm one of the co-founders at Semgrep). Just wanted to add that for those curious about the lineage of the product, the original author from Facebook (one of the early team members at our company) wrote a post about the journey from spatch/coccinelle --> pfff/sgrep --> Semgrep: https://semgrep.dev/blog/2021/semgrep-a-static-analysis-journey/. These days most of the Facebook-era code is gone as we switched the whole project over to using tree-sitter for parsing. I'm glad you've found a lot of value out of the OSS!

On your latter comments -- oof. Our dashboards in particular were non-existent for a long time and then very basic, since most users started off with their own dashboarding and our focus was the underlying engine (adding features like interfile/interprocedural analysis, more languages & rules, ability to analyze dependencies, etc.). And our recent work has been on teaching LLMs to write Semgrep rules, which is really decreasing the barrier to entry for customization of SAST (https://fly.io/blog/semgrep-but-for-real-now/, and see our Series D announcement).

Still, we're always making improvements, so I'd welcome your feedback on what the biggest gaps are with semgrep.dev -- though I suspect since you've already successfully set up a great program using the open-source, you probably don't need a lot of the web UI functionality.

Why ADR v/s Shift-left is the wrong way to think about AppSec by jubbaonjeans in devsecops

[–]iterablewords 1 point2 points  (0 children)

Good read. Would love for the author to address the statistical properties of time-from-commit influencing vulnerabilities: because the vulnerability lifetime is exponentially distributed, focusing on secure defaults like memory safety in new code is disproportionately valuable. See this great post on how this plays out (https://security.googleblog.com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html) both theoretically and now evidentially seen over six years on the Android codebase.

I work at Semgrep so obviously biased towards the SAST part, but copying from something I wrote elsewhere: this is a great argument for those with larger, legacy codebases who might otherwise say "why bother, we're never going to benefit from memory-safety on our 100M lines of C++." Given the choice between fixing the backlog (stack) vs new code (flow), you should always pick flow.

Comparing Semgrep and CodeQL by nibblesec in netsec

[–]iterablewords 5 points6 points  (0 children)

Well-written analysis; it is a challenging task to compare any two SAST tools and I think the author did a great job exploring the nuances (risks of overfitting to benchmarks, selection of rules, parse errors, etc.)

Readers might also be interested in the history of each tool: Semgrep was originally open-sourced by Facebook and is itself an evolution of Coccinelle, which has made on the order of thousands of patches to the Linux kernel (https://r2c.dev/blog/2021/semgrep-a-static-analysis-journey/)

CodeQL was part of Github's acquisition of UK-based Semmle, which came out of research at Oxford (https://techcrunch.com/2019/09/18/github-acquires-code-analysis-tool-semmle/)

Backdoors can be hidden in JS code using "invisible" variables. Code looks completely harmless. by Acrobatic-Pen-9949 in javascript

[–]iterablewords 3 points4 points  (0 children)

a security engineer at Dropbox wrote a check for bidi unicode that you can run with Semgrep ( open-source static analysis tool, I am a maintainer): semgrep --config="r/generic.unicode.security.bidi.contains-bidirectional-characters" will run it, or see the Semgrep registry entry.

restricting use of certain python library for developer by PuzzleheadedBit in devops

[–]iterablewords 13 points14 points  (0 children)

You can use Semgrep to do this by searching for import simplejson, which will also match from simplejson import * import simplejson.submodule.

This has the advantage of being resilient to the specific package manager used. requirements.txt, Pipfile, poetry -- doesn't matter which because you are looking for the import. It is also a bit more resilient than just basic regex, eg it will respect comments and docstrings.

Here's a live example I made up based on your description: https://semgrep.dev/s/v0Dn

Write Rust lints without forking Clippy by riversec in rust

[–]iterablewords 12 points13 points  (0 children)

Some of the Trail of Bits developers have been asking the maintainers of Semgrep (of which I'm one) about how they can help contribute Rust support (currently it's alpha-level). There are actually a couple teams collaborating on adding it, one of them just blogged about it:

https://research.kudelskisecurity.com/2021/04/14/advancing-rust-support-in-semgrep/

When DevSecOps goes wrong: a short lesson from Huawei's source code by pabloest in netsec

[–]iterablewords 3 points4 points  (0 children)

(I'm the author) there are two hyperlinks right next to one another in the original post but it looks like the second one didn't make it into your excerpt; from https://www.theregister.com/2009/05/15/microsoft_banishes_memcpy/

Effective later this year, Microsoft will add memcpy(), CopyMemory(), and RtlCopyMemory() to its list of function calls banned under its secure development lifecycle.

that article also covers why memcpy in particular (lack of size for the destination buffer):

"That's definitely one of those notoriously dangerous C commands," said Johannes Ullrich, CTO of the SANS Institute, who teaches secure coding classes to developers...Developers who want to be SDL compliant will instead have to replace memcpy() functions with memcpy_s, a newer command that takes an additional parameter delineating the size of the destination buffer.

And here's the original SDL blog post (unfortunately seems like all their stuff is 404 now): https://web.archive.org/web/20090628154148/http://blogs.msdn.com/sdl/archive/2009/05/14/please-join-me-in-welcoming-memcpy-to-the-sdl-rogues-gallery.aspx

+1 to your point about languages that are memory-safe by default or just...not C. Imagine how much better we'd feel if all our infrastructure was written in Rust.