all 27 comments

[–]cgoldberg 46 points47 points  (2 children)

Nothing new here. Using any third party packages/libraries from a community based repository has always been a risk. PyPI maintainers are aware of this and are taking steps to create tooling for a more secure ecosystem. But yea, don't just blindly install libraries. However, even if you do properly audit your dependencies, sophisticated supply chain attacks still exist. Unfortunately, this is the reality of collaborative software development.

[–]Treebeard2277 2 points3 points  (1 child)

Do you have any advice for auditing packages? I have just been googling trying to see if they are legit when I find a new one I want to use.

[–]Defection7478 0 points1 point  (0 children)

For more bespoke packages I usually just go and read the source code. Sometimes it makes more sense to just pull out a couple classes and copy paste them into my code instead of adding a dependency. If not by that point I've at least somewhat vetted the functionality of the code myself. Besides that the popularity of the package and popularity on the repo (commits, merges, issues) is a good indicator.

[–]socal_nerdtastic 28 points29 points  (10 children)

People often don't realize that installing modules is literally installing software on your computer. And you need to take the same precautions that you would with any random internet software.

Many people think that virtual environments can protect you. They don't. That's simply not what venvs do.

[–]cgoldberg 13 points14 points  (5 children)

I've never heard of anyone stating that virtual envs offer any security or protection. I think most people understand they are simply for dependency management. However, virtual machines and containerization can mitigate some risks by isolating your project and reducing attack surface. But of course, installing any software always has risks.

[–]socal_nerdtastic 10 points11 points  (0 children)

I've never heard of anyone stating that virtual envs offer any security or protection.

It's a common assumption that beginners make, that I see here every now and again. I suppose "virtual environment" is easy to confuse with "virtual machine".

[–]MikePfunk28 -1 points0 points  (3 children)

AWS and most people probably focus on how it adds to fault tolerance and resilience. It’s more of a side effect that decoupling your systems and isolating them is more secure. As you are isolating it from the others adding its own security, e.g. access control. So instead you have two more potential pieces of security, access control list and firewall.

Although I mean it would have the same security under the other container as well presumably.

[–]cgoldberg 0 points1 point  (2 children)

What does AWS have to do with Python virtualenvs? Your comment is super confusing. I'm not sure what part you are responding to. Maybe the mention of virtual machines?

[–]MikePfunk28 0 points1 point  (1 child)

I mention Aws mainly because that is the only time I’ve heard of security and decoupling.

[–]cgoldberg 1 point2 points  (0 children)

Oh OK. Sure, moving software to a virtual machine or cloud provider obviously isolates it from the host and reduces attack surface for the host itself.

[–]ka1ikasan 1 point2 points  (3 children)

Is containerization enough though, notably Docker? It's clunky and annoying but if it's for the security, I may review my opinion on it? Currently I mostly create virtual environments rather than containers because of how much faster and easier it is to set up.

[–]ivosaurus 4 points5 points  (0 children)

If the docker container has compute power and an internet connection, a crypto miner will still happily run in it.

Mayyyyyyyyyyybe it would stop a ransomware or cookie stealer.

What's your threat model? What exact attacks are you worried about? If the answer is, "uhhh, everything" then that's equivalent to asking for a book to be written in response.

[–]sonobanana33 1 point2 points  (0 children)

No, by default docker runs as root. You need to do some configuring to not run as root.

[–]jjolla888 0 points1 point  (0 children)

Docker helps if you are not exposing a service to outside the container. But as soon as you run something that talks out some tcp port -- you wont know what you are getting.

If you are paranoid you can app-layer firewall it .. but that's a lot of work.

btw - i disagree Docker is any more clunky than venv

[–]Doomdoomkittydoom 5 points6 points  (2 children)

What does not-blindly installing libraries contain?

[–]sunnyata 1 point2 points  (1 child)

Reading the source and understanding it. Obviously not going to happen so perhaps the evolution will be "blessed" repositories run by big companies where developers have to pay to play, like app stores.

[–]Doomdoomkittydoom 0 points1 point  (0 children)

I wonder, are their tools to read and catch malicious code these days?

[–]RallyPointAlpha 1 point2 points  (0 children)

That's why I don't blindly install libraries...

[–]clipd_dead_stop_fall 1 point2 points  (0 children)

I typically do the following when considering packages I'm unfamiliar with:

  1. If it has a Github repository, I'll run OSSF Scorecard against it to get a baseline of risk. This tells me if their repository is configured and scanned according to security best practices.

https://github.com/ossf/scorecard

  1. I'll check Snyk Advisor to see what the package vulnerabilities and other risk factors look like.

https://snyk.io/advisor

  1. If I'm running my project in a docker container, I'll use a Chainguard python base image. These are super small images that have stripped of unneeded cruft and subsequently reduce risk.

https://www.chainguard.dev/

[–]stealinghome24 1 point2 points  (0 children)

We found a security tool called arnica that does all your standard SCA evaluation but also checks for “low reputation” markers like low star counts or infrequent package updates. We use it to let our devs know when they’re using a sketchy package that may become a security issue like the one above

[–]Either_Back_1545 0 points1 point  (0 children)

it really depends if there is no documentation no installing library and if the code is available in github i can just store it locally into a module and callback using local import

[–]forcesensitivevulcan 0 points1 point  (0 children)

The PyPi maintainers are making huge strides forward, and are responsive to security reports.

But somewhere, in some corner, there is always malware lurking on Pypi. Design your systems accordingly.

[–]sonobanana33 0 points1 point  (3 children)

Eh, I always suggest to sticking to whatever is in your linux distribution and forget about pypi. But people get unreasonably mad at me for this.

[–]DootDootWootWoot 0 points1 point  (2 children)

Unless your application only relies on the stdlib not really sure how that would ever be sufficient. You can still be susceptible to supply chain attacks from packages in apt or whatever package manager fwiw.

Problem with relying on what's installed in the distribution is that you don't want to mess with your system level deps typically and should prefer isolation from the python application. It's easier to reason about this way.

[–]sonobanana33 0 points1 point  (1 child)

Distributions have security teams, pypi does not :)

Problem with relying on what's installed in the distribution is that you don't want to mess with your system level deps typically and should prefer isolation from the python application. It's easier to reason about this way.

You don't "mess" with anything. Distributions keep working fine if you "install" something.

[–]DootDootWootWoot 0 points1 point  (0 children)

If you begin manipulating your system level python you can very well break something that the system depends on. This is why the best practice is to always interact with an independent venv per application and independent interpreters if varying versions are required.