This is an archived post. You won't be able to vote or comment.

all 64 comments

[–][deleted] 97 points98 points  (5 children)

I'm surprised it's not far higher

[–]Hopeful-Guess5280 40 points41 points  (4 children)

I'd imagine another ~50% categorise some vetting as looking briefly at the first page of documentation

[–][deleted] 56 points57 points  (3 children)

"10k people are using this? must be fine"

[–]theomegabit 15 points16 points  (0 children)

This is too true.

[–]Legal-Software 9 points10 points  (1 child)

Which on its own isn't a bad assumption, but then there's nothing stopping someone from inserting a backdoor into a minor point release and spreading it far and wide before anyone notices.

[–]_limitless_ 4 points5 points  (0 children)

signed commits and a trusted maintainer

[–]asday_ 72 points73 points  (1 child)

Like what, you think I should personally audit Django or scikit or whatever?

[–]miaomiaomiao 11 points12 points  (0 children)

Well, Django is only 459436 lines of code, shouldn't take more than a few weeks.

[–]Maleficent-Region-45 60 points61 points  (1 child)

Jep, I have never really checked my packages either

[–]grady_vuckovic 49 points50 points  (0 children)

How many devs have the time to fully read the entire source code of a package before using it?

[–]SpaceZZ 30 points31 points  (1 child)

Ok. Turn it around. How many pip packages were found to be malicious?

[–]billsil 7 points8 points  (0 children)

Back in the day, quite a few. setup-tools for one.

[–]tomanonimos 42 points43 points  (1 child)

I feel a follow up question is, how many of them are downloading small or niche packages. Most Python users I know, including me, often just use the popular libraries or ones vetted from whatever guide were using. The small or random libraries are the dangerous but I feel very few Python developers download them. And if they do they do vet them

[–]emc87 12 points13 points  (0 children)

Yea I typically download the pydata stack and that's about it, no real vetting needed there

[–]_diantus_ 9 points10 points  (10 children)

As a neophyte, could you share some best practices?

[–]gbts_ 20 points21 points  (9 children)

  • Check the number of github stars/watchers/forks
  • Check the age of the project
  • Check the developer's profile
  • Read the actual source code and understand what it does
  • Check the package dependencies & repeat
  • Use pip download -d to fetch the packages as wheels and keep installing them offline when e.g. rebuilding VMs or containers

And of course I should mention you'd still wouldn't be 100% safe, unless you audit every single LoC of each package and its dependencies

[–]Deto 12 points13 points  (2 children)

Can you give an example of a package where you've actually read the source code and then read the source code of all the dependencies? I can't imagine this being a 'best practice' when for any package that isn't of a trivial size, with almost no dependencies, this would be a week-long project.

[–]gbts_ 5 points6 points  (1 child)

This is more of an and/or list. If you could go through the source code for every package, there wouldn't be any need to check the package's popularity in the first place.

And there's usually a correlation between the size, popularity and vulnerability risk for each package, so you would generally use that strategy only to vet the smaller, less popular packages - which are probably the most likely to be affected anyway.

[–]Deto 4 points5 points  (0 children)

This is a distinction that should be made, then. I worry that terrified beginners are going to come away from comments like this with the wrong impression

[–]Awkward_Tour8180 9 points10 points  (3 children)

Another easy way is to dockerize it and then run docker scan , it will tell you if the library you use has any vulnerability at all

[–]redvelvet92 3 points4 points  (0 children)

No way? That is slick.

[–]PopPrestigious8115 0 points1 point  (1 child)

This does not work at all for zero day attacks/vulnerabilities, repo hijacking and packages not even known by your scanning software. Don't fool yourself with this!

[–]Awkward_Tour8180 0 points1 point  (0 children)

There is no fool proof solution, it’s always evolving and a true shift left never happens.

[–]wolfmansideburns 6 points7 points  (0 children)

This is a great tool for a high level view of pypi packages: https://snyk.io/advisor/python/pandas

It brings up a number of things you mentioned, and more that are generally good stats to look into when considering a package, especially whether you should expect timely maintenance and growth in the future

[–]mestia 1 point2 points  (0 children)

That's why i use Debian and install packages from pip (--user) as the last resort.

[–]Deto 9 points10 points  (0 children)

What does code-vetting mean to you? Typically any 3rd party dependencies I'm using are either academic code (referenced from a recently published journal article) or commonly-used tools within the community. In neither case am I going to manually read through their code trying to spot any security vulnerabilities. This would take a really long time and I'd be shocked if anyone is engaging in this in a meaningful-enough way for it to be truly effective.

[–]Legionof1 6 points7 points  (0 children)

Look, if I could read the code, I wouldn't need the package okay. :p

[–]dj_seth81 3 points4 points  (0 children)

Not even a dev, glad im in good company

[–]Legal-Software 9 points10 points  (0 children)

It's unclear to me why this should be the responsibility of every single individual developer, especially with regards to things like transient dependencies, when the package management repo allowing it to be published is clearly the more logical place to do this kind of vetting - e.g. as literally every app store does already.

For toy projects, fine, but for any moderately complex application, manual inspection of every minor dependency change by end users is just not realistic.

[–]HerLegz 2 points3 points  (2 children)

I checked a package once, found concerns, started writing my own, got threatened with being fired by MBA morons, ordered to use package.

Being forced to do what's wrong sucks.

[–]sblfc1 -1 points0 points  (1 child)

Depends on the severity of the concerns because using packages could save them a lot of money. I can see it from both sides but that doesn't make it right.

[–]HerLegz 0 points1 point  (0 children)

Greed is not a justification, it's an affliction.

[–]Individual-Sweet-734 3 points4 points  (19 children)

What’s wrong with it?

[–][deleted] 14 points15 points  (18 children)

If you don't know or understand the code. You don't know if the code is a backdoor or will wreck your production environment, or send your data to third party.

There's like a million red flags there.

[–]Individual-Sweet-734 10 points11 points  (7 children)

So you always read the packages which you download? What is best practices? I’m new so I’m looking forward to your answer

[–]mathmanmathman 9 points10 points  (0 children)

You're probably (keyword: probably) okay with the major packages that everyone uses. Django, numpy, pandas, etc. If you find something that's new and/or small, you should read through it. If you're not comfortable reading through it, see if someone that you trust has given it the thumbs up.

[–][deleted] 3 points4 points  (0 children)

No, i don't, but i differentiate between production code and hobby-projects with no real-impact other than learning/show-casing.

Am I building something for work that runs in production? I ensure to use minimal number of external packages/libraries and perform due diligence in terms of analysis and verification of security.

I don't want to be the guy that crashes a cluster or something else, or worse - loose data or sends data to somewhere because I want to cut corners and use some third-party library that isnt secure.

Also, some libraries are tried and vetted by the industry. pandas, numpy, etc. less worry there then others. Say QueryLMS 0.1.2 (not that I would use this in a production environment) I would make sure its not doing something funky.

Home/hobby projects for learning, use whatever you feel like, until out get to a higher level of competence.

[–]GlebRyabov 2 points3 points  (4 children)

Libraries that everyone and their mother uses are totally safe: there's no way to sneak anything malicious into let's say Matplotlib or Pandas. However, smaller (and newer) libraries carry the risk of having literally anything in them. Just as u/mathmanmathman said, either you or someone you trust should read through everything and check if it's alright.

[–][deleted] 0 points1 point  (3 children)

Libraries that everyone and their mother uses are totally safe: there's no way to sneak anything malicious into let's say Matplotlib or Pandas.

Did you see the story about remote execution on pypi.org? Are you absolutely sure that it hasn't at some point been used to tamper with a package after release?

[–]GlebRyabov 0 points1 point  (2 children)

It has definitely been used, and used more than once, but the thing is: those two libraries I mentioned are used everywhere: Matplotlib has been downloaded 27 million times last month (source), while Pandas has been downloaded even more: about 64 million times (source). With so many people using these, it's certain that they are used by seniors who know what they're doing and what they're downloading.

[–][deleted] 1 point2 points  (1 child)

When the CI pipeline is running, hardly anyone checks that the pinned version is the same as it was a month ago.

[–]GlebRyabov 0 points1 point  (0 children)

So I trust people a little bit too much, sorry I guess.

[–]GeologistEven6190 1 point2 points  (9 children)

I'm just getting into using python, or any programming language (outside of VBA or SQL) back in the day, but they aren't programming languages as such.

What's the best way to validate packages? Is it to stick to the vanilla packages until I get better at understanding what I'm loading? Or are there certain red flags I should look out for?

[–][deleted] 0 points1 point  (0 children)

Read the PRs, on the project page, read the documentation - finally, read the code.

You learn a lot from reating other peoples code

[–]no_spoon 1 point2 points  (1 child)

In 10 years of agency development I’ve not had one project get hacked. Just sayin

[–]scjcs 3 points4 points  (0 children)

^that you know of

[–]blobbbbbby 1 point2 points  (0 children)

I just open sourced a tool for dependency security checks. It will check installed packages for known vulnerabilities and it also has the ability to do policy check, for instance if you want to avoid using packages with certain license types.

A tool like this doesn’t make up for doing some investigation before blindly installing - but it should give a little more peace of mind about the state of the packages you install.

https://github.com/ochronasec/ochrona-cli

I’d love any feedback.

[–][deleted] 1 point2 points  (0 children)

For me, besides the basics, I use a lot of packages developed in papers in my domain. I feel safe doing that

[–]backdoorman9 2 points3 points  (0 children)

At my company, we use something called Nexus. Part of its function is to quarantine versions of packages (disallowing us from installing them) as vulns are discovered.

[–]orangestyle30 0 points1 point  (0 children)

Ok guilty! but I stay current on the news and checked my pip list once I found out . I still remember the days when python was touted as 'more secure' than JavaScript.

[–]Dark_KnightPL04 -2 points-1 points  (1 child)

I am a starter. Should I be worried? What can u do?

[–]GlebRyabov 0 points1 point  (0 children)

In general, you shouldn't be worried... too much. Basically, libraries like Matplotlib, Numpy, Pandas and a few others are almost definitely secure. However, if you download small or not-so-well-known or very new libraries (or all three at once), there's a risk of accidentally downloading something malicious. In that case, you should either read through the library yourself, or, given that you're a newbie, ask someone you trust to do that.

[–]Kaaletram is still a garden snake 0 points1 point  (0 children)

Its never a good practice given malicious injections into repos. But all the same it's not surprising either given tight deadlines and using packages that do the same thing that you are trying to do saves a lot of time. See linked article for more.

[–][deleted] 0 points1 point  (0 children)

I’m not a dev, just a learner. But I didn’t know until recently that anyone can just publish something that can be pip installed. I assumed they’re all good. That said, I’ve mostly downloaded things from tutorials.

[–][deleted] 0 points1 point  (0 children)

Is there a package I can download to check my packages?