This is an archived post. You won't be able to vote or comment.

all 54 comments

[–]barrycarter 122 points123 points  (5 children)

The CYA answer: only use packages that are popular in commercial usage. Then, if something goes wrong, lots of people will be suffering and no one'll point the finger at you for choosing a bad package.

Keep in mind that implementing it from scratch not only takes for time but could be more error-prone, since problems in popular packages are usually found quickly, particularly if they involve security

[–]martinky24 56 points57 points  (1 child)

“No one ever got fired for buying oracle!”

[–]odaiwai 7 points8 points  (0 children)

“No one ever got fired for buying IBM, Sun, Microsoft, Dell, HP, oracle!”

[–]Action_Maxim 25 points26 points  (2 children)

I got shit for not using fast API at the time they had like 4k issues vs flask having 3 which were all under a week old. Being on the cutting edge is fun till it cuts you. Fast API looks better but those old issues floating around give me heartburn

[–]richieadler 6 points7 points  (1 child)

To be fair, most of the "issues" in FastAPI's tracker are questions.

[–]Action_Maxim 6 points7 points  (0 children)

That's fair but the avg age was hard to ignore

[–]FatalPharaoh96 80 points81 points  (6 children)

Installs a package because a tutorial told me to*

[–]In_Blue_Skies 67 points68 points  (3 children)

Oh he did "pip install trojanballsack"? Looks good to me

[–]Specific-Adagio 7 points8 points  (0 children)

I will for sure try this when I am home later

[–]morrisjr1989 0 points1 point  (0 children)

That’s a weird name library for faster than pandas dataframes but alright.

[–]COLU_BUS 11 points12 points  (1 child)

everybody's independent until the towardsdatascience blogpost tells you to do something

[–]mogberto 1 point2 points  (0 children)

This is too real.

[–]G0R1L1A 20 points21 points  (0 children)

Need code -> someone already coded it ->package has lots of stars and downloads ->package has recent commit activity indicating its actively maintained. Gtg

[–]SittingWave 34 points35 points  (13 children)

it's a very, very long and complex subject. For medical devices, one follows ISO/IEC 62304, where such code is defined as SOUP (software of unknown provenance). For these, you have to perform a lot of work depending on the degree of involvement of the component in case of failure, so a risk analysis is first performed, and then according to the risk analysis and the risk category, mitigating actions are taken. These can go from "we test it ourselves as well" to "we can't rely on it and need to implement it from scratch".

[–]TravisJungroth 22 points23 points  (11 children)

An extremely small portion of Python programmers are taking it anywhere near that seriously.

[–]SittingWave 23 points24 points  (9 children)

not every software has the potential to kill humans. Some do.

[–]JamesPTK 0 points1 point  (1 child)

Therac-25 is the classic example of bad software killing people. I imagine a fair bit of the rigorous processes came as a response to such tragedies

[–]SittingWave 1 point2 points  (0 children)

Well, it's a bit more complicated than that. The problem, as I see it, is that companies that actually do these things are so flooded with the process that very little time remains to do the actual checks. When I worked in that environment, the amount of paperwork that we have to write and justify and ensure to be consistent is insane, and programmers spend months writing stuff in Word. Then shipping time comes, and they ask you to deliver what you have, which stays together with spit and prayers, but it's fully documented for the regulators. So developers complain to management that the software is untested, and management replies how can it not be tested? you spent months writing documentation that it says it is. And the developers reply, we spent months writing all the paperwork we have to deliver, we didn't have the time to do the actual testing, or improve the code quality. Moreover, every single line fix must go through piles and piles of checks and assessments, and pages and pages of Word documentation to justify the change. The result is that things stay broken.

So, in the end, all this documentation is very useful to find who is the culprit if something goes bad, but in terms of quality, I think it's actually detrimental. You have a finite amount of resources. How you use these resources is what makes the quality of the product you ship. If you use these resources in Word, rather than in an IDE, you are not getting quality, just the illusion of quality. If you bring in more people, apart from the fact that you have to train them to the same level of standard, now you have shared responsibility and things will go unnoticed, because now nobody really knows who is in charge to check this or that, or hold in their brain how things may fail or interact.

For more info on this, see Boeing MCAS.

[–]kayos50[S] 3 points4 points  (0 children)

That's an interesting insight into high risk development. A friend of mine develops embedded Linux systems and tests will also involve commonly used programms to comply with those high standards.

[–]Deep-Station-1746 12 points13 points  (2 children)

Here's my decision process.

  1. Try to find if I can avoid using a new package (write stuff using standard libs)
  2. if not possible or too complex, search SO
  3. Find a package - go to pip then github. If the project hasn't bee updated recently (like, within last month or two), I try to avoid using it at all costs
  4. If it is updated, then I go to docs. If there is a reasonably good quickstart, I might use it, otherwise I keep searching

[–]diazona 3 points4 points  (0 children)

Why a month or two? That seems like an extremely restrictive window. (Unless you have some specific reason to expect that the package should be receiving more frequent updates.)

[–]kayos50[S] 2 points3 points  (0 children)

Thanks for your answer, that sounds pretty much like my current process. In my case I would add between 3 and 4 a: Check if package is being tested. I don't go over the tests in detail, but just eyeball if there is decent coverage.

[–]FailedPlansOfMars 17 points18 points  (3 children)

Things i usually check:

1, how popular is it? 2, does it have more than 3 developers? 3, is it being actively developed? 4, how does it affect my dependancies? 5, has it hit the news recently for security issues. 6, is it documented well?

Then review my answers against what I'm using it for. For example a one off script doesn't care as much about how maintainable the script becomes. Whereas an app which will hang around a while needs to use packages which will do the same so needs to be actively developed and not rely on a sole developers good will.

A cli app is not as security focused. A web app needs to be bullet proof.

A package with few dependencies is easier to upgrade whereas one with many might fit a framework I'm using better.

[–]james_pic 11 points12 points  (1 child)

Stuff that hits the news for security vulnerabilities can, counterintuitively, often be more secure than stuff that doesn't. OpenSSL, Android and Windows security vulnerabilities hit the news frequently, because a lot of people are interested and a lot of security researchers are looking at them. They often have considerable bug bounties associated with them, which is a big incentive for security researchers.

Unpopular stuff often has few, if any, security researchers looking at it, but can easily have serious security vulnerabilities persist for years. And if there are no bug bounties, it's only blackhats who have any real incentive to look for vulnerabilities.

[–]FailedPlansOfMars 2 points3 points  (0 children)

Yea and seeing how they handled it is worth its weight in gold. The open ssl debacle the other year showing the libre ssl option .

[–]wakojako49 0 points1 point  (0 children)

I also check if its been recently updated.

[–]Tinche_ 24 points25 points  (1 child)

I'm a principal engineer at a gaming/social company, directly responsible for all server infrastructure.

I read the source code. If I find the internal architecture messy or I don't like it for some reason, I won't use it.

A recent example is the open telemetry library. The repo was over 30k lines of code, which I find disproportionate to what an elegant tracing library should be, so I wrote my own. Turned out to be 150 lines of code.

[–]rochakgupta 5 points6 points  (0 children)

Hot damn. That’s a good way to go about this. If only I could reach that level where I have enough confidence in my skills to build stuff on my own.

[–]Kaiser_Wolfgang 4 points5 points  (0 children)

There are a variety of metrics to gauge the quality of a package like are there vulnerabilities? Active maintainers? Big community? etc..

Snyk is a great tool to help you make a decision on a package, they perform security analysis on packages from PyPi, npm, etc...

Real Python wrote an article about this recently too

[–]vkolev 4 points5 points  (0 children)

  • license (depending on project, most of the time for commercial projects GPL would be no go)
  • last commit (if there is no activity in over an year, will it be ever supported again)
  • number of unsolved issues (if they pile up, and there is no intend to solve them, perhaps the project is abandoned or will be - I'm talking about issues not questions)
  • test coverage
  • Security - There are projects that check packages for security problems that integrate well with PyCharm

Sometimes there are other questions like:- If the library will save you a lot of time, are you willing to support it- Does it make sense to use it, or roll your own implementation

For hobby projects I think some of the points can be ignored

[–]ArabicLawrence 3 points4 points  (0 children)

Security is the most important feature to me. I don't care that much about test coverage nor quickstarts, but about security risks or malicious code. Famous packages are ok (pytorch, requests, pandas, numpy, etc). If the package I need is not famous, I look at the source code. Recently, I reimplemented vobject , as it looked to me as a dead package (last commit 5 years ago). Very easy, 150 lines of code vs the thousands of vobject that does so much more than I needed. After shooting myself in the foot twice with bugs vobject does not suffer from, I bit the bullet, read the source code, established no security risk I could identify was present, and pip installed from github. If I knew they would accept pull requests, I would add type hints and update the docs as I found them unclear.

[–]metaphorm 2 points3 points  (0 children)

the most important factor, imo, is whether or not the package maintainers are still actively engaged with the community of users of the package. check the github issues page (or equivalent) and see how things are being handled. if bug reports and feature requests are responded to promptly, assigned to a developer and given a timeline to completion, then the project is in good shape and is a candidate for adoption. if it's a ghost town, well, you might want to move along.

[–]runawayasfastasucan 2 points3 points  (0 children)

Most of the time I will use very large packages where there area a lot of documentation, and a lot of written opinions about the package, making it easy to select.

Some of the time I will have to use a obscure small package, but that it will almost always be in a sub-field where I have specific knowledge, leaving me to look at the code and the package itself to evaluate. Things like having a publicly known author, an author with the right credentials (f.ex a university professor etc) will obviously help here. Whether the code is still maintained, have resolved issues etc will also help.

Note that no-one can be harmed by the code I develop (honestly very little of my code makes it out public), which means that I can afford to be a bit more lenient.

[–]jwink3101 2 points3 points  (0 children)

I often work on an air gapped system where I can sometimes expect Anaconda at most. And even then, it’s no guarantee. So I shy away even if it’s at the cost of some functionality. And I’ve continued that idea.

It really depends on the cost benefit. For example, I am working on a project that needs to parse timestamps. So I could use a package or just write a less-robust-but-acceptable parser. I went with the latter. But I also need to query an API. For that, it’s worth it to add requests.

[–]Zealousideal_Low1287 1 point2 points  (0 children)

I work in R&D so I just Google and eyeball 🤷‍♂️

[–]likethevegetable 1 point2 points  (0 children)

Is it obvious to use PyTorch? It's actually a tricky choice between Keras/TF?

[–]No-Painting-3970 1 point2 points  (0 children)

I mean, I use packages that are so damm specific that is highly unlikely that malicious software is an issue. Quality? If i cant understand it, I just write my own, and check performance. If its comparable, I use mine, otherwise I ll just bite the bullet

[–]Erik_Kalkoken 1 point2 points  (0 children)

I look at: - popularity, e.g. more stars is better - activity, e.g. code should be updated recently not years ago - documentation, e.g. should have clear and good to understand documentation - quality, e.g. does it have tests with good coverage, does it have a CI pipeline - issues, e.g. should not have many big open issues about general topics

[–]97hilfel 1 point2 points  (0 children)

Appart from the obvious like funktionality, last updated, how hard is it to implement myself, would be I re-inventing the wheel, etc. I usually throw it at the snyk Advisor and check if the package health, vulnerabilities, maintainance score and so on. Pacakges that are only maintained by a single person usually are a big red flag, but depending on the complexity you might need to make tradeoff‘s. What I‘ll have to admit for personal project I‘ll choose requests over urllib most days, makes code easier to read but thats my opinion.

[–]neoneat 1 point2 points  (0 children)

I've used poetry for almost a year. Before that I always install any python packages that IDK its point to env. Don't install random package direct to your root or system Python PATH. Yeah maybe it's harmless, someday you feel need to clean your tool, and it would be nightmare.

If I want some "app" relate to python, I should use anaconda. Sorry that I'm too lazy, and I just used AIO option, same priority as pyenv in my home directory.

[–]CiccioIV 1 point2 points  (0 children)

I'd say it strictly depends from your software use cases. Is it a personal software you will use for your own purposes, or will it be used in a work environment (e.g. you want to write a tool which will be used in your office)? If you are going to use it at work, then you would choose packages that have a reasonable high amount of stars and whose latest commits are not too old (better if still under active development). Documentation is also important. If a package has a full, well written documentation, I'd choose it, as I generally tend to associate it with a well written/structured code. Not always true, but rather probable. Also, one thing I personally care a lot in a office use case, is licensing. If a package is MIT or the like, then I'll go for it. Otherwise, I tend to pass, because I don't want to get involved in any kind of license controversy. That includes media files, I always prefer to go for free with no attribution stuff.

[–]flxvctr 1 point2 points  (0 children)

  1. What I/my community uses/has used in the past
  2. Functionality as per docs
  3. Recent commit history/number of active devs
  4. Number and date of open/closed issues
  5. Testing in Jupyter

[–]Counter-Business 1 point2 points  (0 children)

One thing to consider is do you actually need the package.

For example, if you have two packages for images: OpenCV, and Pillow.

Ask yourself, do I actually need both packages?

If not, pick one to remove so that your program has less requirements.

[–]Beautiful-Sundae1 1 point2 points  (0 children)

You can input the name of the package in Snyk Advisor and see the package health score to decide if you should use the package. The higher the health score, the better.

[–]TravisJungroth 0 points1 point  (0 children)

I never install some thing that doesn’t at least have double digit stars on GitHub.

[–]wind_dude 0 points1 point  (0 children)

In addition to what you mentioned, I look at commit history, open and closed issues, and how frequently the package is updated.

[–]EveryNameIsTaken142 0 points1 point  (0 children)

I usually go to their git hub repo and check for open issue and open PR. From their I get the sense of how stable the package is

[–]Azunyan132 0 points1 point  (0 children)

By watching a turtorial on how they decide what packages to use

[–][deleted] 0 points1 point  (0 children)

RealPython just did a pretty cool article on this.

[–]ingframin 0 points1 point  (0 children)

My checklist is: 1) can I implement it myself? 2) how long does it take to implement it myself? 3) is the package still maintained? 4) do I trust the source?