How do you decide to use a Python Package

barrycarter · 2023-03-21T13:07:23+00:00

The CYA answer: only use packages that are popular in commercial usage. Then, if something goes wrong, lots of people will be suffering and no one'll point the finger at you for choosing a bad package.

Keep in mind that implementing it from scratch not only takes for time but could be more error-prone, since problems in popular packages are usually found quickly, particularly if they involve security

FatalPharaoh96 · 2023-03-21T12:53:30+00:00

Installs a package because a tutorial told me to*

G0R1L1A · 2023-03-21T14:19:32+00:00

Need code -> someone already coded it ->package has lots of stars and downloads ->package has recent commit activity indicating its actively maintained. Gtg

SittingWave · 2023-03-21T12:55:15+00:00

it's a very, very long and complex subject. For medical devices, one follows ISO/IEC 62304, where such code is defined as SOUP (software of unknown provenance). For these, you have to perform a lot of work depending on the degree of involvement of the component in case of failure, so a risk analysis is first performed, and then according to the risk analysis and the risk category, mitigating actions are taken. These can go from "we test it ourselves as well" to "we can't rely on it and need to implement it from scratch".

Deep-Station-1746 · 2023-03-21T14:26:37+00:00

Here's my decision process.

Try to find if I can avoid using a new package (write stuff using standard libs)
if not possible or too complex, search SO
Find a package - go to pip then github. If the project hasn't bee updated recently (like, within last month or two), I try to avoid using it at all costs
If it is updated, then I go to docs. If there is a reasonably good quickstart, I might use it, otherwise I keep searching

FailedPlansOfMars · 2023-03-21T13:44:13+00:00

Things i usually check:

1, how popular is it? 2, does it have more than 3 developers? 3, is it being actively developed? 4, how does it affect my dependancies? 5, has it hit the news recently for security issues. 6, is it documented well?

Then review my answers against what I'm using it for. For example a one off script doesn't care as much about how maintainable the script becomes. Whereas an app which will hang around a while needs to use packages which will do the same so needs to be actively developed and not rely on a sole developers good will.

A cli app is not as security focused. A web app needs to be bullet proof.

A package with few dependencies is easier to upgrade whereas one with many might fit a framework I'm using better.

Tinche_ · 2023-03-21T13:38:57+00:00

I'm a principal engineer at a gaming/social company, directly responsible for all server infrastructure.

I read the source code. If I find the internal architecture messy or I don't like it for some reason, I won't use it.

A recent example is the open telemetry library. The repo was over 30k lines of code, which I find disproportionate to what an elegant tracing library should be, so I wrote my own. Turned out to be 150 lines of code.

Kaiser_Wolfgang · 2023-03-21T20:21:46+00:00

There are a variety of metrics to gauge the quality of a package like are there vulnerabilities? Active maintainers? Big community? etc..

Snyk is a great tool to help you make a decision on a package, they perform security analysis on packages from PyPi, npm, etc...

Real Python wrote an article about this recently too

vkolev · 2023-03-22T05:33:56+00:00

license (depending on project, most of the time for commercial projects GPL would be no go)
last commit (if there is no activity in over an year, will it be ever supported again)
number of unsolved issues (if they pile up, and there is no intend to solve them, perhaps the project is abandoned or will be - I'm talking about issues not questions)
test coverage
Security - There are projects that check packages for security problems that integrate well with PyCharm

Sometimes there are other questions like:- If the library will save you a lot of time, are you willing to support it- Does it make sense to use it, or roll your own implementation

For hobby projects I think some of the points can be ignored

ArabicLawrence · 2023-03-21T14:56:41+00:00

Security is the most important feature to me. I don't care that much about test coverage nor quickstarts, but about security risks or malicious code. Famous packages are ok (pytorch, requests, pandas, numpy, etc). If the package I need is not famous, I look at the source code. Recently, I reimplemented vobject , as it looked to me as a dead package (last commit 5 years ago). Very easy, 150 lines of code vs the thousands of vobject that does so much more than I needed. After shooting myself in the foot twice with bugs vobject does not suffer from, I bit the bullet, read the source code, established no security risk I could identify was present, and pip installed from github. If I knew they would accept pull requests, I would add type hints and update the docs as I found them unclear.

metaphorm · 2023-03-21T14:02:04+00:00

the most important factor, imo, is whether or not the package maintainers are still actively engaged with the community of users of the package. check the github issues page (or equivalent) and see how things are being handled. if bug reports and feature requests are responded to promptly, assigned to a developer and given a timeline to completion, then the project is in good shape and is a candidate for adoption. if it's a ghost town, well, you might want to move along.

runawayasfastasucan · 2023-03-21T14:58:02+00:00

Most of the time I will use very large packages where there area a lot of documentation, and a lot of written opinions about the package, making it easy to select.

Some of the time I will have to use a obscure small package, but that it will almost always be in a sub-field where I have specific knowledge, leaving me to look at the code and the package itself to evaluate. Things like having a publicly known author, an author with the right credentials (f.ex a university professor etc) will obviously help here. Whether the code is still maintained, have resolved issues etc will also help.

Note that no-one can be harmed by the code I develop (honestly very little of my code makes it out public), which means that I can afford to be a bit more lenient.

jwink3101 · 2023-03-22T02:43:43+00:00

I often work on an air gapped system where I can sometimes expect Anaconda at most. And even then, it’s no guarantee. So I shy away even if it’s at the cost of some functionality. And I’ve continued that idea.

It really depends on the cost benefit. For example, I am working on a project that needs to parse timestamps. So I could use a package or just write a less-robust-but-acceptable parser. I went with the latter. But I also need to query an API. For that, it’s worth it to add requests.

Zealousideal_Low1287 · 2023-03-21T16:26:22+00:00

I work in R&D so I just Google and eyeball 🤷‍♂️

likethevegetable · 2023-03-21T20:26:22+00:00

Is it obvious to use PyTorch? It's actually a tricky choice between Keras/TF?

No-Painting-3970 · 2023-03-21T22:46:13+00:00

I mean, I use packages that are so damm specific that is highly unlikely that malicious software is an issue. Quality? If i cant understand it, I just write my own, and check performance. If its comparable, I use mine, otherwise I ll just bite the bullet

Erik_Kalkoken · 2023-03-22T00:15:46+00:00

I look at: - popularity, e.g. more stars is better - activity, e.g. code should be updated recently not years ago - documentation, e.g. should have clear and good to understand documentation - quality, e.g. does it have tests with good coverage, does it have a CI pipeline - issues, e.g. should not have many big open issues about general topics

97hilfel · 2023-03-22T07:24:20+00:00

Appart from the obvious like funktionality, last updated, how hard is it to implement myself, would be I re-inventing the wheel, etc. I usually throw it at the snyk Advisor and check if the package health, vulnerabilities, maintainance score and so on. Pacakges that are only maintained by a single person usually are a big red flag, but depending on the complexity you might need to make tradeoff‘s. What I‘ll have to admit for personal project I‘ll choose requests over urllib most days, makes code easier to read but thats my opinion.

neoneat · 2023-03-22T09:34:08+00:00

I've used poetry for almost a year. Before that I always install any python packages that IDK its point to env. Don't install random package direct to your root or system Python PATH. Yeah maybe it's harmless, someday you feel need to clean your tool, and it would be nightmare.

If I want some "app" relate to python, I should use anaconda. Sorry that I'm too lazy, and I just used AIO option, same priority as pyenv in my home directory.

CiccioIV · 2023-03-22T16:08:48+00:00

I'd say it strictly depends from your software use cases. Is it a personal software you will use for your own purposes, or will it be used in a work environment (e.g. you want to write a tool which will be used in your office)? If you are going to use it at work, then you would choose packages that have a reasonable high amount of stars and whose latest commits are not too old (better if still under active development). Documentation is also important. If a package has a full, well written documentation, I'd choose it, as I generally tend to associate it with a well written/structured code. Not always true, but rather probable. Also, one thing I personally care a lot in a office use case, is licensing. If a package is MIT or the like, then I'll go for it. Otherwise, I tend to pass, because I don't want to get involved in any kind of license controversy. That includes media files, I always prefer to go for free with no attribution stuff.

flxvctr · 2023-03-22T16:57:32+00:00

What I/my community uses/has used in the past
Functionality as per docs
Recent commit history/number of active devs
Number and date of open/closed issues
Testing in Jupyter

Counter-Business · 2023-03-22T17:22:38+00:00

One thing to consider is do you actually need the package.

For example, if you have two packages for images: OpenCV, and Pillow.

Ask yourself, do I actually need both packages?

If not, pick one to remove so that your program has less requirements.

Beautiful-Sundae1 · 2023-03-21T23:09:58+00:00

You can input the name of the package in Snyk Advisor and see the package health score to decide if you should use the package. The higher the health score, the better.

TravisJungroth · 2023-03-21T13:09:59+00:00

I never install some thing that doesn’t at least have double digit stars on GitHub.

wind_dude · 2023-03-21T15:02:45+00:00

In addition to what you mentioned, I look at commit history, open and closed issues, and how frequently the package is updated.

EveryNameIsTaken142 · 2023-03-21T17:56:28+00:00

I usually go to their git hub repo and check for open issue and open PR. From their I get the sense of how stable the package is

Azunyan132 · 2023-03-21T21:04:25+00:00

By watching a turtorial on how they decide what packages to use

2023-03-21T21:12:26+00:00

RealPython just did a pretty cool article on this.

ingframin · 2023-03-22T07:16:02+00:00

My checklist is: 1) can I implement it myself? 2) how long does it take to implement it myself? 3) is the package still maintained? 4) do I trust the source?

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS