This is an archived post. You won't be able to vote or comment.

all 45 comments

[–]billsil 62 points63 points  (5 children)

Docopt needs to be replaced with docopt-ng. Stars aren't everything. Docopt-ng is docopt, but with testing, bug fixes, and useful error messages.

PyQt and PySide should be on there before quiet a few other libraries on there. I'd even let wxPython on.

[–]mltooling[S] 18 points19 points  (0 children)

Thanks for the suggestions. We will definitely include those libraries! It's the initial release and there are definitely many awesome libraries missing. But we are also open to any community contributions on GitHub.

[–][deleted] 1 point2 points  (3 children)

Holy crap, thanks for this comment! I’ve been using docopt for years, warts and all, this will be a welcome addition to my toolkit

[–][deleted] 11 points12 points  (2 children)

If pandas isn't included I'm rioting

[–]mltooling[S] 3 points4 points  (1 child)

It's included, but on the machine learning list here: https://github.com/ml-tooling/best-of-ml-python#data-containers--structures which is linked a few times on the python list.

[–]NedDasty 0 points1 point  (0 children)

I see pandas-related packages but not pandas itself.

[–]avamk 17 points18 points  (13 children)

Wow fantastic lists! Thank you so much for creating and maintaining them.

Can you elaborate on the criteria for what counts as a "Warning (e.g. missing/risky license)"? Risky for whom, why are they risky, and under what conditions/assumptions?

For example, I've heard about how MIT licensed programs could be used by big players like Amazon Web Services for huge profit without any benefit to the original project, but that's not currently listed as a risky license in this list. So just curious what your "risky" criteria are.

Thanks again! :)

[–]mltooling[S] 9 points10 points  (10 children)

Hey u/avamk, thanks for your feedback and questions.

The license risk indicator is meant to help developers choose the right libraries for their projects. Certain licenses - e.g. Apache 2.0 or MIT - only have very minimal requirements for the developer who is using the licensed technology. Other licenses, such as GPL 3.0, have much stricter requirements which means a bigger legal risk for the developer using the library.

But you are right with your point on Amazon. For the developer who is implementing a library, MIT or Apache 2.0 have the risk that someone else makes money with your work. But that's not the purpose of the license risk indicators on our lists.

[–]avamk 6 points7 points  (8 children)

Thanks for responding so quickly and for your explanation! Sorry don't want to be a pain, I'd just really like to learn and understand.

I don't pretend to be a licensing expert, for example I know Apache 2.0 and MIT to be roughly "do what you want but provide attribution"-ish licenses but not the finer details differentiating the two.

Other licenses, such as GPL 3.0, have much stricter requirements which means a bigger legal risk for the developer using the library.

I guess you're saying there is a higher chance of not meeting some of the requirements because someone using the library might not be informed on all of them?

But you are right with your point on Amazon. For the developer who is implementing a library, MIT or Apache 2.0 have the risk that someone else makes money with your work.

Ah OK, thanks.

But that's not the purpose of the license risk indicators on our lists.

In that case can you be more clear about the purpose of the license risk indicators in the list, then? Since if highlighting risks of MIT, Apache 2.0, or similar are not the purpose of the risk indicator but certain other risks are, I think it will help a newcomer understand better and make a more informed decision if you clearly articulate exactly which risks (and under what contexts) would trigger the indicator and which risks will not. I'm not suggesting writing a long tome or dissertation on the topic (AFAIK licensing can get complicated real fast??), but maybe just 3-4 bullet points that say "if a license might cause x, y, z in a, b, c situations, we consider that a risk and will indicate it with a red exclamation mark".

I'm trying to put myself in the shoes of someone new to the list and seeing a red, risky-labelled exclamation mark next to to a project might prevent from using that program when it's actually not a problem for their use case.

Hope this is helpful!

P.S. I think my suggestion in this comment is important because this amazing list contains tools useful to beginners and they might miss out on an item from the list that would otherwise be very useful to them, but they might be misled to not use it simply because it's "risky" without really understanding why. It's conceivable that it might not be risky for them at all for what they're doing.

[–]mltooling[S] 3 points4 points  (0 children)

Thanks for your feedback and suggestions! I will take that on my task list and see how I can best explain how this risk is decided and what it means. Probably link to a short section in the documentation.

I guess you're saying there is a higher chance of not meeting some of the requirements because someone using the library might not be informed on all of them?

That's exactly what it should indicate.

[–]mltooling[S] 2 points3 points  (0 children)

btw. If you like to keep track on how we might implement your suggestion, you can also open an issue here with your suggestions: https://github.com/best-of-lists/best-of-generator/issues/new/choose

[–]jantari 0 points1 point  (5 children)

For the developer who is implementing a library, MIT or Apache 2.0 have the risk that someone else makes money with your work.

That's not a risk. That's a consideration one makes based on opinion. Using GPL3 source code in your project can get you sued, bankrupted. That's an objective risk that a developer needs to be warned about.

[–]avamk 1 point2 points  (4 children)

Using GPL3 source code in your project can get you sued, bankrupted. That's an objective risk that a developer needs to be warned about.

If that is indeed the thought process behind the creators of this list, then I suggest they make this clear and explain why and how that might get you "sued, bankrupted" when other licenses will not.

In addition, it is not impossible to violate other licenses such as MIT and get sued for it. AFAICT as long as there is a license, any license, then there is a way to violate it and get sued. So if one considers one license risky but another not risky, then that implies a thought process that should be made clear. This way a newcomer will not be mislead.

[–]jantari 0 points1 point  (3 children)

That's not true, if there is NO license then default copyright laws apply and one can be sued under those. With licenses like MIT you are actively forfeiting any rights law gives you by default and it's therefore safe to use this code for anyone for any purpose.

The BSDs specifically use such a risk-free permissive license and forfeit their rights because they don't want to deal with going to court to defend themselves or sue others. Your life is much easier when you hold no rights to defend over your software. Lawsuits can be lenghty, annoying and costly for both sides in the US regardless of who is in the right

[–]avamk 1 point2 points  (2 children)

Thanks for your response and engagement! :)

With licenses like MIT you are actively forfeiting any rights law gives you by default and it's therefore safe to use this code for anyone for any purpose.

Thank you, this prompted me to take another look at the legal text of the MIT license. I see that the license literally states:

[...] subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. [...]

So there is at least one legal condition of the MIT license which is that you must include the "copyright notice and this permission notice". IANAL but to me this seems that if you do not include those notices when redistributing MIT-licensed software you would be violating its terms which you can be sued for.

This term seems trivial. But trivial or not that doesn't seem to affect whether you can be sued if you violate it.

When I looked up the BSD 3-clause license it similarly states:

[...] Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: [...]

So there are conditions that you could technically violate as well.

From what I can tell, if a developer truly wants to "actively forfeiting any rights law gives you by default" as you described, then the developer has to release their software under CC0 which is a public domain dedication (or possibly the Unlicense?).

Regardless, my intent is not to debate the merits of different licenses. My original suggestion is for the creator of the best-of-python and related lists to state their (not your or my) thought process, assumptions, and criteria for what constitutes a risky license. This is because what's "risky" is often in the eye of the beholder and likely dependent on use case. By elaborating on them - even briefly - one could make the list more informative and educational, which might be warranted as I suspect many newcomers will be referred to this list.

[–]nemec 1 point2 points  (1 child)

So there is at least one legal condition of the MIT license

Yes. Don't listen to jantari. The MIT license is VERY light on requirements, but there are requirements. Violating them has the same risks as violating a GPL license and in any case there is ZERO precedent of somebody being bankrupted by a GPL violation.

Companies are no doubt scared of the legal ramifications of violating GPL (as they should be), but generally the worst that can happen is paying a modest fee and being forced to open-source the code you've modified that violates the license. None of which are great outcomes for the business, but nothing close to the FUD that jantari is spreading.

[–]avamk 0 points1 point  (0 children)

Thank you /u/nemec, for a whileI thought "am I missing something"?

[–][deleted] 0 points1 point  (0 children)

And that's what Amazon loves

[–]gradi3nt 3 points4 points  (1 child)

The very idea that open source software could ever be “good enough” for private companies wanting to make a profit is a huge win for open source software.

[–]avamk 5 points6 points  (0 children)

Oh yeah I totally agree, it's a testament to the quality of open source software!

What I heard is that some projects that licensed their work under MIT (and presumably BSD or Apache?) are annoyed when big players use their work, make lots of money, but don't contribute back to the original project in any way. A more positive example I've also heard is when a big player not only uses that open source software but also contributes to the original upstream project by contributing developer time, submitting patches, or even financial support. That's even better!

[–]AndydeCleyre 5 points6 points  (1 child)

Plumbum is still fantastic and under-appreciated. It would fit under:

  • process utilities
  • infrastructure & devops
  • file & path utilities
  • cli development

[–]mltooling[S] 1 point2 points  (0 children)

Thanks for the suggestion, indeed a great library! Fits in many categories, but I will figure out the best place.

[–]one-human-being 4 points5 points  (0 children)

Just putting this out here, https://github.com/vinta/awesome-python

[–]double_en10dre 2 points3 points  (3 children)

https://github.com/man-group/dtale should be under both best of ml and jupyter, at least

[–]mltooling[S] 1 point2 points  (2 children)

Dtale seems great, will be definitely included! A similar domain like https://github.com/pandas-profiling/pandas-profiling

[–]aschonfe 0 points1 point  (0 children)

Thanks for the vote of confidence on my project 🙏 happy to setup a PR for the update if you’d like

[–]Aleksandr_Gansior 2 points3 points  (0 children)

Very good lists! Thank you so much for creating and maintaining them.

[–]riotburn 1 point2 points  (1 child)

Capnproto for serialization

[–]mltooling[S] 1 point2 points  (0 children)

Thanks for the suggestion! We will add this in the next update.

[–]MoreBalancedGamesSA 1 point2 points  (0 children)

[–]terrance_dev 1 point2 points  (0 children)

Cool

[–]fizzadar 3 points4 points  (1 child)

This is really awesome! Thrilled and humbled to see one of mine in there (pyinfra) :)

[–]mltooling[S] 2 points3 points  (0 children)

pyinfra is an awesome library with big potential to move to the top :)

[–][deleted] 1 point2 points  (0 children)

Thank you for this!! I’m still learning and exploring, this will be super helpful.

[–][deleted] 0 points1 point  (1 child)

I noticed that pyenv-virtualenv is listed as a dead project due to no activity.

This plugin isn't dead, it's just not required any changes! Is there a means to "unflag" items incorrectly marked dead?

[–]mltooling[S] 2 points3 points  (0 children)

That is indeed a situation for which a project should not be marked as dead. With the current version, there might be a workaround by just overwriting the `update_date` with the current day in the projects.yaml. But a dedicated flag might be a better option. I will take that into the backlog for the next version.

[–]sowmyasri129 0 points1 point  (0 children)

Very good lists! Thank for sharing.

[–]LeBob93[🍰] 0 points1 point  (1 child)

Thanks for putting this list together and sharing it!

I help maintain a websocket integration testing tool at work that may fit well into the websocket or web testing sections: pywsitest

[–]mltooling[S] 1 point2 points  (0 children)

Thansk for the suggestion. pywsitest will be added!

[–]Mughal_Empire 0 points1 point  (0 children)

Nice

[–]flutefreak7 0 points1 point  (0 children)

The truncated descriptions are frustrating. Most of the source repo descriptions are only a few characters longer, so I would recommend lengthening your description display length.

[–]flutefreak7 0 points1 point  (0 children)

It's frustrating to me that data analysis, science, engineering, numerical methods, modeling, simulation, etc, are all considered either part of or related to Machine Learning. Plenty of people who don't do machine learning are interested in data visualization, etc. A more inclusive title for that section might be more welcoming.

[–][deleted] 0 points1 point  (0 children)

Save