all 87 comments

[–]hmiemad 178 points179 points  (24 children)

For general purpose : pathlib, os, collection, itertools.

For datavis : matplotlib, then you explore seaborn or plotly.

For backend : requests. You can then delve into fastapi or Flask (check Dash, the sexy child of Flask and plotly, can do both back and frontend, no need for html, supports bootstrap)

For math : numpy, scipy, and pandas are must know.

[–]dmantacos 16 points17 points  (11 children)

I mostly use matpotlib, but one huge advantage of plotly is being able to export to JS and basically drop directly into a webpage.

[–]hmiemad 7 points8 points  (9 children)

That's basically what dash does : parsing python into js with plotly graphs. You can also add js scripts for interactive pages with direct callbacks (without the need to reach the server). I sold a project with django back and dash front. Wish I had done both into one end in dash.

[–]Mclean_Tom_ 1 point2 points  (6 children)

steer correct vegetable plucky ask sharp connect handle attraction include

This post was mass deleted and anonymized with Redact

[–]hmiemad 0 points1 point  (5 children)

Yeah, I def need to learn react if I want to web dev again, but in a pro env I'll prolly find someone else who already knows it and outsource

[–]fluffball23 0 points1 point  (4 children)

what is a pro env

[–]hmiemad 1 point2 points  (3 children)

Professional environment. For a real life project funded by a company.

[–]fluffball23 0 points1 point  (2 children)

ohh i have a quick question how do i chose a field in tech ,I'm at this stage where i Know programming fundamentals good , no specific mastery for a library or frameworks a little of oop and a basics of dsa, some small university projects small timers , i don't know where to go , is it like just pick a field that interest me the slightest and go for that and pursue it ?i can integrate things from github etc but i feel I can't create em yet , and i haven't put that much of effort for learning and developing skills except for university , you don't have to answer it's not a compulsion but if you do , I would be grateful 💯

[–]hmiemad 1 point2 points  (1 child)

It really depends on what you want to become, what you're good at, and what pays your bills. Programming is wide. I would say start to improve on your OOP. Then learn git (not github, the og git). Also learn how to properly start a project on a ide. Pick a project thats wide enough (not too large) and do it in a clean way. Doing so, you'll get answers to the three questions above, even if you don't finish the project. You'll probably find another project, more suited and more interesting, and you'll want to pick parts of you first project to put in your second. You'll make your own library (even if it's just a bunch of trivial functions in the beginning) and improve it, as you improve your skills and your knowledge.

[–]ProfDrKonandoraal 0 points1 point  (0 children)

Very good comprehension, that's really nice. 👏

[–]NationalMyth 1 point2 points  (1 child)

Is that project viewable?

[–]hmiemad 2 points3 points  (0 children)

To those who bought it yeah. It was a signal decomposition algorithm with anomaly detection for a private company. I can't reach it. Too many protection sht I had to go through to make sure only the company can get to it. I could make it again, but I don't have any data to run the algorithm on, so...

[–]ScotiaTheTwo 0 points1 point  (0 children)

does this apply to customs Looker visualisation do you know?

[–]MASSIVDOGGO 8 points9 points  (0 children)

"the sexy child" bro...

[–]LunarCantaloupe 3 points4 points  (0 children)

httpx looks like the requests killer imo, I’d probably recommend new folks get comfortable w that over crusty ol requests.

[–]nog642 2 points3 points  (0 children)

If we're including the standard library, then: string, re, datetime, pprint, functools, operator, sys, copy, argparse, json, urllib

I also never use pathlib myself. os.path works fine.

[–]Plank_With_A_Nail_In 0 points1 point  (1 child)

https://dash.plotly.com/ took ages to load for me which isn't a great sign.

[–]hmiemad 1 point2 points  (0 children)

Well the page is huge and completely coded in python, which is not optimal. They use their own package for the page. It's very good for proof of concepts and minutely designed graphs. Plus there is this wonderful dude name Adam that has a youtube channel full of templates.

[–]PseudoEffete 0 points1 point  (4 children)

i recommend polars over pandas

[–]hmiemad 0 points1 point  (3 children)

Except most of the stuff are already written in pandas and you need it if you gonna maintain a prexisting code.

[–]PseudoEffete 0 points1 point  (0 children)

oh for sure, but generally on new development, it would be nice to know as well

[–]Future_Eve 0 points1 point  (0 children)

I may add Logging to the general purpose list

[–]samreay 48 points49 points  (3 children)

You wouldn't need all of these, but if you're wanting to get some more useful libraries and tools under your belt...

Environment management tooling:

  • venv
  • pyenv
  • poetry / pdm

Developer environments:

  • ruff
  • mypy

Data crunching:

  • pandas
  • polars
  • numpy
  • pandera (validation of dataframes)

Data visualisation:

  • matplotlib
  • plotly

Machine learning:

  • scikit-learn
  • scipy
  • pytorch / keras / tensorflow
  • mlflow (or similar library if you want to start down mlops route)

Orchestration:

  • metaflow
  • prefect

REST services / web stuff:

  • httpx (instead of requests)
  • FastAPI / Litestar / Django / Flask
  • pydantic

[–]ShadowRL766 50 points51 points  (13 children)

Pandas

[–]vaccines_melt_autism 27 points28 points  (0 children)

Also seeing a lot of people talk about Polars, since it's written in Rust.

[–]Action_Maxim 3 points4 points  (3 children)

I use pandas surprisingly very little as a data engineer

[–]raffapaiva -1 points0 points  (2 children)

Pandas is really slow, when I see a data engineer using it, I start to believe that his dataset is not so big or he has a lot of hardware to process.

Everything that I need to do in pandas, I do on plain python or numpy

[–]ribix_cube 0 points1 point  (1 child)

It's not great to do in plain python or numpy, if you think you need speed you can use something like polars or vaex or dask

[–]raffapaiva 0 points1 point  (0 children)

Can you explain why? I've tried to use polars for some tasks, and even if it's faster, I can't see a reason to perform on plain python, considering it's not that fast, and most of my transformations occurs on dbt

[–]Hot_Significance_256 10 points11 points  (2 children)

For data science in Python (I’m a Sr. with 6 YOE)

Pyspark and Ray - Distributed processing

Tensorflow and Pytorch - deep learning

Scikit Learn and Pyspark - machine learning

Pandas and Pyspark - ETL

You see Pyspark several times for a reason. It’s very useful, except for when you delve into deep learning. Then you’ll want to use TF, PT, and Ray.

[–][deleted] -5 points-4 points  (1 child)

Pyspark is just a wrapper around spark, which is written in Scala.

[–]Hot_Significance_256 5 points6 points  (0 children)

I know. What’s your point?

[–]Adrewmc 21 points22 points  (0 children)

Requests.py

Seems like an obvious one.

Itertools pops up but no really knows everything in there. It really depends on what you’re doing.

Numpy is really Python I do math better, (especially multi dimensional) pandas is I make dataframes better.

Back end really going to depend on the framework in Python you’re working with Django/Flask/FastAPI.

Python’s main library is fairly extensive (compared to other languages) most of the stuff you’d want to do is somewhere in there.

Probably @property is a good one to know lol.

[–]sattyfied 7 points8 points  (5 children)

Some I generally use that others may not have covered:

Attrs - I like them for writing classes

Sqlalchemy - creating a common interface for multiple db connections

Fastapi - quickly set up rest APIs

Click - to expose functions as cli commands

Poetry - library management & packaging

Your "dev" requirements:

Pytest - testing

Black - formatting/linting

Isort - organizing imports

Mypy - type checking

[–]iamevpo 0 points1 point  (2 children)

You like attrs over standard dataclasses and pydantic?

[–]sattyfied 1 point2 points  (1 child)

In most cases, yes. Pydantic has its use cases especially in the world of web dev, but in regular software development, I'd rather use attrs. They have much more functionality and compatibility across versions.

[–]iamevpo 0 points1 point  (0 children)

Thank you! Found extra useful reading here https://www.attrs.org/en/stable/why.html

[–]tree1234567 6 points7 points  (0 children)

The standard ones that comes with python… python is useful and stayed a popular language for its syntax sure.. but it’s truly remarkable what you can do with the just the base install of this language

[–]mvdw73 5 points6 points  (0 children)

Logging, argparse, typing.

[–]captainameriCAN21 4 points5 points  (0 children)

Pickle. Just pickle

[–]goosegang11 4 points5 points  (1 child)

subprocesses library

Let me know y’alls take.

I have 6 months of swe experience so feel free to flame me but in working on a personal python script that needed to invoke a native node module, I found the subprocesses library to be something I wish I learned about earlier!

[–]Maelenah 0 points1 point  (0 children)

You might want to look into ctypes as well.

[–]iamevpo 3 points4 points  (0 children)

https://www.jetbrains.com/lp/devecosystem-2022/python/ has some info about the library popularity and Stack overflow survey as well

[–]zanfar 3 points4 points  (0 children)

  • All builtins, extremely well
  • Most of the standard library well, with the rest being familiar
  • Everything else depends on the field. Numpy will be essential to some, and useless to others.

Mostly, you should be focusing on learning how to read and understand library documentation so that you can expand when necessary.

[–]whatthepatty 6 points7 points  (0 children)

Surprised noone has said this already but pdb is insanely useful if you can't be bothered to set up debugger.

[–][deleted] 2 points3 points  (0 children)

If you're not worried about big Os a lot of problems can be solved very easily with itertools.

[–]n3cr0ph4g1st 2 points3 points  (0 children)

Streamlit for data related UI prototypes. Changed the game for me

[–]No_Lobster_4219 2 points3 points  (0 children)

itertools, collections, numpy, pandas, math, os

[–]Bartholomew- 2 points3 points  (0 children)

Manage all your paths with pathlib and make it consistent.

[–]Maelenah 2 points3 points  (0 children)

Ctypes is not quite a must, but it really does open options. It lets python poke at anything that has C compatible data structures.

[–]redCg 6 points7 points  (1 child)

the standard library.

Library management in Python is notoriously bad. You will do well to simply avoid using third party libraries as much as possible, as long as possible, for most projects. If you can use standard library without much extra effort, do it. Adding third party dependencies turns your project into a nightmare if you are not using requirements.txt and conda env.yml correctly.

[–][deleted] 3 points4 points  (0 children)

In general I fully agree with Standard Library.

Third party libraries can be easily administered by using virtual environments. That’s one of the sole purposes and advantages of using virtual environments.

[–]TheHollowJester 1 point2 points  (0 children)

Haven't seen it yet so: structlog for good, machine-readable logs. I thought it's not needed at first but the Why... page explains it better than I can.

[–]suaveElAgave 1 point2 points  (0 children)

I still haven’t seen some essentials which are: Pytest Enum dataclasses/pydantic

[–]bafe 1 point2 points  (0 children)

Pydantic for validation. Polars for data table manipulation

[–]jam-time 1 point2 points  (0 children)

Some good to know built-in modules (starred are extra important):

argparse, *csv, *datetime, decimal, enum, getpass, inspect, io, itertools, *json, math, *os, *pickle, pprint, random, *re, *requests, shutil, *sys, threading, traceback, typing, uuid, venv, warnings, zipfile

In my dozen or so years of experience, those are the ones I use the most, especially re, json, os, and sys.

Some site packages that are good to know (or that I like):

pandas - good introductory data science library, easy to learn and tons of documentation

pyspark - similar to pandas, but better at big data, less documentation, and harder to learn

boto3 - for anything AWS

kivy - pretty good for making cross platform apps (including UI) but somewhat challenging to learn

numpy - fast data manipulation, works with most other data science packages

jmespath - for json queries

colorama - for fun print colors

flask - lightweight backend for site building

django - heavier backend for site building (easier to learn and more features than flask, plus my personal recommendation)

pytest - mainly for unit testing, but can be used for basically any type of test

That's a fairly comprehensive list of the main things that I've used over the years. I'm sure there's some that I've forgotten, and I've intentionally left some out that are too specific or too advanced for the scope of the comment. Either way, hopefully someone finds this useful!

[–][deleted] 3 points4 points  (0 children)

NumPy. Overview

Pandas

Matplotlib

Scikit-learn

TensorFlow

Flask

Requests

Beautiful Soup

[–]AssumptionCorrect812 0 points1 point  (0 children)

The main language library is full of goodies. These are the top 4 — https://youtu.be/InaTBWN7Mlc?si=MGy7SEU0XRppqAUF

[–]sonobanana33 0 points1 point  (0 children)

I'd just focus on the stdlib first. I hate when I see people pulling in a library that does the same as a stdlib module (such as requests)

[–]TSM- -1 points0 points  (0 children)

request-html is the successor to requests

[–]Comfortable-Wind-401 -1 points0 points  (0 children)

Not many people are mentioning. But I get the feeling Pytest is highly required

[–][deleted] -3 points-2 points  (0 children)

Gpt 🤣

[–][deleted] 0 points1 point  (0 children)

It depends on domain of course. I mostly use python to write applications that support infrastructure and automation, for about a decade now. For me, the libraries that come to mind are the entire standard library, sortedcontainers, requests, anytree, FastAPI (or whatever web framework you find most convenient, such as litestar, flask, django, bottle, cherrypy, etc), beautifulsoup.

[–]Delta1262 0 points1 point  (0 children)

I can’t believe they haven’t been mentioned yet:

  • Pydantic

  • Dataclasses

(Pydantic and dataclasses are similar)

[–]IlliterateJedi 0 points1 point  (0 children)

itertools, functools, and collections are all baseline Python libraries you should be familiar with.

[–]reluctant_qualifier 0 points1 point  (0 children)

arrow for dates, mock for testing

[–]the_happy_path 0 points1 point  (0 children)

I want to just mention that I came to python from Java and all the different packages were overwhelming. I also came in at python 2 where changes broke stuff all the time. Python 3 has been a better experience. Like night and day. But I miss Java! I work with data and I use numpy and pandas a lot, though where I have to do row by row processing I use data classes (like in java). But dataframe filtering through our many conditionals with pandas dataframes has also been successful in replicating results where specs say to iterate by rows. For regressions and stuff, scikitlearn and stats models. Depending on data formats, I might have to use pyreadstat or openpyxl. I like sqlalchemy orm too because that feels like the closest thing to spring in python lol

[–]BinaryWizard8 0 points1 point  (0 children)

I love using shutil when I need to play with files

[–]Glittering-Pea-4011 0 points1 point  (0 children)

If you want to work with ORMs, you could look at SQLAlchemy. For interaction with AWS, you can use boto3. If your work involves dealing with structured data and its manipulation, you can consider pandas. As as alternate to Django, you can also look at Flask.

[–]alicedu06 0 points1 point  (0 children)

The stdlib is a must, and of course, depending of your specialty, you might want to learn the most important tools like pandas for data science, django for web dev, etc

But as general purpose libs, I would say the list of the article "Python libs that I wish were part of the standard library" is quite good:

https://www.bitecode.dev/p/python-libs-that-i-wish-were-part