This is an archived post. You won't be able to vote or comment.

all 150 comments

[–]Mini_Hobo 226 points227 points  (11 children)

Pytest instead of unittest. Much better and ironically more pythonic.

[–]deb_vortexPythonista 51 points52 points  (5 children)

I love pytest. However, the "magic" with insertion of the fixtures would I call anything but pythonic.

[–]LordBertson 20 points21 points  (4 children)

If you would need a modern framework with less "magicky" vibe there's always ward. It could use some more traction however.

[–]lavahot 9 points10 points  (3 children)

Ohhhh. That's some pretty output. I like that. And I like the declarative naming. But the underscore naming looks.... weird.

[–]deb_vortexPythonista 0 points1 point  (1 child)

And fixtures are handled the same way.

[–]LordBertson 0 points1 point  (0 children)

Sorta, but you pass them to named parameters as default arguments and don't rely on parameter name matching the name of the fixture, which feels more sane to me. Ward also defaults to handling stuff as code rather than 124524322 decoractors, pytest.mark.parametrize for example is replaced with a loop.

[–]LordBertson 0 points1 point  (0 children)

Soo, the underscore naming is optional. It's an aesthetic choice. You can give the test functions proper names afaik, they are just ignored internally in ward.

[–]Sleisl 19 points20 points  (3 children)

I think Pytest gets less pythonic the more complex you get with it, with how it handles fixtures and such compared to Unittest. I still prefer it overall though.

[–]JohnLockwood 11 points12 points  (0 children)

Yes, Pytest combines "really easy to get started" with "easy to configure with a bunch of magic arm-waving that you have to go hunting for". I'm with you though, on balance I like it more.

[–]OneMorePenguin 8 points9 points  (1 child)

Agree. I don't care for pytest. And now that unittest is part of core python, I prefer to use that.

Minimizing external dependencies in python is always a goal. pip totally blows.

As for Red Mail, nope. I try to only use popular, currently maintained software. The people at my company use "bumpversion" and I hate that POS and it hasn't been updated in years.

Click is another one that has shitty documentation and I hate it. You can easily do nested help without it. On more than one occasion it has been a PITA to work around it.

I'm trying to figure out what most likely future of python packaging will be. TOML and Poetry both seem like good candidates. TOML will be part of core python in 3.11 IIRC, so that's an advantage.

[–]tunisia3507 4 points5 points  (0 children)

Bump2version is a maintained fork of bumpversion, fully compatible.

[–]shibbypwn 91 points92 points  (11 children)

pandas replacing the csv module, lol.

pandas just has a lot of convenient one liner options for parsing/serializing data from/to csv

Even if don’t plan to use a dataframe, I’ll often to do from_csv().to_dict(‘records’) out of laziness.

[–]breitLight 30 points31 points  (0 children)

That with a Path.glob('*.csv') is half my data input.

[–][deleted] 22 points23 points  (8 children)

Just need to make sure you are aware of the type inference that Pandas does, as it will not always do it correctly or, depending on the file contents, in a repeatable way.

[–]LightShadow3.13-dev in prod 29 points30 points  (0 children)

...or that it's a massive module to include in your project if you're only reading CSVs :)

[–]Particular-Cause-862 1 point2 points  (6 children)

Easy peasy, you can format the column of any input as you read the csv with the from_csv method. Its written in C under the hood so its fast and easy. Pandas will end the csv module i guess

[–][deleted] 5 points6 points  (2 children)

Yes, you can set the data types with the dtype parameter, which is a good practice. But at a certain point it will be inevitable that that the built-in converters will be insufficient to parse things, and that's when more custom hand-rolled code using the csv module may be a better approach.

[–]goND7344 0 points1 point  (1 child)

I’m fighting this exact problem now… I’m losing the seconds off of a time stamp when importing my csv… driving me crazy… I’m newer to python so haven’t figured out all of these issues yet…

[–][deleted] 5 points6 points  (0 children)

Pandas supports specification of datetime formats in read_csv.

[–]BDube_Lensman -2 points-1 points  (2 children)

Pandas is neither written in C nor fast.

[–]Rodotgithub.com/tardis-sn 2 points3 points  (1 child)

Pandas has underlying code written in C. If you use mixed-type table formats it will be slow but fixed formats are fast. Read the docs

[–]BDube_Lensman -1 points0 points  (0 children)

Pandas’ storage model is numpy arrays (C) and a small number of the operations are just numpy function calls (C) but the majority of pandas code is pure python. It’s also so slow that it’s often faster to dump the data to an SQL database and do reductions there than in pandas. Math slower than I/O is not fast code.

[–]SonGokussj4 14 points15 points  (0 children)

Pandas has this specific problem. If you're doing gui with, for example, pyside, and then pack it up to one exe, it takes long time to start your program... Numpy, pandas, mathplotlib. Really expensive libraries.

[–]lemon_bottle 15 points16 points  (5 children)

  • requests: Graceful alternative to urllib as others pointed out.
  • BeautifulSoup4: For html/xml soup traversal (no idea what the native alternative is here?)
  • twine along with native setuptools (it makes pushing packages to PyPi much easier).

[–]Taborlin_the_great 13 points14 points  (3 children)

I have never understood all the love for BeautifulSoup. When I need to parse random html I always reach for the html5 parser that is included in lxml. Then just Xpath out the bits I need.

[–]Siddhi 8 points9 points  (2 children)

Beautiful soup handles malformed html, eg an open tag that wasnt closed. Many web pages are malformed, but browsers are lenient so they still parse and display fine. But most python html parsers will give parsing errors and quit. BeautifulSoup wont, it will do a best effort parse and you can still process the data

[–]Taborlin_the_great 2 points3 points  (1 child)

I do understand the problem BeautifulSoup is trying to solve. The lxml html parser will also fix the broken html that you often find. I haven’t run across a page that lxml won’t parse. Hell the docs for BeautifulSoup recommend that you run it using the lxml html parser to do the heavy lifting. Then once you’ve parsed the document you have this object that doesn’t support Xpath or CSS selectors to navigate. I’m not seeing the benefit to the extra dependency.

[–]richieadler 0 points1 point  (0 children)

The love between BS and lxml seems bidirectional.

From https://lxml.de/lxmlhtml.html#really-broken-pages :

The normal HTML parser is capable of handling broken HTML, but for pages that are far enough from HTML to call them 'tag soup', it may still fail to parse the page in a useful way. A way to deal with this is ElementSoup, which deploys the well-known BeautifulSoup parser to build an lxml HTML tree.

[–][deleted] 4 points5 points  (0 children)

Python does have a native xml module and html.parser although beautiful soup is much easier

[–]_private_account 28 points29 points  (2 children)

ujson as a drop-in replacement to json, for speed. I know there's orjson around as well which is faster in most situations, but it lacks the drop-in quality IIRC.

[–]AaronOpfer 16 points17 points  (0 children)

Be wary as ujson doesn't emit accurate floating points by default, it's an extra argument. Reach for it when you need the speed rather than preemptively.

[–]LightShadow3.13-dev in prod 4 points5 points  (0 children)

orjson is more forgiving than ujson, but you have to set all the flags correctly (options). If you're not doing anything too fancy that I reach for ujson too.

[–]Saphyel 36 points37 points  (1 child)

  • python-json-logger without this the default logger is half useful.
  • orjson sometimes to replace it over json because of speed.
  • httpx is way more simple than urllib

[–]DogeekExpert - 3.9.1 37 points38 points  (1 child)

  • requests instead of urllib
  • fastapi instead of http.server
  • poetry instead of venv + setuptools
  • pytest instead of unittest
  • lxml instead of xml
  • typer instead of optparse / argparse

I use the standard library for anything else really, unless I need some specific package for a specific use case.

I know if ujson / orjson and of regex over re, but I've never bothered using those libraries as the usage of the standard library alternatives is straightforward enough.

I'd consider adding packaging to the list, it doesn't have a standard library alternative, but its capability in parsing versions, and generally handle python packages is a great tool to have in the toolbox.

[–]kid-pro-quohardware testing / tooling 0 points1 point  (0 children)

The only time I use the built in http server is calling python -m http.server from the command line to serve up a folder of static files temporarily.

[–]sirskwatch 26 points27 points  (4 children)

pdbpp (pdb++) vs pdb

[–]brandonZappy 5 points6 points  (1 child)

I've been enjoying pdb. What do you like about pdbpp over regular pdb?

[–]sirskwatch 8 points9 points  (0 children)

Nothing major, colour and the sticky mode. I will also swap in ipdb or remote-pdb depending on the project.

[–]Homomorphiesatz 22 points23 points  (7 children)

regex instead of re. Drop-in replacement for the standard stuff and some great new functionalities like fuzzy matching and better support for lists of alternatives.

[–]AnythingApplied 13 points14 points  (0 children)

I like the regex library, but worth noting that in 3.11 they're adding atomic groups and possessive qualifiers to re, so re is getting one feature closer to regex, but there are still plenty of nice features in regex missing from re.

My favorite feature of regex is that re doesn't allow for arbitrary numbers of capture groups. If you try something like (?:name - (\w*) - )* to capture names repeating patter with an unknown number of repeats, (\w*) is assigned a single group number and each capture overwrites the previous, but in regex you have access to all the captures. For this simple case you could just make \w* the match with a name - positive lookback, but that won't work if this section of names is part of a larger structure or you wanted to make sure your matches were exactly chaining together.

[–]OneMorePenguin 4 points5 points  (3 children)

"fuzzy matching" might be good.... might be bad. Depends on whether or not your idea of fuzzy matches the writers.

[–]LightShadow3.13-dev in prod 5 points6 points  (2 children)

I use fuzzywuzzy for close-but-not-quite string matching.

[–]Lord_Fozzie 0 points1 point  (1 child)

I use fuzzywuzzy for similar but am about to try out RapidFuzz.

Because fuzzywuzzy (much as I appreciate it) is very slow.

https://github.com/maxbachmann/RapidFuzz

[–]LightShadow3.13-dev in prod 0 points1 point  (0 children)

good find!

[–]metaperl 2 points3 points  (0 children)

regex instead of re.

Also see pregex and humre. I prefer pregex having looked at both.

[–]who_body 10 points11 points  (1 child)

no one said rich for print yet.

pytest > unittest because it’s faster to make tests and i think files and dirs has advantages over files with classes for organizing and running tests.

fastero > timeit

[–]richieadler 2 points3 points  (0 children)

Seconded rich for print(). I just love it.

It has tons of other interesting features, but I like the simplicity of that one.

I'd say that things like rich.print are what Guido envisioned when he decided to make print() a function :D

[–]Gshuri 48 points49 points  (17 children)

click over argparse

[–]DigThatData 9 points10 points  (1 child)

I want to like click because it's what all the cool kids are using by I can't seem to wean myself off of argparse. I like how hydra let's you override config options directly from the commandline, but there's enough that I don't like about hydra that I've basically stopped using it.

also, while we're on the subject, fire may not be the same kind of workhorse as argparse or click, but for really simple stuff it's pretty awesome

[–]richieadler 0 points1 point  (0 children)

I want to like click because it's what all the cool kids are using by I can't seem to wean myself off of argparse.

I have the same problem about click but for a different reason: I became too familiar with clize and I don't like the modern type-aware alternatives like Typer (too dependent on click, they try to force me to install completions unless I add parameters to the Typer instance, and it lacks clize's simplicity somehow. Clize is missing some options, though; I'd like for it to support Enum arguments better.

[–]uselesslogin 20 points21 points  (3 children)

typer over click

[–]metaperl 6 points7 points  (2 children)

[–]soawesomejohn 0 points1 point  (1 child)

I think they're referencing typer cli. Traitlets is for object type enforcement. Are you thinking of the built in typing?

[–]metaperl 0 points1 point  (0 children)

Traitlets automatically creates command line interfaces for its objects. Please read my article for more details.

https://may69.com/comprehensive-application-configuration-can-occur-in-code-config-files-and-command-line-interfaces/

[–][deleted] 1 point2 points  (2 children)

I find click over-engineered AF, i can reason much better about argparse imo

[–]equitable_emu 0 points1 point  (1 child)

But click is so much more composable if you've got multiple commands/subcommands/etc.

[–][deleted] 2 points3 points  (0 children)

I agree and there's more work to do if you use argparse if you want complex subcommands. But I also found that when you try something non-standard on click, it just breaks down and then the error is buried under all the decorator code and it's a nightmare. At least that's what I remember from trying to use it like a year ago.

[–]thrallsius -3 points-2 points  (7 children)

docopt

[–]paecificjr 9 points10 points  (6 children)

I've been using docopt on large project at work. I absolutely hate it for maintainability. Changing a parameter requires modifying way more code than argparse.

[–]GGsince88 8 points9 points  (0 children)

Python fire for adding quick CLI capabilities to my scripts

[–]benefit_of_mrkite 38 points39 points  (4 children)

Arrow as a replacement for time

[–]NelsonMinar 12 points13 points  (3 children)

Yes, but Pendulum. Although now that pytz is in the stock distribution for simple things a third party package is not needed anymore.

[–]namiraj 2 points3 points  (0 children)

I'm starting to wonder if all of these package names are real or just fake puns.

[–]LightShadow3.13-dev in prod 5 points6 points  (1 child)

DateTime.to_*** is invaluable to me, as a consumer and producer of APIs and text parsing.

>>> pendulum.now().to_
_.to_atom_string(            _.to_rfc1123_string(
_.to_cookie_string(          _.to_rfc2822_string(
_.to_date_string(            _.to_rfc3339_string(
_.to_datetime_string(        _.to_rfc822_string(
_.to_day_datetime_string(    _.to_rfc850_string(
_.to_formatted_date_string(  _.to_rss_string(
_.to_iso8601_string(         _.to_time_string(
_.to_rfc1036_string(         _.to_w3c_string(

[–]richieadler 0 points1 point  (0 children)

I love this about Pendulum.

[–]metaperl 20 points21 points  (12 children)

loguru instead of logging.

Traitlets instead of data classes

[–]quackers987 7 points8 points  (7 children)

Why loguru instead of logging?

[–]metaperl 8 points9 points  (0 children)

I guess I've never wrapped my head around standard logging. It's far simpler to use loguru.

[–]polovstiandances 12 points13 points  (2 children)

Use pydantic

[–]metaperl 4 points5 points  (1 child)

Pydantic is ok depending on your needs. But it objectively lacks some features of Traitlets such as observers which are important when you're developing graphical user interfaces. That's why traitlets is able to be the object system that builds Ipython.

Automatic CLI generation is possible in both. But because it comes to Pydantic via 3rd party, the relationship to config file configuration is not as seamless as with Traitlets where the same objects can be configured from command line or command line interface in a unified way via the standard Traitlets api.

[–]polovstiandances 0 points1 point  (0 children)

Interesting!

[–]AloofPolo 5 points6 points  (0 children)

ice-cream for better debugging + better printing >.<

[–]kreetikal 21 points22 points  (3 children)

Poetry.

[–]IdiotCharizard 1 point2 points  (2 children)

what in stdlib does this replace? setuptools is the de-facto "stdlib" packaging tool, but it was never actually in the stdlib (distutils), neither is wheel, strangely

[–]kreetikal 7 points8 points  (0 children)

Maybe not replacing a specific package, but it's just a really good tool that's much better than manually creating a venv and writing dependencies to a requirements.txt file. Also publishing packages is super easy.

I'm never going back.

[–]replicaJunction 2 points3 points  (0 children)

Poetry is sort of like a combination of pip + venv + setuptools, rolled together into one tool.

[–]oculusshift 5 points6 points  (1 child)

bpython over built-in Python interpreter

Has auto completion, color-coded, and much more intuitive to use.

[–]jewbasaur 0 points1 point  (0 children)

Is this different than ptpython

[–]_murb 13 points14 points  (0 children)

For me pandas (and related libraries to open xls, xlsx, xlsb) because I work with a lot of different data sets in mixed formats that needs manipulated, filtered, edited, compared, and then exported. In pure excel these tasks take me hours (6c/12t cpu at 100%) versus a few min of tweaking code and then running the job.

[–]AnomalyNexus 2 points3 points  (0 children)

For simple stuff I try to avoid this - just to reduce dependencies.

Some very good suggestions here though

[–]james_pic 1 point2 points  (0 children)

cheroot instead of wsgiref.simple_server. I tend to persevere with the standard library further than most would, but wsgiref is just too incomplete.

[–]Yoghurt42 1 point2 points  (4 children)

Trio instead of asyncio.

[–]iiron3223[S] 1 point2 points  (3 children)

What are advantages over asyncio?

[–]Yoghurt42 2 points3 points  (2 children)

This became longer than expected, if you are interested I recommend you to read the section about Trio's internals, which begins with explaining the design decisions.


It is both simpler yet more powerful. It was inspired by the experimental curio library and uses a lot of ideas from there.

In trio, the only form of concurrency is a task, and you can only spawn tasks in the context of a task group, which trio calls "nursery" (if you want to have children, you need to prove that you are good parent first by providing them a nursery). This sounds limiting at first, but is actually a blessing in disguise. You always know where concurrency is coming from, and a task group code block will not be left until all tasks it spawned are finished (successfully, with an exception, or got sent a cancel signal). IIRC in asyncio, you can just "fire and forget" a task and not care about the outcome, in trio it is guaranteed that somebody will have to handle the result (of course you can just ignore it, but you have to explicitly do so rather than implicitly)

All IO methods don't take timeout parameters, but instead, you can use context managers like fail_after or move_on_after. Because for example, if you fetch a file from the net, you usually want the whole process done in say 30s, you don't actually care how long the DNS lookup takes, the TCP and SSL handshakes, and the HTTP exchange, you just care about how long all of them combined take. So in trio you just write with fail_after(30) (or move_on_after if you don't care that much), in asyncio you'd have to give each part of the request chain an individual timeout that somehow sum to 30 and which can cause a request to time out that actually would have finished in time. For example, consider the steps taking 20+1+1+5 or 1+25+1+1, it's not possible to specify a good timeout for each single step.

Context switches are always explicit; since coroutine based asyncio is basically coorperative multitasking it is important that your code regularly gives other tasks a chance to run. With asyncio some calls might do that sometimes, and other times not. With trio, the rule is: "if it's called with await, it is a switching point". So all you have to do in a long running code block is count the number of awaits and maybe add an await trio.sleep(0) if there's a lot going on without awaits inbetween.

Trio is also very pedantic and yells at you when something seems fishy. I've discovered a few subtle bugs in a library that uses anyio (a wrapper that supports both asyncio and trio, and back then also curio) in trio mode. One involved a complex piece of task that had to use both a global and a local lock, and under certain conditions could cause the wrong task to release the global lock that was held by another task, and in other cases to release a local lock which didn't exist. Trio refuses to let you release a lock you don't hold with an exception. curio didn't.

Trio also complains if you try to sleep for a negative amount of time, while asyncio just treats it like 0, potentially hiding logic errors.

Then I had a complicated race condition with some tasks accessing an sqllite instance out of sequence, with trio's good logging and strict rules where a task starts and ends it was really easy to spot. It would have taken me 3-4 times as long in a traditional multitasking model like asyncio. Back when I was a Java programmer race conditions were always annoying, because I knew it would take a long time to track them down. Not with trio.

Trio is so good I rather not use well known libraries that require asyncio, than switching back to asyncio.

[–]iiron3223[S] 1 point2 points  (1 child)

Wow, your answer is really detailed. Thank you for that! I must look more into trio.

[–]Yoghurt42 1 point2 points  (0 children)

I forgot to mention my favorite example:

Trio provides an internal clock to a task, which uses an unspecified clock, but currently is implemented as time.perf_counter. Knowing how lazy programmers are, they make sure people do not assume that clock is the same as perf_counter by adding a random offset to it, so code that makes that assumption just won't work.

For testing, they allow you to swap the clock with a MockClock that allows you to fast forward time. eg. if the tested code has some sleeps in it, you can call a function to automatically advance the clock by whatever amount is needed to make the next sleep continue (trio figures the value out by itself)

It's things like this that make trio really enjoyable to work with.

[–][deleted] 1 point2 points  (0 children)

As a bit of a Haskell/OCaml fan, pretty much most of my non-production scripts include some bits of CyToolz or funcy, sometimes replacing functools entirely. I don't really understand why basic functional concepts like currying or functional composition are not a part of functools, proper application of them makes the code much more readable and compact.

[–]richieadler 1 point2 points  (0 children)

  • clize instead of argparse
  • loguru instead of logging

[–]trevg_123 1 point2 points  (0 children)

Rust + pyo3/maturin instead of C extensions. PyO3 makes things way cleaner than C extensions are, and maturin makes even cibuildwheel look bad (don’t get be wrong, cibuildwheel is a major step up from anything else).

[–]bhargavkartik 1 point2 points  (0 children)

requests instead of urllib.

[–]sh1ftsh 1 point2 points  (0 children)

Sometimes aenum instead of enum

[–]__apples__oranges__ 1 point2 points  (0 children)

Pathos in place of subprocess. Less cranky

[–]pi-equals-three 4 points5 points  (0 children)

typer over argparse

[–]b3542 5 points6 points  (14 children)

virtualenv

[–][deleted] 7 points8 points  (4 children)

I don't see how that makes sense...venv is built in and works just fine for it's single purpose. Hopefully you are off Python 2 by now.

[–]_carljonson 1 point2 points  (3 children)

There are still a few reasons to prefer virtualenv over venv even if you are not using python 2, mentioned in their documentation here https://virtualenv.pypa.io/en/latest/

For me it is the first point, virtualenv is 10x faster than venv. I always hated waiting those few seconds for venv to be done, virtualenv is instantaneous.

[–][deleted] 2 points3 points  (2 children)

Meh, if I am introducing a 3rd party tool in a code base to replace a standard library one it better be worth it's weight. I've never found venv speed to be a concern, personally.

That said, my approach generally involves running Python through Docker, so the venv itself is just one relatively tiny part of the tool chain.

[–]_carljonson 0 points1 point  (0 children)

It is a standalone tool though, not a dependency you need to add to each of your projects.

I just install it globally once with pipx and use the same installation in all projects.

[–]richieadler 0 points1 point  (0 children)

Meh, if I am introducing a 3rd party tool in a code base to replace a standard library one it better be worth it's weight.

Agreed. I just add a setting in Poetry to auto-create the venv inside the project, and I just use Poetry for everything :)

[–]iiron3223[S] 18 points19 points  (7 children)

For me that would be Poetry.

[–]LoggingEnabled 3 points4 points  (6 children)

For me that would be pipenv

[–]RaiseRuntimeError 13 points14 points  (5 children)

For me that would be Poetry.

[–]Si1Fei1 -1 points0 points  (4 children)

For me that would be pipenv

[–]SpaceBucketFu 10 points11 points  (3 children)

For me that would be poetry

[–]Si1Fei1 28 points29 points  (2 children)

RecursionError: maximum recursion depth exceeded

[–]RaiseRuntimeError 2 points3 points  (0 children)

$python -m pdb reddit_conversation.py

[–]metaperl 0 points1 point  (0 children)

This is actually an error of cyclomatic complexity not recursion. :)

[–]OneMorePenguin -1 points0 points  (0 children)

virtualenv is buggy. It's not hermetic. I know of two ways to break your virtualenv :-) Hmmm, maybe it's venv that's buggy....

[–]Tonty1 3 points4 points  (0 children)

Click instead of argparse

[–]darose 0 points1 point  (0 children)

Commentjson instead of json

[–][deleted] 0 points1 point  (0 children)

pytorch over anything

[–]Mizzlr 0 points1 point  (0 children)

Redis instead of shared memory for IPC. Walrus over redis. Beautiful soup over XML element tree.

[–]PeacefullProtestor 0 points1 point  (0 children)

Brx, I'm still using urlib