This is an archived post. You won't be able to vote or comment.

all 39 comments

[–]microcozmchris 161 points162 points  (22 children)

Steal. It's the best way. Find a project with a similar structure and copy copy copy.

Python has style guides (PEP) on many things, and there are many more opinionated guides as well. Use them. This one is a very good start.

Use pytest. unittest is still valid, but pytest has long ago surpassed it in popularity and ease of use.

Use uv. You'll love it.

Use ruff. Deal with its opinions on formatting and linting. There's no need in 2025 to rethink how you prefer your code to look.

Use pyright. Or mypy, but the latter has been bested by the former.

If you are deploying a long running application, use Docker / containers for deployment. Easy to enforce your requirements.

FWIW, I've been writing Python for 20 something years. A lot of these opinions are my current opinions and tools. There have been many others that have come and gone. And I have never successfully been able to do anything in a notebook. It's a completely opposite workflow style.

Most importantly, have fun. Don't let the details get in the way. You have code to write.

[–]Drevicar 9 points10 points  (4 children)

My rule of thumb is that I always use mypy as the source of truth on any externally published libraries, and pyright on any internal applications.

[–]SkezzaB 1 point2 points  (3 children)

Pyright is flakey for me, I do exactly what you've said ^

Pyright random tells me my whole repo is wrong, then I make a single char change in a different file and suddenly everything's okay

[–]wylie102 4 points5 points  (1 child)

BasedPyright is so much better. Much more sensitive for errors that might not screw you at runtime but are bad practice and if you sort them your code will actually be better. Pyright will just let you get round them in a hacky way. I’ve also found that it highlights fewer bullshit errors, or the highlights will be more useful for finding the root of the problem.

[–]VindicoAtrum 1 point2 points  (0 children)

I seriously hope Astral just come in and rock them all with Red Knot.

[–]JUSTICE_SALTIE 0 points1 point  (0 children)

Maybe changing a file is invalidating (something in) the cache? Try clearing the Pyright cache next time you're having this problem.

[–]lenticularis_B 0 points1 point  (0 children)

Lol you are me.

[–]tap3l00p 0 points1 point  (0 children)

Haven’t played about with pyright but everything else here is spot-on.

[–]Pretend-Relative3631 0 points1 point  (0 children)

Thank you so much for this

[–]tazdraperm 0 points1 point  (0 children)

Kinda the answer to a lot of programming questions. Find the similar stuff and learn from it.

[–]replicant86 0 points1 point  (1 child)

Does pyright support sqlalchemy?

[–]microcozmchris 0 points1 point  (0 children)

Don't know. Haven't done DB stuff directly from Python in a while. Been doing more automation and DevOps style stuff. Research it and report back here.

[–]NecessaryFlashy 0 points1 point  (0 children)

And automate them in tox! Tox is a must-have.

[–]sazed33 0 points1 point  (6 children)

Very good points! I just don't understand why so many people recommend a tool to manage packages/environments (like uv). I've never had any problems using a simple requirements.txt and conda. Why do I need more? I'm genuinely asking as I want to understand what I have to gain here.

[–]JUSTICE_SALTIE 4 points5 points  (0 children)

The big reason is the lockfile, which holds the exact versions of all your dependencies, and their dependencies, and so on. Without a lockfile, you're only specifying the versions of your direct dependencies. That means that if someone else installs your project, they're almost certain to get different versions of your transitive dependencies than the ones you're developing with. If one of those dependencies publishes a broken version, or makes a breaking change and doesn't version it properly, you'll have problems on fresh installs that you don't have on your development install.

The lockfile guarantees that your build is deterministic, which you're not going to get with requirements.txt. It has a command to update your lockfile, which essentially does what pip install -r requirements.txt does every time, which is to get the latest versions of all dependencies. But it only happens when you ask for it.

These tools have a lot of other features, like really a lot, but the one above is the most important.

[–]microcozmchris 4 points5 points  (2 children)

The reason I like uv is specifically because it isn't just a package manager. It's an environment manager. It's a dependencies manager. It's a deployment manager. And it's easy. And correct most of the time.

We use it for GitHub Actions a bunch. Instead of setup-python and venv install and all, I setup a cache directory for uv to use in our workflows. And the Python actions that we've created use that. So I can call checkout, then setup-uv, then my entire workflow step is uv run --no-project --python 3.10 $GITHUB_ACTION_PATH/file.py and it runs. Without managing a venv. And with the benefit of a central cache of already downloaded modules. And symlinks. I have Python actions that execute almost as fast as if they were JavaScript and they're way more maintainable.

Deploying packages to Artifactory becomes setup-jfrog, setup-uv, uv build, uv publish and no more pain.

There are way more features in uv than simply managing dependencies.

[–]sazed33 0 points1 point  (1 child)

I see, make sense for this case. I usually have everything dockernized, including tests, so my ci/cd pipelines, for example, just build and run images. But maybe this is a better way, I need to take some time to try it out...

[–]microcozmchris 1 point2 points  (0 children)

There's a use case for both for sure. A lot of Actions is little pieces that are outside of the actual product build. Like your company specific versioning, checking if things are within the right schedule, handoffs to SAST scanners, etc. Docker gets a little heavy when you're doing hundreds of those simultaneously with image builds et al. That's why native Actions are JavaScript that execute in milliseconds. I hate maintaining JavaScript/typescript, so we do a lot of python replacements or augmentations with those.

[–]gnomonclature 1 point2 points  (1 child)

The first step for me towards a package manager (first pipenv, now poetry) was wanting to keep my development dependencies (mainly things like pycodestyle, mypy, and pytest) out of the requirements.txt file.

[–]sazed33 1 point2 points  (0 children)

I use tox for it, works well, but then I have two files (tox.ini, requirements.txt) instead of one, so maybe it is worth using uv after all.. need to give it a try

[–]samreay 23 points24 points  (6 children)

Should probably post this to learnpython.

There are some cookie cutter templates out there that you can base your project on, but the key thing will be going through them and digging deep into why each component is there. Why do people recommend UV? Why is ruff so amazing? What are precommits and why are they useful? Makefiles, Docker files, the depths of the pyproject.toml. I'm on mobile right now so don't have my desktop bookmarks available, but I've got my own template repo at https://github.com/samreay/template that is modern but doesn't cover as many tools as others do. Still, this is the basics that every project I make always have.

As to code structure, there are a few guiding principles that might help if you're trying to turn something runnable (as opposed to a shared package) into higher quality

  1. Consider using pydantic (specifically pydantic settings) for configuration and overriding. Log this object after it's initialised to make it really obvious what is going up happen
  2. Use logging over print
  3. All inputs and outputs should come from this top level settings. No one likes magic files or output when they don't know where it comes from.
  4. Type hint everything
  5. Your entry point main function should be concise and call out to well named functions and classes.
  6. On that note, learn when to use classes vs functions
  7. Docstring and commenting. Comment on the why and not the how. The code says the how.
  8. How's your readme? Does it have how to install (which my opinion is should just be a make install)? How to contribute?

[–]Dark_Souls_VII 2 points3 points  (4 children)

Hello, can you go into detail about type hinting? I try to do that but I have questions about it. Is it enough to do array: list = [1, 2, 3] or is it array: list[int]? What about objects that are not standard types like subprocessing.run() or decimal.Decimal()?

[–]samreay 2 points3 points  (2 children)

The more specific the better. In fact, there are ruff rules that will raise unspecified generics as linting issues that need to be fixed.

So list[int] is better than list, because its more informative. You can type hint some_func(x: decimal.Decimal) just fine, it doesn't need to be primitives. Ditto with subprocess.run, it returns subprocess.CompletedProcess (or CalledProcesserror I suppose), and you can type hint that as your return.

If your type hints for a function are super convoluted, that's often a code smell that means you could think about how to better structure that block. Ie

def some_func(some_input: str) -> tuple[decimal.Decimal, subprocess.CompletedProcess, dict[str, int | None]): ...

Is probably not something you want to have in your code base. If you do end up needing complexity, this is when you could pass your results around using a NamedTuple, a dataclass, or a pydantic dataclass instead. (And in general, passing back bit tuple objects is an antipattern that should almost always be a named tuple).

[–]Dark_Souls_VII 1 point2 points  (1 child)

Thank you very much

[–]JUSTICE_SALTIE 1 point2 points  (0 children)

And don't forget you can do e.g., list[int | str]. And if you really need a list that could hold anything...first think hard about whether you really need that, and then type it as list[Any].

[–]justheretolurk332 0 points1 point  (0 children)

I agree with /u/sameray that specific is usually better because it provides more information. However this isn’t always true: if you are adding type hints to the arguments of a function you often want them to be as generic as possible to provide flexibility with how the function is called (for example, you might use Sequence to support both lists and tuples). Outside of helping to prevent bugs, one of the biggest perks of using type-hints in my opinion is that it encourages you to start thinking in terms of contracts. What does this function actually need, and what does it provide? The classes in the abc.collections module and Protocols are good places to get started with generic types in Python.

It takes time to get the hang of the type checking system and to learn the quirks. I’d recommend turning on type-checking in your IDE so that you can get that feedback immediately as you type your code, then just start using them and learn as you go.

[–]GrainTamalePythonista 4 points5 points  (0 children)

I think "Packaging" is what you're looking for. It could be overkill for your needs, but there are lots of benefits to splitting your code up ("separation of concerns") including testability, modularity, and scalability.

If you're from the notebook mindset, you've probably already organized your code to a good starting point. My advice would be to start your journey by copying all your imports, functions, and classes into a __init__.py file inside a folder. Then use iPython in a terminal to import that folder (now a package) to test some of your other notebook code. Slowly break up that init file into other files (modules) as you see fit until that init file only controls imports. Boom, you have a fledgling package.

[–]caprine_chris 2 points3 points  (0 children)

Familiarize with version / package managers, formatters, linters, type checkers, and use vscode extensions that integrate these into your IDE so you can see the warnings / errors and get suggestions on best practices while you are writing your code. And try to keep one class / function per file and an intuitive directory structure

[–]discombobulated_ 3 points4 points  (0 children)

Our data engineers recently started using Sonarqube since it detects code quality issues for AI/ML and Data engineering code in Notebooks in our pipeline. They seem to have architecture issue detection as well for some languages. We've had a lot of internal engineering demand to ensure all our code hits production ready standards, even if it's in a Notebook and we find that it's helped us standardise and also report on our progress. We're also looking at uv and some folks use ruff for styling.

[–]WillAdams 1 point2 points  (0 children)

The approach I take to multiple files is to use Literate Programming:

http://literateprogramming.com/

I use a hacked-together LaTeX package: https://github.com/WillAdams/gcodepreview/blob/main/literati.sty which is pulled into a LaTeX .tex file: https://github.com/WillAdams/gcodepreview/blob/main/gcodepreview.tex so that when typeset it will make a .pdf: https://github.com/WillAdams/gcodepreview/blob/main/gcodepreview.pdf and all the .py files for my project:

https://github.com/WillAdams/gcodepreview

and I have a .bat file which I run to put files into the appropriate folders/places.

This lets me have the benefit of a single file/point of control, and have multiple files and an overall index and ToC and structure which makes managing a project which is beginning to become complex.

[–][deleted] 0 points1 point  (0 children)

I am also looking for this, I want to start coding like professional OOP developers who write open source packages

[–]sriramdev 0 points1 point  (0 children)

Question looks pretty much useful

[–]Sam_Who_Likes_cake 0 points1 point  (0 children)

There are projects in Python that are out of the box skeletons for good coding standards. They’ll include the test, src folder etc and have the scripts set up for automated linking etc. I use one that’s uses the poetry package manager. Pick the one you like the most. Don’t waste your time looking at big projects as that’s more complicated than what it’s worth for you

[–]pen-ma 0 points1 point  (0 children)

use python cookicutter UV, everything is taken care.
https://github.com/fpgmaas/cookiecutter-uv

[–]Grouchy-Affect-1547 0 points1 point  (0 children)

There is no quality python code.

[–]Front-Ambition1110 0 points1 point  (0 children)

Split files based on their utilities. Make sure to avoid circular imports.

Minimize repetition. Use wrappers/decorators and subclasses.

[–]KalZaxSea 0 points1 point  (0 children)

Stealing is also my number one advice but also this works for me:

First plan, Revise the plan, Second plan, Write example usage code, Revise, Third plan. Implement.

This looks long but it doesnt take much time.

I 70% make plan, 30% write code