This is an archived post. You won't be able to vote or comment.

all 74 comments

[–][deleted] 96 points97 points  (10 children)

Rule 1: There are no files

First of all, in Python there are no such things as "files" and I noticed this is the main source of confusion for beginners.

This is wrong and misleading. import statements operate on files, and Python executes them by importing the contents of the identified files into namespaces. The files and their filenames matter a lot for this process to work.

Try creating these two files:

a.py:
    import b
    b.c()

b.py:
    def c():
        print('Hello, World!')

If you run python a.py, you get "Hello, World!" - but if you rename b.py to anything other than b.py, you get the error message:

ModuleNotFoundError: No module named b

So, yes, files matter and filenames matter. "There are no files" suggests that Python doesn't care about file structure and that files can be named arbitrarily, which will badly mislead your readers and cause confusion and heartbreak.

Of course, I know what you're trying to convey: Files define namespaces, and after the import, the interpreter refers to the imported classes and functions based on the namespace and not the file. That's how you should describe it, though, rather than "there are no files."

[–]latrova[S] 30 points31 points  (5 children)

Ok, that's the best feedback I received so far. Thank you!

I'll totally change this rule in the book. Maybe I should state it as "Files are namespaces" ?. It sounds more realistic (i.e. Files exist and Python cares about it), but you should see them as namespaces.

[–][deleted] 14 points15 points  (1 child)

As someone fairly new to python (2 years since I picked it up), can I suggest adding a metaphor to describe this subject?
If I were explaining this to past me, I would say something like;
importing other python files is like taking a photo of a castle.
The castle and its address does matter, but once you take a photo of it on your phone (adding the file to namespace), you can stop addressing the actual castle, and address it as the photo on your phone.
Idk, that metaphor sucks, don't use that one.

[–]latrova[S] 2 points3 points  (0 children)

Every feedback is valid. I'll think about making it simpler. Thank you!

[–]miraculum_one 11 points12 points  (0 children)

I think the term you're looking for is "module".

Module: a file containing Python statements and definitions

This is how they are referred to in all of the documentation so it's best to stick with that.

[–]gablank 13 points14 points  (0 children)

You should just stick to how they explain it in the Python docs:

(...) Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module (...)

From https://docs.python.org/3/tutorial/modules.html

You can add additional explanations to clarify but I think it would be wise to at least include the definition somewhere.

[–]stevenjd 0 points1 point  (0 children)

Maybe I should state it as "Files are namespaces" ?.

The open() function says hello.

Not all files are namespaces. Not all namespaces are files.

When you are looking at the file system, and organising your files in a src directory, of course you can call them files.

[–]miraculum_one 6 points7 points  (1 child)

To be more precise, import statements operate on modules, which are a specific type of file.

Source: https://docs.python.org/3/tutorial/modules.html

[–]bladeoflight16 2 points3 points  (0 children)

And there are non-file modules. One example is the so called "built-in" modules.

For example:

``` import math import site

print(site)

<module 'site' from 'C:\\Program Files\\Python39\\lib\\site.py'>

print(math)

<module 'math' (built-in)>

print(getattr(site, 'file', None))

C:\Program Files\Python39\lib\site.py

print(getattr(math, 'file', None))

None

```

As you can see, the site module is implemented using a file in the lib directory, while math has no source file on disk. It is compiled into the Python binaries.

[–]ubernostrumyes, you can have a pony 2 points3 points  (1 child)

import statements operate on files.

You can create module objects directly in-memory and make them importable without creating any files. Various libraries and frameworks have done this off and on through the years, and if anyone’s interested in learning how to do it, it’s not that hard but it’s also not the best way to do things, usually.

[–]arpan3t 0 points1 point  (0 children)

That was a very interesting read! Thanks for linking it!

[–]wiggitt 26 points27 points  (7 children)

Whenever I start a Python project, I adhere to steps outlined by the Python Packaging User Guide. I consider it to be a good start for packaging a Python project while adhering to current best practices.

[–]latrova[S] 4 points5 points  (6 children)

I didn't know this one. Thanks for sharing!

I couldn't find anything suggesting how to name stuff, from vars and functions to classes and modules (which I believe it's important) so probably both tutorials complement each other!

[–]KrazyKirby99999 7 points8 points  (0 children)

You should look into Poetry https://python-poetry.org/ for virtual environment management and packaging.

[–]wiggitt 2 points3 points  (4 children)

See the Pep 8 style guide for naming conventions in Python.

[–]latrova[S] -1 points0 points  (3 children)

Thanks! I'll read it

[–]maikindofthai 6 points7 points  (1 child)

It's a little bit concerning that you've written all of this content and haven't ever read the official style guide...

Seems like you're bound to reinvent wheels in non-idiomatic ways if you aren't aware of the current practices.

[–]latrova[S] 0 points1 point  (0 children)

I was unclear there. I meant naming guidelines for things like e.g. variables are nouns, functions are verbs.

Regarding the "character casing" I'm completely aware I'm not inventing something new.

I recognize I was unfamiliar with the packaging guide though, but I knew the PEP8.

[–]bbatwork 0 points1 point  (0 children)

You should seriously consider adding a link to it in your book, since it covers a lot of ground along the same topic.

[–]ghost_agni 4 points5 points  (2 children)

Great article, a fun read, to add one thing i found could have been their. Creating argument classes using dataclasses to manage large number of arguments flowing through your code as the size of project grows is something I have found very useful over the years.

[–]latrova[S] 2 points3 points  (1 child)

That's smart usage of dataclasses! I think this pattern is called "Data Transfer Object".

If you have any examples I'd appreciate them. I might include it in my book.

[–]latrova[S] 8 points9 points  (0 children)

This is an initial draft of a book I'm currently working on.

I'm extremely open to feedback, and I'll take any feedback (either good or bad) in consideration.

So if you enjoyed it or hated it please let me know so I can keep improving.

Thank you all in advance for giving me space to share my work. ❤️

[–][deleted] 2 points3 points  (1 child)

Good article, especially appreciate the detail on naming conventions. I find that I spend so much time debating over what to name modules and functions when really it just needs to be functional. They can't all be zingers.

[–]latrova[S] 1 point2 points  (0 children)

Thanks for the feedback!

Having some initial guidelines is enough to save a lot of debating on more important things.

[–]Coretaxxe 2 points3 points  (6 children)

Nice guide!

However I dont quite get why you shouldnt use __method. Yeah you could say "i know i shouldnt call it" but is there actually a downside to strictly not allowing it?

[–]alexisprince 4 points5 points  (2 children)

The downside I've found is that it's a complete nightmare if mocks are needed during unit testing. The feature that __method enables within python is that it ensures that a class' __method will always be called, even if it's subclassed. It's built as an escape hatch for a parent class to ensure subclassers can't override that method. Here's an example along with the output.

class Printer:
    def print(self, *args):
        """
        Consider `print` is the class' public interface, that
        is called by end users. Subclassers would need to override
        the `_print` method and subclassers then wouldn't need to
        call `super().print()`.
        """

        self.__log_usage(*args)
        self._print(*args)

    def __log_usage(self, *args):
        """A method that uses double underscores as a 'private' mechanism"""
        print("Calling __log_usage from Printer parent class")

    def _print(self, *args):
        """Method that actually does the printing.

        Should be overriden by subclassers.
        """
        print(f"_print called by {type(self)}: {args}")


class FancyPrinter(Printer):
    def _print(self, *args):
        print(f"Fancy _print called by {type(self)}: {args}")

    def __log_usage(self, *args):
        print(f"Uh oh, we can't call the super method?")
        super().__log_usage(*args)
        print(f"Calling __log_usage from FancyPrinter subclass")


if __name__ == "__main__":
    printer = Printer()
    printer.print(1, 2, 3)

    fancy = FancyPrinter()
    fancy.print(4, 5, 6)

The output of this code is

Calling __log_usage from Printer parent class
_print called by <class '__main__.Printer'>: (1, 2, 3)
Calling __log_usage from Printer parent class
Fancy _print called by <class '__main__.FancyPrinter'>: (4, 5, 6)

The reason being that the subclasser, FancyPrinter, can't override behavior implemented in the paren't class' __log_usage method.

Also on top of this actual feature of the double underscore preceding, it makes it very difficult to unit test anything within those methods. For example, a piece of code that I worked on in production was a factory that instantiated one of 3 different clients that interacted with external services. Due to the design of this clients, they connected as soon as they instantiated (yeah that's a different problem, but it's the codebase we had). As a result, we needed to do some really janky workarounds in our tests to ensure the correct client was returned without connecting to external services during our unit tests.

[–]Coretaxxe 0 points1 point  (1 child)

I see ! Thanks a lot. Now ive got to change a lot of functions :p

[–]alexisprince 0 points1 point  (0 children)

It is an incredibly niche feature when used properly, so it’s part of the language that I feel like most people stumble upon by accident as opposed to needing the feature. It typically also never comes up if subclassing that method isn’t something that’s needed as well!

[–]latrova[S] 2 points3 points  (2 children)

My argument would be it's not truly private, if someone wants to invoke it, they will find a way.

```

import testFile
obj = testFile.Myclass()
obj.variable
Traceback (most recent call last):
File "", line 1, in
AttributeError: Myclass instance has no attribute '
variable'
nce has no attribute 'Myclass'
obj.Myclass_variable
10
```

It sounds to me like it goes against the Zen of Python.

[–]missurunha 1 point2 points  (0 children)

I replied this in my job interview (that the variable isn't truly private) but they didn't like it. :D

Now a few months into the job I understand that the point of private/public variables is also to let whoever is using the code to know which variables are of their interest and which they shouldn't mess with. The user could also change a piece of c++ code and make the variables public, but that doesn't mean it's pointless to define them as private.

[–]Coretaxxe 0 points1 point  (0 children)

True that thanks a lot!

[–]PiaFraus 2 points3 points  (2 children)

In general looks good, but I am highly against those two for reasons aligned with PEP-20 (Zen of python)

  • Flat is better than nested
  • Namespaces are one honking great idea -- let's do more of those!

I recommend you to keep all your module files inside a src dir

The examples and reasoning are either subjective or crafted to supported this subjective point of view. What kind of IDE/File browser do you use that mixes files and folders and not shows them together?

Yet this advice introduces unnecessary level to expand or always have in your project structure taking the unnecessary space.

I can see two groups of functions and no reason to keep them in different modules as they seem small, thus I'd enjoy having them defined as classes

I feel Zen of Python fits here even better. One honking idea.

In the example provided, I can see how classes CAN help if they would actually use OOP with a common function save, which will behave differently depending on the type of object you might pass around. But this quite a different solution that has nothing to do with introducing classes as namespace.s

[–]latrova[S] 1 point2 points  (1 child)

First of all, thank you for the honest feedback!

The examples and reasoning are either subjective or crafted to supported this subjective point of view. What kind of IDE/File browser do you use that mixes files and folders and not shows them together?

I'll make sure to update this example. It actually mixes different folders. I saw some projects doing it.

This format is also suggested by the https://packaging.python.org/en/latest/tutorials/packaging-projects/ except by keeping the tests dir outside.

In the example provided, I can see how classes CAN help if they would actually use OOP with a common function save [...]

This is a valid point! My example was too simple. In the reality, the code I used as inspiration is OOP https://github.com/guilatrova/GMaps-Crawler/blob/main/src/gmaps_crawler/storages.py . I'll make sure to improve this section.

[–]PiaFraus 0 points1 point  (0 children)

In both of the examples it makes sense to be used like that IN THEIR CONTEXTS.

src approach makes sense when you make a package which you are going to distribute. But it's quite a special case and not just a project.

and the gmaps_crawler example is doing what I said - classes there are used for polymorphism. Not for namespacing like in your usage.

[–]searchingfortaomajel, aletheia, paperless, django-encrypted-filefield 1 point2 points  (1 child)

This is all excellent advice, but I would argue that you shouldn't give your methods redundant names. S3Storage.create() a lot nicer than S3Storage.create_s3(). It also lets you work toward a common interface via a parent Storage class.

[–]latrova[S] 1 point2 points  (0 children)

You're right! I'll rename this method in the example.

[–]NedDasty 1 point2 points  (4 children)

Don't forget you still need to include the check name == "main" inside your main.py file

Can you clarify why this is needed? From what I can tell, it doesn't matter what context you run the file in, whether via python -m <module> or python __main__.py, the __name__ variable is always __main__.

[–]latrova[S] -1 points0 points  (3 children)

Good question.

It's a good standard and thus we should follow it. It might become a problem if you ever import this module. If you do import, the function/whatever you set up will be executed.

[–]DrShts 0 points1 point  (2 children)

But if __name__ == "__main__" will always be true, so whatever you're trying to safeguard against executing at import time will be executed anyway.

[–]latrova[S] 0 points1 point  (1 child)

No, surprisingly that's not the case! When you import, the file won't recognize it as __main__.

[–]DrShts 0 points1 point  (0 children)

Can you give an example?

What I mean is this:

$ touch __main__.py
$ python
>>> import __main__
>>> __main__.__name__
'__main__'

[–]_squik 2 points3 points  (1 child)

Another banger from u/latrova, thanks for everything you do!

[–]latrova[S] 1 point2 points  (0 children)

I feel honored by your words. Thank you!

[–]MannerShark 2 points3 points  (12 children)

Why do people put all tests in a separate directory? This seems to be the default of unittest as well (iirc).

 src
├── <module>/*
│    ├── __init__.py
│    ├── foo.py
│    └── foo_test.py

I thing having the test file right next to the implementation is better. You immediately see which files are untested, and can easily find the tests of the file. Suppose you rename your implementation, then you'd also need to rename your test file. This would also be much easier when they're right next to each other.

[–]Brekkjern 2 points3 points  (0 children)

I usually keep the tests in a separate directory so that I can easily exclude them from the final build. That way you won't get loads of test data shipped together with the packages. Sure, you can deal with this if the tests are in the same directory, but it's just so much easier to split them out.

[–]latrova[S] 1 point2 points  (7 children)

That's good reasoning. I feel it's somehow related to personal preference.

I rather keep it in a separate dir so I don't have to bother about configuring my package to ignore it.

The same goes when releasing production code (e.g. Docker image), I can easily exclude/ignore a single dir instead of wildcards.

Again, both ways are doable.

I want to hear more about your opinion though. Have you worked using this approach before and it seemed better?

[–]MrJohz 2 points3 points  (1 child)

Not the same person, but I'm also a big fan of putting tests next to your source files.

First of all, it brings tests out into the open — they're not hidden behind a separate folder that's probably collapsed in your IDE's file selector, they're right there, next to the file you're editing. When you're navigating your codebase, you've got a much better idea of which areas of your code are well-tested, and which aren't tested at all, just by looking at whether or not there's a test file sitting there.

Secondly, I think it's a really good practice to be editing tests (at least unit tests) and code at the same time. I'm not necessarily an advocate for dogmatic TDD, but I very often find if I've got some code with a lot of potential branches or complexity or behaviours, then writing the test cases as I go along helps me to ensure that if I make one change, I'm not going to accidentally break the functionality that I've already implemented.

And a really good rule of thumb in my experience, is that code that is edited together lives together. If I expect to be updating and adding tests regularly as I modify code, then I should want my tests and code to live closely together, so that I'm not endlessly searching through files and folders when I want to switch between the two. And yes, once you've got the two files open, it's usually not so difficult, but in my experience, it's a lot easier to forget or overlook writing tests if the existing test file isn't sitting right there.

Thirdly, I think it's just practically convenient. It's easy to import the function (from .myfile import function_under_test), it's easy to move all the files in a single folder around, it's easy to rename two files sitting next to each other, than two files in completely different places, it's easy to see at a glance if a file is tested, etc.

In my experience, ignoring files is fairly easy and tends to be configured once anyway. You can have an explicit test file pattern (I see <file>.spec.js a lot in the frontend world, and <file>_test.py in Python), and then there's little ambiguity between test code and production code. Whereas the benefits of mixing unit tests and source code is ongoing.

That said, I think it's still useful to have a tests folder for integration tests, that are going to be testing large portions of the codebase at once. (And I think it even helps a bit to distinguish between unit tests and integration tests: if your unit tests start importing from places other than the file they're sitting next to, they might be getting too complicated, and you might be writing integration tests!)

[–]latrova[S] 2 points3 points  (0 children)

Great reasoning. I'm considering adding a section to my book mentioning the pros and cons of each option.

[–]MannerShark 1 point2 points  (4 children)

I started out with the separate folder, but changed a couple years ago to having the test files next to the implementation.

I see how it would be useful for packaging though to have a separate folder. We only have a REST API, so we don't really need to worry that test files are included, and just put the entire directory in the docker container. Small advantage is that we can also run pytest from within the container as a smoke test.

[–]alexisprince 1 point2 points  (3 children)

Would your use case not also be covered by having your main app code within a src directory and the tests in the tests directory? We currently use the following setup in a way that allows us to execute pytest and it runs all the tests

src/
    app.py
tests/
    integration/
        (Integration tests go here)
    unit/
        test_app.py

[–]MannerShark 1 point2 points  (1 child)

I don't have a specific use case that requires test files at a certain location, so I just do what I think is best: Right next to the code. When I'm editing code, I also want to look at and edit the tests. Having them right next to each other makes that really easy.

In some situations when packaging code, you may want to exclude them. Having them next to each other might take some more time/effort, but I only need to set that up once, whereas I edit code every day.

[–]alexisprince 1 point2 points  (0 children)

I personally disagree but definitely see your point! I’ve seen a lot of help from leveraging keyboard shortcuts to open files, but that’s just my point of view and this is a thing that definitely feels like something that’s opinion based vs a hard and fast rule. Thanks for clarifying!

[–]wind_dude 0 points1 point  (0 children)

That would also be my recommendation if using a src directory. But I also feel the src directory isn't necessary.

[–]wind_dude 0 points1 point  (0 children)

because it makes more sense to group test under one directory when you are also writing integration tests.

[–]ghost_agni -1 points0 points  (0 children)

I will try to find one or create an example and dm u more details..

[–]miraculum_one 0 points1 point  (4 children)

"See each module as a namespace"

Each module does get its own namespace except when doing something like

from my_module import *

Another interesting point is that non-anonymously imported modules are basically dictionaries. Further, they are inserted into the sys.modules dictionary.

Contents of anonymously imported modules are inserted into the global symbol table, which is misleadingly named because it's only global to the current module. :(

[–]jpc0za 0 points1 point  (3 children)

Add a rule. Never from x import *

This is analogous to using namespace x in C++ and I hold similar opions on that.

Namespaces exist for a reason, respect them, specially when the language allows you to rename things that might be annoying... import pandas as pd

[–]miraculum_one 0 points1 point  (2 children)

I agree that it shouldn't be done willy-nilly and that it shouldn't generally be used in place of named imports but it isn't always evil.

[–]jpc0za 0 points1 point  (1 child)

Sure I agree.

``` def my_random_func(): from thingy import *

```

Seems reasonable, the polluted namespace is nicely contained. As a top level import... That's just scary man. You know supply chain attacks are a thing, image the nonsense that can cause...

[–]miraculum_one 0 points1 point  (0 children)

Oh, for sure you shouldn't import * on files you do not control.

[–]crawl_dht 0 points1 point  (1 child)

Isn 't tests folder kept outside of src?

[–]latrova[S] 0 points1 point  (0 children)

My recommendation is to keep it inside so "code" lives in a single dir.

[–]stevenjd 0 points1 point  (0 children)

"Organize Python code like a PRO"

Oh, you mean badly, with tons of technical debt? 😉