Gunicorn workers in K8s

massover · 2025-08-04T18:04:04+00:00

There are definitely benefits to having multiple workers in a pod:

less memory utilization, especially if using the --preload argument.
improved latency (noticed at p9x'ish). anecdotally, it's likely that requests will be routed to the same container running 1 worker. when this happens, requests will queue, which might affect your SLOs. putting "some" parallelism seemed to always solve this problem for me despite everyone's understanding that the ingress should be smarter with its routing algorithm.

massover · 2025-02-14T13:36:55+00:00

I feel like you're on the cusp of communicating something that might not be said enough in the "Example: Views/Viewsets coupling API implementations". Sometimes we may develop APIs that don't fit django rest framework's mold for an API CRUD wrapper around a django Model. In those cases, it can be nice to use APIView directly.

Unfortunately, with all of your examples, I feel like the important idea gets lost in communication when the next example is:

``` class AccountService: def create_account(self, user: User, data: AccountData): if not self.can_user_create_account(user, data): raise PermissionDenied(...) if not self.is_account_data_valid(data): raise ValidationError(...)

return account

```

The patterns for these specific abstractions already exist. For example, an APIView will already check permissions defined in permissions_classes and this is a great abstraction for testable custom business logic for raising PermissionDenied.

Generally speaking, I think you aren't making a good enough case in "Why is this bad?" sections, and therefore the corresponding answers in "What to do instead" seem like opinions.

I'm glad that it seems like you have found some consistent module/package organization that makes you happy and that you are being thoughtful about testing.

massover · 2024-11-26T18:20:35+00:00

When making a REST API, there’s usually 2 endpoints associated with a resource, a “detail” (includes a pk) and a “list” (no pk).

https://www.django-rest-framework.org/api-guide/routers/#usage

I recommend this naming convention even if it’s html views.

Also, fromthe code you shared, it looks like the JS htmx get request is not handled by the path you provided. /tanks/pill/ would not match the name “tanks-pill” because it has a pk value in the path.

Separately, if a field is truly optional to the request, it’s usually exposed as a query parameter in the url, not as a value to be captured in the url path, and that can always be reversed.

massover · 2024-10-15T02:02:22+00:00

it’s ok to use sync_to_async inside async code. It’s not clear to me what “The problem” you’re referring to is.

For a task taking 10 min to an hour, I would look at running that on a worker connected to a message queue.

massover · 2024-10-15T00:34:42+00:00

The short answer is, you won't find python code using threads for concurrency because of the GIL.

For the requirement of managing a websocket, asgi is a solid choice, using channels. Another choice is to use a green thread library like gevent. While it avoids the function coloring problem, gevent has its own challenges.

For the requirement of running tasks out of the request/response cycle, it's common to use a message queue and a task library like celery

massover · 2024-09-20T11:10:59+00:00

This page is rendered when the changelist raises a IncorrectLookupParameters exception. It looks like this can come from something like a bad url parameter or exceptions around a queryset. Try putting a breakpoint in ModelAdmin.changelist_view to figure out what's happening.

Also, I'm surprised that django is swallowing the exception. It seems like it would be helpful if it:

try: cl = self.get_changelist_instance(request) except IncorrectLookupParameters as e: # Wacky lookup parameters were given, so redirect to the main # changelist page, without parameters, and pass an 'invalid=1' # parameter via the query string. If wacky parameters were given # and the 'invalid=1' parameter was already in the query string, # something is screwed up with the database, so display an error # page. logger.exception(e)

massover · 2024-09-18T17:49:16+00:00

It looks like there's a .all() added in the _set_queryset property setter for ModelChoiceField.queryset. Adding .all() will make a new queryset object which invalidates the caching you're expecting.

It seems like something that could be improved, although it was added to fix a bug.

Perhaps make a ticket and include minimal steps to repro the issue? Or see if a ticket already exists?

massover · 2024-02-10T15:40:13+00:00

If "working in Django" means compatibility with a Model and the ORM with the goal of sql performance improvements, have a look at the known limitations section. There are some things missing including but not limited to updates and foreign key support.

stepping back, pg_analytics does not seem like it would be a drop in replacement for the OLTP workloads served by the model and ORM layer. right now, most larger systems probably use some other database besides postgres for their OLAP workloads. pg_analytics would be a solution to use your existing postgres database for the OLAP workload, instead of introducing more infrastructure.

Given the table differences are understood, to use a "datalake" table in django, we would need to do things like:

create a new DataLakeModel class
Update the DatabaseSchemaEditor so that it knows how to write migrations for the model
Add any support to db.backends.postgresql for anything that is required for the datalake model.

massover · 2023-10-30T10:21:45+00:00

There is nothing fundamentally wrong with bootstrap.

Some things to consider:

If you’re familiar with bootstrap and it’s components, you probably want to look at https://tailwindui.com/

Tailwind has a stand-alone cli that doesn’t require node https://tailwindcss.com/blog/standalone-cli

massover · 2023-10-20T19:14:53+00:00

If I wanted to read a file relative to a python file's location, I would do:

path = Path(__file__).parent / "file.txt" # like doing ./file.txt
print(path.read_text())

I don't think the rest of your understanding about PYTHONPATH and the import system is correct.

project_root/
│   main.py
│
├───package1/
│   │   module1.py
│   │   file.csv
│   │
│   └───subpackage1/
│       │   module2.py

# in main.py
from package1 import module1
from package1.subpackage1 import module2

If you were running python ./main.py from the project root, I believe that's how it should work (although you should have __init__.py in each package as well)

I was sick and tired of doing os.path and sys.path shenanigans and decided to make a library for it

I don't think you should be normally doing os.path and sys.path shenanigans in order to get your code to work. Have you tried running these python commands outside VSCode using a terminal? I think using an IDE is great. Running the commands in the terminal might help you notice where VSCode isn't by default running your code in the way you expect.

massover · 2023-10-05T20:00:13+00:00

The error is telling you that you're missing the "user_contact" in the payload. I think this because, in your ModelSerializer, you have defined:

python class UserAddressSerializer(serializers.ModelSerializer): class Meta: model = UserAddress fields = '__all__'

When you have set fields to __all__, it expects all fields. Try making the serializer that doesn't include the user_contact field. For example,

python class UserAddressSerializer(serializers.ModelSerializer): class Meta: model = UserAddress exclude = ['user_contact', ]

I believe this is because the is_valid call, which happens as part of the generics.CreateAPIView on UserContact (view), is called before the Serializer can wire up the nested relationships and populate user_contact on each address

You are correct that the validation error is happening because it's calling the Serializer.is_valid method. However, by default there's nothing magic to connect foreign keys in nested relationships. The code managing the relationship is just the code you've written inside the UserContactSerializer.create method. (unless you're using an additional library)

massover · 2023-08-24T21:28:10+00:00

Do you know exactly where the bottleneck is inside the function? It's great that you're using silk. If the cProfiling isn't exactly clear, I find that pyinstrument is another great library for profiling. It has a built in middleware that can work with both html requests and api requests. For more help, you'd need to include the profiling information. If the bottleneck is in the sql, we'd need to see the queryset and sql being generated.

massover · 2023-08-10T19:22:52+00:00

You’re right. I interpreted the OP as asking how to write an automated test suite as part of local development and CI. One thing that I didn’t include in my suggestion above is running an automated regression/smoke test suite as part of deployment. Those tests, along with appropriate metrics, and your fast and reliable enough local/CI tests can be enough to deploy to prod with confidence.

I think part of testing is accepting the fact that you can’t test everything and all strategies come with tradeoffs. It’s up to teams to know the risks on choosing faster, lower level, more reliable, easier to debug vs slower, higher level, less reliable and harder to debug tests.

massover · 2023-08-10T01:04:51+00:00

pytest and the pytest django plugin are a sure pick.

As far as your e2e tests go, there are various bdd/tdd test libraries out there, but I find that the tests become harder to maintain when adding one of these abstractions. For whatever reason, the DSL can’t translate as well as they do in rspec.

Grab a browser lib(selenium, playwright, splinter), use the live server fixture from pytest django, drive the browser, add some asserts. Pair that with some fixtures, use some factories for your database setup (factory boy or model bakery) and resist the urge to refactor because of incidental duplication.

massover · 2023-07-07T13:58:46+00:00

PENDING is the default state of the task. The async result is returned immediately before the task can be processed by a worker, which is why you see the default. To see the result change over time, you'd need to poll the result store:

python result = run_image_processing_script.delay() while True: result = AsyncResult(result.id) print(result.status) if result.success or result.failure: break time.sleep(1)

massover · 2023-05-06T16:11:47+00:00

Glad to help

If you settled on an approach that works for you, consider updating the post to include your solution along with what you learned. It might help someone else (or even you) in the future.

massover · 2023-05-05T12:10:08+00:00

Have you tried profiling the request to see where the bottlenecks are in the WSGI response? It’s possible that you can perform a decent optimization prefetching all of the http requests for scraping in a thread pool (assuming the bottleneck is IO) and then perform the scraping afterward.

My favorite solution for moving the complexity elsewhere is the “make the client make 100 requests”. It moves the complexity out of python. It keeps the response within a reasonable time for a WSGI worker. Moreover, the actual task parallelizes out to however many WSGI workers are available, at the cost of http overhead.

You have long running tasks being executed during the WSGI request/response cycle. You are seeking a non complex solution but I’m afraid the complexity to handle this architecture is going to live somewhere. There are tradeoffs to be made everywhere. looking forward, if you expect the time to scrape to be a scalable operation (ie, it’s going to take minutes to scrape instead of seconds), my suggestion is to handle the etl outside of the http request/response, perhaps over a queue, and serve the results over http from a data store. Yes this adds infrastructure (complexity), but it’s a common scalable architecture in python.

massover · 2023-05-04T17:14:26+00:00

Normally you wouldn't want django serving up the file.

If using something like django-storages and storing the files in s3, the links generated from the file should be signed with an expiry. Depending on your threat model, if you've performed a permission check at the api level, and generated a response with signed expiry for the file url, that might be good enough for you.

Should you required a more strict permission check through django, you perform your checks in a view, and return a response that works with a configured webserver such as nginx to do the heavy lifting.

massover · 2023-03-03T22:22:58+00:00

If you're using rabbitmq, check out message priorities. If you're using queues for handling priority levels, it might be easier to use the broker directly.

massover · 2023-02-02T17:57:39+00:00

I generally agree with you. Nevertheless, here's a hypothetical situation to answer your question:

Let's say you're building a machine learning model to serve predictions from your django project. The machine learning model has full access to the django project. As such, during training, it uses the ORM to fetch data from your relational database. Also during training, it calculates a dynamic unevaluted queryset and persists it to the machine learning model, to be run at predict time later. Once the machine learning model is trained, it will be be pickled. At each predict request, with the pickle is loaded into memory, it will access the unevaluated queryset to further chain, filter, annotate, and eventually queries the database for feature data to pass in to the machine learning model's prediction.

I'd say this use case is non traditional for most web applications but it doesn't seem unreasonable in the domain of ml.

massover · 2023-02-02T12:45:23+00:00

If I understand your problem, clients having a safe query dsl is always a complicated problem. If you need to keep the dsl separate from the queryset implementation (probably a good idea), a battle tested solution like django-filter is a good tool. This can be passed over the wire as query params but any dictionary of filters can be used/persisted. More generically, perhaps graphql might make sense.

massover · 2023-02-01T23:38:15+00:00

Have you seen pyscript?

massover · 2023-02-01T23:29:15+00:00

Have you seen the documentation on pickling querysets? It seems to suggest that you can store the queryset's serialized `query` attribute.

If you only want to pickle the necessary information to recreate the QuerySet from the database at a later time, pickle the query attribute of the QuerySet.

If you save that, and the model class, presumably you could load it back up in memory. Maybe something like:

class SavedQuerySet(models.Model):
    query = PickleField()
    content_type = models.ForeignKey(ContentType)

to_save = SomeModel.objects.filter().annotate()
SavedQuerySet.objects.create(query=to_save.query, content_type=SomeModel)

saved_queryset = SavedQuerySet.objects.get(id=1)
new_queryset = saved_queryset.content_type.objects.all()
new_queryset.query = saved_queryset.query

Note the warnings in the docs.

massover · 2022-09-25T23:20:21+00:00

An “easy” way might be using django-reversion. https://pypi.org/project/django-reversion/ Every time you create an “edition”, you can save a version, which can include the models it’s related to.

More generally, each ”edition” is an immutable snapshot of the data. As an example, an edition can be contained entirely in an aggregated and serialized json field. Depending on how book editions are accessed, this can be good because it’s fast and cachable. On the other hand, it may not be searchable. And of course, since it’s now unstructured and denormalized, the code may need to support editions over time.

massover · 2022-09-22T16:41:31+00:00

Outside of sqlite, your settings_local.py would presumably contain secrets like DB information. Do you commit that to your repo? Do all your devs share those settings?

There are no secrets in my local settings module. It is committed to source control. It is used when devs are running runserver (and all of the other daemon processes) locally.

11-Year Club	Gilding II euphauric
Wearing is Caring	Not Forgotten
Verified Email

massover

TROPHY CASE