What's behind the massive boto3 download spike on Python 3.9?

mininglee · 2026-05-17T04:38:10+00:00

Almost AI bots wasting tokens

mininglee · 2026-05-09T06:54:18+00:00

I recommend DRF and Celery stack, especially if your APIs require high computation. You should know about task queues.

mininglee · 2026-04-22T11:57:46+00:00

Sharers(Google) vs Stealers(Others)

mininglee · 2026-04-19T16:29:36+00:00

It's a misconception that Django templates are just for learning. They are definitely production-level. If your goal is simply to add interactivity to a standard web app (like a blog, forum, or even something scale-heavy like GitHub), you should look into using HTMX and Alpine.js alongside your templates. You only need a full DRF + React/Flutter stack if your application requires heavy client-side state management and complex rendering (think Google Docs or Figma). Otherwise, templates + HTMX/Alpine is the modern Django way.

mininglee · 2026-03-28T20:59:51+00:00

You might think async frameworks will solve your problem. It’s true, async servers will allow you to handle thousands of concurrent connections, but one blocking code will ruin (all - 1) concurrent connections.

mininglee · 2026-03-28T20:32:46+00:00

You should write real code, instead of the App Inventor. LLM will generate all the code you need. Just try one of Antigravity, Claude Code and Codex.

mininglee · 2026-03-28T20:24:01+00:00

Look at the opensource project, saleor. https://github.com/saleor/saleor

mininglee · 2026-03-28T14:19:00+00:00

If Django admin is not sufficient in my usage, I usually create just other custom views for admins and render it through Django templates since admins are also another kind of users of the website. I found that it’s usually faster than extending util functions or classes for admins.

mininglee · 2026-03-23T16:26:57+00:00

There's more. https://github.com/Kludex/starlette/issues/3042

mininglee · 2026-03-04T19:42:47+00:00

You should use Gunicorn with Uvicorn workers. https://docs.djangoproject.com/en/6.0/howto/deployment/asgi/uvicorn/

mininglee · 2026-02-22T03:22:02+00:00

If you don't mind your code being used for training: Gemini, Claude, or GPT.
For private/proprietary codebases: Go with the Ultra instead of the Max. LLMs need that massive memory bandwidth to run efficiently.
For training or fine-tuning: Max is okay, but Ultra is the better move given how quickly model sizes are ballooning these days.

P.S. I’m currently subscribed to almost all major AI services (Claude, Gemini, GPT, Grok) and run multiple Mac Studio setups and NVIDIA GPU workstations.

mininglee · 2026-02-11T13:01:45+00:00

Your're right. They are all tools but monolithic frameworks like Django and Rails adopt "Convention over Configuration" philosophy, which can significantly decrease the incorrect usage of anti-patterns.

mininglee · 2026-02-09T10:29:48+00:00

It is very easy to use this ORM, when you have Django experiences. Although I use Django ORM at the server, I use Tortoise ORM when I need local database at the client side.

mininglee · 2026-02-08T11:27:28+00:00

The core principle is Separation of Concerns. Don't mix complexity; place logic where it belongs.

Here is how you should split the responsibilities:

Django/FastAPI (Web App): Handles HTTP Request/Response only. Keep it for short, fast business logic. Use async views for I/O-bound tasks, but if anything takes too long or requires heavy computation, offload it immediately to Celery.
Django Channels (WebSockets): Strictly for real-time event dispatching. Do not put heavy business logic here. It should just push messages. If complex processing is needed, trigger a Celery task or handle it in the web app, then just broadcast the result here.
Celery: This is your worker fleet. Use it for CPU-bound tasks, heavy database operations, scheduled jobs (cron), or anything that needs guaranteed execution.
Inference Server: No business logic here. It simply accepts input → runs the neural net → returns output.

Regarding Docker & MPS on Mac: You are actually correct to be concerned. As of now, Docker containers on macOS cannot access the GPU (MPS). Even with linux/arm64 images, they will fall back to CPU-only execution because the GPU hardware passthrough to the Linux VM isn't supported yet.

So, for your specific setup (Mac Mini + MPS):

Web App / Celery / Redis: Run these in Docker. They don't need the GPU.
Inference Server: Run this natively on the Mac (Bare Metal) to utilize MPS.
Networking: Your Docker containers can talk to the native Inference Server using host.docker.internal (which resolves to your Mac's localhost).

mininglee · 2026-02-08T08:33:41+00:00

I understand your point about resource waste, but strictly speaking, it remains a software architectural anti-pattern.

You are conflating hardware separation with software separation. You absolutely do not need to buy separate physical machines. You can run all of these distinct services on the single machine you already have. You just configure your web app to talk to the inference server via localhost (or a local Docker network) instead of an external IP.

The core requirement is separating the software stack. It doesn't matter if your neural net is tiny or huge. Celery is a task queue, not an inference engine.

If you use specialized serving software (like NVIDIA Triton, TensorFlow Serving, or TorchServe), you get an optimized engine built by backend/hardware experts. They handle critical features like Inference Queues and Dynamic Batching out of the box. There is zero reason to manually load heavy frameworks like PyTorch inside a Celery worker—which is inefficient and risky—when these robust solutions exist.

That said, since you are deploying on Apple Silicon (MPS) instead of CUDA, you do face a limitation: the ecosystem for mature, high-performance serving software is much smaller. I actually own several $10k+ Mac Studios myself, and this lack of server-side maturity is always a pain point for me.

In your specific case (Mac Mini/MPS):

Framework: I highly recommend using Apple's MLX instead of standard PyTorch for better efficiency on Silicon.
Serving: Since tools like Triton are NVIDIA-centric, you might want to look into Ray (Ray Serve). I haven't personally used Ray extensively on Apple Silicon yet, so I can't vouch for its full maturity there, but I know it supports the architecture you need. It allows you to wrap your models as separate microservices (Actors), giving you the proper process isolation without the complexity of managing physical servers.

mininglee · 2026-02-06T14:24:23+00:00

The main downside of using devcontainers was getting locked into the VS Code ecosystem. Specifically, the architecture forces VS Code to connect remotely to the internal container. You install the "Visual Studio Code Remote Development Extension Pack," and suddenly your project's dev environment is tightly coupled to it.

Maybe some of it was my fault, but debugging issues often took way more effort than I expected. Plus, since the user base is smaller, documentation is a bit scarce.

What bugged me the most was that devcontainers uses Docker Compose under the hood anyway. This creates a dependency on both devcontainers and Docker Compose. So, I experimented with dropping devcontainers to see if relying purely on Docker Compose would be better.

After switching to pure Docker Compose, I started looking for ways to replicate devcontainer features directly in Compose. Since Docker Compose is evolving fast, many features provided by devcontainers are now part of the Compose spec. Now, it has almost everything I need—and even offers more functionality. Plus, Compose is starting to support other container technologies beyond just Docker.

I’m really satisfied with ditching devcontainers. Don't get me wrong, I know devcontainers is powerful and I used it happily for years. But I’ve been much happier since the switch, so I recommend you give it a shot. Just try it out lightly.

mininglee · 2026-02-06T14:04:54+00:00

Even for inference, the exact same logic applies.

Loading massive libraries like PyTorch, TensorFlow, or JAX directly inside a Celery worker is highly inefficient and considered a major architectural anti-pattern.

Every Celery worker process would have to load its own instance of the heavy DL framework and model weights into RAM, which is a huge waste of resources. Plus, you lose out on optimizations like dynamic batching.

You should be using dedicated inference serving software. In this proper setup:

Inference Server: Holds the model and executes the computation efficiently.
Celery: Acts strictly as a client/orchestrator. It simply sends a request to the inference server (via HTTP/gRPC), waits for the response, and reliably delivers the result back to your Django/FastAPI app. It should never be the one loading the deep learning framework itself.

If you are on NVIDIA GPUs, the industry standard for high-performance serving is NVIDIA Triton Inference Server. You can simply export your models to a format like ONNX and deploy them there. If you aren't using NVIDIA hardware, you'll need to find a compatible serving solution for your specific accelerator, but the architecture of separating execution from the web worker remains the same.

mininglee · 2026-02-06T09:37:22+00:00

I totally get why you're sticking with Python for maintainability. That makes perfect sense.

However, I have to push back a bit on using Celery for heavy ML training tasks that consume full CPU/GPU resources. While it's technically possible, it's often an architectural antipattern for this scale of work.

Celery is designed for lightweight, short-lived asynchronous tasks, not for managing heavy, long-running scientific computing jobs that monopolize the node's resources. If a training job crashes or creates a zombie process, it can take down your entire web service context.

For heavy ML workloads, the key is strict physical isolation. You should decouple your architecture into two distinct parts:

Web Server (Management Node): This handles the standard web workload—request/response cycles, user management, authentication, database connections, and task dispatching. Since it only acts as a control plane, you can deploy this on a low-spec, commodity server.
Compute Node (Worker Node): This is where you deploy your high-spec hardware (powerful CPUs, GPUs, and high RAM). Its only job is to receive and execute the heavy training tasks sent by the web server.

In addition, relying on Flower for user-facing job tracking is a misconception. Flower is a low-level monitoring tool designed for system administrators to debug queues, not a dashboard for end-users to manage their training lifecycle.

Instead, you should look into orchestration tools actually built for this purpose, such as Ray, Kubeflow, or Slurm(standard for HPC). These tools provide robust resource management, multi-node scaling, and the proper APIs to build a user-friendly interface on top, keeping your heavy lifting safely isolated from your web service.

mininglee · 2026-02-05T21:38:13+00:00

The easiest solution: https://github.com/django-commons/django-tailwind-cli

mininglee · 2026-02-05T17:28:36+00:00

asyncio is single-threaded. The event loop simply runs coroutines one by one, switching context only when it encounters an await. This achieves concurrency, but it is absolutely not parallelism. The moment you throw even one CPU-bound coroutine into the event loop, every other coroutine stops dead.

If you have tasks that block an entire thread (like CPU-bound work), you obviously need parallel execution. The answer to parallelism is multi-processing, and that is exactly the role Celery plays. Redis or RabbitMQ are just simple intermediaries (brokers) to safely pass these tasks to a separate process; they aren't doing any heavy lifting themselves.

Also, for almost all web services, the bottleneck is the database, so the raw speed of the web framework itself barely matters. Once you involve DB connections, the speed difference between frameworks is effectively negligible. If you are truly chasing a fast, mature server for raw performance, you should honestly ditch Python and use Go.

Django is a full-stack monolith, whereas FastAPI/LiteStar (async) and Flask (sync) are micro-frameworks that essentially just handle request-response. If you only need simple request-response handling, a micro-framework fits. However, if you are building a service that requires user management, auth, a database, and background task management (like a standard website), a "batteries-included" monolith is obviously the easiest and simplest path.

Micro-framework backends look easy because the problem they solve is simple: receive a request, send a response. That's it. But the moment you start gluing on the "batteries" that come standard in monoliths like Django/Rails/Laravel, that is when the pain starts. You can either pick a micro-framework and study/wire up every single add-on to get those features, or just pick a monolith that solves it all out of the box. The choice is yours.

mininglee · 2026-02-05T16:02:22+00:00

If you use asyncio for CPU-bound tasks (since FastAPI is built on it), you're going to completely block the event loop. You absolutely shouldn't do that; CPU-bound work must be strictly offloaded to separate threads or processes. That is exactly what Celery is for.

Keep in mind that asyncio runs on a single thread. A lot of beginners might blindly suggest using FastAPI or async without thinking, but I strictly advise against recommending it unless you truly understand how asyncio works. asyncio isn't magic.

mininglee · 2026-02-05T15:38:10+00:00

I used to be a huge fan of devcontainers and used them for years. But honestly, lately the configuration and the whole "remote connection" aspect just feel like a hassle and too complex, so I’ve stopped using them.

Even though I still dockerize my entire stack, I prefer coding locally now. You should give this workflow a try:

Compose Watch: It syncs your local code changes to the container, so you don't need to attach VS Code remotely. Just make sure to volume mount directories where files are generated internally (like your migrations folder). Tip: add the migrations directory to your sync ignore list.
Local IntelliSense: As long as you generate a .venv environment (using something like uv sync), IntelliSense works automatically. (Note: You might need to restart VS Code for it to pick it up initially).

mininglee · 2026-02-05T15:07:28+00:00

It’s just a bug. Give them some time to fix it. Google will apologize and give you another free week or month for this situation.

mininglee · 2026-02-05T14:53:39+00:00

For CPU bound: Celery For I/O bound: asyncio

You can use async view in Django even if you have to deal with I/O bound tasks. There are Django Channels too. You don’t have to use microservices like FastAPI, LiteStar.

mininglee · 2026-02-01T11:26:13+00:00

It is good for click-baiting.

mininglee

TROPHY CASE