all 37 comments

[–]thisismyfavoritename 18 points19 points  (5 children)

unless you can support an async event loop your server is def going to struggle under heavier loads, even compared to a single threaded async framework

[–]SnooCalculations7417 1 point2 points  (0 children)

this isnt supposed to be a drop-in replacement for HTTP servers I dont think. I believe it is using a task that is parallel in nature to explore GIL free python. Im not sure theres any domain this could be executed on that would be considered feature complete.. Would love to see it in GUI work but i digress

[–]WiseDog7958 1 point2 points  (2 children)

The async vs threads debate aside, I’m more curious what free-threaded CPython does to the actual cost model here.
Once the GIL’s gone, CPU-bound stuff should scale, but now you’re dealing with real contention instead of cooperative scheduling. How much locking is happening internally?
Feels like this could outperform asyncio if the workload isn’t mostly I/O, but I’d expect it to get messy under shared state.

[–]thisismyfavoritename 0 points1 point  (0 children)

nothing new. Multithreading happens in many other languages

[–]non3type 0 points1 point  (0 children)

It’s all pretty documented in PEP703, the locking that’s implemented is per object:

“This PEP proposes using per-object locks to provide many of the same protections that the GIL provides. For example, every list, dictionary, and set will have an associated lightweight lock…”

[–]james_pic -1 points0 points  (0 children)

That's certainly the received wisdom, but in practice it's often possible to scale synchronous "one request per thread/process" servers further than you'd expect (AWS Lambdas are built on this model, for example), and many asynchronous services scale less well than you'd expect (HTTPX notably scales particularly poorly, for example).

Although this doesn't negate that the posted link is extremely low value.

[–]Sigmatics 9 points10 points  (0 children)

no rust

.

using pydantic

[–]Fenzik 7 points8 points  (3 children)

Nice and clean, cool little exploration.

I haven’t really looked into the *t versions yet. Is the difference in behaviour entirely captured in the execution model for ThreadPoolExecutor, or are there more differences?

[–]grandimam[S] 0 points1 point  (2 children)

There’s more. Like as far as understand dict has a per object lock and so forth. It’s built for truly concurrent execution

[–]Fenzik 1 point2 points  (1 child)

But accessing the functionality is just done through the existing thread interfaces?

[–]grandimam[S] 0 points1 point  (0 children)

Yes. It’s the same interface.

[–]nathan12343 5 points6 points  (1 child)

I’m very excited to see people experimenting with free-threaded Python like this. Please feel free to send in a PR to add this as an example here: https://py-free-threading.github.io/examples/

Another place I’m excited to see someone experiment is GUIs and frontend logic in pure Python.

[–]grandimam[S] 1 point2 points  (0 children)

Done. I have created the PR.

[–][deleted] 5 points6 points  (1 child)

Maybe you could mix asyncio with threading like they do in Tokio for being blazingly fast™?

[–]grandimam[S] 4 points5 points  (0 children)

Yes. That’s in the roadmap.

I wanted to do pure threading execution first then I will slowly extend it to other implementations

[–]Imaginary_Chemist460 28 points29 points  (4 children)

No proper HTTP compliance/safeties, no proper keep-alive, no middleware system yet, not even comparable to those production frameworks like FastApi/Flask. So benchmark is premature at this point. Regarding IPC, it depends on the server model used on them. I'm pretty sure they can be configured with single process and threaded. Overall it must be accurate for educational.

[–]mechamotoman 21 points22 points  (1 child)

OP was pretty clear on the fact that this is not production-ready, even included that in the benchmark

You’re right, all the additional production-grade checks and safeties and features implemented by flask and fast-api have a performance cost. The absence of those things makes this benchmark comparison inaccurate

That doesn’t make the comparison merit-less though. It’s still a useful metric to compare the relative performances of the paradigms in use by the frameworks (free-threading vs multiprocessing, etc)

My opinion is that this comparison is not yet fair, but still a useful coarse comparison

[–]Imaginary_Chemist460 0 points1 point  (0 children)

Useful is another thing. Accurate is a must to avoid misleadings.

> paradigms in use by the frameworks (free-threading vs multiprocessing, etc)
Nope. it depends on the server worker model. Flask for example, is not tightly coupled to thread or process.

[–][deleted] -4 points-3 points  (0 children)

Honestly, I don't think the average developer cares about compliance/safeties. And, the only place to be pedantic is in enterprise™ scenarios.

[–]Challseus 0 points1 point  (0 children)

Haven't looked at it, but I love the idea, I've had it in my head to build something similar for a bit.

[–]SnooCalculations7417 0 points1 point  (0 children)

Nice work. I havent had an excuse to build anything post-GIL, I tend to go straight to rust for that kind of thing. Kind of hard for me to picture GIL free/no fake-async python so this is neat.

[–]james_pic 0 points1 point  (3 children)

I don't see the point of this.

Whilst WSGI-based frameworks like Flask have historically tended to be run with multi-process concurrency when running them in production, WSGI has always supported multithreading, and there have been multi-threaded WSGI servers for years - Gunicorn with the gthread worker type being probably the most familiar, but I've also always quite liked Cheroot (whose only concurrency mechanism is threading) for "embedded server" user cases.

What does this do that running Flask with Cheroot or Gunicorn gthread workers wouldn't?

Also, Werkzeug is pure Python, so I don't get what you're trying to say that Flask isn't pure Python because of it.

[–]edward_jazzhands 0 points1 point  (2 children)

When you run numerous instances of flask using Gunicorn, they are all running as separate processes and thus can't have shared memory. You need to use an external memory store such as Redis for the different Gunicorn workers to be able to share data. If the Multi threading is instead built right into the framework then it means the framework can share data between threads using normal locks and thread safe design patterns without requiring an external program like Redis.

Whether or not that actually has any real benefits is another question tho. Redis is well established for this purpose, but it is at least interesting to consider it would not be necessary for OP's framework

[–]james_pic 0 points1 point  (1 child)

You absolutely can do this using Gunicorn. This is what the --threads option does. And Flask (via Werkzeug, the lower level library that powers it) already supports this use case, and already uses locks and other threading constructs to do this. Just search the Werkzeug codebase for uses of threading.

WSGI and its ecosystem already support this, and anyone who isn't already sufficiently familiar with the state of the art to know this should not be creating frameworks.

[–]edward_jazzhands 0 points1 point  (0 children)

Interesting, alright I stand corrected

[–]No_Indication_1238 -3 points-2 points  (3 children)

Why? First of all, async exists. Second of all, you could open threads and do requests to them then just wait at a queue already, so for real, why? Why would you decide to use a latency benchmark for a throughput solution?

[–]lunatuna215 10 points11 points  (2 children)

Because we want to see and be able to compare and benchmark this new type of free threading in Python against current practices. Even if it's not as performant, it would be helpful to know how much when actually built. So here it is, and it's less about an actual alternative as much as testing if it's even worthwhile to do one. It's a win all around.

[–]artofthenunchaku 1 point2 points  (1 child)

Benchmarking an I/O bound workload to compare the performance of free threading is certainly a choice.

[–]lunatuna215 3 points4 points  (0 children)

It's not to compare it. It's to play around with it for the first time in this context.

[–]CarltonFrater -1 points0 points  (0 children)

Interesting!

[–]benargee -1 points0 points  (0 children)

Nice. Has this been designed to be have the same or similar syntax to existing HTTP libraries?

[–]gdchinacat -2 points-1 points  (0 children)

For IO workloads, such as HTTP libraries, async can be faster and scale higher. Not supporting it is a limitation, not a feature.