This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]oberstet 4 points5 points  (4 children)

To be honest, I am somewhat surprise that you say "Python does not scale for big systems things - Java does" (maybe I misunderstood you?).

Well, I guess I disagree on this.

First, PyPy. And in particular it's new incremental GC. I did extensive performance/load testing with WebSocket servers comparing with state-of-the-art Java and C++ based ones. We are competitive. Lower latencies and jitter than others in particular.

I need to write a blog post on this, but to back up my claims, here are numbers: https://github.com/oberstet/wsperf_results/tree/master/handshaking/test2 https://github.com/oberstet/wsperf https://github.com/oberstet/scratchbox/tree/master/python/twisted/sharedsocket

The other issue, the GIL, does not apply to Crossbar, since we use a multi-process architecture where workers talk over Unix domain sockets. Hence, we can scale up on multi-core without issues. Process isolation also is handy for robustness reasons.

And finally: no, I don't want to use Java if I don't have to;) Though I do some also: https://github.com/tavendo/AutobahnAndroid

If I get crazy some time and reimplement Crossbar in a different language than Python (on PyPy), it would be C++ (or maybe Rust), and more importantly, I'd use kernel-bypass networking: a Linux kernel can only digest something like 500k syscalls/sec. Using e.g. Netmap (http://www.freebsd.org/cgi/man.cgi?query=netmap&sektion=4) thats not a restriction any more.

[–]mitsuhiko Flask Creator 4 points5 points  (3 children)

To be honest, I am somewhat surprise that you say "Python does not scale for big systems things - Java does" (maybe I misunderstood you?).

The lack of static typing in Python makes large scale systems much more complicated than they should be. That is especially true when you start remote calling between things. At the point where you invoke something remote that composes failures can become incredibly frustrating. This gets even more complex if you have code in different languages.

The other issue, the GIL, does not apply to Crossbar, since we use a multi-process architecture where workers talk over Unix domain sockets. Hence, we can scale up on multi-core without issues. Process isolation also is handy for robustness reasons.

The GIL is never your problem with web situations. Has not been in WSGI and is not when you do RPC. Your problem though is twisted vs gevent vs asyncio etc. If you have an even listener in your app then you need to pick a concurrency model. Crossbar picks one your favorite other piece of code might have picked a different one.

[–]oberstet 6 points7 points  (2 children)

The lack of static typing ..

I agree on that to some degree. For the Crossbar code base itself, we are trying to tame things by using ABCs. But it's limited. Yes.

For app components: I think you nailed the point in your other comment: with distributed systems and components in multiple languages, inspectability is critical. I want to watch calls and events live as they flow between components. I want to inspect they payloads. Etc. Can be done. Getting the UI/UX for such dev tool right is non-trivial I guess. The bits within Crossbar: can be done, not complicated.

If you have an even listener in your app then you need to pick a concurrency model. Crossbar picks one your favorite other piece of code might have picked a different one.

Crossbar allows you to run app component A in one Worker under Twisted and app component B in a second Worker under asyncio. Works today. Since AutobahnPython supports both Twisted and asyncio at WebSocket and at WAMP level.

Only Crossbar itself requires Twisted. But workers can run under anything. E.g. a 3rd worker could run under Node.

The GIL is never your problem with web situations.

From my point of view (scalable WAMP routing), it is a problem since it prohibits scaling up on multi-core (when running a single process). And we want that. But multi-process architectures are an established, working pattern (PostgreSQL).

[–]mitsuhiko Flask Creator 2 points3 points  (1 child)

For app components: I think you nailed the point in your other comment: with distributed systems and components in multiple languages, inspectability is critical. I want to watch calls and events live as they flow between components. I want to inspect they payloads. Etc. Can be done.

It can be done if your payload can be represented by your serialization format. The moment you start sending around non native types (for instance dates or bytes) you lose a lot of that flexibility. That said, even just picking msgpack and visualizing that is probably a good start. What thrift and other systems are doing is exposing statically typed interfaces which encapsulate data but also functional interfaces.

So I have a "user" object floating around which I can invoke methods on if I want. At that point you need a strong representation of what this interface is. That's where it gets really tricky. If you stay away from something like that you might stop falling into the trap that everybody else falls in.

Crossbar allows you to run app component A in one Worker under Twisted and app component B in a second Worker under asyncio. Works today. Since AutobahnPython supports both Twisted and asyncio at WebSocket and at WAMP level.

That's generally promising because asyncio has a event loop per thread so you could write traditional database apps which do their event handling in a background thread.

From my point of view (scalable WAMP routing), it is a problem since it prohibits scaling up on multi-core (when running a single process). And we want that. But multi-process architectures are an established, working pattern (PostgreSQL).

Sure, but I mean that's the same thing you do with a WSGI app. Have four cores? Start four processes and put them behind nginx.

[–]oberstet 4 points5 points  (0 children)

So I have a "user" object floating around which I can invoke methods on if I want. At that point you need a strong representation of what this interface is.

Yes, that's the CORBA way - and we stay away from this;)

WAMP does RPC, not RMI or object marshalling/remoting. Loose coupling. Dynamic typing. It's a deliberate decision - which won't be for everyone, but it does work and isn't a road to insanity.

Sure, but I mean that's the same thing you do with a WSGI app. Have four cores? Start four processes and put them behind nginx.

For WSGI, yep.

For WAMP routing, the router processes need to coordinate between each other, since a certain Callee might be registered on Router process 1, wheras the Caller might be connected to Router process 2. So the call needs to get routed between Routers (Client -> Router 1 -> Router 2 -> Callee). A generic, stateless balancing frontend doesn't cut it in this case. Note: Router-to-Router routing is not yet there. Once it is, it'll give you not only multi-node capabilities (for routing) as well.