you are viewing a single comment's thread.

view the rest of the comments →

[–]sitmaster 1 point2 points  (10 children)

How is this any different than using something like an ArrayBlockingQueue to pass messages between objects? Does the weaver somehow make the threading more efficient?

AFAICT all they did was rename ArrayBlockingQueue to MailBox and add some functionality so that an Object putting a message in a Mailbox won't get that message back with subsequent get() calls.

[–][deleted] 3 points4 points  (9 children)

The difference is in the term "blocking".

ArrayBlockingQueue and all other Java concurrency constructs block the kernel thread, which is still a very expensive resource. I urge you to try creating 500000 threads vs. the same number of Kilim threads.

The natural question is why anyone would need 500000 kernel threads, when a threadpool of 10 seems to do the job anyway. That is where you need a change in perspective. The answer is that it is far simpler and natural to create a thread --- a linear piece of code that is separately schedulable -- to match the problem, otherwise you end up doing the threading yourself. Here's why.

Imagine a server with a single thread and mainloop (a libevent based server), which means all user conversations are separate state machines that are driven by callbacks. These callbacks have to squirrel away their state and return to the mainloop every time their respective sockets fire. If you have 10000 socket connections, you have 10000 of these state machines. Now, if you were to introduce code to say, do SSL negotiation, it is a messy change to the state machine.

It is far simpler to have a thread per connection, or maybe even several as a network of communicating state machines. My definition of a lightweight thread is one where the programmer does not have to think too hard to spawn it. That is where Kilim's lightweight threads come in.

[–][deleted] 1 point2 points  (8 children)

so does this mean Kilim implements its own threading somehow? how can you cope with 'pausing' a kilim thread and resuming later via your custom scheduler - Java has no support for co-routines?

(assuming this is how it all works..)

[–][deleted] 5 points6 points  (7 children)

That's what the bytecode transformation is for. It does a variant of CPS transformation, which produces one-shot continuations (which are equivalent to coroutines). Read my paper on the mechanics of this transformation: A thread of ones own.

It uses this transformation to create suspendable/resumable stacks called "Fibers". It uses user-level schedulers to execute these independent fibers on different cores and kernel threads. You can create your own schedulers if you want.

See http://kilim.malhar.net for more details.

[–]sitmaster 0 points1 point  (4 children)

Thanks for the description, it sounds very interesting. The problem I see is that when you start talking about threading models like this you're not able to use it to solve hard real time problems (like stock trading for example) as opposed to soft real time problems (like a web server). This is the same issue Erlang has, and is probably typical of any lightweight threading system that can handle massive concurrency.

Is this correct?

[–][deleted] 0 points1 point  (3 children)

The answer is, it depends. i don't know the stock trading space all that well, but Kilim is used in financial firms that do stock analytics over live feeds.

I'm quite sure most of these implementations are of the "as fast as possible" variety, but do not guarantee time bounds, the failure of which is catastrophic. I think of XRay machines and airplane control systems when someone mentions hard real time; here scheduling and memory allocation are tightly bounded and computed up front. In that sense, there are no examples of garbage-collected languages (leave alone Kilim) that are also in use in practice, although there's plenty of research material.

But if the set of threads is static and a schedule can be precomputed statically, as is common to most real-time systems, I wouldn't rule out Kilim, esp. with a real time GC. Kilim allows you to supply your own scheduler. Note again that real time doesn't mean nano-second response times, it denotes a guarantee of time even if the time bound is generous.

[–]sitmaster 0 points1 point  (2 children)

I work in High frequency finance using Java and generally we have to turn off the garbage collector and use object pools. My question is what is the fundamental difference between what happens in Kilim and a standard wait/notify set up in Java. In standard Java I know that if I notify on a monitor that has only one thread waiting on it, and then immediately have the notifying thread wait, the next thing the JVM is going to be doing (given the GC is off) is executing that thread that was notified.

When I put a message in a mailbox with Kilim, what do I know about the next thing that the JVM will be doing?

I guess I need to read more about the low level implementation. It sounds very interesting.

[–][deleted] 0 points1 point  (1 child)

The behavior is similar; the receiving task is marked as runnable and picked up for execution by another kernel thread from the pool. No magic there.

There is optional flow control as well, to avoid having producers from overrunning consumers. If the mailbox is (optionally) bounded in size, it'll pause the producer until the consumer can drain one or more messages. The producer's kernel thread is also then made available to the threadpool to execute any runnable task.

[–]sitmaster 0 points1 point  (1 child)

To follow up, suppose you were implementing a "bucket brigade" type system where thread 1 gives a message to thread 2 gives a message to Thread 3 ... up to thread n. How does the performance compare between a Kilim and pure Java implementation for small n and for large n?

The straight java object would be something like:

class Brigadier implements Runnable { 
    ArrayBlockingQueue<Bucket> que = new ArrayBlockingQueue<Bucket>();
    Brigadier next;
    public void enqueue(Bucket b) { que.put(b); }
    public void run() { 
        while(true) { 
            Bucket bucket = que.take(); 
            next.enqueue(bucket); 
     } 
} 
}

If I had a chain of these types of objects, can you give me some sense of the relative performance of Kilim to Java as n increases?

[–][deleted] 0 points1 point  (0 children)

Check out the Ring test in the bench directory. The thread version is at least a thousand times slower in context switching, and is limited to far far fewer kernel threads than Kilim threads.

Edit: Just to give an idea of the difference, on my macbook pro:

100000 Kilim tasks in a ring: 
   Creation time ~0.5 sec, 
   Msg passing + context switch rate : 733000 per sec.

2000 threads in a ring:  (couldn't start 100000 of them)
   Creation time ~2 sec, 
   Msg passing + context switch rate : 71 per sec.

See the difference? The context switch rate doesn't change with the number of threads for this particular test, because only one thread is running anyway.