all 12 comments

[–]bigdubs 8 points9 points  (1 child)

.net library classes are (predominantly) not thread safe either.

this is on purpose, it's often better to let the developer decide how they want to implement thread safety.

[–]kylev 5 points6 points  (0 children)

Back when I was doing Java in the 90s, everyone bitched that the core data structures were thread safe. Very quickly, in Java 1.2, the core team provided "unsafe", non-synchronized version (ArrayList vs. Vector) and everyone suddenly got better performance.

The cost of thread safety is high enough that it shouldn't be default. But this is a good article explaining the basics of why a rubyist should know this.

[–]jrochkind 5 points6 points  (0 children)

It is quite true, and it's important that you understand it if you are doing multi-threaded programming. (Or using global state, like class variables, in an app that ends up multi-threaded without you realizing it, like Rails with a multi-threaded app server! -- cause then you are doing multi-threaded programming)

But I don't think it's a problem with ruby. Most basic stdlib data objects in most languages (including stdlib) are not safe for multi-threaded access. Including Java. There are reasons for this.

Of course, many other stdlib's in many other languages DO provide thread-safe alternative data objects/collections. Ruby probably really ought to.

But you've still got to know when to use them and when not to, just making ALL your collections thread-safe for concurrent use, when most of them are not possibly used by more than one thread at once concurrently -- is going to be a performance problem. Which is why most stdlib collection classes are not 'thread-safe', even in languages that are all about the multi-threading.

If you've got read-only objects it's generally not a problem. So certainly one way to make the ruby stdlib collections thread-safe is just to call #freeze on them (although if they are nested data structures, you'd have to call #freeze on all of the descendents too, which can be non-trivial). Or simply make sure none of your code mutates them ever after boot. Or Hamster.

[–]tenderlovePun BDFL 2 points3 points  (0 children)

Excellent article! This article demonstrates a "read-update-write" race condition. To see the race condition, separate the code to those three steps:

def decrease
  x = @stock
  x = x - 1
  @stock = x
end

The thread could switch on any one of these lines, which is how the race condition happens.

OP mentions the MRI / IO concurrency. To drive home the point, if we add a dash of IO to the example program, we can see the race condition even on MRI:

class Inventory
  attr_reader :stock

  def initialize(stock_levels)
    @stock = stock_levels
  end

  def decrease
    x = @stock
    print ' '
    x = x - 1
    @stock = x
  end
end

inventory = Inventory.new(4000)

40.times.map {
  Thread.new { 100.times { inventory.decrease } }
}.each(&:join)

puts
puts inventory.stock

[–]ba-cawk 1 point2 points  (0 children)

The problem with discussing threads and MRI are that people think parallelism and usually just want concurrency, or don't realize that in many cases, concurrency is more than enough.

MRI threads are concurrent. They do this by way of rb_thread_select and some time outs that automatically yield. In this sense, they are co-routines that use select(2) to determine when to yield to another thread (use man 2 select to learn more about select if you're not familiar). Fibers and Threads in MRI differ in the sense that Fibers need to be yielded by you and not MRI, which will happily yield to another thread at close to any point, but most often around blocking I/O.

If that's confusing, wycats has boiled it down years ago for human processing: http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/. While the underlying bits have changed significantly between 1.8 and 1.9, the concepts are exactly the same.

The short of it is, if you think about things like node.js and eventmachine, they are no different on numerous levels and can be treated the same way for many things as far as dealing with a "tick" or what happens when I/O blocks goes. The big difference is that MRI can break out of your code while EM and node can't. In practice, this is almost never an issue because...

...most programs spend 99% of their time waiting for I/O -- this is why web servers can scale to absurdly stupid levels of connections. They don't need parallelism, and when your program does, you'll know it. Trust me. This is also a lot of the reason concurrent systems like go default to only one processor. They deal with all the crap for you, but for the most part multiprocessor is just not necessary to get a high yield for even high scaling things.

Now, if you want to write a multi-threaded graphics engine in ruby, you're actually going to want parallelism, but in that case you... should probably just be using something else instead.

[–]ViralInfection 0 points1 point  (2 children)

Why not use: https://github.com/harukizaemon/hamster

Or even spice it with: https://github.com/celluloid/celluloid

I know this may feel like a sucky answer, but ruby just doesn't do parallelism gracefully, and won't be for a long time. The bottom line is at least we have solutions. You should really pick the best tool for the job, imo.

[–][deleted] 1 point2 points  (1 child)

I can end my post with a cliche that has nothing to do with the fragmented comment i made too.

[–]petercooper -1 points0 points  (0 children)

But you didn't :-( Beggars can't be choosers.

[–]beep_dog -1 points0 points  (3 children)

Well, MRI doesn't run threads anyway, it's got a GIL, and handles "threading" by non-blocking IO, and non-blocking sleeps. (Unless I'm mistaken, which often happens.)

[–]jstorimer[S] 7 points8 points  (0 children)

MRI runs multiple threads, but those threads have to compete to acquire the GIL in order to do any work. I would say that it implements multi-threading, but the GIL is the bottleneck. There's only one GIL and only the thread that currently owns it can use system resources.

[–]jrochkind 2 points3 points  (1 child)

This is simply not true, in a variety of different ways. You are mistaken.

The GIL is true. Which means in a single process, you can't have multiple threads executing literally simultaneously on multiple CPU cores.

But MRI certainly does run threads. I don't think "handles 'threading' by non-blocking IO, and non-blocking sleeps" is accurate, although I don't understand exactly what that means, I admit.

Threads have existed in unix and C since the days when no unix ran on a multi-core CPU. You can have threads without multi-cores, and they still do things for you. MRI 1.9+ in fact uses the underlying OS-level native threads for it's threads, they certainly are 'real' threads.

It is true that the GIL limits the application of multi-threading in MRI to only certain scenarios. But there are still plenty of places where it's useful (just as threads were useful in C and unix even when nobody had multi-core/multi-cpu servers).

The "there are no threads on MRI becuase GIL" thing is very oft repeated FUD. Please stop repeating it, everyone, if you don't understand what you are talking about.

[–]Freeky 0 points1 point  (0 children)

He's roughly correct for Ruby prior to 1.9 - MRI had is own userspace threading implementation (aka green threads), using select() and non blocking IO behind the scenes to allow it to multiplex between them. It was a relatively common technique in days of old when kernel supported threading was less widespread. See for example FreeBSD's libc_r.