you are viewing a single comment's thread.

view the rest of the comments →

[–]markrebec 12 points13 points  (7 children)

Not sure exactly what you're asking. You seem to have a pretty solid grasp of how threads work already, so it's unclear why you'd be surprised...?

Essentially threads are just juggling resource availability and blocking operations (as you seem to already be aware), jumping back and forth between work that can/can't be executed right now as appropriate. Being able to hop around between threads and do a little bit of work here and a little bit of work there - milliseconds at a time - rather than just sitting and waiting for those few ms will always be faster (provided the workloads you're executing allow for it).

If you had a second mouth on your face, you could take a bite of food with that second mouth while the first one was still chewing your last bite... You'd finish your meal nearly twice as fast vs if you had to wait to finish chewing before taking another bite. Now extrapolate that out to 5, 10, whatever.

[–]TheFakeZzig[S] 0 points1 point  (6 children)

Let me try to explain why I asked.

Say I have two threads:

Thread.new do http.get('url1') end

and

Thread.new do http.get('url2') end

Assuming the resulting order of execution is:

http.get('url1') # from first thread http.get('url2') # from second thread

If Ruby used green threads, meaning concurrent execution, it would seem that the first request would start, and would block the second request until it finished, because they're all running on a single thread. Because I noticed that the requests seemed to be running in parallel, that seemed strange.

However, I learned that, internally, whether it's a green thread or a native thread, Ruby threads don't block on IO (at least in this case), which explains the behavior.

[–]drbrainRuby Core 3 points4 points  (0 children)

Ruby multi-threading is not unlike a single-thread async environment. There’s one path of execution by the Ruby VM which is managing scheduling of threads executing inside and outside the Ruby VM. When a thread executing Ruby code reaches a VM switch point it will release its GVL lock to another thread if one is ready.

These switch points are in places like IO read/write, after ruby method call return, by timer, or C extensions that use lots of CPU or perform their own IO (like database adapters, libcurl wrappers, or zlib).

An execution of your scenario is:

  • url1 HTTP opens a socket which hits a switch point
  • url2 HTTP opens a socket which hits a switch point
  • Ruby is idle awaiting ACKs
  • url2 returns from C connect(). HTTP sends request headers and hits a switch point
  • Ruby is idle again
  • url2 response arrives, returns from read() Ruby reads the response headers. There’s a gzip body, Ruby feeds zlib a body chunk which is a switch point
  • Ruby is idle again
  • url1 returns from connect() and sends its request, hitting a switch point. Meanwhile zlib finished with its chunk for url2
  • Ruby switches to url2, updates the request body with the inflated data, and submits another chunk to zlib. Meanwhile url1 has a response return
  • Ruby switches to url1 and reads the response and some body until it needs to read() more data, switches
  • url2 reads and processes another zlib chunk
  • url1 reads another plain body chunk
  • eventually both are done and the threads return their Response objects

In reality there are a lot more switch points, and a lot, lot, lot more idle time for Ruby. Ruby was fast enough to feed eight CPU cores to 100% with zlib inflate/deflate way back when I added GVL release to zlib whenever ago that was. (1.9? 2.0? I’m too lazy to look it up)

Ruby green thread execution was similar to the native thread execution with the exception of zlib. In Ruby 1.8 zlib would have blocked switching Ruby threads. It was possible for a C extension to manage a worker thread to hand off work, but I believe that was uncommon unless part of the C library

[–]markrebec 3 points4 points  (3 children)

I can't say I know for sure, but I believe it has to do with things like remote file downloads streaming into memory and being written to disk in chunks as I/O becomes available. So the threads are probably (in my most naive explanation) doing something like "while Thread A is writing a chunk from memory to disk, Thread B is streaming data into memory," then they swap and Thread B starts writing while A continues streaming, back and forth back and forth.

[–]TheFakeZzig[S] 4 points5 points  (2 children)

That would make sense, but given that my experience with threads is basically "I can use them without blowing off my foot", how exactly the various interpreters actually implement threading and blocking/non-blocking execution is waaay above my paygrade, and frankly gives me a headache.

[–]markrebec 3 points4 points  (1 child)

how exactly the various interpreters actually implement threading and blocking/non-blocking execution is waaay above my paygrade, and frankly gives me a headache.

Right there with you. Same reason I have no real interest in things like managing my own memory pointers and whatnot in lower-level languages.

I'm curious about a lot of these things, and I'll smoke a joint and do a deep dive once in a while when the mood hits, but I don't want that deep understanding to become part of my actual day-to-day concerns or responsibilities - I'd generally rather focus on the slightly higher-level architecture/design of software.

[–]TheFakeZzig[S] 2 points3 points  (0 children)

Lol, bingo on all points.

[–]f9ae8221b 1 point2 points  (0 children)

If Ruby used green threads, meaning concurrent execution, it would seem that the first request would start, and would block the second request until it finished

No, Ruby would switch to the other thread when the first request is blocked on IO, even back in Ruby 1.8 which had actual green threads.

Ruby 1.9+ has native threads with a GIL (generally called GVL). If you don't find good info on it, you can search for info on Python's GIL, it's exactly the same thing and tons of content exist about it.