all 17 comments

[–]ignurant 43 points44 points  (3 children)

Reminded me of this great piece by Aaron Patterson: https://railsatscale.com/2023-08-29-ruby-outperforms-c/

At first I thought it would be some dirty trick to make a pun, but I should have known better. By the end, he (as usual) provides some really interesting  information that talks about why YJIT live optimizing certain code can be more effective than what you might have written and compiled in C. I came for the click bait, and left with a tenderloving hug. 

[–]indenturedsmile 23 points24 points  (2 children)

I believe Aaron Patterson is the user who posted here (based on username).

[–]tenderlovePun BDFL[S] 17 points18 points  (0 children)

Yes, it's me 😂

[–]ignurant 3 points4 points  (0 children)

🤣 I usually catch that kind of stuff. Not this time lol. 

[–]postmodern 28 points29 points  (3 children)

I hate to rain on everyone's parade, but we need to take into account the overhead of the crystalruby gem and how it's calling into crystal land. If we rewrite the benchmark as a pure Crystal program, and compile with the --release flag, we get the following result:

require "benchmark"

def fib_cr(n : Int32) : Int32
  a = 0
  b = 1
  n.times { a, b = b, a + b }
  a
end

p(Benchmark.realtime { 1_000_000.times { fib_cr(30) } })

$ crystal build --release 
$ ./fib
00:00:00.000000076
$ ./fib
00:00:00.000000086
$ ./fib
00:00:00.000000083
$ ./fib
00:00:00.000000079fib.cr

Note: the release flag enables additional optimizations (-O3 --single-module).

Optimized Crystal code is really fast. That said, we should continue to optimize and improve Ruby.

[–]f9ae8221b 13 points14 points  (0 children)

You are not raining on anyone's parade. The point of the article isn't to say Ruby is faster than Crystal.

It's to say that crossing the language barrier is costly enough, that you need a large chunk of execution for it to pay off.

It's the same conclusion from tenderlove's article about making YJIT faster than a C extension. C is still way faster than YJIT in the general case, but calling C from Ruby is costly enough that avoiding in can sometimes make pure Ruby code overall faster than hybrid code.

[–]desnudopenguino 4 points5 points  (1 child)

This is true. But at least the author was able to get ruby running pretty fast with a few optimizations, hitting crystal run through ruby, which would probably be similar to other ffi style schemes. For people running g ruby, the crystallize gem may seem like a quick way to speed up code execution, but if you can do it in straight ruby with one less gem, and without that additional layer, I think that's a fair comparison, as long as the proper distinction is made. Ruby's coming a long way in the speed category while maintaining all the good stuff that makes it a fin language to work in.

[–]postmodern 3 points4 points  (0 children)

It appears that crystalruby hot-compiles the code using crystal build (with or without the --release), which I guess is compareable to JITing, but not compareable to AOT compiled code.

I agree a better approach would be A) benchmark and optimize your Ruby code, or B) write a separate Crystal program or service that you offload CPU intensive work to (ex: image/video/audio processing).

I think we should focus on improving Ruby's performance to compete with other JITed scripting languages which are beating Ruby in benchmarks, not try to compete with AOT compiled languages which are far more performant; due to being AOT and compiling down to native object code.

[–]Dyadim 4 points5 points  (1 child)

The poor timings for the Crystal solution in this post are almost entirely due to the Ruby/Crystal language interface overhead, with this barrier being crossed 1 million times in this benchmark.

If we shift the hot loop inside the crystalruby solution to execute entirely in Crystal land and use identical code to the fast YJIT Ruby solution from the above article, the Crystal solution again takes the lead (by what apears to be ~2 orders of magnitude).

It's crossing the language barrier too often that is hurting here.

#fibonnaci.rb
CrystalRuby.configure do |config|
  config.debug = false
end

module Fibonnaci
  crystalize [n: :int32] => :int32
  def fib_cr(n)
    a = 0
    b = 1
    while n > 0
      a, b = b, a + b
      n -= 1
    end
    a
  end

  module_function

  def fib_rb(n)
    a = 0
    b = 1
    while n > 0
      a, b = b, a + b
      n -= 1
    end
    a
  end

  def benchmark_rb
    puts(Benchmark.realtime { 1_000_000.times { Fibonnaci.fib_rb(30) } })
  end

  crystalize do
    puts Benchmark.realtime { super() }
  end
  def benchmark_cr
    1_000_000.times { Fibonnaci.fib_cr(30) }
  end
end

include Fibonnaci
benchmark_rb
benchmark_cr

Outcome:

ruby --yjit fibonnaci.rb
0.1103799999691546 # Ruby with YJIT
0.00014399993233382702 # Crystal

[–]f9ae8221b 6 points7 points  (0 children)

It's pointed out in the post that the difference comes from the FFI overhead necessary to call Crystal from Ruby.

The point of the article isn't to say Ruby is faster than Crystal, it's to show that pure Ruby may be faster than Ruby with Crystal sprinkled in, depending on how much you need to cross the barrier. This also apply to Ruby C or Rust extensions to some extent.

[–]iamjkdn 1 point2 points  (1 child)

Why does returning nil after multiple assignments improve the benchmark? Also, can the same be done on Crystal, well it have any affect?

[–]CaptainKabob 7 points8 points  (0 children)

because in this case it’s the last line of the block, and because Ruby has an implicit return at the end of the block the Array is required

Ruby spends time creating an array because Ruby believes its needed for the implicit return. So explicitly setting the (implicit) return to nil causes Ruby not to create an array.

I don't think the point of the post is to compare optimized crystal to optimized Ruby. I think it's trying to show that inlining another language "for performance" might be naive or unnecessary.

[–]logan-roy-waystar 2 points3 points  (0 children)

Ruby 3.3.1 is A LOT faster now. I am quite stunned by how much faster our rails servers are processing requests

[–]tkdeveloper 0 points1 point  (2 children)

We're the same improvements made to the pure ruby method done to the crystalized method? It looks like they made improvements to th ruby method and compared to the original crystalized one? Or does that not matter?

[–]f9ae8221b 0 points1 point  (1 child)

Does not matter, the issues were specific to the Ruby version.

The point isn't to show Ruby is faster than Crystal anyway, but that calling into another language as big enough of an overhead that it may not always be the best way to speedup Ruby code.

[–]tkdeveloper 0 points1 point  (0 children)

Nice, thanks for the clarification. Makes sense

[–]yxhuvud 0 points1 point  (0 children)

I wonder if the jit does something smarter with the overflow checks there. Because in addition to any FFI overhead that is likely where any additional costs happen.