./ch02.html

ToC

2.1. Raw Floating Point Speed

Bob: Yes, I would like to see the difference in raw speed between Ruby and C, for a few simple tasks. How about this simple Ruby script:

 include Math
 
 N = 1000000
 
 a = 1.0
 
 N.times{a = sqrt( (a * (a + 1))/(a + 0.001*a) )}
 
 print "N = #{N} ; a = #{a}\n"

I will just let it do a million interations of some floating point calculations.

Here is the corresponding C program. Since C is bound to be a lot faster than Ruby, we may as well give C a factor of ten handicap: ten million iterations it will be for the C code:

 #define  N  10000000
 
 int main()
 {
     int i;
     double a;
 
     a = 1.0;
     for (i = 0; i < N; i++)
         a = sqrt( (a * (a + 1))/(a + 0.001*a) );
     
     printf("N = %d ; a = %g\n", N, a);
 }

I'll start and see how the C code fares:

  |gravity> time test1
  test1: Command not found.
  0.000u 0.000s 0:00.00 0.0%    0+0k 0+0io 0pf+0w

2.2. Slooow

Alice: how about compiling it first?

Bob: Ah, of course, in C you have to compile things first. I've gotten spoiled, using so much Ruby! Okay, I'll compile it without any optimizer, since who knows what that will do, in terms of corner cutting in such a simple expression.

 |gravity> gcc test1.c -lm -o test1

And now we can see what C delivers:

 |gravity> time test1
 N = 10000000 ; a = 1.61686
 2.156u 0.004s 0:02.32 92.6%    0+0k 0+0io 0pf+0w

And here is the Ruby counterpart:

 |gravity> time ruby test1.rb
 N = 1000000 ; a = 1.61686424868974
 3.920u 0.005s 0:04.23 92.6%    0+0k 0+0io 0pf+0w

Alice: At least both give us the same output. But boy, is Ruby slow: even with the handicap of a factor ten, C wins hands down. Ruby must be more than twenty times slower then C.

But wait a minute, why would we get the same output? The C code did ten times more work.

Bob: I just threw in some random floating point operations. We must be converging to some limit point.

Alice: Might be. But let me check whether that is reasonable. I don't like a situation where I don't know what's going on, numerically. If we forget about that factor 0.001 that you threw is as well, we basically have , or in other words or . I remember even high school math to solve that one: . So yes, we do seem to reach a limit point. Let's check its value:

  |gravity> irb
  irb(main):001:0> include Math
  => Object
  irb(main):002:0> 0.5*(1+sqrt(5))
  => 1.61803398874989

Close to what we found; your factor 0.001 must make the difference. Good! I'm happy.

2.3. At Least Not Slower

Bob: You may be happy with the value of the calculation, but I'm far from happy with the speed. And we're mainly comparing floating point calculations here. It could be worse once we include function calls and other overhead. Let's do the same thing through a function call.

Here is the Ruby version:

 include Math
 
 N = 1000000
 
 a = 1.0
 
 def new_a(old_a)
   return sqrt( (old_a * (old_a + 1))/(old_a + 0.001*old_a) )
 end
 
 N.times{a = new_a(a)}
 
 print "N = #{N} ; a = #{a}\n"

And here is the C version:

 #define  N  10000000
 
 double new_a(double old_a)
 {
     return sqrt( (old_a * (old_a + 1))/(old_a + 0.001*old_a) );
 }
 
 int main()
 {
     int i;
     double a;
 
     a = 1.0;
 
     for (i = 0; i < N; i++)
         a = new_a(a);
     
     printf("N = %d ; a = %g\n", N, a);
 }

I'll start again with the C code. And this time I'll compile it first:

 |gravity> gcc test2.c -lm -o test1

Let's see:

 |gravity> time test2
 test2: コマンドが見つかりません.
 0.000u 0.000s 0:00.00 0.0%     0+0k 0+0io 0pf+0w

And here is what Ruby delivers

 |gravity> time ruby test2.rb
 N = 1000000 ; a = 1.61686424868974
 5.616u 0.003s 0:06.11 91.8%    0+0k 0+0io 0pf+0w

Alice: That didn't make much difference. It seems that Ruby is slower than C by a bit more than a factor twenty.

2.4. Not So Quick

Bob: I would not be so quick in jumping to conclusions. A real N-body code does a lot more than floating point calculations and a few function calls. Think about the vector notation we introduced in Ruby, through the Vector class that we built on top of the Array class. I bet that will slow things down.

Alice: The only way to measure a really realistic speed difference between Ruby and C would be to rewrite a whole N-body code from Ruby into C.

Bob: Perhaps. But I think we can get a good stab at the speed difference if we simulate just a small part of the work done during one time step, in terms of the pairwise force calculations between particles. As long as we do a double loop over a large array of particles, each of which has at least one of our Vector vectors, and then do some kind of Vector operation. How about this, inspired by the first step in a real pairwise gravity calculation:

 require "vector.rb"
 
 include Math
 
 NDIM = 3
 EPS = 0.000001
 
 n = gets.to_i
 
 r = Array.new(n)
 r.each_index do |i|
   v = Vector.new(NDIM)
   (0...NDIM).each{|k| v[k] = (i * NDIM + k) * EPS}
   r[i] = v
 end
 
 sum = 0
 r.each_index do |i|
   (i+1...n).each do |j|
     rji = r[j]-r[i]
     sum += rji*rji
   end
 end
 
 print "n = #{n} ; sum = #{sum}\n"

Alice: Good idea. You are writing in the number of particles, before giving each a three-dimensional vector with some components which are all chosen differently, so that you can safely subtract vectors later on. Then you go through a double loop to calculate the difference vector, the distance between two particles, and from that vector you compute the inner product with itself. Finally you print something out, to see whether you'll get the same value in C as in Ruby.

Bob: You may not be able to read my mind, but at least you can read my code. Yes, that was exactly my intention.

2.5. Wanna Bet?

Alice: Do you remember how to write all this in C?

Bob: As long as I keep telling myself to put semicolons at the end of every line, it shouldn't be too bad. How about this:

 #define  NMAX  100000
 #define  NDIM  3
 #define  EPS  0.000001
 
 int main()
 {
     int i, j, k, n;
     double r[NMAX][NDIM];
     double sum;
     double rji[NDIM];
     double r2;
 
     scanf("%d", &n);
 
     for (i = 0; i < n; i++)
        for (k = 0; k < NDIM; k++)
            r[i][k] = 1.0 + (i * NDIM + k) * EPS;
 
     sum = 0;
     for (i = 0; i < n; i++){
         for (j = i+1; j < n; j++){
             for (k = 0; k < 3; k++)
                 rji[k] = r[j][k] - r[i][k];
            r2 = 0;
             for (k = 0; k < NDIM; k++)
                 r2 += rji[k] * rji[k];
            sum += r2;
        }
     }
     printf("n = %d ; sum = %.10g\n", n, sum);
 }

Alice: Looks fine. Does it compile?

 |gravity> gcc test3.c -lm -o test3

Bob: It does. Now, before I run both programs, what do you expect to see, for a slowness ratio of Ruby over C?

Alice: The simplest guess would be another factor between twenty and thirty.

Bob: I think it may be quite a bit larger, with all that vector stuff.

Alice: Well, okay, I will predict a factor 35, but that may be an overstatement.

Bob: Wanna bet?

Alice: I don't like to bet, but tell me, what do you expect?

Bob: I think vectors will slow things down by at least a factor two. I'll put my bets on a factor 50.

Alice: May the best predictor win!

2.6. Surprise

Let's start with Ruby first, this time:

 |gravity> time ruby test3.rb
 1000
 n = 1000 ; sum = 2.2499977500002
 14.753u 0.007s 0:16.01 92.1%   0+0k 0+0io 0pf+0w

And why not give C the same number of particles? At least we can see whether we get the same result:

 |gravity> time test3
 1000
 n = 1000 ; sum = 2.24999775
 0.064u 0.001s 0:00.06 100.0%   0+0k 0+0io 0pf+0w

Alice: Now that is fast! Very much faster even than your prediction of a factor 50 speedup. And we do get the same value out, so I guess both codes are really doing the same thing.

Bob: Let's make life a factor hundred more difficult for the C version. Giving ten times more particles should do the job:

 |gravity> time test3
 10000
 n = 10000 ; sum = 22499.99978
 5.008u 0.001s 0:05.41 92.4%    0+0k 0+0io 0pf+0w

Alice: Still C is faster than Ruby, with a hundred times more work. At least, if the work is really quadratic in the particle number.

Bob: It should be. Even so, it is easy to check! We can double the number for Ruby:

 |gravity> time ruby test3.rb
 2000
 n = 2000 ; sum = 35.9999909999736
 64.107u 0.010s 1:11.62 89.5%   0+0k 0+0io 0pf+0w

and for C as well:

 |gravity> time test3
 20000
 n = 20000 ; sum = 359999.9991
 19.543u 0.010s 0:21.19 92.2%   0+0k 0+0io 0pf+0w

Alice: And everything takes about four times as long. Okay, I'm convinced. For this task at least, Ruby seems to be a whopping factor of two hundred slower than C.

Bob: I must say, I'm surprised.

Alice: At least you are a better predictor than I am.

Bob: That may not be such a surprise, since you generally play the role of corrector!

2.7. Wonderful

Alice: Well, somebody has to. But this factor of 200 is shocking. Not only that, it is probably more relevant for our N-body calculations than the first two tests we did, as you already stressed. This is pretty terrible.

Bob: It is wonderful!

Alice: I beg your pardon?

Bob: It's wonderful to have a factor of 200 to play with. Remember that I had asked for a factor 100 improvement in speed? We can actually do it now! We only need to determine which part of the code does at least half of the work, and replace that by C code. We can then leave the rest in Ruby, and by getting a speed-up of a factor 200 for half the work, the total speed of the code will increase by a factor 100.

In general, my experience with N-body codes has shown me that for an N-body integrator much more than half of the computer time is spent in just a few lines of code. It is the innermost loop where the gravitational accelerations are computed between pairs of particles, that takes almost all the time.

So the conclusion is: we should be able to speed up our Ruby codes by a factor of 100.

Alice: That all sounds a bit too optimistic to me, but I see the logic of what you say. Well, let's try it out and see how far we get!

Previous ToC Up Next