./ch03.html

ToC

3.1. Pairing Pairwise Calculations

Bob: Before we start adding pieces of C code to our Ruby code, let us first see whether we can improve the speed of our N-body code purely within the Ruby realm. As we discussed before, that should be the first step, even though the second step, adding C, will buy us more in the end. We may as well get extra speed from wherever we can.

Alice: So this means avoiding double work in pairwise interactions.

Bob: Exactly. What we have been doing so far was to let each particle determine the acceleration it feels by interrogating all other particles. However, the work done in calculating the gravitational acceleration from particle j on particle i:

overlaps a lot with the work done in calculating the gravitational acceleration from particle i on particle j:

Subtracting the two position vectors, and determining the third power of the magnitude of the difference is what takes most of the computer time. The minus sign and the different mass multiplication factor is peanuts, in comparison.

Alice: So you suggest rewriting our N-body code in such a way as to allow both calculations to be done simultaneously.

Bob: Exactly.

Alice: That may not be so easy. Also, it may not be so pretty. I like the modularity of our code, where each particle has its own job to do in finding out how to determine the total gravitational acceleration it feels from all other particles. I don't like the idea of messing up everything by crossing levels of command.

Bob: And I don't like codes that are unnecessarily slow! At the very least we have to try and find out. Only when we see what it really looks like, and how much speed increase we really get, can we decide whether the increas of speed is worth the decrease in prettiness, however you may want to define that.

Alice: Fair enough.

3.2. A Need for Library Structure

Bob: I'll try to be careful with giving names to directories and files. I must admit, I'm getting a bit confused with all the different versions of N-body codes we now have lying around.

Alice: We should soon decide upon a library structure, where we can store those versions we are really happy with.

Bob: Yes. Now that we have a rather versatile N-body code, a general command line argument interpreter, and a generator for Plummer's model realizations, we are beginning to put together a real N-body toolbox.

Well, one thing at a time. It is crucial that we get good speed out of Ruby first. And in order to keep track of our N-body versions, here is the file with the definitons of the Body and Nbody classes, which we called rknbody.rb, after we had finished our command line interpreter, in a directory I called vol-4. Our Plummer's model building versions live in directory vol-5, so let me put a copy of vol-4/rknbody.rb in the current directory vol-6, and let me call it nbody1.rb, for short, with the 1 indicating that we'll probably make a few different versions, for our various attempts at speedup.

Similarly, let me copy the driver from vol-4/rkn2.rb to the current directory, and call it vol-6/nb1.rb, with the one difference of course that the first line is no longer

  require "rknbody.rb"

but

  require "nbody1.rb"

since we just changed the name of that file, even though the contents are exactly the same.

Alice: You'd better keep some good notes! It is high time not only for making a good library structure, but also for organizing our notes. How are we ever going to present this to our students otherwise?

Bob: Yes, and we were going to define a good data format, remember, and there is graphics, something we talked about many times, but never got around too. People have no idea how much work it is to build a good foundation for a software project. They think that an N-body code is just integrating Newton's equations of motion, well, how complex can that bed.

Alice: An understandable misunderstanding. Well, let's hope we can get that misunderstanding out of the way, when we get our act together.

Bob: And no act without speedup. Here we go! I'm copying nbody1.rb now to nbody2.rb, and in parallel I'm copying nb1.rb to nb2.rb, with one distinction . . .

Alice: . . . again the first line.

Bob: Yes: instead of

  require "nbody1.rb"

in nb1.rb, it now reads

  require "nbody2.rb"

nb2.rb. I agree, there ought to be a better way. And I'm sure there is. But first: speed!

Alice: No stopping you at this point!

Bob: I hope not. Give me a few minutes, and let me see how I can implement a more economic pairwise acceleration calculation.

3.3. From `Body` to `Nbody`

Alice: Hi Bob! How's your economic progress?

Bob: Fairly well, I think I just got it working. After scratching my head for a bit, I realized that I had to move the method acc, to calculate the acceleration between particles, from the Body to the Nbody class.

Alice: That's interesting, and a big change. But now that you mention it, yes, of course: a single body can only care about it's own business. It can calculate its own acceleration, but it wouldn't be able to help others.

Bob: Unless it had double pointers. Remember my first attempt at writing an N-body code, way back after we had finished playing with our 2-body version? You didn't like my backward pointers, but they would have enabled one particle to talk directly to another particle.

Alice: Directly, you say? You mean by pointing back to the parent Nbody class instance, and from ther to another particle, crossing boundaries twice! If you can that direct, that you may as well flatten the whole organization of the code into one big fat pancake . . .

Bob: . . . how can the pancake be fat and flat?

Alice: At partial attempt at flattening can still look wobbly and fat. But my point is: such an approach would go against any attempt at modularity.

Bob: Okay, okay, I wasn't proposing to go back to that idea. For one thing, I didn't look forward to having to argue with you about double crossing pointers.

Alice: So you decided to lift the acceleration calculation from the Body class to the Nbody class.

Bob: Yes. I've called it nb_acc now, instead of acc, and it got a bit more complex, but not that much. Let me show them side by side. Here is the old Body#acc, a somewhat confusing Ruby notation meaning the acc method belonging to the class Body, but we may as well get used to it, since it is in general use in the Ruby community.

   def acc(body_array, eps)
     a = @pos*0                              # null vector of the correct length
     body_array.each do |b|
       unless b == self
         r = b.pos - @pos
         r2 = r*r + eps*eps
         r3 = r2*sqrt(r2)
         a += r*(b.mass/r3)
       end
     end
     a
   end

And here is Nbody#nb_acc, my new variation:

   def nb_acc
     null_vector = @body[0].pos*0
     @body.each{|bi| bi.acc = null_vector}
     @body.each_index do |i|
       bi = @body[i]
       (i+1...@body.size).each do |j|
           bj = @body[j]
           r = bj.pos - bi.pos
           r2 = r*r + @eps*@eps
           r3 = r2*sqrt(r2)
           bj.acc -= r*(bi.mass/r3)
           bi.acc += r*(bj.mass/r3)
       end
     end
   end

3.4. Looping Options

Alice: That's not much longer than what we had before. I see that you start out by setting the acceleration to zero for each particle. This means that every body now has an instance variable @acc, on the same level as @pos and @vel, I take it?

Bob: Yes, and before doing any acceleration calculation, each particle's @acc has to be set to zero, so that it can accumulate contributions. A particle cannot delay this action until it starts to calculate its own accelerations, since before doing so, it may already receive contributions from other particles, as side effects of their calculations.

Alice: Then you enter into a double loop over particles, something that you would normally code in an double for loop, using the traditional i and j variables.

Bob: Yes, just as in the C test program, where we used:

    for (i = 0; i < n; i++)
        for (j = i+1; j < n; j++)

Alice: It's interesting to see the options that Ruby offers. You could have used for loops here too, of course.

Bob: Yes, I could have used

    for i in 0...@body.size
      for j in i+1...@body.size

instead of

    @body.each_index do |i|
      (i+1...@body.size).each do |j|

I guess I'm growing fond of the each method that lets Ruby do the counting, rather than having to worry explicitly about reminding myself and the computer how long an array is, where to stop, and whether or not to include the upper bound, using .. notation, or leave that one out, using ....

Alice: Remind me, what is the difference?

Bob: 1...3 counts over 1,2 only, while 1..3 counts over 1,2,3. The more dots, the fewer points. I guess it the .. notation was chosen first as the most obvious interpretation, and then ... was added as a practical after thought, since it happens so often that we count through an array, starting at zero and taking N terms, which means that we have to count up to but excluding the Nth element.

Alice: You could have avoided mentioning the size of the array by writing:

    @body.each_index do |i|
      @body.each_index do |j|
        if j > i

Bob: True, but since our aim is to speed up our calculations, I was afraid that would get unnecessary overhead. Well, we can check later, when we do our timings.

Alice: The rest of nb_acc is straightforward: after calculating the usual r vector and r3 scalar values, you now get two accelerations for the price of one. Can you show me what else you had to change in the program?

3.5. The `calc` Methods

Bob: I already mentioned the addition of an extra Body variable:

   attr_accessor :mass, :pos, :vel, :acc

And the Body#calc method has simplified a lot. It used to be:

   def calc(softening_parameter, body_array, time_step, s)
     ba  = body_array
     dt = time_step
     eps = softening_parameter
     eval(s)
   end

but in the new version it has become:

   def calc(time_step, s)
     dt = time_step
     eval(s)
   end

The reason for this trimming down is that all the hard work of acceleration calculations are now done on the Nbody level. The only job left to do on the Body level is to add, subtract terms containing @pos and @vel and @acc, and multiply those with coefficients and powers of the time step dt.

Consequently, the Nbody#calc method got trimmed down as well. Instead of:

   def calc(s)
     @body.each{|b| b.calc(@eps, @body, @dt, s)}
   end

we now have:

   def calc(s)
     @body.each{|b| b.calc(@dt, s)}
   end

3.6. Integrators

Alice: Can you show me how this affect a simple integrator, such as our forward Euler method?

Bob: It used to be:

   def forward
     calc(" @old_acc = acc(ba,eps) ")
     calc(" @pos += @vel*dt ")
     calc(" @vel += @old_acc*dt ")
   end

while now we have:

   def forward
     nb_acc
     calc(" @old_acc = @acc ")
     calc(" @pos += @vel*dt ")
     calc(" @vel += @old_acc*dt ")
   end

You see, the line containing the acceleration is now much simpler, and the whole expression is more homogeneous -- but of course you first have to give the instruction to do the global acceleration calculation, something that nb_acc takes care off.

Alice: And the other methods are affected similarly.

Bob: Yes, but with one exception. The two Runge Kutta methods are changed the way would expect them to change. Here is the second-order one:

   def rk2
     calc(" @old_pos = @pos ")
     nb_acc
     calc(" @half_vel = @vel + @acc*0.5*dt ")
     calc(" @pos += @vel*0.5*dt ")
     nb_acc
     calc(" @vel += @acc*dt ")
     calc(" @pos = @old_pos + @half_vel*dt ")
   end

and here is the fourth-order version:

   def rk4
     calc(" @old_pos = @pos ")
     nb_acc
     calc(" @a0 = @acc ")
     calc(" @pos = @old_pos + @vel*0.5*dt + @a0*0.125*dt*dt ")
     nb_acc
     calc(" @a1 = @acc ")
     calc(" @pos = @old_pos + @vel*dt + @a1*0.5*dt*dt ")
     nb_acc
     calc(" @a2 = @acc ")
     calc(" @pos = @old_pos + @vel*dt + (@a0+@a1*2)*(1/6.0)*dt*dt ")
     calc(" @vel += (@a0+@a1*4+@a2)*(1/6.0)*dt ")
   end

3.7. The Leapfrog

The exception comes in with the leapfrog method. I could have made a similarly straightforward translation from the old version:

   def leapfrog
     calc(" @vel += acc(ba,eps)*0.5*dt ")
     calc(" @pos += @vel*dt ")
     calc(" @vel += acc(ba,eps)*0.5*dt ")
   end

But I did not like to ask the whole system to calculate all accelerations twice. You see, at the end of each loop, we change only the velocity. This means that the acceleration calculation at the beginning of the next step repeats exactly the same calculation as we already did at the end of the previous, since the acceleration is only dependent on the positions, not on the velocity. If we are interested in speed-up, there is another potential factor of two in speed that we can gain.

Alice: Let me try to remember, there must have been a reason that we wrote it that way in the first place.

Bob: Yes, there was. You can't skip the second call to acc, since otherwise @vel would not be properly updated at the end of the step. But you can't skip the first call either, for two reasons.

The first reason has to do with start-up. When you take the very first step, there is no previous information yet, so you just have to calculate the acceleration at the beginning of the loop, in order to step the position forward.

The second reason is connected with our previous approach of letting each particle calculate its own acceleration, whenever needed, without introducing extra variables unless we needed to do so. That made sense, since we were aiming at clarity and brevity, rather than speed. But now we have to reconsider those choices.

If we were to speed up the old approach, we would have to do two things: acquire the initial acceleration in a special move at the start of an integration, and introduce an extra variable @old_acc. However, in our new approach, where we determine the integration on the Nbody level for all particles at once, we already have the acceleration avaiable immediately after a call to nb_acc, for each particle in the variable @acc. So the only thing left to do is to warm up the engine before getting into an integration loop.

Here is how I implemented this:

   def leapfrog
     if @init_flag
       nb_acc
       @init_flag = false
     end
     calc(" @vel += @acc*0.5*dt ")
     calc(" @pos += @vel*dt ")
     nb_acc
     calc(" @vel += @acc*0.5*dt ")
   end

The flag init_flag tells you whether the acceleration variables @acc have to be initialized. And that flag is set, you guessed it, in the initializer for the Nbody system, since it is within the Nbody class that the leapfrog method is used:

   def initialize
     @body = []
     @init_flag = true
   end

Alice: And in this way the initial nb_acc is invoked only one time. That's a nice solution, which hardly complicates the algorithm. And it is impressive that for the leapfrog we now have a factor four in speed-up, at least with respect to the calculation of pairwise gravitational attractions: we only visit each particle pair once, instead of twice; and apart from the first round, each time the frog leaps it calculations all accelarations only once. Great!

Bob: And that's it! No further changes needed.

Alice: You did not touch the calculation of the potential energy? That is the only other place where there are operations that scale with the square of the particle number.

Bob: One thing at a time. Normally we don't calculate the energy of the system at every time step, but only when we ask for a diagnostics output. Even if we lose a factor of two or four there, it will really make no difference in the total speed.

This may chance when we start using C modules. If we really can get a speedup of a factor 100 there, we may well have to revisit the energy calculation too. Even if we calculate the energy, say, once every thousand time steps, carelessness there could cost us a few tens of percent in total speed.

3.8. Brevity

Alice: I have one more suggestion for a change. Nothing to do with speedup, only with making things look prettier. I bet you'll like it.

Bob: What do you have in mind?

Alice: Remember that you tried so hard to make the individual lines of the integrators look as simple as they did, back in the days that we were working on the two-body problem? Well, now that you have brought the acceleration variables in line with the position and velocity variables, we can finally grant yuor wish!

Let me try a bit of regular expression magic. May I?

Bob: Sure you may! As you know, I love brevity. But let me call the new version nbody3.rb, to keep our versions separate. Here is the keyboard.

Alice: It is the Nbody#calc that I would like to make just a bit more smarter. In your last version it looked like this:

   def calc(s)
     @body.each{|b| b.calc(@dt, s)}
   end

Are you ready for this? Here is a calc on steroids:

   def calc(s)
     @body.each{|b| b.calc(@dt, s.gsub(/([a-z]\w*)/, '@\&').gsub(/@dt/, 'dt'))}
   end

Bob: Ah, good old regular expressions! Let me see. Everywhere in the string s you do two global substitutions, using gsub. First you take any substring that starts with a lower case letter, followed by an arbitrary number of alphanumeric characters -- the name of a variable, I take it.

Alice: Precisely.

Bob: Then you take that name, and you add a @ symbol in front it in, because \& just echoes the previous match of what was found in between the parentheses there, namely the variable name.

Ah, I get it! You add all those annoying @ signs that are needed to tell Ruby that we are dealing with instance variables. In that way we don't need to add those to the code of the integrators. Alice, you're a genius!

Alice: maa, nee.

Bob: What does that mean?

Alice: Oh, I guess I didn't tell you that I started to take some Japanese classes, just for fun. Occasionally I slip into classroom mode. Just ignore that.

Bob: But then there is a second global substitution. Why that? Let's see. Wherever you encounter an expression @dt, you replace it by dt. Ah, of course. When you give every variable an @ sign, it is all nice and well for pos to turn into @pos, and so on, but you will also turn dt into @dt, and that is too much of a good thing. Got it!

May I rewrite the integrators? That will be fun!

Alice: Go right ahead!

Bob: Let's see, almost a global replace of @ by nothing, except for the @init_flag in leapfrog, almost trimmed that one too, by mistake. Ah, that looks wonderful. Just look at the fourth-order Runge Kutta:

   def rk4
     calc(" old_pos = pos ")
     nb_acc
     calc(" a0 = acc ")
     calc(" pos = old_pos + vel*0.5*dt + a0*0.125*dt*dt ")
     nb_acc
     calc(" a1 = acc ")
     calc(" pos = old_pos + vel*dt + a1*0.5*dt*dt ")
     nb_acc
     calc(" a2 = acc ")
     calc(" pos = old_pos + vel*dt + (a0+a1*2)*(1/6.0)*dt*dt ")
     calc(" vel += (a0+a1*4+a2)*(1/6.0)*dt ")
   end

What a beauty!

Alice: Glad you like it!

Previous ToC Up Next