The basic idea of the tree algorithm is to replace the gravitational forces from distant particles with the force from their center of mass, or with multipole expansion if high accuracy is desired. Particles are organized into an octree structure, with the root node covering the entire system and leafs corresponding to each particles.
The force on a particle from a node is defined (and calculated) recursively. If the node and particle are well separated (in terms of the error of the multipole expansion), the force from the node to the particle is calculated by evaluating the multipole expansion of the node at the location of the particle. If they are not well separated, the force is evaluated as the sum of the forces from children of the node. The calculation cost of the force on one particle is , since the cost is proportional to the number of levels of the tree.
In order to use the multipole expansions of the nodes, they must be precomputed. The expansion coefficients for a node can be recursively calculated from those of children nodes. The calculation cost of this part is .
For details of implementation, see [13]. Salmon et al. [15] describes the implementation of the tree algorithm on distributed-memory parallel computers.
In the tree algorithm, the particles which generate the gravitational potential and the particles which feel the potential are not symmetric. The particles which generate potential are treated as clumps whenever possible. However, calculations of the forces on two particles are totally independent, even though the two particles are in small distance.
The basic idea of FMM (Fast Multipole Method, [6,7]) is to locally expand the potential field and use that expansion to obtain the forces on multiple particles. Figure 1 shows the relation between the tree algorithm and FMM.
Figure 1: Approximations in tree (top) and FMM(bottom)
Since neighboring particles share the same expansion, the scaling of the calculation cost changes from of the tree algorithm to . However, for similar accuracy, the actual calculation cost of FMM is significantly higher than that of tree algorithm, even for very large number of particles. [4]
The fundamental reason for the relatively high cost of FMM is that the scaling with the order of the expansion is different. In three dimensions, the calculation cost of the tree algorithm is , where p is the order of the multipole expansion. The number of independent terms of the spherical harmonics of order p is . Since the evaluation of terms can be done using recurrent relation, the calculation cost is proportional to the number of terms. On the other hand, the calculation cost of FMM is , since the translation of the multipole expansion to the local expansion requires the calculation cost proportional to the square of the number of terms.
Of course, it is possible to reduce this scaling to , by increasing the number of particles at the lowest cell. Also, it is possible to apply FFT to the translation [5] to reduce scaling to . When these two are combined, the resulting scaling is . However, it should be noted that FFT is advantageous only for very large values of p. Even for pretty large values of p, the gain by FFT does not exceed a factor of two. In addition, FFT works fine for non-adaptive variant of FMM, but might not work so well for an adaptive version.
At present, FMM does not offer clear advantage in performance compared to the tree algorithms. One of the reason is that the mathematics used in FMM is much more complex and therefore it is more difficult to implement FMM than to implement the tree algorithm. As a result, relatively small number of implementations exist for FMM. These implementations are not widely used and not very highly optimized. Since FMM is a complex algorithm, there are many small places which can easily lead to rather large inefficiency. The tree algorithm is much simpler and therefore easier to achieve high efficiency.
Anderson [1] proposed an alternative formulation of FMM which is based on Poisson's formula. In two dimensions, the gravitational potential outside a disk of radius a containing particles is expressed as
where G is the gravitational constant, M is the total mass of the particles in the disk, is the position in polar coordinate. This formula gives the solution of the boundary value problem of the Laplace equation.
In order to use formula (1) as an replacement of the multipole expansion, the integral must be replaced by some numerical quadrature. The ``best'' method for the numerical integration over a circle is to distribute the points in equal spacing and sum the values on these points with equal weights. When we use 2p+1 points to sample potential, the integration should give exact values for p-th order terms in multipole expansion. Note that the p-th term in the multipole expansion corresponds to p-th term in the Fourier expansion of the potential on the circle .
Anderson found that the naive use of the numerical quadrature in combination with formula (1) gives unacceptable result. The reason is that finite number of sampling points introduces fictitious high-frequency terms in Fourier components. In order to suppress high-frequency terms, we should truncate the multipole expansion at order p. In other words, formula (1) should be replaced by
With this modification, Anderson successfully used Poisson's formula to implement FMM. The local expansion can also given in a similar form.
In three dimensions, the outer and inner expansions are given by
and
In actual implementation, the infinite sum must also be truncated at the order of the integration scheme used.
In two dimensions, the trapezoidal rule is optimum and is directly related to the Fourier expansion. However, in three dimensions, the way to assign points on a sphere is not unique. Anderson followed the formula given in [11], which claims to have constructed 5th, 7th, 9th, 11th and 14th order integration formulae with 12, 24, 32, 50 and 72 points. Recently, Hardin and Sloane [9] suggested a complete set of the achievable orders for integration schemes with up to 100 points. They called an integration schemes which achieved order t as t-designs. Table 2.3 gives the summary of their result. Note that a D-th order scheme (D-design) can express spherical harmonics of the order only up to [1].
Table 1: number of points K and achievable order D
For orders 2, 3 and 5, the corresponding distributions of points are the vertices of tetrahedron(K=4), octahedron(K=6) and icosahedron (K=12).
Blackston and Suel [4] implemented both the classical FMM and Anderson's method, and found that for similar accuracy the latter is faster typically by a factor of few.
The most significant practical advantage of Anderson's method is the ease of the implementation. The classical FMM in three dimensions requires rather complex formulas to be implemented for the shifting the center of the the multipole expansion (``M2M'' part), translation of the multipole expansion to local expansion (``M2L'' part) and shifting the center of the local expansion (``L2L'' part). In Anderson's method, all shiftings and translations are realized by evaluating the potential on the sample points on the sphere. Thus, all mathematics are confined into formulae (4) and (3).