# N-body simulations in astrophysics

## Astrophysics background

• The importance of numerical calculations complementing theory and observation

Simulations may be considered a third pillar of astrophysics, or part of theoretical work. They allow us to connect theory to observation in a very precise way, and make predictions where observation is lacking. Today we are focusing on gravitational N-body simulations only, which help us understand systems such as globular clusters and galaxies where gravity plays a major role in the dynamical behavior and evolution.

## The N-body problem

• Presentation of the problem as an ODE set

The problem of the motion of $$N$$ point masses (where usually $$N\gg1$$) under their own gravity can be described by these $$3N$$ coupled non-linear second order differential equations $\ddot{\mathbf{r}}_{i}=-G\sum_{j\neq i}\frac{m_{j}(\mathbf{r}_{i}-\mathbf{r}_{j})}{|\mathbf{r}_{i}-\mathbf{r}_{j}|^{3}}$

Q: What are the computational difficulties?

A: (1) The inverse $$\sqrt{\phantom{a}}$$ calculation, needed $$N(N-1)/2\approx N^{2}/2$$ times for a “full force evaluation” (2) Also note that the equations may contain singularities, this makes it difficult to choose a timestep because if we want to solve the equations within some tolerance, the required time-step diverges as one approaches the singularities.

• Relationship to Bolzmann & Poisson

In the limit $$N\rightarrow\infty$$, we are in the thermodynamic limit and can talk about a phase space probability distribution function $$f(\mathbf{r},\mathbf{v})$$ instead of the positions and velocities of individual (discrete) particles.

Q: What equations describe the flow of the df in phase space?

A: The Boltzmann equation; but in it hides the potential, which couples it to the Poisson equation. (The Boltzmann equation is the continuity equation or the conservation of probability in phase space) \begin{eqnarray*} \frac{\mathrm{d}f}{\mathrm{d}t} & = & \frac{\partial f}{\partial t}+\mathbf{v}\cdot\boldsymbol{\nabla}_{\mathbf{x}}f-\boldsymbol{\nabla}_{\mathbf{x}}\Phi\boldsymbol{\nabla}_{\mathbf{v}}f=0\\ \nabla_{\mathbf{x}}^{2}\Phi & = & 4\pi G\rho=4\pi G\iiint f\mathrm{d}\mathbf{v} \end{eqnarray*}

Q: Why not solve the Boltzmann equation directly?

A: (1) Sometimes we are interested in discrete particles. (2) It's not easy technically; the df lives in 6D, so if we want to solve it on a grid, it needs to be really huge if we want to have good resolution in each one of those six dimensions.

There are very interesting ways to solve this statistically (Monte Carlo / Fokker--Planck methods) those are called mean field methods and will not be discussed today, although they are just as valuable as full N-body simulations for some problems.

• Reality check (1)

Q: The initial conditions $$\{(\mathbf{r}_{i}(0),\mathbf{v}_{i}(0))\}$$ are propagated with some numerical technique to $$\{(\mathbf{r}_{i}(t),\mathbf{v}_{i}(t))\}$$. What is the easiest and most fundamental reality check?

A: Constants of motion (e.g. energy, linear momentum, angular momentum) must be conserved.

• General vs. specialized Poisson solvers
Just making the point that this ODE set could in principle be solved by general-purpose numerical techniques, but numerical techniques have been developed which are especially suitable for the difficulties discussed earlier. We will now talk about some of these methods.

## Solving the N-body problem

### Early history

The idea of mechanical computers existed since the 19th century; S. Elis Strömgren (Lund University, later Copenhagen Observatory) discussed the possibility to solve the 3-body problem with such a machine.
Erik Holmberg (Lund University) made a primitive simulation with $$N=74$$ light bulbs and photoelectric cells to simulate a merger of two disk galaxies. He used the fact that both light and gravity decay with distance as $$r^{2}$$.
Sebastian R. K. von Hoerner from the Astronomisches Rechen-Institut (ARI) in Heidelberg performed the first N-body simulations on a digital computer in the late 1950s. He used the Siemens 2002 machine in Tübingen. The machine had internal decimal floating/fixed point arithmetics, where a decimal digit is represented by a 4-digit binary number. A “word” was 12 digits long + sign, and the machine could store up to 104 words (meaning it had a memory of nearly 60 kB). The CPU frequency was about 11 kHz and the floating point calculation speed was ~700 FLOPS (for multiply operations). Programming was done in machine language, on punch cards. In the 1960 paper, a simulation with $$N=16$$ is presented, and checked against Chandrasekhar's theory of relaxation.

## Methods

### Collisional vs collisionless

“There are two distinct types of N-body calculation with very different methodologies and problems. Collisional N-body codes simulate the evolution of a system with $$N_{*}$$ stars by numerically integrating the equations of motion of exactly $$N=N_{\ast}$$ particles. Collisionless N-body codes simulate the evolution of $$N_{\ast}$$ stars by following the motion of $$N\ll N_{\ast}$$ particles.” BT08 (p. 122)

The term superparticles is often used in relations to particles in collisionless simulations.

### Direct methods

• Collisional or collisionless (if there is softening)
• Taking the force contribution from all particle into consideration
• Aarseth-type codes (from Aarseth 1963 and on)

Independently of von Hoerner, in 1961 Sverre Aarseth (Cambridge) wrote his own code, which will be retroactively known as NBODY1. The codes will evolve to today's NBODY6(++) and NBODY7, and would be forked along the way into some special-purpose codes.

“The first code was based on the simple concepts of a force polynomial combined with individual timesteps, where numerical problems due to close encounters were avoided by a softened potential.” Aarseth (1999)
The computer used originally was IBM 7090 (London IBM Centre); coding in fortran allowed for a much more complicated software and integration of a $$N=100$$ system, published in 1963.
• Symplectic integrators

A Hamiltonian is split into several parts that are more easily integrated. In the case of the solar system, these could be the motion of a planet in a Keplerian orbit about the Sun, and the motion due to perturbations from the other planets.

Examples: mercury

### Indirect methods

All these methods are practically collisionless (i.e. optimized for that).

• Fourier

The idea is that in Fourier space, the Poisson equation is a simple algebraic one $\hat{\Phi}(\mathbf{k})=-\frac{G}{\pi|\mathbf{k}|^{2}}\hat{\rho}(\mathbf{k})$ so we can FFT the density function, divide by the spatial frequency squared, and rFFT back to real space to get the potential $$\Phi(\mathbf{r})$$. But to do this, we have to work on a grid. This poses some difficulties assigning mass to grid points, and we lose resolution at around the grid cell size (subgrid).

There are more advance techniques utilizing this principle: adaptive mesh refinement (AMR), and particle-particle/particle-mesh (P3M).

Particularly useful for cosmology and other systems where we are interested in a cube, possible with periodic boundary conditions.

Examples: Superbox, ENZO

• Tree (Barnes--Hut)
• The force exerted by many particles in a remote cell can be approximated by the force exerted by one pseudoparticle at that cell's center of mass, with a mass equals to the sum of the cell's particle masses.
• The octree: the root volume is consecutively subdivided or refined into eights, until each node or leaf contains one particle or is empty.
• Near-field (all nodes are open, particle-particle interaction) and far-field (nodes are closed, particle-pseudoparticle interaction) are distinguished by the opening angle θ.

• The Fast Multipole Method (FMM)
• Also using an octree, but leaves contain usually several particles.
• Combines target (or sink) particles into clusters or pseudoparticles as well.
• Instead of particle-pseudoparticle interaction, we now have interactions between remote pseudoparticles.

Examples: PKDGRAV

• Multipole expansion
• The potential as a whole is expanded to a series of terms $\Phi(\mathbf{r})=\sum_{k}a_{k}\Phi_{k}(\mathbf{r})$ where either the functions or coefficients are decided on in advance, and the other is calculated from the density.
• The angular part of $$\Phi_{k}(\mathbf{r})$$ will generally be the spherical harmonics $$Y_{\ell m}(\theta,\phi)$$, and it is the radial part which either needs to be chosen in advance or calculated.
• The fastest method, but gives good results only if the geometry is favorable.

Examples: ETICS (self-promotion)

### Aarseth-type codes

Q: Why use direct methods at all?

A: The approximate methods mentioned introduce various artifacts (depending on the method). Sometimes it's OK but some other times more accurate integration is needed.

Aarseth-type codes contain several ideas that help mitigate the difficulties in direct N-body integration. First we talk about the integration itself and the timestep issue.

• Hermite scheme with block timesteps

This is a predictor-corrector scheme that uses both the explicitly calculated total force and its first derivative (the jerk), which is given by $\dot{\mathbf{F}}_{i}=\sum_{i\neq j}-\frac{3(\mathbf{V}_{ij}\cdot\mathbf{R}_{ij})\mathbf{R}_{ij}}{|\mathbf{R}_{ij}|^{5}}+\frac{\mathbf{V}_{ij}}{|\mathbf{R}_{ij}|^{3}}$ where $$\mathbf{R}_{ij}$$ and $$\mathbf{V}_{ij}$$ are the displacement and relative velocity vectors between particles i and j, respectively. We can increase the order of integration, so each step is more accurate and therefore fewer steps are needed. Today this is done by estimating the higher force derivatives.

First, advance the radius and velocity (of particle i, subscript omitted) using the information we have: \begin{eqnarray*} \mathbf{r} & = & \mathbf{r}_{0}+\mathbf{v}_{0}\Delta t+{\textstyle \frac{1}{2}}\mathbf{F}_{0}\Delta t^{2}+{\textstyle \frac{1}{6}}\Delta\dot{\mathbf{F}}_{0}t^{3}\underbrace{ {\color{magenta}{+{\textstyle \frac{1}{24}}\ddot{\mathbf{F}}_{0}\Delta t^{4}+{\textstyle \frac{1}{120}}\dddot{\mathbf{F}}_{0}\Delta t^{5}}}}_{\Delta\mathbf{r}}\\ \mathbf{v} & = & \mathbf{v}_{0}+\mathbf{F}_{0}\Delta t+{\textstyle \frac{1}{2}}\dot{\mathbf{F}}_{0}\Delta t^{2}\underbrace{ {\color{magenta}{+{\textstyle \frac{1}{6}}\ddot{\mathbf{F}}_{0}\Delta t^{3}+{\textstyle \frac{1}{24}}\dddot{\mathbf{F}}_{0}\Delta t^{4}}}}_{\Delta\mathbf{v}} \end{eqnarray*} At the next time point ($$t+\Delta t$$) we make an explicit calculation of the force and jerk again, but can also write Taylor series for them: \begin{eqnarray*} \mathbf{F} & = & \mathbf{F}_{0}+\dot{\mathbf{F}}_{0}\Delta t+{\textstyle \frac{1}{2}}\ddot{\mathbf{F}}_{0}\Delta t^{2}+{\textstyle \frac{1}{6}}\dddot{\mathbf{F}}_{0}\Delta t^{3}\\ \dot{\mathbf{F}} & = & \dot{\mathbf{F}}_{0}+\ddot{\mathbf{F}}_{0}\Delta t+{\textstyle \frac{1}{2}}\dddot{\mathbf{F}}_{0}\Delta t^{2} \end{eqnarray*} we can isolate $$\ddot{\mathbf{F}}_{0}$$ and $$\dddot{\mathbf{F}}_{0}$$ and substitute them back into the equations for $$\mathbf{r}$$ and $$\mathbf{v}$$.

Q: Why not just calculate the higher derivatives explicitly?

A: We can do that, but this requires more computational effort. By explicitly calculating the higher derivatives we can extend the Hermite scheme to higher orders. Higher orders schemes may therefore not significantly reduce the computational effort, and may have numerical stability issues.

• Timestep criterion $\Delta t=\sqrt{\eta\frac{|\mathbf{F}||\ddot{\mathbf{F}}|+|\dot{\mathbf{F}}|^{2}}{|\dot{\mathbf{F}}||\dddot{\mathbf{F}}|+|\ddot{\mathbf{F}}|^{2}}}$
• Individual and block timesteps

The aim is make the minimum number of force evaluations possible.

The total force on a particle i is a sum of many force contributions. Naturally, the further away the particle j is, the slower its contribution to the total force would vary. We can split the total force into a fast (irregular) and slowly (regular) varying components. \begin{eqnarray*} \mathbf{F}(t+\Delta t) & = & \mathbf{F}_{\mathrm{I}}(t+\Delta t)+\mathbf{F}_{\mathrm{R}}(t)+\dot{\mathbf{F}}_{\mathrm{R}}(t)\Delta t\\ \dot{\mathbf{F}}(t+\Delta t) & = & \dot{\mathbf{F}}_{\mathrm{I}}(t+\Delta t)+\dot{\mathbf{F}}_{\mathrm{R}}(t) \end{eqnarray*} Evaluating $$\mathbf{F}_{\mathrm{I}}$$ and $$\dot{\mathbf{F}}_{\mathrm{I}}$$ is much cheaper because one needs to calculate the distances only within the neighbor sphere, which includes $$n\ll N$$ particles. The more expensive evaluation of $$\mathbf{F}_{\mathrm{R}}$$ can be done on longer intervals.

The scheme eases the computational burden significantly, although its full implementation is quite involved, and has at least the neighbor radius and regular timestep as free parameters.

• Regularization and binaries (2-body and in general)

We still didn't talk about the singularities in the equation of motion. The closer two particles are, the smaller the timestep would be, and more Hermite steps would be required to achieve the desired accuracy. But on the other hand, the closer two particles are, their orbit is more similar to a Kepler's orbit, which we know explicitly. At this stage we can treat the force exerted by all other stars as a perturbation, which varies on longer timescales.

The Levi-Civita transformation: we can transform the equations of motion to a very simple form. Please convince yourselves that $\frac{\mathrm{d}^{2}\mathbf{R}}{\mathrm{d}t^{2}}=-\mu\frac{\mathbf{R}}{|\mathbf{R}|^{3}}\longrightarrow\frac{\mathrm{d}^{2}u}{\mathrm{d}\tau^{2}}={\textstyle \frac{1}{2}}\varepsilon u$ where we transformed time to a “slow motion” variable $$\mathrm{d}t/\mathrm{d}\tau=R$$, defined a new coordinate position in the complex plane such that $$u^{2}=x+\mathrm{i}y$$, and used the fact that the specific energy is a constant $$\varepsilon={\textstyle \frac{1}{2}}|\dot{\mathbf{R}}|^{2}-\mu/R$$ that we can find from the initial conditions.

Q: The transformation maps 2D vectors to the complex plane, so what can we do for 3D vectors?

A: While it is proven that such a trick cannot exist in 3D, it can exist in 4D.

Kustaanheimo & Stiefel (KS; 1965) came up with a 4D transformation. It can be derived in an analogous way using quaternions. It is a non-abelian group extending the complex field with two additional “imaginary” directions, with many useful properties including something analogous to a square root.

The KS regularization scheme has been a part of Aarseth's codes since NBODY3 in 1969. Other schemes exist.

• Softening

Another way to remove the singularities from the equations of motion is to replace the actual gravitational potential $$1/r$$ by something like $$1/\sqrt{r^{2}+b^{2}}$$ (there are many options, called softening kernels). This substitution makes gravitational interactions behave normally where $$r\gg b$$, while keeping the force and its derivatives finite everywhere, and in particular where $$r\ll b$$.

This is particularly useful in collisionless simulations, where particles do not represent single objects anyway, so a strong gravitational interaction (strong scattering or a formation of a binary) is meaningless. Formation of a binary particle could slow the simulation significantly due to the small timesteps (i.e. in the absence of a regularization method).

• Kitchen sinking
• External tidal field
• Stellar evolution
• Binary evolution, GR

• Reality check (2)

We mentioned conservation laws, but in practice they might not be useful because of external forces or stellar evolution. Bookkeeping of the energy budget is in principle possible but often problematic to follow in practice. We can consider two important ways to validate our results:

Convergence check With Aarseth-type codes we at least have η as a free parameter, we should make sure the results are independent of it. In collisionless simulations we would generally want to make sure that the results are independent of N, or that their scaling with N is well understood.

Multiple realizations Doing multiple simulations with different initial conditions, since they are usually randomly generated realizations, and making sure that the results are qualitatively the same (not in the case of the solar system for example).

## Hardware acceleration

almost all the techniques we talked about has some bottleneck that could be accelerated using parallelization.

• HARP
• Multiprocessing
• GRAPE
• GPU