- The importance of numerical calculations complementing theory and observation

Simulations may be considered a third pillar of astrophysics, or part of theoretical work. They allow us to
connect
theory to observation in a very precise way, and make predictions where observation is lacking. Today we are
focusing on *gravitational N-body simulations* only, which help us understand systems such as globular
clusters and galaxies where gravity plays a major role in the dynamical behavior and evolution.

- Presentation of the problem as an ODE set

The problem of the motion of \(N\) point masses (where usually \(N\gg1\)) under their own gravity can be described by these \(3N\) coupled non-linear second order differential equations \[ \ddot{\mathbf{r}}_{i}=-G\sum_{j\neq i}\frac{m_{j}(\mathbf{r}_{i}-\mathbf{r}_{j})}{|\mathbf{r}_{i}-\mathbf{r}_{j}|^{3}} \]

Q: What are the computational difficulties?

A: (1) The inverse \(\sqrt{\phantom{a}}\) calculation, needed \(N(N-1)/2\approx N^{2}/2\) times for a “full
force
evaluation” (2) Also note that the equations may contain **singularities**, this makes it difficult
to
choose a **timestep** because if we want to solve the equations within some tolerance, the required
time-step diverges as one approaches the singularities.

- Relationship to Bolzmann & Poisson

In the limit \(N\rightarrow\infty\), we are in the **thermodynamic limit** and can talk about a
phase
space probability **distribution function** \(f(\mathbf{r},\mathbf{v})\) instead of the positions
and
velocities of individual (discrete) particles.

Q: What equations describe the flow of the df in phase space?

A: The **Boltzmann equation**; but in it hides the potential, which couples it to the
**Poisson
equation**. (The Boltzmann equation is the continuity equation or the conservation of probability in
phase space)
\begin{eqnarray*}
\frac{\mathrm{d}f}{\mathrm{d}t} & = & \frac{\partial f}{\partial
t}+\mathbf{v}\cdot\boldsymbol{\nabla}_{\mathbf{x}}f-\boldsymbol{\nabla}_{\mathbf{x}}\Phi\boldsymbol{\nabla}_{\mathbf{v}}f=0\\
\nabla_{\mathbf{x}}^{2}\Phi & = & 4\pi G\rho=4\pi G\iiint f\mathrm{d}\mathbf{v}
\end{eqnarray*}

Q: Why not solve the Boltzmann equation directly?

A: (1) Sometimes *we are* interested in discrete particles. (2) It's not easy technically; the df lives in 6D, so if we want to solve it on a grid, it needs to be
really huge if we want to have good resolution in each one of those six dimensions.

There are very interesting ways to solve this statistically (Monte Carlo / Fokker--Planck methods) those are
called
*mean field methods* and will not be discussed today, although they are just as valuable as full
*N*-body simulations for some problems.

- Reality check (1)

Q: The initial conditions \(\{(\mathbf{r}_{i}(0),\mathbf{v}_{i}(0))\}\) are propagated with some numerical technique to \(\{(\mathbf{r}_{i}(t),\mathbf{v}_{i}(t))\}\). What is the easiest and most fundamental reality check?

A: **Constants of motion** (e.g. energy, linear momentum, angular momentum) must be conserved.

- General vs. specialized
**Poisson solvers**

- Numerical integration by hand (Strömgren 1900; 1909)

- Light bulb experiment (Holmberg 1941)

- First computer simulation (von Hoerner 1960)

“There are two distinct types ofN-body calculation with very different methodologies and problems.Collisionalsimulate the evolution of a system with \(N_{*}\) stars by numerically integrating the equations of motion of exactly \(N=N_{\ast}\) particles.N-body codesCollisionlesssimulate the evolution of \(N_{\ast}\) stars by following the motion of \(N\ll N_{\ast}\) particles.” BT08 (p. 122)N-body codes

The term **superparticles** is often used in relations to particles in collisionless simulations.

- Collisional
*or*collisionless (if there is softening) - Taking the force contribution from all particle into consideration
- Aarseth-type codes (from Aarseth 1963 and on)

Independently of von Hoerner, in 1961 Sverre Aarseth (Cambridge) wrote his own code, which will be retroactively known as NBODY1. The codes will evolve to today's NBODY6(++) and NBODY7, and would be forked along the way into some special-purpose codes.

“The first code was based on the simple concepts of a force polynomial combined with individual timesteps, where numerical problems due to close encounters were avoided by a softened potential.” Aarseth (1999)The computer used originally was IBM 7090 (London IBM Centre); coding in fortran allowed for a much more complicated software and integration of a \(N=100\) system, published in 1963.

- Symplectic integrators

A Hamiltonian is split into several parts that are more easily integrated. In the case of the solar system, these could be the motion of a planet in a Keplerian orbit about the Sun, and the motion due to perturbations from the other planets.

Examples: mercury

All these methods are *practically* collisionless (i.e. optimized for that).

- Fourier

The idea is that in Fourier space, the Poisson equation is a simple algebraic one \[ \hat{\Phi}(\mathbf{k})=-\frac{G}{\pi|\mathbf{k}|^{2}}\hat{\rho}(\mathbf{k}) \] so we can FFT the density function, divide by the spatial frequency squared, and rFFT back to real space to get the potential \(\Phi(\mathbf{r})\). But to do this, we have to work on a grid. This poses some difficulties assigning mass to grid points, and we lose resolution at around the grid cell size (subgrid).

There are more advance techniques utilizing this principle: adaptive mesh refinement (AMR), and
particle-particle/particle-mesh (P^{3}M).

Particularly useful for cosmology and other systems where we are interested in a cube, possible with periodic boundary conditions.

Examples: Superbox, ENZO

- Tree (Barnes--Hut)
- The force exerted by many particles in a remote cell can be approximated by the force exerted by one
*pseudoparticle*at that cell's center of mass, with a mass equals to the sum of the cell's particle masses. - The octree: the
*root*volume is consecutively subdivided or*refined*into eights, until each node or*leaf*contains one particle or is empty. - Near-field (all nodes are open, particle-particle interaction) and far-field (nodes are closed,
particle-pseudoparticle interaction) are distinguished by the opening angle
*θ*.

Examples: GADGET-2

- The Fast Multipole Method (FMM)
- Also using an octree, but leaves contain usually several particles.
- Combines target (or
*sink*) particles into clusters or pseudoparticles as well. - Instead of particle-pseudoparticle interaction, we now have interactions between remote pseudoparticles.

Examples: PKDGRAV

- Multipole expansion
- The potential as a whole is expanded to a series of terms \[ \Phi(\mathbf{r})=\sum_{k}a_{k}\Phi_{k}(\mathbf{r}) \] where either the functions or coefficients are decided on in advance, and the other is calculated from the density.
- The angular part of \(\Phi_{k}(\mathbf{r})\) will generally be the spherical harmonics \(Y_{\ell m}(\theta,\phi)\), and it is the radial part which either needs to be chosen in advance or calculated.
- The fastest method, but gives good results only if the geometry is favorable.

Examples: ETICS (self-promotion)

Q: Why use direct methods at all?

A: The approximate methods mentioned introduce various artifacts (depending on the method). Sometimes it's OK but some other times more accurate integration is needed.

Aarseth-type codes contain several ideas that help mitigate the difficulties in direct *N*-body
integration.
First we talk about the integration itself and the timestep issue.

- Hermite scheme with block timesteps

This is a predictor-corrector scheme that uses both the explicitly calculated total force and its first
derivative
(the jerk), which is given by
\[
\dot{\mathbf{F}}_{i}=\sum_{i\neq
j}-\frac{3(\mathbf{V}_{ij}\cdot\mathbf{R}_{ij})\mathbf{R}_{ij}}{|\mathbf{R}_{ij}|^{5}}+\frac{\mathbf{V}_{ij}}{|\mathbf{R}_{ij}|^{3}}
\]
where \(\mathbf{R}_{ij}\) and \(\mathbf{V}_{ij}\) are the displacement and relative velocity vectors between
particles *i* and *j*, respectively. We can increase the order of integration, so each step is more
accurate and therefore fewer steps are needed. Today this is done by estimating the higher force derivatives.

First, advance the radius and velocity (of particle *i*, subscript omitted) using the information we have:
\begin{eqnarray*}
\mathbf{r} & = & \mathbf{r}_{0}+\mathbf{v}_{0}\Delta t+{\textstyle \frac{1}{2}}\mathbf{F}_{0}\Delta
t^{2}+{\textstyle \frac{1}{6}}\Delta\dot{\mathbf{F}}_{0}t^{3}\underbrace{ {\color{magenta}{+{\textstyle
\frac{1}{24}}\ddot{\mathbf{F}}_{0}\Delta t^{4}+{\textstyle \frac{1}{120}}\dddot{\mathbf{F}}_{0}\Delta
t^{5}}}}_{\Delta\mathbf{r}}\\
\mathbf{v} & = & \mathbf{v}_{0}+\mathbf{F}_{0}\Delta t+{\textstyle
\frac{1}{2}}\dot{\mathbf{F}}_{0}\Delta
t^{2}\underbrace{ {\color{magenta}{+{\textstyle \frac{1}{6}}\ddot{\mathbf{F}}_{0}\Delta t^{3}+{\textstyle
\frac{1}{24}}\dddot{\mathbf{F}}_{0}\Delta t^{4}}}}_{\Delta\mathbf{v}}
\end{eqnarray*}
At the next time point (\(t+\Delta t\)) we make an explicit calculation of the force and jerk again, but can
also
write Taylor series for them:
\begin{eqnarray*}
\mathbf{F} & = & \mathbf{F}_{0}+\dot{\mathbf{F}}_{0}\Delta t+{\textstyle \frac{1}{2}}\ddot{\mathbf{F}}_{0}\Delta
t^{2}+{\textstyle \frac{1}{6}}\dddot{\mathbf{F}}_{0}\Delta t^{3}\\
\dot{\mathbf{F}} & = & \dot{\mathbf{F}}_{0}+\ddot{\mathbf{F}}_{0}\Delta t+{\textstyle
\frac{1}{2}}\dddot{\mathbf{F}}_{0}\Delta t^{2}
\end{eqnarray*}
we can isolate \(\ddot{\mathbf{F}}_{0}\) and \(\dddot{\mathbf{F}}_{0}\) and substitute them back into the
equations
for \(\mathbf{r}\) and \(\mathbf{v}\).

Q: Why not just calculate the higher derivatives explicitly?

A: We can do that, but this requires more computational effort. By explicitly calculating the higher derivatives we can extend the Hermite scheme to higher orders. Higher orders schemes may therefore not significantly reduce the computational effort, and may have numerical stability issues.

- Timestep criterion \[ \Delta t=\sqrt{\eta\frac{|\mathbf{F}||\ddot{\mathbf{F}}|+|\dot{\mathbf{F}}|^{2}}{|\dot{\mathbf{F}}||\dddot{\mathbf{F}}|+|\ddot{\mathbf{F}}|^{2}}} \]
- Individual and block timesteps

The aim is make the **minimum number of force evaluations** possible.

- The Ahmad--Cohen neighbor scheme

The total force on a particle *i* is a sum of many force contributions. Naturally, the further away the
particle
*j* is, the slower its contribution to the total force would vary. We can split the total force into a fast
(*irregular*) and slowly (*regular*) varying components.
\begin{eqnarray*}
\mathbf{F}(t+\Delta t) & = & \mathbf{F}_{\mathrm{I}}(t+\Delta
t)+\mathbf{F}_{\mathrm{R}}(t)+\dot{\mathbf{F}}_{\mathrm{R}}(t)\Delta t\\
\dot{\mathbf{F}}(t+\Delta t) & = & \dot{\mathbf{F}}_{\mathrm{I}}(t+\Delta t)+\dot{\mathbf{F}}_{\mathrm{R}}(t)
\end{eqnarray*}
Evaluating \(\mathbf{F}_{\mathrm{I}}\) and \(\dot{\mathbf{F}}_{\mathrm{I}}\) is much cheaper because one needs
to
calculate the distances only within the **neighbor sphere**, which includes \(n\ll N\) particles.
The
more expensive evaluation of \(\mathbf{F}_{\mathrm{R}}\) can be done on longer intervals.

The scheme eases the computational burden significantly, although its full implementation is quite involved,
and has
at least the *neighbor radius* and *regular timestep* as free parameters.

- Regularization and binaries (2-body and in general)

We still didn't talk about the singularities in the equation of motion. The closer two particles are, the smaller the timestep would be, and more Hermite steps would be required to achieve the desired accuracy. But on the other hand, the closer two particles are, their orbit is more similar to a Kepler's orbit, which we know explicitly. At this stage we can treat the force exerted by all other stars as a perturbation, which varies on longer timescales.

**The Levi-Civita transformation:** we can transform the equations of motion to a very simple
form.
Please convince yourselves that
\[
\frac{\mathrm{d}^{2}\mathbf{R}}{\mathrm{d}t^{2}}=-\mu\frac{\mathbf{R}}{|\mathbf{R}|^{3}}\longrightarrow\frac{\mathrm{d}^{2}u}{\mathrm{d}\tau^{2}}={\textstyle
\frac{1}{2}}\varepsilon u
\]
where we transformed time to a “slow motion” variable \(\mathrm{d}t/\mathrm{d}\tau=R\), defined a new coordinate
position in the **complex plane** such that \(u^{2}=x+\mathrm{i}y\), and used the fact that the
specific energy is a constant \(\varepsilon={\textstyle \frac{1}{2}}|\dot{\mathbf{R}}|^{2}-\mu/R\) that we can
find
from the initial conditions.

Q: The transformation maps 2D vectors to the complex plane, so what can we do for 3D vectors?

A: While it is proven that such a trick cannot exist in 3D, it can exist in 4D.

Kustaanheimo & Stiefel (KS; 1965) came up with a 4D transformation. It can be derived in an analogous way using quaternions. It is a non-abelian group extending the complex field with two additional “imaginary” directions, with many useful properties including something analogous to a square root.

The KS regularization scheme has been a part of Aarseth's codes since NBODY3 in 1969. Other schemes exist.

- Softening

Another way to remove the singularities from the equations of motion is to replace the actual gravitational potential \(1/r\) by something like \(1/\sqrt{r^{2}+b^{2}}\) (there are many options, called softening kernels). This substitution makes gravitational interactions behave normally where \(r\gg b\), while keeping the force and its derivatives finite everywhere, and in particular where \(r\ll b\).

This is particularly useful in collisionless simulations, where particles do not represent single objects
anyway, so
a strong gravitational interaction (strong scattering or a formation of a binary) is meaningless.
**Formation
of a binary** particle could slow the simulation significantly due to the small timesteps (i.e. in the
absence of a regularization method).

- Kitchen sinking
- External tidal field
- Stellar evolution
- Binary evolution, GR

- Reality check (2)

We mentioned conservation laws, but in practice they might not be useful because of external forces or stellar evolution. Bookkeeping of the energy budget is in principle possible but often problematic to follow in practice. We can consider two important ways to validate our results:

**Convergence check** With Aarseth-type codes we at
least
have *η* as a free parameter, we should make sure the results are independent of it. In collisionless
simulations we would generally want to make sure that the results are independent of *N*, or that their
scaling
with *N* is well understood.

**Multiple realizations** Doing multiple simulations
with
different initial conditions, since they are usually randomly generated realizations, and making sure that the
results are qualitatively the same (not in the case of the solar system for example).

almost all the techniques we talked about has some bottleneck that could be accelerated using parallelization.

- HARP
- Multiprocessing
- GRAPE
- GPU