# Introduction

This package provides implementations of some general-purpose random-walk based adaptive MCMC algorithms, including the following:

- Adaptive Metropolis, proposal covariance adaptation, (Haario, Saksman and Tamminen, 2001, and Andrieu and Moulines, 2006)
- Adaptive scaling Metropolis, acceptance rate adaptation for scale (e.g. as in Andrieu and Thoms, 2008, and Atchadé and Fort, 2010)
- Robust Adaptive Metropolis, acceptance rate adaptation for shape (Vihola, 2012)
- Adaptive Parallel Tempering, acceptance rate adaptation for temperature levels (Miasojedow, Moulines and Vihola, 2013)

The aim of the package is to provide a simple and modular general-purpose implementation, which may be easily used to sample from a log-target density, but also used in a variety of custom settings.

See also AdaptiveParticleMCMC.jl which uses this package with SequentialMonteCarlo.jl for adaptive particle MCMC.

## Installation

To get the latest registered version:

```
using Pkg
Pkg.add("AdaptiveMCMC")
```

To install the latest development version:

```
using Pkg
Pkg.add(url="https://github.com/mvihola/AdaptiveMCMC.jl")
```

## Sampling from log-posteriors

The package provides an easy-to-use adaptive random-walk Metropolis sampler, which samples (in principle) from any probability distribution $p$, whose log-density values can be evaluated point-wise.

```
# Load the package
using AdaptiveMCMC
# Define a function which returns log-density values:
log_p(x) = -.5*sum(x.^2)
# Run 10k iterations of the Adaptive Metropolis:
out = adaptive_rwm(zeros(2), log_p, 10_000; algorithm=:am)
# Calculate '95% credible intervals':
using Statistics
mapslices(x->"$(mean(x)) ± $(1.96std(x))", out.X, dims=2)
```

See Adaptation state for explanation of the different `algorithm`

options:

`:am`

=`AdaptiveMetropolis`

`:ram`

=`RobustAdaptiveMetropolis`

`:asm`

=`AdaptiveScalingMetropolis`

`:aswam`

=`AdaptiveScalingWithinAdaptiveMetropolis`

There are a number of other optional keyword arguments, too:

`AdaptiveMCMC.adaptive_rwm`

— Function`out = adaptive_rwm(x0, log_p, n; kwargs)`

Generic adaptive random walk Metropolis algorithm from initial state vector `x0`

targetting log probability density `log_p`

run for `n`

iterations, including adaptive parallel tempering.

**Arguments**

`x0::Vector{<:AbstractFloat}`

: The initial state vector`log_p::Function`

: Function that returns log probability density values (up to an additive constant) for any state vector.`n::Int`

: Total number of iterations

**Keyword arguments**

`algorithm::Symbol`

: The random walk adaptation algorithm; current choices are`:ram`

(default),`:am`

,`:asm`

,`:aswam`

and`:rwm`

. (Alternatively, if algorithm is a vector of AdaptState, then this will be used as an initial state for adaptation.)`b::Int`

: Burn-in length:`b`

:th sample is the first saved sample. Default`⌊n/5⌋`

`thin::Int`

: Thinning factor; only every`thin`

:th sample is stored; default`1`

`fulladapt::Bool`

: Whether to adapt after burn-in; default`true`

`Sp`

: Saved adaptive state from output to restart MCMC; default`nothing`

`Rp`

: Saved rng state from output to restart MCMC; default`nothing`

`indp::Int`

: Index of saved adaptive state to restart MCMC; default`0`

`rng::AbstractRNG`

: Random number generator; default`Random.GLOBAL_RNG`

`q::Function`

: Zero-mean symmetric proposal generator (with arguments`x`

and`rng`

); default`q=randn!(x, rng)`

`L::Int`

: Number of parallel tempering levels`acc_sw::AbstractFloat`

: Desired acceptance rate between level swaps; default`0.234`

`all_levels::Bool`

: Whether to store output of all levels; default`false`

`log_pr::Function`

: Log-prior density function; default`log_pr(x) = 0.0`

.`swaps::Symbol`

: Swap strategy, one of:`:single`

(default, single randomly picked swap)`:randperm`

(swap in random order)`:sweep`

(up- or downward sweep, picked at random)`:nonrev`

(alternate even/odd sites as in Syed, Bouchard-Côté, Deligiannidis, Doucet, arXiv:1905.02939)`progress::Union{Bool,Progress}`

: Whether a progress meter is shown; default`false`

Note that if `log_pr`

is supplied, then `log_p(x)`

is regarded as the log-likelihood (or, equivalently, log-target is `log_p(x) + log_pr(x)`

). Tempering is only applied to `log_p`

, not to `log_pr`

.

The output `out.X contains the simulated samples (column vectors).`

out.allX[k]`for`

k>=2` contain higher temperature auxiliary chains (if requested)

**Examples**

```
log_p(x) = -.5*sum(x.^2)
o = adaptive_rwm(zeros(2), log_p, 10_000; algorithm=:am)
using MCMCChains, StatsPlots # Assuming MCMCChains & StatsPlots are installed...
c = Chains(o.X[1]', start=o.params.b, thin=o.params.thin); plot(c)
```

## With adaptive parallel tempering

If the keyword argument `L`

is greater than one, then the adaptive parallel tempering algorithm (APT) of Miasojedow, Moulines & Vihola (2013) is used. This can greatly improve mixing with multimodal distributions.

Here is a simple multimodal distribution sampled with normal adaptive random walk Metropolis, and with APT:

```
# Multimodal target of dimension d.
function multimodalTarget(d::Int, sigma2=0.1^2, sigman=sigma2)
# The means of mixtures
m = [2.18 5.76; 3.25 3.47; 5.41 2.65; 4.93 1.50; 8.67 9.59;
1.70 0.50; 2.70 7.88; 1.83 0.09; 4.24 8.48; 4.59 5.60;
4.98 3.70; 2.26 0.31; 8.41 1.68; 6.91 5.81; 1.14 2.39;
5.54 6.86; 3.93 8.82; 6.87 5.40; 8.33 9.50; 1.69 8.11]'
n_m = size(m,2)
@assert d>=2 "Dimension should be >= 2"
let m=m, n_m=size(m,2), d=d
function log_p(x::Vector{Float64})
l_dens = -0.5*(mapslices(sum, (m.-x[1:2]).^2, dims=1)/sigma2)
if d>2
l_dens .-= 0.5*mapslices(sum, x[3:d].^2, dims=1)/sigman
end
l_max = maximum(l_dens) # Prevent underflow by log-sum trick
l_max + log(sum(exp.(l_dens.-l_max)))
end
end
end
using AdaptiveMCMC
n = 100_000; L = 2
rwm = adaptive_rwm(zeros(2), multimodalTarget(2), n; thin=10)
apt = adaptive_rwm(zeros(2), multimodalTarget(2), div(n,L); L = L, thin=10)
# Assuming you have 'Plots' installed:
using Plots
plot(scatter(rwm.X[1,:], rwm.X[2,:], title="w/o tempering", legend=:none),
scatter(apt.X[1,:], apt.X[2,:], title="w/ tempering", legend=:none), layout=(1,2))
```

What the APT is actually based on? Parallel tempering is a MCMC algorithm which samples from a product density proporitional to:

\[\prod_{i=1}^L p^{\beta(i)}(x^{(i)}),\]

where (the 'inverse temperatures') $1 = \beta(1) > \beta(2) > \cdots > \beta(L) > 0$.

In the end, the 'first level' is of interest (and samples of the first level are usually used for estimation), whereas the tempered levels $i=2,\ldots,L$ are auxiliary, which help the sampler to move between modes of a multi-modal target. The easier moving is because the tempered densities $p^{\beta(i)}$ are 'flatter' than $p$ for any $\beta(i)<1$.

The sampler consists of two types of MCMC moves:

- Independent adaptive random-walk Metropolis moves on individual levels $i$, targetting tempered densities $p^{\beta(i)}$.
- Switch moves, where swaps of adjacent levels $x^{(i)} \leftrightarrow x^{(i+1)}$ are proposed, and the moves are accepted with (Metropolis-Hastings) probability $\min\big\{1, \frac{p^{\beta(i)-\beta(i+1)}(x^{(i+1)})}{p^{\beta(i)-\beta(i+1)}(x^{(i)})}\big\}$.

In the APT, each random walk sampler for each individual level is adapted totally independently, following exactly the same mechanism as before. Additionally, the APT adapts the inverse temperatures $\beta(2),\ldots,\beta(L)$, in order to reach the average switch probability $0.234$. More precisely, adaptation mechanism tunes the parameters $\rho^{(i)}$, which determine

\[\frac{1}{\beta^{(i)}} = \frac{1}{\beta^{(i-1)}} + e^{\rho^{(i)}},\]

and the adaptation is similar to Adaptive scaling Metropolis: if swap $x^{(i-1)}\leftrightarrow x^{(i)}$ is proposed the $k$:th time, the parameter is updated as follows:

\[\rho_k^{(i)} = \rho_{k-1}^{(i)} + \gamma_k (\alpha_k^{(\text{swap }i)} - \alpha_*),\]

where $\alpha_k^{(\text{swap }i)}$ is the swap probability.

## Restarting simulation

Simulation can be restarted, or continued after one simulation. Here is an example:

```
using AdaptiveMCMC, Random
log_p(x) = -.5*sum(x.^2)
Random.seed!(12345)
# Simulate 200 iterations first:
out = adaptive_rwm(zeros(2), log_p, 200)
# Simulate 100 iterations more:
out2 = adaptive_rwm(out.X[:,end], log_p, 100; Sp=out.S, Rp=out.R, indp=200)
# This results in exactly the same output as simulating 300 samples in one go:
Random.seed!(12345)
out2_ = adaptive_rwm(zeros(2), log_p, 300)
```

## Using individual modules in a custom setting

In many cases, the simple samplers provided by `adaptive_rwm`

are not sufficient, but a custmised sampler is necessary. For instance:

- Adaptative sampler is used only for certain parameters, whilst others are updated by another MCMC scheme, such as with Gibbs moves.
- The sampler state is large, and simulations cannot be saved in memory.

The package provides simple building blocks which allow such custom scenarios. Here is a simple example how the individual components can be used:

```
using AdaptiveMCMC
# Sampler in R^d
function mySampler(log_p, n, x0)
# Initialise random walk sampler state: r.x current state, r.y proposal
r = RWMState(x0)
# Initialise Adaptive Metropolis state (with default parameters)
s = AdaptiveMetropolis(x0)
# Other adaptations are: AdaptiveScalingMetropolis,
# AdaptiveScalingWithinAdaptiveMetropolis, and RobustAdaptiveMetropolis
X = zeros(eltype(x0), length(x0), n) # Allocate output storage
p_x = log_p(r.x) # = log_p(x0); the initial log target
for k = 1:n
# Draw new proposal r.x -> r.y:
draw!(r, s)
p_y = log_p(r.y) # Calculate log target at proposal
alpha = min(one(p_x), exp(p_y - p_x)) # The Metropolis acceptance probability
if rand() <= alpha
p_x = p_y
# This 'accepts', or interchanges r.x <-> r.y:
# (NB: do not do r.x = r.y; these are (pointers to) vectors!)
accept!(r)
end
# Do the adaptation update:
adapt!(s, r, alpha, k)
X[:,k] = r.x # Save the current sample
end
X
end
# Standard normal target for testing
normal_log_p(x) = -mapreduce(e->e*e, +, x)/2
# Run 1M iterations of the sampler targetting 30d standard Normal:
X = mySampler(normal_log_p, 1_000_000, zeros(30))
```

See Random-walk sampler state and Adaptation state for more details about these components.