Introduction

This package provides implementations of some general-purpose random-walk based adaptive MCMC algorithms, including the following:

Adaptive Metropolis, proposal covariance adaptation, (Haario, Saksman and Tamminen, 2001, and Andrieu and Moulines, 2006)
Adaptive scaling Metropolis, acceptance rate adaptation for scale (e.g. as in Andrieu and Thoms, 2008, and Atchadé and Fort, 2010)
Robust Adaptive Metropolis, acceptance rate adaptation for shape (Vihola, 2012)
Adaptive Parallel Tempering, acceptance rate adaptation for temperature levels (Miasojedow, Moulines and Vihola, 2013)

The aim of the package is to provide a simple and modular general-purpose implementation, which may be easily used to sample from a log-target density, but also used in a variety of custom settings.

See also AdaptiveParticleMCMC.jl which uses this package with SequentialMonteCarlo.jl for adaptive particle MCMC.

Installation

To get the latest registered version:

using Pkg
Pkg.add("AdaptiveMCMC")

To install the latest development version:

using Pkg
Pkg.add(url="https://github.com/mvihola/AdaptiveMCMC.jl")

With adaptive parallel tempering

If the keyword argument L is greater than one, then the adaptive parallel tempering algorithm (APT) of Miasojedow, Moulines & Vihola (2013) is used. This can greatly improve mixing with multimodal distributions.

Here is a simple multimodal distribution sampled with normal adaptive random walk Metropolis, and with APT:

# Multimodal target of dimension d.
function multimodalTarget(d::Int, sigma2=0.1^2, sigman=sigma2)
    # The means of mixtures
    m = [2.18 5.76; 3.25 3.47; 5.41 2.65; 4.93 1.50; 8.67 9.59;
         1.70 0.50; 2.70 7.88; 1.83 0.09; 4.24 8.48; 4.59 5.60;
         4.98 3.70; 2.26 0.31; 8.41 1.68; 6.91 5.81; 1.14 2.39;
         5.54 6.86; 3.93 8.82; 6.87 5.40; 8.33 9.50; 1.69 8.11]'
    n_m = size(m,2)
    @assert d>=2 "Dimension should be >= 2"
    let m=m, n_m=size(m,2), d=d
        function log_p(x::Vector{Float64})
            l_dens = -0.5*(mapslices(sum, (m.-x[1:2]).^2, dims=1)/sigma2)
            if d>2
                l_dens .-= 0.5*mapslices(sum, x[3:d].^2, dims=1)/sigman
            end
            l_max = maximum(l_dens) # Prevent underflow by log-sum trick
            l_max + log(sum(exp.(l_dens.-l_max)))
        end
    end
end

using AdaptiveMCMC
n = 100_000; L = 2
rwm = adaptive_rwm(zeros(2), multimodalTarget(2), n; thin=10)
apt = adaptive_rwm(zeros(2), multimodalTarget(2), div(n,L); L = L, thin=10)

# Assuming you have 'Plots' installed:
using Plots
plot(scatter(rwm.X[1,:], rwm.X[2,:], title="w/o tempering", legend=:none),
scatter(apt.X[1,:], apt.X[2,:], title="w/ tempering", legend=:none), layout=(1,2))

What the APT is actually based on? Parallel tempering is a MCMC algorithm which samples from a product density proporitional to:

\[\prod_{i=1}^L p^{\beta(i)}(x^{(i)}),\]

where (the 'inverse temperatures') $1 = \beta(1) > \beta(2) > \cdots > \beta(L) > 0$.

In the end, the 'first level' is of interest (and samples of the first level are usually used for estimation), whereas the tempered levels $i=2,\ldots,L$ are auxiliary, which help the sampler to move between modes of a multi-modal target. The easier moving is because the tempered densities $p^{\beta(i)}$ are 'flatter' than $p$ for any $\beta(i)<1$.

The sampler consists of two types of MCMC moves:

Independent adaptive random-walk Metropolis moves on individual levels $i$, targetting tempered densities $p^{\beta(i)}$.
Switch moves, where swaps of adjacent levels $x^{(i)} \leftrightarrow x^{(i+1)}$ are proposed, and the moves are accepted with (Metropolis-Hastings) probability $\min\big\{1, \frac{p^{\beta(i)-\beta(i+1)}(x^{(i+1)})}{p^{\beta(i)-\beta(i+1)}(x^{(i)})}\big\}$.

In the APT, each random walk sampler for each individual level is adapted totally independently, following exactly the same mechanism as before. Additionally, the APT adapts the inverse temperatures $\beta(2),\ldots,\beta(L)$, in order to reach the average switch probability $0.234$. More precisely, adaptation mechanism tunes the parameters $\rho^{(i)}$, which determine

\[\frac{1}{\beta^{(i)}} = \frac{1}{\beta^{(i-1)}} + e^{\rho^{(i)}},\]

and the adaptation is similar to Adaptive scaling Metropolis: if swap $x^{(i-1)}\leftrightarrow x^{(i)}$ is proposed the $k$:th time, the parameter is updated as follows:

\[\rho_k^{(i)} = \rho_{k-1}^{(i)} + \gamma_k (\alpha_k^{(\text{swap }i)} - \alpha_*),\]

where $\alpha_k^{(\text{swap }i)}$ is the swap probability.

Restarting simulation

Simulation can be restarted, or continued after one simulation. Here is an example:

using AdaptiveMCMC, Random
log_p(x) = -.5*sum(x.^2)
Random.seed!(12345)
# Simulate 200 iterations first:
out = adaptive_rwm(zeros(2), log_p, 200)
# Simulate 100 iterations more:
out2 = adaptive_rwm(out.X[:,end], log_p, 100; Sp=out.S, Rp=out.R, indp=200)
# This results in exactly the same output as simulating 300 samples in one go:
Random.seed!(12345)
out2_ = adaptive_rwm(zeros(2), log_p, 300)

Using individual modules in a custom setting

In many cases, the simple samplers provided by adaptive_rwm are not sufficient, but a custmised sampler is necessary. For instance:

Adaptative sampler is used only for certain parameters, whilst others are updated by another MCMC scheme, such as with Gibbs moves.
The sampler state is large, and simulations cannot be saved in memory.

The package provides simple building blocks which allow such custom scenarios. Here is a simple example how the individual components can be used:

using AdaptiveMCMC

# Sampler in R^d
function mySampler(log_p, n, x0)

    # Initialise random walk sampler state: r.x current state, r.y proposal
    r = RWMState(x0)

    # Initialise Adaptive Metropolis state (with default parameters)
    s = AdaptiveMetropolis(x0)
    # Other adaptations are: AdaptiveScalingMetropolis,
    # AdaptiveScalingWithinAdaptiveMetropolis, and RobustAdaptiveMetropolis

    X = zeros(eltype(x0), length(x0), n) # Allocate output storage
    p_x = log_p(r.x)                     # = log_p(x0); the initial log target
    for k = 1:n

        # Draw new proposal r.x -> r.y:
        draw!(r, s)

        p_y = log_p(r.y)                      # Calculate log target at proposal
        alpha = min(one(p_x), exp(p_y - p_x)) # The Metropolis acceptance probability

        if rand() <= alpha
            p_x = p_y

            # This 'accepts', or interchanges r.x <-> r.y:
            # (NB: do not do r.x = r.y; these are (pointers to) vectors!)
            accept!(r)
        end

        # Do the adaptation update:
        adapt!(s, r, alpha, k)

        X[:,k] = r.x   # Save the current sample
     end
    X
end

# Standard normal target for testing
normal_log_p(x) = -mapreduce(e->e*e, +, x)/2

# Run 1M iterations of the sampler targetting 30d standard Normal:
X = mySampler(normal_log_p, 1_000_000, zeros(30))

See Random-walk sampler state and Adaptation state for more details about these components.

Introduction

Installation

Sampling from log-posteriors

With adaptive parallel tempering

Restarting simulation

Using individual modules in a custom setting