# Adaptation state

The state of each adaptive algorithms is encapsulated to a subtype of abstract type `AdaptState`

. Each adaptive algorithm implements two methods

`draw!(::RWMState, ::AdaptState)`

`adapt!(::AdaptState, ::RWMState, α)`

The `draw!`

method forms proposals of the form

\[Y_k = X_{k-1} + S_{k-1} U_k, \quad U_k \sim q,\]

where $q$ is the 'prototype proposal' as discussed in Random walk sampler state, and the form of the 'shape' matrix factor $S_{k-1}$ depends on the chosen adaptive algorithm.

The `adapt!`

method updates the adapted state, based on the state of the random walk sampler. Some of the adaptation algorithms use the third argument of `adapt!`

: the acceptance probability $\alpha\in[0,1]$.

## Adaptive Metropolis

`AdaptiveMCMC.AdaptiveMetropolis`

— Type```
r = AdaptiveMetropolis(x0; kwargs)
r = AdaptiveMetropolis(x0, [scale, [step]])
```

Constructor for AdaptiveMetropolis state.

**Arguments**

`x0`

: The initial state vector

**Keyword arguments**

`scale`

: Scaling parameter; default`2.38/sqrt(d)`

where`d`

is dimension.`step`

: Step size object; default`PolynomialStepSize(1.0)`

`S_init`

: Initial proposal shape Cholesky factor; default`identity_cholesky(x0)`

If `s`

is `RWMState`

, then proposal samples may be drawn calling `draw!(s, r)`

and adaptation is performed with `adapt!(r, s)`

or `adapt_rb!(r, s, α)`

.

This is the seminal Adaptive Metropolis (AM) algorithm of Haario, Saksman & Tamminen (2001), where the proposal increment shape is

\[S_{k-1} + s_d L_{k-1},\]

where $s_d$ is a fixed scaling defaulting to $s_d = \frac{2.38}{\sqrt{d}}$, where $d$ is the dimension, and $L_{k-1}$ is the Cholesky factor of estimated covariance matrix.

The covariance matrix is estimated using the recursive Robbins-Monro stochastic approximation update of Andrieu & Moulines (2006), that is,

\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k X_k, \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k (X_k - \mu_{k-1})(X_k - \mu_{k-1})^T, \end{aligned}\]

where $\gamma_k$ is one of the Step sizes.

In fact, the $\Sigma_k$ fators are not calculated directly as given above, but their Cholesky factors are updated sequentially, using rank-1 updates. This is more efficient, as calculating a full Cholesky factor is $O(d^3)$ but rank-1 updates are $O(d^2)$.

The empirical covariances $\Sigma_k$ of the AM converge to the true target covariance (under technical conditions).

## Robust adaptive Metropolis

`AdaptiveMCMC.RobustAdaptiveMetropolis`

— Type```
r = RobustAdaptiveMetropolis(x0; kwargs)
r = RobustAdaptiveMetropolis(x0, [acc_target, [step]])
```

Constructor for RobustAdaptiveMetropolis state.

**Arguments**

`x0`

: The initial state vector

**Keyword arguments**

`acc_target`

: Desired mean accept rate; default`0.234`

.`step`

: Step size object; default`RAMStepSize(0.66,d)`

where`d`

is state dimension.`S_init`

: Initial proposal shape Cholesky factor; default`identity_cholesky(x0)`

If `s`

is `RWMState`

, then proposal samples may be drawn calling `draw!(s, r)`

and adaptation is performed with `adapt!(r, s, α)`

.

This is the Robust Adaptive Metropolis (RAM) of Vihola, 2010. In this algorithm, the shape $S_{k}$ is adapted directly by Cholesky rank-1 updates:

\[S_k S_k^T = S_{k-1} \bigg( I + \gamma_k (\alpha_k - \alpha_*) \frac{U_k U_k^T}{\| U_k \|^2} \bigg) S_{k-1},\]

where $\alpha_k$ is the acceptance probability of the move $X_{k-1} \to Y_k$, and $U_k$ is the 'prototype proposal' used when forming the proposal $Y_k$, and $\alpha_*\in(0,1)$ is the desired acceptance rate, defaulting to $0.234$.

In the case of finite-variance elliptically symmetric target distribution (and technical conditions), the shape parameter $S_k$ converges to a Cholesky factor of a matrix which is proportional to the covariance. However, in general, the limiting shapes of the RAM and the AM can differ.

The RAM seems to have empirical advantage over AM in certain cases, such as higher dimensions, but RAM is not directly applicable for instance with Pseudo-marginal algorithms.

## Adaptive scaling Metropolis

`AdaptiveMCMC.AdaptiveScalingMetropolis`

— Type```
r = AdaptiveScalingMetropolis(x0; kwargs)
r = AdaptiveScalingMetropolis(x0, [acc_target, [scale, [step]]])
r = AdaptiveScalingMetropolis(acc_target, scale, step)
```

Constructor for AdaptiveScalingMetropolis state.

**Arguments**

`x0`

: The initial state vector (not used).

**Keyword arguments**

`acc_target`

: Desired mean accept rate; default`0.44`

for univariate and`0.234`

for multivariate.`scale`

: Initial scaling; default`1.0`

.`step`

: Step size object; default`PolynomialStepSize(0.66)`

.

If `s`

is `RWMState`

, then proposal samples may be drawn calling `draw!(s, r)`

and adaptation is performed with `adapt!(r, s, α)`

.

Adaptive scaling Metropolis (ASM) implements a version of adaptive scaling first suggested in the Andrieu and Robert, 2001 preprint, but the version implemented by `AdaptiveMCMC.jl`

is due to Andrieu and Thoms, 2008 and Atchadé and Fort, 2010.

In the ASM, the adaptation parameter $S_k$ is a scalar, and it has the following dynamic:

\[\log S_k = \log S_{k-1} + \gamma_k (\alpha_k - \alpha_*).\]

The update has similarities with RAM, which may be regarded as a multivariate generalisation of this rule.

## Adaptive scaling within adaptive Metropolis

`AdaptiveMCMC.AdaptiveScalingWithinAdaptiveMetropolis`

— Type```
r = AdaptiveScalingWithinAdaptiveMetropolis(x0; kwargs)
r = AdaptiveScalingWithinAdaptiveMetropolis(x0, [acc_target,
[scale, [stepAM, [stepASM]]]])
```

Constructor for AdaptiveScalingWithinAdaptiveMetropolis state.

**Arguments**

`x0`

: The initial state vector.

**Keyword arguments**

`acc_target`

: Desired mean accept rate; default`0.234`

.`scale`

: Initial scaling; default`2.38/sqrt(d)`

where`d`

is the dimension.`stepAM`

: Step size object for covariance adaptation; default`PolynomialStepSize(0.66)`

.`stepASM`

: Step size object for scaling adaptation; default`PolynomialStepSize(0.66)`

.

If `s`

is `RWMState`

, then proposal samples may be drawn calling `draw!(s, r)`

and adaptation is performed with `adapt!(r, s, α)`

or `adapt_rb!(r, s, α)`

.

The adaptive scaling within adaptive Metropolis combines the AM and ASM, as suggested (at least) in Andrieu and Thoms, 2008. That is:

- The scaling $\theta_k$ is updated as in the ASM
- The covariance $\Sigma_k$ is updated as in AM, with Cholesky factor $L_k$
- The scaling parameter is both combined: $S_k = \theta_k L_k$

This algorithm is often more robust than AM, but often outperformed by a simpler RAM.

## 'Rao-Blackwellised' update

The covariance states of `AdaptiveMetropolis`

and `AdaptiveScalingWithingAdaptiveMetropolis`

also support the alternative adaptation suggested in Andrieu and Thoms, 2008. The alternative update call is the following:

`AdaptiveMCMC.adapt_rb!`

— Functionadapt_rb!(s, r, α)

Rao-Blackwellised adaptation step of Adaptive Metropolis as suggested by Andrieu & Thoms (Statist. Comput. 2008).

**Arguments:**

`s`

: AdaptiveMetropolis object`r`

: RWMState object`α`

: Acceptance rate

NB: This function should be called *before* calling accept!(r).

This update is not based on new, accepted state, but involves a convex combination of the current and proposed states, weighted by the acceptance probability:

\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k \big[(1-\alpha_k) X_{k-1} + \alpha_k Y_k\big], \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k \big[(1-\alpha_k)(X_{k-1} - \mu_{k-1})(X_{k-1} - \mu_{k-1})^T + \alpha_k (Y_{k} - \mu_{k-1})(Y_{k} - \mu_{k-1})^T\big]. \end{aligned}\]

This update is 'Rao-Blackwellised' in the sense that it may be regarded as a conditional expectation of the AM adaptation rule, integrating the accept/reject decision. The Rao-Blackwellised rule can stabilise the covariance adaptation slightly.