Adaptation state
The state of each adaptive algorithms is encapsulated to a subtype of abstract type AdaptState
. Each adaptive algorithm implements two methods
draw!(::RWMState, ::AdaptState)
adapt!(::AdaptState, ::RWMState, α)
The draw!
method forms proposals of the form
\[Y_k = X_{k-1} + S_{k-1} U_k, \quad U_k \sim q,\]
where $q$ is the 'prototype proposal' as discussed in Random walk sampler state, and the form of the 'shape' matrix factor $S_{k-1}$ depends on the chosen adaptive algorithm.
The adapt!
method updates the adapted state, based on the state of the random walk sampler. Some of the adaptation algorithms use the third argument of adapt!
: the acceptance probability $\alpha\in[0,1]$.
Adaptive Metropolis
AdaptiveMCMC.AdaptiveMetropolis
— Typer = AdaptiveMetropolis(x0; kwargs)
r = AdaptiveMetropolis(x0, [scale, [step]])
Constructor for AdaptiveMetropolis state.
Arguments
x0
: The initial state vector
Keyword arguments
scale
: Scaling parameter; default2.38/sqrt(d)
whered
is dimension.step
: Step size object; defaultPolynomialStepSize(1.0)
S_init
: Initial proposal shape Cholesky factor; defaultidentity_cholesky(x0)
If s
is RWMState
, then proposal samples may be drawn calling draw!(s, r)
and adaptation is performed with adapt!(r, s)
or adapt_rb!(r, s, α)
.
This is the seminal Adaptive Metropolis (AM) algorithm of Haario, Saksman & Tamminen (2001), where the proposal increment shape is
\[S_{k-1} + s_d L_{k-1},\]
where $s_d$ is a fixed scaling defaulting to $s_d = \frac{2.38}{\sqrt{d}}$, where $d$ is the dimension, and $L_{k-1}$ is the Cholesky factor of estimated covariance matrix.
The covariance matrix is estimated using the recursive Robbins-Monro stochastic approximation update of Andrieu & Moulines (2006), that is,
\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k X_k, \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k (X_k - \mu_{k-1})(X_k - \mu_{k-1})^T, \end{aligned}\]
where $\gamma_k$ is one of the Step sizes.
In fact, the $\Sigma_k$ fators are not calculated directly as given above, but their Cholesky factors are updated sequentially, using rank-1 updates. This is more efficient, as calculating a full Cholesky factor is $O(d^3)$ but rank-1 updates are $O(d^2)$.
The empirical covariances $\Sigma_k$ of the AM converge to the true target covariance (under technical conditions).
Robust adaptive Metropolis
AdaptiveMCMC.RobustAdaptiveMetropolis
— Typer = RobustAdaptiveMetropolis(x0; kwargs)
r = RobustAdaptiveMetropolis(x0, [acc_target, [step]])
Constructor for RobustAdaptiveMetropolis state.
Arguments
x0
: The initial state vector
Keyword arguments
acc_target
: Desired mean accept rate; default0.234
.step
: Step size object; defaultRAMStepSize(0.66,d)
whered
is state dimension.S_init
: Initial proposal shape Cholesky factor; defaultidentity_cholesky(x0)
If s
is RWMState
, then proposal samples may be drawn calling draw!(s, r)
and adaptation is performed with adapt!(r, s, α)
.
This is the Robust Adaptive Metropolis (RAM) of Vihola, 2010. In this algorithm, the shape $S_{k}$ is adapted directly by Cholesky rank-1 updates:
\[S_k S_k^T = S_{k-1} \bigg( I + \gamma_k (\alpha_k - \alpha_*) \frac{U_k U_k^T}{\| U_k \|^2} \bigg) S_{k-1},\]
where $\alpha_k$ is the acceptance probability of the move $X_{k-1} \to Y_k$, and $U_k$ is the 'prototype proposal' used when forming the proposal $Y_k$, and $\alpha_*\in(0,1)$ is the desired acceptance rate, defaulting to $0.234$.
In the case of finite-variance elliptically symmetric target distribution (and technical conditions), the shape parameter $S_k$ converges to a Cholesky factor of a matrix which is proportional to the covariance. However, in general, the limiting shapes of the RAM and the AM can differ.
The RAM seems to have empirical advantage over AM in certain cases, such as higher dimensions, but RAM is not directly applicable for instance with Pseudo-marginal algorithms.
Adaptive scaling Metropolis
AdaptiveMCMC.AdaptiveScalingMetropolis
— Typer = AdaptiveScalingMetropolis(x0; kwargs)
r = AdaptiveScalingMetropolis(x0, [acc_target, [scale, [step]]])
r = AdaptiveScalingMetropolis(acc_target, scale, step)
Constructor for AdaptiveScalingMetropolis state.
Arguments
x0
: The initial state vector (not used).
Keyword arguments
acc_target
: Desired mean accept rate; default0.44
for univariate and0.234
for multivariate.scale
: Initial scaling; default1.0
.step
: Step size object; defaultPolynomialStepSize(0.66)
.
If s
is RWMState
, then proposal samples may be drawn calling draw!(s, r)
and adaptation is performed with adapt!(r, s, α)
.
Adaptive scaling Metropolis (ASM) implements a version of adaptive scaling first suggested in the Andrieu and Robert, 2001 preprint, but the version implemented by AdaptiveMCMC.jl
is due to Andrieu and Thoms, 2008 and Atchadé and Fort, 2010.
In the ASM, the adaptation parameter $S_k$ is a scalar, and it has the following dynamic:
\[\log S_k = \log S_{k-1} + \gamma_k (\alpha_k - \alpha_*).\]
The update has similarities with RAM, which may be regarded as a multivariate generalisation of this rule.
Adaptive scaling within adaptive Metropolis
AdaptiveMCMC.AdaptiveScalingWithinAdaptiveMetropolis
— Typer = AdaptiveScalingWithinAdaptiveMetropolis(x0; kwargs)
r = AdaptiveScalingWithinAdaptiveMetropolis(x0, [acc_target,
[scale, [stepAM, [stepASM]]]])
Constructor for AdaptiveScalingWithinAdaptiveMetropolis state.
Arguments
x0
: The initial state vector.
Keyword arguments
acc_target
: Desired mean accept rate; default0.234
.scale
: Initial scaling; default2.38/sqrt(d)
whered
is the dimension.stepAM
: Step size object for covariance adaptation; defaultPolynomialStepSize(0.66)
.stepASM
: Step size object for scaling adaptation; defaultPolynomialStepSize(0.66)
.
If s
is RWMState
, then proposal samples may be drawn calling draw!(s, r)
and adaptation is performed with adapt!(r, s, α)
or adapt_rb!(r, s, α)
.
The adaptive scaling within adaptive Metropolis combines the AM and ASM, as suggested (at least) in Andrieu and Thoms, 2008. That is:
- The scaling $\theta_k$ is updated as in the ASM
- The covariance $\Sigma_k$ is updated as in AM, with Cholesky factor $L_k$
- The scaling parameter is both combined: $S_k = \theta_k L_k$
This algorithm is often more robust than AM, but often outperformed by a simpler RAM.
'Rao-Blackwellised' update
The covariance states of AdaptiveMetropolis
and AdaptiveScalingWithingAdaptiveMetropolis
also support the alternative adaptation suggested in Andrieu and Thoms, 2008. The alternative update call is the following:
AdaptiveMCMC.adapt_rb!
— Functionadapt_rb!(s, r, α)
Rao-Blackwellised adaptation step of Adaptive Metropolis as suggested by Andrieu & Thoms (Statist. Comput. 2008).
Arguments:
s
: AdaptiveMetropolis objectr
: RWMState objectα
: Acceptance rate
NB: This function should be called before calling accept!(r).
This update is not based on new, accepted state, but involves a convex combination of the current and proposed states, weighted by the acceptance probability:
\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k \big[(1-\alpha_k) X_{k-1} + \alpha_k Y_k\big], \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k \big[(1-\alpha_k)(X_{k-1} - \mu_{k-1})(X_{k-1} - \mu_{k-1})^T + \alpha_k (Y_{k} - \mu_{k-1})(Y_{k} - \mu_{k-1})^T\big]. \end{aligned}\]
This update is 'Rao-Blackwellised' in the sense that it may be regarded as a conditional expectation of the AM adaptation rule, integrating the accept/reject decision. The Rao-Blackwellised rule can stabilise the covariance adaptation slightly.