Adaptation state

The state of each adaptive algorithms is encapsulated to a subtype of abstract type AdaptState. Each adaptive algorithm implements two methods

  • draw!(::RWMState, ::AdaptState)
  • adapt!(::AdaptState, ::RWMState, α)

The draw! method forms proposals of the form

\[Y_k = X_{k-1} + S_{k-1} U_k, \quad U_k \sim q,\]

where $q$ is the 'prototype proposal' as discussed in Random walk sampler state, and the form of the 'shape' matrix factor $S_{k-1}$ depends on the chosen adaptive algorithm.

The adapt! method updates the adapted state, based on the state of the random walk sampler. Some of the adaptation algorithms use the third argument of adapt!: the acceptance probability $\alpha\in[0,1]$.

Adaptive Metropolis

AdaptiveMCMC.AdaptiveMetropolisType
r = AdaptiveMetropolis(x0; kwargs)
r = AdaptiveMetropolis(x0, [scale, [step]])

Constructor for AdaptiveMetropolis state.

Arguments

  • x0: The initial state vector

Keyword arguments

  • scale: Scaling parameter; default 2.38/sqrt(d) where d is dimension.
  • step: Step size object; default PolynomialStepSize(1.0)
  • S_init: Initial proposal shape Cholesky factor; default identity_cholesky(x0)

If s is RWMState, then proposal samples may be drawn calling draw!(s, r) and adaptation is performed with adapt!(r, s) or adapt_rb!(r, s, α).

source

This is the seminal Adaptive Metropolis (AM) algorithm of Haario, Saksman & Tamminen (2001), where the proposal increment shape is

\[S_{k-1} + s_d L_{k-1},\]

where $s_d$ is a fixed scaling defaulting to $s_d = \frac{2.38}{\sqrt{d}}$, where $d$ is the dimension, and $L_{k-1}$ is the Cholesky factor of estimated covariance matrix.

The covariance matrix is estimated using the recursive Robbins-Monro stochastic approximation update of Andrieu & Moulines (2006), that is,

\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k X_k, \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k (X_k - \mu_{k-1})(X_k - \mu_{k-1})^T, \end{aligned}\]

where $\gamma_k$ is one of the Step sizes.

In fact, the $\Sigma_k$ fators are not calculated directly as given above, but their Cholesky factors are updated sequentially, using rank-1 updates. This is more efficient, as calculating a full Cholesky factor is $O(d^3)$ but rank-1 updates are $O(d^2)$.

The empirical covariances $\Sigma_k$ of the AM converge to the true target covariance (under technical conditions).

Robust adaptive Metropolis

AdaptiveMCMC.RobustAdaptiveMetropolisType
r = RobustAdaptiveMetropolis(x0; kwargs)
r = RobustAdaptiveMetropolis(x0, [acc_target, [step]])

Constructor for RobustAdaptiveMetropolis state.

Arguments

  • x0: The initial state vector

Keyword arguments

  • acc_target: Desired mean accept rate; default 0.234.
  • step: Step size object; default RAMStepSize(0.66,d) where d is state dimension.
  • S_init: Initial proposal shape Cholesky factor; default identity_cholesky(x0)

If s is RWMState, then proposal samples may be drawn calling draw!(s, r) and adaptation is performed with adapt!(r, s, α).

source

This is the Robust Adaptive Metropolis (RAM) of Vihola, 2010. In this algorithm, the shape $S_{k}$ is adapted directly by Cholesky rank-1 updates:

\[S_k S_k^T = S_{k-1} \bigg( I + \gamma_k (\alpha_k - \alpha_*) \frac{U_k U_k^T}{\| U_k \|^2} \bigg) S_{k-1},\]

where $\alpha_k$ is the acceptance probability of the move $X_{k-1} \to Y_k$, and $U_k$ is the 'prototype proposal' used when forming the proposal $Y_k$, and $\alpha_*\in(0,1)$ is the desired acceptance rate, defaulting to $0.234$.

In the case of finite-variance elliptically symmetric target distribution (and technical conditions), the shape parameter $S_k$ converges to a Cholesky factor of a matrix which is proportional to the covariance. However, in general, the limiting shapes of the RAM and the AM can differ.

The RAM seems to have empirical advantage over AM in certain cases, such as higher dimensions, but RAM is not directly applicable for instance with Pseudo-marginal algorithms.

Adaptive scaling Metropolis

AdaptiveMCMC.AdaptiveScalingMetropolisType
r = AdaptiveScalingMetropolis(x0; kwargs)
r = AdaptiveScalingMetropolis(x0, [acc_target, [scale, [step]]])
r = AdaptiveScalingMetropolis(acc_target, scale, step)

Constructor for AdaptiveScalingMetropolis state.

Arguments

  • x0: The initial state vector (not used).

Keyword arguments

  • acc_target: Desired mean accept rate; default 0.44 for univariate and 0.234 for multivariate.
  • scale: Initial scaling; default 1.0.
  • step: Step size object; default PolynomialStepSize(0.66).

If s is RWMState, then proposal samples may be drawn calling draw!(s, r) and adaptation is performed with adapt!(r, s, α).

source

Adaptive scaling Metropolis (ASM) implements a version of adaptive scaling first suggested in the Andrieu and Robert, 2001 preprint, but the version implemented by AdaptiveMCMC.jl is due to Andrieu and Thoms, 2008 and Atchadé and Fort, 2010.

In the ASM, the adaptation parameter $S_k$ is a scalar, and it has the following dynamic:

\[\log S_k = \log S_{k-1} + \gamma_k (\alpha_k - \alpha_*).\]

The update has similarities with RAM, which may be regarded as a multivariate generalisation of this rule.

Adaptive scaling within adaptive Metropolis

AdaptiveMCMC.AdaptiveScalingWithinAdaptiveMetropolisType
r = AdaptiveScalingWithinAdaptiveMetropolis(x0; kwargs)
r = AdaptiveScalingWithinAdaptiveMetropolis(x0, [acc_target, 
      [scale, [stepAM, [stepASM]]]])

Constructor for AdaptiveScalingWithinAdaptiveMetropolis state.

Arguments

  • x0: The initial state vector.

Keyword arguments

  • acc_target: Desired mean accept rate; default 0.234.
  • scale: Initial scaling; default 2.38/sqrt(d) where d is the dimension.
  • stepAM: Step size object for covariance adaptation; default PolynomialStepSize(0.66).
  • stepASM: Step size object for scaling adaptation; default PolynomialStepSize(0.66).

If s is RWMState, then proposal samples may be drawn calling draw!(s, r) and adaptation is performed with adapt!(r, s, α) or adapt_rb!(r, s, α).

source

The adaptive scaling within adaptive Metropolis combines the AM and ASM, as suggested (at least) in Andrieu and Thoms, 2008. That is:

  • The scaling $\theta_k$ is updated as in the ASM
  • The covariance $\Sigma_k$ is updated as in AM, with Cholesky factor $L_k$
  • The scaling parameter is both combined: $S_k = \theta_k L_k$

This algorithm is often more robust than AM, but often outperformed by a simpler RAM.

'Rao-Blackwellised' update

The covariance states of AdaptiveMetropolis and AdaptiveScalingWithingAdaptiveMetropolis also support the alternative adaptation suggested in Andrieu and Thoms, 2008. The alternative update call is the following:

AdaptiveMCMC.adapt_rb!Function

adapt_rb!(s, r, α)

Rao-Blackwellised adaptation step of Adaptive Metropolis as suggested by Andrieu & Thoms (Statist. Comput. 2008).

Arguments:

  • s: AdaptiveMetropolis object
  • r: RWMState object
  • α: Acceptance rate

NB: This function should be called before calling accept!(r).

source

This update is not based on new, accepted state, but involves a convex combination of the current and proposed states, weighted by the acceptance probability:

\[\begin{aligned} \mu_k &= (1-\gamma_k)\mu_{k-1} + \gamma_k \big[(1-\alpha_k) X_{k-1} + \alpha_k Y_k\big], \\ \Sigma_k &= (1-\gamma_k)\Sigma_{k-1} + \gamma_k \big[(1-\alpha_k)(X_{k-1} - \mu_{k-1})(X_{k-1} - \mu_{k-1})^T + \alpha_k (Y_{k} - \mu_{k-1})(Y_{k} - \mu_{k-1})^T\big]. \end{aligned}\]

This update is 'Rao-Blackwellised' in the sense that it may be regarded as a conditional expectation of the AM adaptation rule, integrating the accept/reject decision. The Rao-Blackwellised rule can stabilise the covariance adaptation slightly.