Log transformation in Stan
In the previous installment of this series I discussed the ubiquitous logarithm transformation, and how to do Jacobian adjustment in the context of the classic Metropolis algorithm. Nowadays, however, all the cool kids are using Stan (https://mc-stan.org/) for drawing their MCMC samples. A common question on the Stan messaging board is whether we have to care about such adjustments. We'll explore this issue by, again, sampling from the Gamma(3,1) distribution.
We will define two Stan files. The first file will contain three models, which can be chosen by supplying an integer as data.
- The first model does not use transformation at all, the constraint is not taken into account at all
- In the second model the samples are (assumed to be) log transformed, but there's no Jacobian adjustment.
- The third model is the same as the second, but now with Jacobian adjustment.
- The fourth model uses Stan's built-in constraint.
Enough with the blabber, here're the codes for the models. Note that again, in the code, only the inverse transform is present:
data {
int s;
}
parameters {
real theta;
}
model {
if(s == 0){
// Model 1: No transformation, will result in divergences
target += gamma_lpdf(theta | 3, 1);
} else if(s == 1){
// Model 2: Transformation but no Jacobian
target += gamma_lpdf(exp(theta) | 3, 1);
} else if(s == 2){
// Model 3: Transformation and Jacobian:
target += gamma_lpdf(exp(theta) | 3, 1);
target += log(fabs(exp(theta)));
}
}
// Model 4: Transformation and Jacobian under the hood
data {
}
parameters {
real<lower = 0> theta;
}
model {
target += gamma_lpdf(theta | 3, 1);
}
When running the first model (R code is here), Stan will complain about divergences: it is not happy about the discontinuity at 0.0.
Warning messages: 1: There were 241 divergent transitions after warmup. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup to find out why this is a problem and how to eliminate them. 2: Examine the pairs() plot to diagnose sampling problems
Despite this problem, the samples approximate the target distribution well (see the figure below). Simple remedy to the divergences is to log transform the parameter theta. Or, again, in other words we sort of pretend that the sampler works on the log scale, and we use the inverse of that transfrom in our model.
The second model implements the log transformation, but there's no Jacobian adjustment. The samples from this model do not approximate the Gamma(3,1) distribution.
In the third model the Jacobian is taken into account. Samples from this model do approximate the Gamma(3,1) distribution.
In the fourth model (which is in a separate file) we use Stan's built-in constraint for θ. Now the transformation and Jacobian adjustment are done under the hood.
Note that since Stan operates on the log posterior, we will also have to use log Jacobian. This means that instead of multiplying the posterior density we will add the (log) Jacobian to it.
Here are histograms of the samples against the target distribution for each of the models:
It's clear that the model in which the samples are log transformed, the Jacobian adjustment is needed. Again, if you use Stan's built-in constraints, all of this will be done under the hood for you.