Bayesian model fails to estimate

I have a Bayesian model that fails to converge.

Here is the model. A 1-cmt model with proportional error and LogNormal priors and likelihood:

@model begin
    @param begin
        tvcl ~ LogNormal(log(1.1), 0.25)
        tvvc ~ LogNormal(log(70), 0.25)
        σ ~ truncated(Cauchy(0, 5), 0, Inf)
        C ~ LKJCholesky(2, 1.0)
        ω ~ Constrained(
            MvNormal(zeros(2), Diagonal(0.4^2 * ones(2)));
            lower=zeros(2),
            upper=fill(Inf, 2),
            init=ones(2)
        )
    end

    @random begin
        η ~ MvNormal(Diagonal(ω) * C * Diagonal(ω))
    end

    @pre begin
        # PK parameters
        CL = tvcl * exp(η[1])
        Vc = tvvc * exp(η[2])
    end

    @dynamics begin
        Central' = -CL / Vc * Central
    end

    @derived begin
        cp := @. Central / Vc
        dv ~ @. LogNormal(log(cp), cp * σ)
    end
end

This is my current fit using BayesMCMC():

Chains MCMC chain (1000×6×4 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 411.02 seconds
Compute duration  = 799.76 seconds
parameters        = tvcl, tvvc, σ, C₂,₁, ω₁, ω₂

Summary Statistics
  parameters      mean       std   naive_se      mcse       ess                  rhat  ⋯
      Symbol   Float64   Float64    Float64   Float64   Float64               Float64  ⋯

        tvcl    0.9790    0.0180     0.0003    0.0023    8.0160    5873614936937.1377  ⋯
        tvvc   10.1971    0.1832     0.0029    0.0230    8.0160   98493110057788.4062  ⋯
           σ    0.3003    0.0007     0.0000    0.0001    8.0160    6062788037191.0049  ⋯
        C₂,₁   -0.0012    0.0014     0.0000    0.0002    8.0160                   Inf  ⋯
          ω₁    0.0901    0.0003     0.0000    0.0000    8.0160    6510201446863.5693  ⋯
          ω₂    0.0898    0.0008     0.0000    0.0001    8.0160   19313243996740.9648  ⋯

Take a look at the rhats they are huge.

1 Like

We have to understand that the Bayesian (state-of-the-art) sampler that Pumas uses is not immune to high-curvature posterior topologies.
This means that even a simple 1-cmt model can be a challenge for estimation.

What you are doing is what we call “centered parameterization” (CP).
It is not a good naming, but it means that one parameter depends on the value of another parameter.
Take a look at you η:

    @random begin
        η ~ MvNormal(Diagonal(ω) * C * Diagonal(ω))
    end

They depend on both ω and C.

What we can do is to remove those dependencies from the η and then do some sort of transformation afterwards.
This is what we call “non-centered parameterization” (NCP).
Again, the naming is not the best one, but what it means is that there aren’t any parameters that depends on other parameters (at least in terms of sampling and priors).
Here is an example for you ηs:

    @random begin
        ηstd ~ MvNormal(I(2))
    end

    @pre begin
        # compute the η from the ηstd
        # using lower Cholesky triangular matrix
        η = Diagonal(ω) * (getchol(C).L * ηstd)
    end

Here you can see that we have the ηstd in the @random block with samples (has a prior) from a MvNormal with mean 0 and covariance matrix the identity matrix I(2).
In the @pre block we compute back the η from ηstd, ω and C.

The two parameterizations are equivalent but for the Bayesian sampler they generate very different posterior topologies.
The NCP is much “friendlier” than the CP.
Hence, the NCP model converges and runs much faster than CP.

You can find more in this tutorial: https://tutorials.pumas.ai/html/bayesian/02-bayes_random_effects.html

Here is your full model using NCP:

@model begin
    @param begin
        tvcl ~ LogNormal(log(1.1), 0.25)
        tvvc ~ LogNormal(log(70), 0.25)
        σ ~ truncated(Cauchy(0, 5), 0, Inf)
        C ~ LKJCholesky(2, 1.0)
        ω ~ Constrained(
            MvNormal(zeros(2), Diagonal(0.4^2 * ones(2)));
            lower=zeros(2),
            upper=fill(Inf, 2),
            init=ones(2)
        )
    end

    @random begin
        ηstd ~ MvNormal(I(2))
    end

    @pre begin
        # compute the η from the ηstd
        # using lower Cholesky triangular matrix
        η = Diagonal(ω) * (getchol(C).L * ηstd)

        # PK parameters
        CL = tvcl * exp(η[1])
        Vc = tvvc * exp(η[2])
    end

    @dynamics begin
        Central' = -CL / Vc * Central
    end

    @derived begin
        cp := @. Central / Vc
        dv ~ @. LogNormal(log(cp), cp * σ)
    end
end
3 Likes