Post-processing steps for full Bayesian model

I am using the following code from the Pumas tutorials website for fitting a Bayesian model for theophylline

Code
using Pumas, MCMCChains

theopmodel_bayes = @model begin
    @param begin
      # Mode at [2.0, 0.2, 0.8, 2.0]
      θ ~ Constrained(
        MvNormal(
          [2.0, 0.2, 0.8, 2.0],
          Diagonal(ones(4))
        ),
        lower = zeros(4),
        upper = fill(10.0, 4),
        init  = [2.0, 0.2, 0.8, 2.0])
  
      # Mode at diagm(fill(0.2, 3))
      Ω ~ InverseWishart(6, diagm(fill(0.2, 3)) .* (6 + 3 + 1))
  
      # Mean at 0.5 and positive density at 0.0
      σ ~ Gamma(1.0, 0.5)
    end
  
    @random begin
      η ~ MvNormal(Ω)
    end
  
    @pre begin
      Ka = (SEX == 1 ? θ[1] : θ[4])*exp(η[1])
      CL = θ[2]*(WT/70)            *exp(η[2])
      Vc = θ[3]                    *exp(η[3])
    end
  
    @covariates SEX WT
  
    @dynamics Depots1Central1
  
    @derived begin
      # The conditional mean
      μ := @. Central / Vc
      # Additive error model
      dv ~ @. Normal(μ, σ)
    end
end

df = CSV.read("/mnt/data/code/tutorial/THEOPP.csv", DataFrame;
              missingstring = ["."])
data = read_pumas(df, covariates = [:SEX,:WT])

param = init_params(theopmodel_bayes)

result = fit(theopmodel_bayes, data, param, Pumas.BayesMCMC();
              nsamples=2000, nadapts=1000)

I have the following questions regarding post-processing and qualification of the model:

  1. Is there a direct way to get predicted concentrations to qualify the model using individual fits? - this is equivalent to running the inspect(fpm) in the maximum likelihood estimation (MLE) approach
  2. For covariate inclusion, I would like to test the Bayes factor approach. Is there a way to calculate the said factor in Pumas using the two models (one with the covariate effect and another without the covariate effect)? - this is an equivalent of the log-likelihood ratio in the MLE approach.
  3. How to generate the mode and 95% credible intervals for a) individual parameters and b) observations?
  4. What information does the result.chains[chain_num][iteration_num] object contain? In this case, it seems to be a vector with 47 numbers but I do not know what those numbers are representing.
  5. I keep seeing the warning when fitting - Warning: the current proposal will be rejected due to numerical error(s). When I ran a fit with nsamples=500, nadapts=250, I saw this warning for approximately 100 times until the end.
  6. A last general question - are there simple tutorials to understand the NUTS sampling, MCMC (or HMC) stuff for people without a background in Bayesian population PK ?

Hi Rahul -

We have a major Bayesian release in the works where all the points above have been already addressed via an easy to use interface. Regarding understanding the NUTS sampler, we are beginning to offer some courses that you could avail. Hopefully just a few more weeks.

Vijay

1 Like

Thank you, Vijay. Could you let me know about the warning issue as mentioned in question #5 ?

cc @mohamed82008 should be able to explain this better

Hi Rahul,

Thanks for all your questions. As Vijay said, a full Bayesian workflow in Pumas will be available in a few months. It will have features to help you do all the points you raised. Regarding the warnings, this basically says there were numerical errors when evaluating the model. The most common cause of this is often some standard deviation or covariance parameter going to 0 or exploding to Inf. There are 2 main causes of these numerical errors:

  • Model mis-specification, e.g. wrong data generating process or bad priors.
  • The inference algorithm is not able to find the reasonable regions in the posterior that have a high probability.

If these warnings only show up at the beginning of the sampling, it’s usually nothing to worry about since the first part of the NUTS algorithm involves hyper-parameter fine-tuning and often significant exploration of the parameter space which can lead to unreasonable parameter values being tried especially at the beginning. However as the algorithm adapts to the model, these warnings should be mostly gone. If they are not gone until the end, then either the adaptation phase was not successful or the model is mis-specified. Unfortunately, it’s difficult to know which of the 2 is the case when this happens but you can try the following:

  • Increase the number of adaptation and sampling steps. If this helps, then it’s possible more adaptation steps were needed to get to good hyper-parameters.
  • Use fewer data points. If this helps, then it’s possible your model has non-identifiability issues or a wrong data generating process.
  • Use a different prior. If this helps, then it’s possible your model had a misleading prior or the prior was too strong around some unreasonable parameter values.
  • Simplify the model and try synthetic data first. This can help you diagnose issues with the model before dealing with issues in the data, and model-data mis-specifications.

These are all some general suggestions that might help you diagnose your model and MCMC sampling. They also apply in case the convergence diagnostics were not good enough. We will have a detailed documentation of multiple diagnostics you can use to assess convergence of MCMC in the big Bayesian release. So stay tuned!

Thank you so much for the detailed response @mohamed82008. I will evaluate every option and see which works best to find an adequate posterior. I will reach back out in case I face any issues.

A follow-up question on the burn-in phase: is the nadapts a part of nsamples? What I mean by that is that if nadapts =100 and nsamples=200 then which of the following is true?

  1. the burn-in is 100 and sampling 200
  2. the burn-in is 100 and the sampling 100

it’s the latter, nadapts is part of nsamples.

1 Like