Using a Log-Normal distribution can be useful when the dependent variable (e.g., concentration) is positive and its dispersion varies over time through its dependence on the mean. For drug concentration data, this would imply a “wider” distribution for higher mean concentration and more “narrow” distribution as the drug is cleared and the concentrations become lower and approaches zero.
In software that only supports Normal
dependent variables, or that only provides alternative specifications through advanced interfaces, it is tradition to log-transform the the data, and model the transformed variable using a Normal
distribution. This is also possible in Pumas, and would result in a @derived
block that looks something like
@derived begin
log_conc_pk := @. log(abs(Central/Vc))
log_dv ~ @. Normal(conc_pk, σ_pk)
end
We use the abs
(absolute value) function here, because the solution can become ever so slightly negative due to numerical noise present in computer calculations.
In the context of PK/PD modeling, users are sometimes confused about which values should be transformed using the natural logarithm and which shouldn’t when using either a
- log-transform the data then use
Normal
versus
- use level data and use
LogNormal
approaches offered in Pumas. Notice, that for estimation purposes these two approaches give the exact same results. However, using the LogNormal
distribution itself instead of transforming the data first can be useful as any subsequent simulations result in trajectories that are actually in levels rather than in logs. Luckily, it is not hard to change the specification into a LogNormal
distribution, as you just need to use the original “level” data and change Normal
to LogNormal
in your derived block
@derived begin
log_conc_pk := @. log(abs(Central/Vc)
dv ~ @. LogNormal(log_conc_pk, σ_pk)
end
This is because the LogNormal
is parameterized by its mean and standard deviation of the variable in the log-domain. This also means, that if estimates were taken from a paper or piece of code from another software that uses the “log-transform data approach”, then the exact same estimates can be used in the Pumas code using a LogNormal
distribution. One detail is that if the software where the original analysis was done reports variances (say, NONMEM) then you have to use the square root of that estimate when specifying the distribution in Pumas because the input has to be the standard deviation. The other detail is that any log-transformations have to be undone by using the exp
function, say
df.dv_level = @. exp(df.dv)
where dv
is the log-transformed observation column.
Example
Let’s look at a model for an orally administered that only includes IIV variable on Clearance. We’ve use a closed form solution (Depots1Central1) to represent the gut (Depot) and systemic circulation (Central) compartments.
model = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. log(abs(Central / Vc))
conc_pk ~ @. LogNormal(μ, σ_pk)
end
end
Before we continue we notice that the mean parameter in the LogNormal
specification is the log
of the modeled concentration and on the left hand side we have conc_pk
in levels. So the log-mean parameterizes a distribution that models and samples level observations.
From this model we can simulate a population
# Specify doses
doses = DosageRegimen(400, addl = 3, ii = 12)
# Specify fixed effects
param = (tvvc = 2.1, tvcl = 1.1, tvka = 0.9, ω = 0.1, σ_pk = 0.2)
# Create subjects with events and simulate into them to get SimulatedObservations
raw_population = [Subject(id=string(id), events = doses) for id = 1:90]
sim_population = simobs(model, raw_population, param; obstimes = 0.1:0.5:72)
# Convert to normal Subjects
population = Subject.(sim_population)
to get a plot like below
Again, notice that we simulate observations at their actual levels. It is also important to notice that the obstimes do not start at time
0.0
. At time 0.0
the Central
compartment is also 0.0
and this means that we cannot take the log
of the concentrations at time 0.0
. Let us try to fit this model
julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter Function value Gradient norm
0 -7.906322e+03 2.487626e+02
* time: 7.104873657226562e-5
1 -7.907277e+03 9.662535e+01
* time: 0.2010788917541504
2 -7.907427e+03 1.111693e+02
* time: 0.3763298988342285
3 -7.907673e+03 9.637800e+01
* time: 0.5127780437469482
4 -7.907788e+03 6.010314e+01
* time: 0.6470799446105957
5 -7.908177e+03 7.424839e+00
* time: 1.7827699184417725
6 -7.908209e+03 3.186270e+00
* time: 1.8105559349060059
7 -7.908209e+03 2.523274e-01
* time: 1.8254759311676025
8 -7.908209e+03 9.261201e-03
* time: 1.850268840789795
9 -7.908209e+03 9.301419e-03
* time: 1.873647928237915
10 -7.908209e+03 9.301441e-03
* time: 1.9534409046173096
11 -7.908209e+03 9.301441e-03
* time: 2.019981861114502
12 -7.908209e+03 1.012473e-02
* time: 2.164860963821411
13 -7.908209e+03 1.012473e-02
* time: 2.319835901260376
14 -7.908209e+03 1.012473e-02
* time: 3.3832459449768066
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Log-likelihood value: 7908.2093
Number of subjects: 90
Number of parameters: Fixed Optimized
0 5
Observation records: Active Missing
conc_pk: 12960 0
Total: 12960 0
-------------------
Estimate
-------------------
tvvc 2.0971
tvcl 1.1145
tvka 0.9029
ω 0.093888
σ_pk 0.20071
-------------------
If we had been given this model, we could instead have fit it to the log-transformed dependent variables. First, let us make that transformation to get a new population
pop_df = DataFrame(population)
pop_df.log_conc_pk = log.(pop_df.conc_pk)
population_log = read_pumas(pop_df; observations = [:log_conc_pk])
Now population_log
is a population that is identical to the original data but with log-transformated observations. Let us copy the model from above and change the distribution to Normal
instead of LogNormal
and we now refer to the log
-transformed variable on the left-hand side, not the original level. Otherwise, the model is actually identical.
model_log = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. log(abs(Central ./ Vc))
log_conc_pk ~ @. Normal(μ, σ_pk)
end
end
This model wil fit and simulate log-level dependent variables. We can try to fit this model to the log-transformed data from before
julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter Function value Gradient norm
0 -7.906322e+03 2.487626e+02
* time: 7.104873657226562e-5
1 -7.907277e+03 9.662535e+01
* time: 0.2010788917541504
2 -7.907427e+03 1.111693e+02
* time: 0.3763298988342285
3 -7.907673e+03 9.637800e+01
* time: 0.5127780437469482
4 -7.907788e+03 6.010314e+01
* time: 0.6470799446105957
5 -7.908177e+03 7.424839e+00
* time: 1.7827699184417725
6 -7.908209e+03 3.186270e+00
* time: 1.8105559349060059
7 -7.908209e+03 2.523274e-01
* time: 1.8254759311676025
8 -7.908209e+03 9.261201e-03
* time: 1.850268840789795
9 -7.908209e+03 9.301419e-03
* time: 1.873647928237915
10 -7.908209e+03 9.301441e-03
* time: 1.9534409046173096
11 -7.908209e+03 9.301441e-03
* time: 2.019981861114502
12 -7.908209e+03 1.012473e-02
* time: 2.164860963821411
13 -7.908209e+03 1.012473e-02
* time: 2.319835901260376
14 -7.908209e+03 1.012473e-02
* time: 3.3832459449768066
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Log-likelihood value: 7908.2093
Number of subjects: 90
Number of parameters: Fixed Optimized
0 5
Observation records: Active Missing
conc_pk: 12960 0
Total: 12960 0
-------------------
Estimate
-------------------
tvvc 2.0971
tvcl 1.1145
tvka 0.9029
ω 0.093888
σ_pk 0.20071
-------------------
Then we can compare our estimates
julia> using PumasUtilities
julia> compare_estimates(;ft, ft_log)
5×3 DataFrame
Row │ parameter ft ft_log
│ String Float64? Float64?
─────┼─────────────────────────────────
1 │ tvvc 2.09708 2.09708
2 │ tvcl 1.11449 1.11449
3 │ tvka 0.902904 0.902904
4 │ ω 0.0938882 0.0938882
5 │ σ_pk 0.200707 0.200707
We notice that the estimates are the same, because the two models are equivalent. This property is also explained here PumasModel-Error models · Pumas.
One thing that might be worth mentioning is, that in the two distributions our mean and standard deviation inputs refer to the “Normal” distribution that the log-transformed variable follows when the original variable is LogNormal
. This means that when writing the LogNormal
version the actual mean and standard deviation of the dependent variable are different from μ
and σ_pk
and they depend on both parameters. For small σ_pk
mean is actually approximately μ
and the standard deviation is approximately μ*σ_pk
. In other words, for small σ_pk
, the LogNormal
is actually quite close the the proportional error model.
Bonus model
Another useful model is the Gamma
distribution model. The model is largely the same as the LogNormal
model. This means that again no data points should be transformed, we handle everything at the distribution level.
model_gamma = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. abs(Central / Vc)
conc_pk ~ @. Gamma(1 / σ_pk^2, μ * σ_pk^2)
end
end
ft_gamma = fit(model_gamma, population, param, FOCE())
The specific parameterization is also shown and explained here PumasModel-Error models · Pumas . The output of the fit shows that once again we are able to recover similar estimates of the system parameters
julia> compare_estimates(;ft, ft_log, ft_gamma)
5×4 DataFrame
Row │ parameter ft ft_log ft_gamma
│ String Float64? Float64? Float64?
─────┼────────────────────────────────────────────
1 │ tvvc 2.09708 2.09708 2.05531
2 │ tvcl 1.11449 1.11449 1.09224
3 │ tvka 0.902904 0.902904 0.903248
4 │ ω 0.0938882 0.0938882 0.0939244
5 │ σ_pk 0.200707 0.200707 0.2001