Modeling Log-Normal or positive dependent variables

patrick · September 18, 2023, 12:23pm

Using a Log-Normal distribution can be useful when the dependent variable (e.g., concentration) is positive and its dispersion varies over time through its dependence on the mean. For drug concentration data, this would imply a “wider” distribution for higher mean concentration and more “narrow” distribution as the drug is cleared and the concentrations become lower and approaches zero.

In software that only supports Normal dependent variables, or that only provides alternative specifications through advanced interfaces, it is tradition to log-transform the the data, and model the transformed variable using a Normal distribution. This is also possible in Pumas, and would result in a @derived block that looks something like

@derived begin
    log_conc_pk := @. log(abs(Central/Vc))
    log_dv ~ @. Normal(conc_pk, σ_pk)
end

We use the abs (absolute value) function here, because the solution can become ever so slightly negative due to numerical noise present in computer calculations.

In the context of PK/PD modeling, users are sometimes confused about which values should be transformed using the natural logarithm and which shouldn’t when using either a

log-transform the data then use Normal

versus

use level data and use LogNormal

approaches offered in Pumas. Notice, that for estimation purposes these two approaches give the exact same results. However, using the LogNormal distribution itself instead of transforming the data first can be useful as any subsequent simulations result in trajectories that are actually in levels rather than in logs. Luckily, it is not hard to change the specification into a LogNormal distribution, as you just need to use the original “level” data and change Normal to LogNormal in your derived block

@derived begin
    log_conc_pk := @. log(abs(Central/Vc)
    dv ~ @. LogNormal(log_conc_pk, σ_pk)
end

This is because the LogNormal is parameterized by its mean and standard deviation of the variable in the log-domain. This also means, that if estimates were taken from a paper or piece of code from another software that uses the “log-transform data approach”, then the exact same estimates can be used in the Pumas code using a LogNormal distribution. One detail is that if the software where the original analysis was done reports variances (say, NONMEM) then you have to use the square root of that estimate when specifying the distribution in Pumas because the input has to be the standard deviation. The other detail is that any log-transformations have to be undone by using the exp function, say

    df.dv_level = @. exp(df.dv)

where dv is the log-transformed observation column.

Example

Let’s look at a model for an orally administered that only includes IIV variable on Clearance. We’ve use a closed form solution (Depots1Central1) to represent the gut (Depot) and systemic circulation (Central) compartments.

model = @model begin
    @param begin
        tvvc ∈ RealDomain(lower = 0)
        tvcl ∈ RealDomain(lower = 0)
        tvka ∈ RealDomain(lower = 0)
        ω ∈ RealDomain(lower = 1e-5)
        σ_pk ∈ RealDomain(lower = 1e-5)
    end
    @random begin
        η ~ Normal(0, ω)
    end
    @pre begin
        Ka = tvka
        CL = tvcl*exp(η)
        Vc = tvvc
    end
    @dynamics Depots1Central1
    @derived begin
        μ := @. log(abs(Central / Vc))
        conc_pk ~ @. LogNormal(μ, σ_pk)
    end
end

Before we continue we notice that the mean parameter in the LogNormal specification is the log of the modeled concentration and on the left hand side we have conc_pk in levels. So the log-mean parameterizes a distribution that models and samples level observations.

From this model we can simulate a population

# Specify doses
doses = DosageRegimen(400, addl = 3, ii = 12)
# Specify fixed effects
param = (tvvc = 2.1, tvcl = 1.1, tvka = 0.9, ω = 0.1, σ_pk = 0.2)
# Create subjects with events and simulate into them to get SimulatedObservations
raw_population = [Subject(id=string(id), events = doses) for id = 1:90]
sim_population = simobs(model, raw_population, param; obstimes = 0.1:0.5:72)
# Convert to normal Subjects
population = Subject.(sim_population)

to get a plot like below

Again, notice that we simulate observations at their actual levels. It is also important to notice that the obstimes do not start at time 0.0. At time 0.0 the Central compartment is also 0.0 and this means that we cannot take the log of the concentrations at time 0.0. Let us try to fit this model

julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0    -7.906322e+03     2.487626e+02
 * time: 7.104873657226562e-5
     1    -7.907277e+03     9.662535e+01
 * time: 0.2010788917541504
     2    -7.907427e+03     1.111693e+02
 * time: 0.3763298988342285
     3    -7.907673e+03     9.637800e+01
 * time: 0.5127780437469482
     4    -7.907788e+03     6.010314e+01
 * time: 0.6470799446105957
     5    -7.908177e+03     7.424839e+00
 * time: 1.7827699184417725
     6    -7.908209e+03     3.186270e+00
 * time: 1.8105559349060059
     7    -7.908209e+03     2.523274e-01
 * time: 1.8254759311676025
     8    -7.908209e+03     9.261201e-03
 * time: 1.850268840789795
     9    -7.908209e+03     9.301419e-03
 * time: 1.873647928237915
    10    -7.908209e+03     9.301441e-03
 * time: 1.9534409046173096
    11    -7.908209e+03     9.301441e-03
 * time: 2.019981861114502
    12    -7.908209e+03     1.012473e-02
 * time: 2.164860963821411
    13    -7.908209e+03     1.012473e-02
 * time: 2.319835901260376
    14    -7.908209e+03     1.012473e-02
 * time: 3.3832459449768066
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Log-likelihood value:                    7908.2093
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              5
Observation records:         Active        Missing
    conc_pk:                  12960              0
    Total:                    12960              0

-------------------
         Estimate
-------------------
tvvc      2.0971
tvcl      1.1145
tvka      0.9029
ω         0.093888
σ_pk      0.20071
-------------------

If we had been given this model, we could instead have fit it to the log-transformed dependent variables. First, let us make that transformation to get a new population

pop_df = DataFrame(population)
pop_df.log_conc_pk = log.(pop_df.conc_pk)
population_log = read_pumas(pop_df; observations = [:log_conc_pk])

Now population_log is a population that is identical to the original data but with log-transformated observations. Let us copy the model from above and change the distribution to Normal instead of LogNormal and we now refer to the log-transformed variable on the left-hand side, not the original level. Otherwise, the model is actually identical.

model_log = @model begin
    @param begin
        tvvc ∈ RealDomain(lower = 0)
        tvcl ∈ RealDomain(lower = 0)
        tvka ∈ RealDomain(lower = 0)
        ω ∈ RealDomain(lower = 1e-5)
        σ_pk ∈ RealDomain(lower = 1e-5)
    end
    @random begin
        η ~ Normal(0, ω)
    end
    @pre begin
        Ka = tvka
        CL = tvcl*exp(η)
        Vc = tvvc
    end
    @dynamics Depots1Central1
    @derived begin
        μ := @. log(abs(Central ./ Vc))
        log_conc_pk ~ @. Normal(μ, σ_pk)
    end
end

This model wil fit and simulate log-level dependent variables. We can try to fit this model to the log-transformed data from before

julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0    -7.906322e+03     2.487626e+02
 * time: 7.104873657226562e-5
     1    -7.907277e+03     9.662535e+01
 * time: 0.2010788917541504
     2    -7.907427e+03     1.111693e+02
 * time: 0.3763298988342285
     3    -7.907673e+03     9.637800e+01
 * time: 0.5127780437469482
     4    -7.907788e+03     6.010314e+01
 * time: 0.6470799446105957
     5    -7.908177e+03     7.424839e+00
 * time: 1.7827699184417725
     6    -7.908209e+03     3.186270e+00
 * time: 1.8105559349060059
     7    -7.908209e+03     2.523274e-01
 * time: 1.8254759311676025
     8    -7.908209e+03     9.261201e-03
 * time: 1.850268840789795
     9    -7.908209e+03     9.301419e-03
 * time: 1.873647928237915
    10    -7.908209e+03     9.301441e-03
 * time: 1.9534409046173096
    11    -7.908209e+03     9.301441e-03
 * time: 2.019981861114502
    12    -7.908209e+03     1.012473e-02
 * time: 2.164860963821411
    13    -7.908209e+03     1.012473e-02
 * time: 2.319835901260376
    14    -7.908209e+03     1.012473e-02
 * time: 3.3832459449768066
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Log-likelihood value:                    7908.2093
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              5
Observation records:         Active        Missing
    conc_pk:                  12960              0
    Total:                    12960              0

-------------------
         Estimate
-------------------
tvvc      2.0971
tvcl      1.1145
tvka      0.9029
ω         0.093888
σ_pk      0.20071
-------------------

Then we can compare our estimates

julia> using PumasUtilities
julia> compare_estimates(;ft, ft_log)
5×3 DataFrame
 Row │ parameter  ft         ft_log    
     │ String     Float64?   Float64?  
─────┼─────────────────────────────────
   1 │ tvvc       2.09708    2.09708
   2 │ tvcl       1.11449    1.11449
   3 │ tvka       0.902904   0.902904
   4 │ ω          0.0938882  0.0938882
   5 │ σ_pk       0.200707   0.200707

We notice that the estimates are the same, because the two models are equivalent. This property is also explained here PumasModel-Error models · Pumas.

One thing that might be worth mentioning is, that in the two distributions our mean and standard deviation inputs refer to the “Normal” distribution that the log-transformed variable follows when the original variable is LogNormal. This means that when writing the LogNormal version the actual mean and standard deviation of the dependent variable are different from μ and σ_pk and they depend on both parameters. For small σ_pk mean is actually approximately μ and the standard deviation is approximately μ*σ_pk. In other words, for small σ_pk, the LogNormal is actually quite close the the proportional error model.

Bonus model

Another useful model is the Gamma distribution model. The model is largely the same as the LogNormal model. This means that again no data points should be transformed, we handle everything at the distribution level.

model_gamma = @model begin
    @param begin
        tvvc ∈ RealDomain(lower = 0)
        tvcl ∈ RealDomain(lower = 0)
        tvka ∈ RealDomain(lower = 0)
        ω ∈ RealDomain(lower = 1e-5)
        σ_pk ∈ RealDomain(lower = 1e-5)
    end
    @random begin
        η ~ Normal(0, ω)
    end
    @pre begin
        Ka = tvka
        CL = tvcl*exp(η)
        Vc = tvvc
    end
    @dynamics Depots1Central1
    @derived begin
        μ := @. abs(Central / Vc)
        conc_pk ~ @. Gamma(1 / σ_pk^2, μ * σ_pk^2)
    end
end
ft_gamma = fit(model_gamma, population, param, FOCE())

The specific parameterization is also shown and explained here PumasModel-Error models · Pumas . The output of the fit shows that once again we are able to recover similar estimates of the system parameters

julia> compare_estimates(;ft, ft_log, ft_gamma)
5×4 DataFrame
 Row │ parameter  ft         ft_log     ft_gamma  
     │ String     Float64?   Float64?   Float64?  
─────┼────────────────────────────────────────────
   1 │ tvvc       2.09708    2.09708    2.05531
   2 │ tvcl       1.11449    1.11449    1.09224
   3 │ tvka       0.902904   0.902904   0.903248
   4 │ ω          0.0938882  0.0938882  0.0939244
   5 │ σ_pk       0.200707   0.200707   0.2001

Topic		Replies	Views
Discrepancies in PRED values between Lognormal and Gaussian PK models Simulation	2	43	April 4, 2025
Logit Normal distribution for PK parameters maximum-likelihood	6	424	May 12, 2023
Help with Fitting Estimation	54	2499	March 5, 2020
Can you compare two models with the same dataset but use DV vs LNDV? Concepts	3	50	March 8, 2025
How to define error How-to	23	3012	April 3, 2024

Modeling Log-Normal or positive dependent variables

Example

Bonus model

Related topics