# Modeling Log-Normal or positive dependent variables

Using a Log-Normal distribution can be useful when the dependent variable (e.g., concentration) is positive and its dispersion varies over time through its dependence on the mean. For drug concentration data, this would imply a “wider” distribution for higher mean concentration and more “narrow” distribution as the drug is cleared and the concentrations become lower and approaches zero.

In software that only supports `Normal` dependent variables, or that only provides alternative specifications through advanced interfaces, it is tradition to log-transform the the data, and model the transformed variable using a `Normal` distribution. This is also possible in Pumas, and would result in a `@derived` block that looks something like

``````@derived begin
log_conc_pk := @. log(abs(Central/Vc))
log_dv ~ @. Normal(conc_pk, σ_pk)
end
``````

We use the `abs` (absolute value) function here, because the solution can become ever so slightly negative due to numerical noise present in computer calculations.

In the context of PK/PD modeling, users are sometimes confused about which values should be transformed using the natural logarithm and which shouldn’t when using either a

• log-transform the data then use `Normal`

versus

• use level data and use `LogNormal`

approaches offered in Pumas. Notice, that for estimation purposes these two approaches give the exact same results. However, using the `LogNormal` distribution itself instead of transforming the data first can be useful as any subsequent simulations result in trajectories that are actually in levels rather than in logs. Luckily, it is not hard to change the specification into a `LogNormal` distribution, as you just need to use the original “level” data and change `Normal` to `LogNormal` in your derived block

``````@derived begin
log_conc_pk := @. log(abs(Central/Vc)
dv ~ @. LogNormal(log_conc_pk, σ_pk)
end
``````

This is because the `LogNormal` is parameterized by its mean and standard deviation of the variable in the log-domain. This also means, that if estimates were taken from a paper or piece of code from another software that uses the “log-transform data approach”, then the exact same estimates can be used in the Pumas code using a `LogNormal` distribution. One detail is that if the software where the original analysis was done reports variances (say, NONMEM) then you have to use the square root of that estimate when specifying the distribution in Pumas because the input has to be the standard deviation. The other detail is that any log-transformations have to be undone by using the `exp` function, say

``````    df.dv_level = @. exp(df.dv)
``````

where `dv` is the log-transformed observation column.

# Example

Let’s look at a model for an orally administered that only includes IIV variable on Clearance. We’ve use a closed form solution (Depots1Central1) to represent the gut (Depot) and systemic circulation (Central) compartments.

``````model = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. log(abs(Central / Vc))
conc_pk ~ @. LogNormal(μ, σ_pk)
end
end
``````

Before we continue we notice that the mean parameter in the `LogNormal` specification is the `log` of the modeled concentration and on the left hand side we have `conc_pk` in levels. So the log-mean parameterizes a distribution that models and samples level observations.

From this model we can simulate a population

``````# Specify doses
doses = DosageRegimen(400, addl = 3, ii = 12)
# Specify fixed effects
param = (tvvc = 2.1, tvcl = 1.1, tvka = 0.9, ω = 0.1, σ_pk = 0.2)
# Create subjects with events and simulate into them to get SimulatedObservations
raw_population = [Subject(id=string(id), events = doses) for id = 1:90]
sim_population = simobs(model, raw_population, param; obstimes = 0.1:0.5:72)
# Convert to normal Subjects
population = Subject.(sim_population)
``````

to get a plot like below

Again, notice that we simulate observations at their actual levels. It is also important to notice that the obstimes do not start at time `0.0`. At time `0.0` the `Central` compartment is also `0.0` and this means that we cannot take the `log` of the concentrations at time `0.0`. Let us try to fit this model

``````julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
0    -7.906322e+03     2.487626e+02
* time: 7.104873657226562e-5
1    -7.907277e+03     9.662535e+01
* time: 0.2010788917541504
2    -7.907427e+03     1.111693e+02
* time: 0.3763298988342285
3    -7.907673e+03     9.637800e+01
* time: 0.5127780437469482
4    -7.907788e+03     6.010314e+01
* time: 0.6470799446105957
5    -7.908177e+03     7.424839e+00
* time: 1.7827699184417725
6    -7.908209e+03     3.186270e+00
* time: 1.8105559349060059
7    -7.908209e+03     2.523274e-01
* time: 1.8254759311676025
8    -7.908209e+03     9.261201e-03
* time: 1.850268840789795
9    -7.908209e+03     9.301419e-03
* time: 1.873647928237915
10    -7.908209e+03     9.301441e-03
* time: 1.9534409046173096
11    -7.908209e+03     9.301441e-03
* time: 2.019981861114502
12    -7.908209e+03     1.012473e-02
* time: 2.164860963821411
13    -7.908209e+03     1.012473e-02
* time: 2.319835901260376
14    -7.908209e+03     1.012473e-02
* time: 3.3832459449768066
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Log-likelihood value:                    7908.2093
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
0              5
Observation records:         Active        Missing
conc_pk:                  12960              0
Total:                    12960              0

-------------------
Estimate
-------------------
tvvc      2.0971
tvcl      1.1145
tvka      0.9029
ω         0.093888
σ_pk      0.20071
-------------------
``````

If we had been given this model, we could instead have fit it to the log-transformed dependent variables. First, let us make that transformation to get a new population

``````pop_df = DataFrame(population)
pop_df.log_conc_pk = log.(pop_df.conc_pk)
population_log = read_pumas(pop_df; observations = [:log_conc_pk])
``````

Now `population_log` is a population that is identical to the original data but with log-transformated observations. Let us copy the model from above and change the distribution to `Normal` instead of `LogNormal` and we now refer to the `log`-transformed variable on the left-hand side, not the original level. Otherwise, the model is actually identical.

``````model_log = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. log(abs(Central ./ Vc))
log_conc_pk ~ @. Normal(μ, σ_pk)
end
end
``````

This model wil fit and simulate log-level dependent variables. We can try to fit this model to the log-transformed data from before

``````julia> ft = fit(model, population, param, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
0    -7.906322e+03     2.487626e+02
* time: 7.104873657226562e-5
1    -7.907277e+03     9.662535e+01
* time: 0.2010788917541504
2    -7.907427e+03     1.111693e+02
* time: 0.3763298988342285
3    -7.907673e+03     9.637800e+01
* time: 0.5127780437469482
4    -7.907788e+03     6.010314e+01
* time: 0.6470799446105957
5    -7.908177e+03     7.424839e+00
* time: 1.7827699184417725
6    -7.908209e+03     3.186270e+00
* time: 1.8105559349060059
7    -7.908209e+03     2.523274e-01
* time: 1.8254759311676025
8    -7.908209e+03     9.261201e-03
* time: 1.850268840789795
9    -7.908209e+03     9.301419e-03
* time: 1.873647928237915
10    -7.908209e+03     9.301441e-03
* time: 1.9534409046173096
11    -7.908209e+03     9.301441e-03
* time: 2.019981861114502
12    -7.908209e+03     1.012473e-02
* time: 2.164860963821411
13    -7.908209e+03     1.012473e-02
* time: 2.319835901260376
14    -7.908209e+03     1.012473e-02
* time: 3.3832459449768066
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Log-likelihood value:                    7908.2093
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
0              5
Observation records:         Active        Missing
conc_pk:                  12960              0
Total:                    12960              0

-------------------
Estimate
-------------------
tvvc      2.0971
tvcl      1.1145
tvka      0.9029
ω         0.093888
σ_pk      0.20071
-------------------
``````

Then we can compare our estimates

``````julia> using PumasUtilities
julia> compare_estimates(;ft, ft_log)
5×3 DataFrame
Row │ parameter  ft         ft_log
│ String     Float64?   Float64?
─────┼─────────────────────────────────
1 │ tvvc       2.09708    2.09708
2 │ tvcl       1.11449    1.11449
3 │ tvka       0.902904   0.902904
4 │ ω          0.0938882  0.0938882
5 │ σ_pk       0.200707   0.200707
``````

We notice that the estimates are the same, because the two models are equivalent. This property is also explained here PumasModel-Error models · Pumas.

One thing that might be worth mentioning is, that in the two distributions our mean and standard deviation inputs refer to the “Normal” distribution that the log-transformed variable follows when the original variable is `LogNormal`. This means that when writing the `LogNormal` version the actual mean and standard deviation of the dependent variable are different from `μ` and `σ_pk` and they depend on both parameters. For small `σ_pk` mean is actually approximately `μ` and the standard deviation is approximately `μ*σ_pk`. In other words, for small `σ_pk`, the `LogNormal` is actually quite close the the proportional error model.

# Bonus model

Another useful model is the `Gamma` distribution model. The model is largely the same as the `LogNormal` model. This means that again no data points should be transformed, we handle everything at the distribution level.

``````model_gamma = @model begin
@param begin
tvvc ∈ RealDomain(lower = 0)
tvcl ∈ RealDomain(lower = 0)
tvka ∈ RealDomain(lower = 0)
ω ∈ RealDomain(lower = 1e-5)
σ_pk ∈ RealDomain(lower = 1e-5)
end
@random begin
η ~ Normal(0, ω)
end
@pre begin
Ka = tvka
CL = tvcl*exp(η)
Vc = tvvc
end
@dynamics Depots1Central1
@derived begin
μ := @. abs(Central / Vc)
conc_pk ~ @. Gamma(1 / σ_pk^2, μ * σ_pk^2)
end
end
ft_gamma = fit(model_gamma, population, param, FOCE())
``````

The specific parameterization is also shown and explained here PumasModel-Error models · Pumas . The output of the fit shows that once again we are able to recover similar estimates of the system parameters

``````julia> compare_estimates(;ft, ft_log, ft_gamma)
5×4 DataFrame
Row │ parameter  ft         ft_log     ft_gamma
│ String     Float64?   Float64?   Float64?
─────┼────────────────────────────────────────────
1 │ tvvc       2.09708    2.09708    2.05531
2 │ tvcl       1.11449    1.11449    1.09224
3 │ tvka       0.902904   0.902904   0.903248
4 │ ω          0.0938882  0.0938882  0.0939244
5 │ σ_pk       0.200707   0.200707   0.2001
``````
1 Like