# Simulation time

Hi folks -

I was wondering what we can do to increase simulation speed. Its taking ridecolus amount of time to simulate 1000 subjects.

``````Sub = map(i-> Subject(;id=i, events=event, covariates=(dose=dose, wt=70, sex=1,)), 1:1000)
__sim = simobs(model, Sub,  final_estimates,
simulate_error=false, obstimes=0:0.01:288)
``````

This is the code.

``````
model = @model begin
@param begin
θ  ∈ VectorDomain(7, lower=zeros(7))
Ω  ∈ PDiagDomain(5)
σ  ∈ RealDomain(;lower=0.001)
end

@random begin
η ~ MvNormal(Ω)
end
@covariates dose wt

@pre begin
CLi = exp(log(θ) + 0.75 * log(wt/70) + η)
Vi  = exp(log(θ) + log(wt/70) + η)
kai = exp(log(θ)  + η)
Qi  = exp(log(θ)+ 0.75 * log(wt/70))
Vpi = exp(log(θ)  + log(wt/70) + η)
end

@dosecontrol begin
f50i = exp(log(θ) + η)
fmax = exp(log(θ))
bio =  dose < 50 ? 1 : 1 - fmax * (dose - 50)/ (f50i + (dose - 50) )

bioav = (y1=bio, y2=1, y3 = 1,)
end

@dynamics begin
y1' = -kai * y1
y2' = kai * y1 - (CLi/Vi) * y2 + (Qi/Vpi) * y3 - (Qi/Vi) * y2
y3' = - (Qi/Vpi) * y3 + (Qi/Vi) * y2
y4' = y2/Vi
end

@derived begin
auc = @. 1_000 * y4
cp := @. 1_000 * y2 / Vi
dv ~ @. Normal(cp, cp * σ)
end
end
``````

The final estimates:

``````(θ = [0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079],
Ω = Diagonal(ones(5) * 0.15),
σ = 0.01)
``````

If I did few subjects, It’s fast, the estimation, is fast, but when simulating this much, it takes more than 30 minutes. Any ideas? I think the issue is coming from `DataFrame`

@mjaber by default the `simobs` is not parallelized by default. Can you just add

``````simobs(...., ensemblealg = EnsembleThreads())
``````

if you want this to be reproducible, you can also add a random seed within the function. Please check `? simobs` for more information. Let us know if that helps. In the meanwhile, we will test it at our end.

Thanks Vijay for your suggestion. I already tried that before and nothing much gained. After I did some testing. It looks like `DataFrame` function when converting simobs output is taking that much time. I think thats where attention should be.

Not sure which `DataFrame` call are you referring to? There is nothing in your code. Also, can you please share your `events` set up for the `Sub`. I just assumed a 100 mg dose using `events = DosageRegimen(100)` and covariate value for `dose = 100` and I get the following results

``````julia> @time sims = simobs(model, Sub, final_estimates,
simulate_error=false, obstimes=0:0.01:288)
135.055649 seconds (965.08 M allocations: 77.680 GiB, 7.06% gc time, 5.82% compilation time)
Simulated population (Vector{<:Subject})
Simulated subjects: 1000
Simulated variables: auc, dv

``````

Opps. my bad. Here you go:

``````dose = 450
Sub = map(i-> Subject(;id=i, events=event, covariates=(dose=dose, wt=70,)), 1:1000)
@time __sim = simobs(model, Sub,  final_estimates,
@time SIM_450 = DataFrame(__sim)
``````
1. The slowdown is not in the simulation as that takes less than 3 minutes (single threaded)
2. Your simulation is generating 28000 datapoints per subject and you have 1000 subjects resulting in 28 million data points that has to be converted to a dataframe.
3. 28 million dataframe will obviously take time, but my question to you is if you really need that fine a time grid. Also, what is the purpose of the simulation and what do you want to do as post-processing.

Thanks Vijay; The aim is to simulate and get some endpoints including parameters such as cmax, tmax, AUC with different time points (partials) and assess concentrations at specific points. The workflow, is to simulate with this much followed by calculating some statistics. Does that make sense?

You can use `NCA` within a model to extract them easily. Since you set `simulate_error` to `false`, you can replace the `derived` block with `observed` block and then do the simulation. You can also directly convert a simulation to a `NCAPopulation` and perform NCA. I have a fully reproducible example on just 10 subjects. Go through it and let me know if you have any questions.

``````using Pumas
using PumasUtilities
using Random
dose = 450
event = DosageRegimen(dose, time=0, ii=24, addl=1, route=NCA.EV)
Sub = map(i -> Subject(; id=i, events=event, covariates=(dose=dose, wt=70,)), 1:10)
# Sub = map(i -> Subject(; id=i, events=DosageRegimen(100), covariates=(dose=100, wt=70, sex=1,)), 1:1000)

# integrated NCA within the model
model = @model begin
@param begin
θ ∈ VectorDomain(7, lower=zeros(7))
Ω ∈ PDiagDomain(5)
σ ∈ RealDomain(; lower=0.001)
end

@random begin
η ~ MvNormal(Ω)
end
@covariates dose wt

@pre begin
CLi = exp(log(θ) + 0.75 * log(wt / 70) + η)
Vi = exp(log(θ) + log(wt / 70) + η)
kai = exp(log(θ) + η)
Qi = exp(log(θ) + 0.75 * log(wt / 70))
Vpi = exp(log(θ) + log(wt / 70) + η)
end

@dosecontrol begin
f50i = exp(log(θ) + η)
fmax = exp(log(θ))
bio = dose < 50 ? 1 : 1 - fmax * (dose - 50) / (f50i + (dose - 50))

bioav = (y1=bio, y2=1, y3=1,)
end

@dynamics begin
y1' = -kai * y1
y2' = kai * y1 - (CLi / Vi) * y2 + (Qi / Vpi) * y3 - (Qi / Vi) * y2
y3' = -(Qi / Vpi) * y3 + (Qi / Vi) * y2
#y4' = y2 / Vi
end

@observed begin
# auc = @. 1_000 * y4
cp := @. 1_000 * y2 / Vi
simnca := @nca cp
cmax = NCA.cmax(simnca)
auc = NCA.auc(simnca)
auc04 = NCA.auc(simnca, interval=(0, 4))
# dv ~ @. Normal(cp, cp * σ)
end
end

final_estimates = (θ=[0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079],
Ω=Diagonal(ones(5) * 0.15),
σ=0.01)
#
@time sims = simobs(model, Sub, final_estimates,
simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
simdf = DataFrame(sims, include_events=false, include_covariates=false)
#

## converting simulations directly to NCAPopulation
model = @model begin
@param begin
θ ∈ VectorDomain(7, lower=zeros(7))
Ω ∈ PDiagDomain(5)
σ ∈ RealDomain(; lower=0.001)
end

@random begin
η ~ MvNormal(Ω)
end
@covariates dose wt

@pre begin
CLi = exp(log(θ) + 0.75 * log(wt / 70) + η)
Vi = exp(log(θ) + log(wt / 70) + η)
kai = exp(log(θ) + η)
Qi = exp(log(θ) + 0.75 * log(wt / 70))
Vpi = exp(log(θ) + log(wt / 70) + η)
end

@dosecontrol begin
f50i = exp(log(θ) + η)
fmax = exp(log(θ))
bio = dose < 50 ? 1 : 1 - fmax * (dose - 50) / (f50i + (dose - 50))

bioav = (y1=bio, y2=1, y3=1,)
end

@dynamics begin
y1' = -kai * y1
y2' = kai * y1 - (CLi / Vi) * y2 + (Qi / Vpi) * y3 - (Qi / Vi) * y2
y3' = -(Qi / Vpi) * y3 + (Qi / Vi) * y2
end

@observed begin
dv = @. 1_000 * y2 / Vi
end
end

final_estimates = (θ=[0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079],
Ω=Diagonal(ones(5) * 0.15),
σ=0.01)
#
@time sims = simobs(model, Sub, final_estimates,
simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
ncapop = NCAPopulation(Subject.(sims))
NCA.cmax(ncapop)
NCA.auc(ncapop)
NCA.auc(ncapop, interval=(0,4))
#
``````

Note that the results between the two methods are identical and generate results by each dose per subject. So you don’t really have to take these into `DataFrame`s which are always expensive

1 Like

This is amazing. I will go through your code and let you know if I have any questions/comments. Appreciate the response, Vijay!

@vijay I have had a similar experience in the past. Converting simulations to DataFrame is very slow. In some cases, it was required for me to convert a simulation to a DataFrame for the sake of post processing. My dataset had approximately 6000 subjects and took approximately 10 minutes for this conversion.

For the case study you shared here, I used your code to simulate 5000 subjects instead of 10. Converting by doing `DataFrame(sims)` took > 1 minute than converting the required columns to a DataFrame on my own that took ~1 second.

``````julia> Sub = map(i -> Subject(; id=i, events=event, covariates=(dose=dose, wt=70,)), 1:5000)
Population
Subjects: 5000
Covariates: dose, wt
Observations:

julia> @time sims = simobs(model, Sub, final_estimates,
simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
12.803235 seconds (87.68 M allocations: 7.893 GiB, 13.31% gc time)
Simulated population (Vector{<:Subject})
Simulated subjects: 5000
Simulated variables: cmax, auc, auc04

julia> @time simdf = DataFrame(sims, include_events=false, include_covariates=false)
75.476109 seconds (173.89 M allocations: 221.364 GiB, 6.89% gc time)
2885000×19 DataFrame
Row │ id       time     cmax      auc        auc04     evid    η_1         η_2        η_3         η_4        η_5        y1             y2        y3         CLi        Vi    ⋯
│ String?  Float64  Float64?  Float64?   Float64?  Int64?  Float64?    Float64?   Float64?    Float64?   Float64?   Float64?       Float64?  Float64?   Float64?   Float ⋯
─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1            0.0   9743.63  1.37404e6   25207.5       0  -0.139322    0.421074  -0.162506    0.278433  0.162761   154.772          0.0      0.0       0.0381881  11.64 ⋯
2 │ 1            0.5   9743.63  1.37404e6

julia> @time sims_df = vcat(map(sims) do z
obs = DataFrame(id = z.subject.id, cmax = z.observations.cmax, auc = z.observations.auc, auc04 = z.observations.auc04)
covs = DataFrame([(z.subject.covariates.u..., id = z.subject.id)])
eves = vcat(DataFrame.(z.subject.events)...)
obs_eves = hcat(obs, eves)
subj_sims = outerjoin(covs, obs_eves, on=[:id])
end...)
0.966107 seconds (5.02 M allocations: 327.896 MiB, 30.04% compilation time)
``````
1 Like

One additional tip that is not the bottleneck here but might be a good thing to know in the future, especially if you are doing heavy `map` stuff (such as in tons of simulations).

You can use the `ThreadsX.map` instead of `map`. It will run a multithreaded parallel `map`.

2 Likes

This is really helpful folks! Appreciate that.

Sorry for being late to the party, but if any of you could try to replicate this on v2.4 (the latest version of Pumas) I would appreciate it. On an internal project we found a similar bottleneck, and realized that we could improve the DataFrame constructor significantly, so now it should run much faster.

1 Like