Simulation time

Hi folks -

I was wondering what we can do to increase simulation speed. Its taking ridecolus amount of time to simulate 1000 subjects.

Sub = map(i-> Subject(;id=i, events=event, covariates=(dose=dose, wt=70, sex=1,)), 1:1000)
__sim = simobs(model, Sub,  final_estimates, 
                    simulate_error=false, obstimes=0:0.01:288)

This is the code.


model = @model begin
    @param begin
        θ  ∈ VectorDomain(7, lower=zeros(7))
        Ω  ∈ PDiagDomain(5)
        σ  ∈ RealDomain(;lower=0.001)
    end

    @random begin
        η ~ MvNormal(Ω)
    end
    @covariates dose wt 

    @pre begin
        CLi = exp(log(θ[1]) + 0.75 * log(wt/70) + η[1])
        Vi  = exp(log(θ[2]) + log(wt/70) + η[2])
        kai = exp(log(θ[3])  + η[3])
        Qi  = exp(log(θ[6])+ 0.75 * log(wt/70))
        Vpi = exp(log(θ[7])  + log(wt/70) + η[5])
    end

    @dosecontrol begin
        f50i = exp(log(θ[4]) + η[4])
        fmax = exp(log(θ[5]))
        bio =  dose < 50 ? 1 : 1 - fmax * (dose - 50)/ (f50i + (dose - 50) )
        
        bioav = (y1=bio, y2=1, y3 = 1,) 
    end

    @dynamics begin
        y1' = -kai * y1
        y2' = kai * y1 - (CLi/Vi) * y2 + (Qi/Vpi) * y3 - (Qi/Vi) * y2
        y3' = - (Qi/Vpi) * y3 + (Qi/Vi) * y2
        y4' = y2/Vi 
    end

    @derived begin
        auc = @. 1_000 * y4
        cp := @. 1_000 * y2 / Vi
        dv ~ @. Normal(cp, cp * σ) 
    end
end

The final estimates:

(θ = [0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079], 
Ω = Diagonal(ones(5) * 0.15),
σ = 0.01)

If I did few subjects, It’s fast, the estimation, is fast, but when simulating this much, it takes more than 30 minutes. Any ideas? I think the issue is coming from DataFrame

@mjaber by default the simobs is not parallelized by default. Can you just add

simobs(...., ensemblealg = EnsembleThreads())

if you want this to be reproducible, you can also add a random seed within the function. Please check ? simobs for more information. Let us know if that helps. In the meanwhile, we will test it at our end.

Thanks Vijay for your suggestion. I already tried that before and nothing much gained. After I did some testing. It looks like DataFrame function when converting simobs output is taking that much time. I think thats where attention should be.

Not sure which DataFrame call are you referring to? There is nothing in your code. Also, can you please share your events set up for the Sub. I just assumed a 100 mg dose using events = DosageRegimen(100) and covariate value for dose = 100 and I get the following results

julia> @time sims = simobs(model, Sub, final_estimates,
           simulate_error=false, obstimes=0:0.01:288)    
135.055649 seconds (965.08 M allocations: 77.680 GiB, 7.06% gc time, 5.82% compilation time)
Simulated population (Vector{<:Subject})
  Simulated subjects: 1000
  Simulated variables: auc, dv

Opps. my bad. Here you go:

dose = 450
event = DosageRegimen(dose, time=0,ii=24, addl=1)
Sub = map(i-> Subject(;id=i, events=event, covariates=(dose=dose, wt=70,)), 1:1000)
@time __sim = simobs(model, Sub,  final_estimates, 
                    simulate_error=false, obstimes=0:0.01:280,ensemblealg = EnsembleThreads())
@time SIM_450 = DataFrame(__sim)
  1. The slowdown is not in the simulation as that takes less than 3 minutes (single threaded)
  2. Your simulation is generating 28000 datapoints per subject and you have 1000 subjects resulting in 28 million data points that has to be converted to a dataframe.
  3. 28 million dataframe will obviously take time, but my question to you is if you really need that fine a time grid. Also, what is the purpose of the simulation and what do you want to do as post-processing.

Thanks Vijay; The aim is to simulate and get some endpoints including parameters such as cmax, tmax, AUC with different time points (partials) and assess concentrations at specific points. The workflow, is to simulate with this much followed by calculating some statistics. Does that make sense?

You can use NCA within a model to extract them easily. Since you set simulate_error to false, you can replace the derived block with observed block and then do the simulation. You can also directly convert a simulation to a NCAPopulation and perform NCA. I have a fully reproducible example on just 10 subjects. Go through it and let me know if you have any questions.

using Pumas
using PumasUtilities
using Random
dose = 450
event = DosageRegimen(dose, time=0, ii=24, addl=1, route=NCA.EV)
Sub = map(i -> Subject(; id=i, events=event, covariates=(dose=dose, wt=70,)), 1:10)
# Sub = map(i -> Subject(; id=i, events=DosageRegimen(100), covariates=(dose=100, wt=70, sex=1,)), 1:1000)

# integrated NCA within the model
model = @model begin
    @param begin
        θ ∈ VectorDomain(7, lower=zeros(7))
        Ω ∈ PDiagDomain(5)
        σ ∈ RealDomain(; lower=0.001)
    end

    @random begin
        η ~ MvNormal(Ω)
    end
    @covariates dose wt

    @pre begin
        CLi = exp(log(θ[1]) + 0.75 * log(wt / 70) + η[1])
        Vi = exp(log(θ[2]) + log(wt / 70) + η[2])
        kai = exp(log(θ[3]) + η[3])
        Qi = exp(log(θ[6]) + 0.75 * log(wt / 70))
        Vpi = exp(log(θ[7]) + log(wt / 70) + η[5])
    end

    @dosecontrol begin
        f50i = exp(log(θ[4]) + η[4])
        fmax = exp(log(θ[5]))
        bio = dose < 50 ? 1 : 1 - fmax * (dose - 50) / (f50i + (dose - 50))

        bioav = (y1=bio, y2=1, y3=1,)
    end

    @dynamics begin
        y1' = -kai * y1
        y2' = kai * y1 - (CLi / Vi) * y2 + (Qi / Vpi) * y3 - (Qi / Vi) * y2
        y3' = -(Qi / Vpi) * y3 + (Qi / Vi) * y2
        #y4' = y2 / Vi
    end

    @observed begin
        # auc = @. 1_000 * y4
        cp := @. 1_000 * y2 / Vi
        simnca := @nca cp
        cmax = NCA.cmax(simnca)
        auc = NCA.auc(simnca)
        auc04 = NCA.auc(simnca, interval=(0, 4))
        # dv ~ @. Normal(cp, cp * σ)
    end
end

final_estimates = (θ=[0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079],
    Ω=Diagonal(ones(5) * 0.15),
    σ=0.01)
#
@time sims = simobs(model, Sub, final_estimates,
    simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
simdf = DataFrame(sims, include_events=false, include_covariates=false)
#


## converting simulations directly to NCAPopulation
model = @model begin
    @param begin
        θ ∈ VectorDomain(7, lower=zeros(7))
        Ω ∈ PDiagDomain(5)
        σ ∈ RealDomain(; lower=0.001)
    end

    @random begin
        η ~ MvNormal(Ω)
    end
    @covariates dose wt

    @pre begin
        CLi = exp(log(θ[1]) + 0.75 * log(wt / 70) + η[1])
        Vi = exp(log(θ[2]) + log(wt / 70) + η[2])
        kai = exp(log(θ[3]) + η[3])
        Qi = exp(log(θ[6]) + 0.75 * log(wt / 70))
        Vpi = exp(log(θ[7]) + log(wt / 70) + η[5])
    end

    @dosecontrol begin
        f50i = exp(log(θ[4]) + η[4])
        fmax = exp(log(θ[5]))
        bio = dose < 50 ? 1 : 1 - fmax * (dose - 50) / (f50i + (dose - 50))

        bioav = (y1=bio, y2=1, y3=1,)
    end

    @dynamics begin
        y1' = -kai * y1
        y2' = kai * y1 - (CLi / Vi) * y2 + (Qi / Vpi) * y3 - (Qi / Vi) * y2
        y3' = -(Qi / Vpi) * y3 + (Qi / Vi) * y2
    end

    @observed begin
        dv = @. 1_000 * y2 / Vi
    end
end

final_estimates = (θ=[0.04389707139276508, 7.642248641569024, 0.5063665507090237, 55.583019021299606, 0.7764954718388033, 0.7811025178500393, 6.995697749715079],
    Ω=Diagonal(ones(5) * 0.15),
    σ=0.01)
#
@time sims = simobs(model, Sub, final_estimates,
    simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
ncapop = NCAPopulation(Subject.(sims))
NCA.cmax(ncapop)
NCA.auc(ncapop)
NCA.auc(ncapop, interval=(0,4))
#

Note that the results between the two methods are identical and generate results by each dose per subject. So you don’t really have to take these into DataFrames which are always expensive

1 Like

This is amazing. I will go through your code and let you know if I have any questions/comments. Appreciate the response, Vijay!

@vijay I have had a similar experience in the past. Converting simulations to DataFrame is very slow. In some cases, it was required for me to convert a simulation to a DataFrame for the sake of post processing. My dataset had approximately 6000 subjects and took approximately 10 minutes for this conversion.

For the case study you shared here, I used your code to simulate 5000 subjects instead of 10. Converting by doing DataFrame(sims) took > 1 minute than converting the required columns to a DataFrame on my own that took ~1 second.

julia> Sub = map(i -> Subject(; id=i, events=event, covariates=(dose=dose, wt=70,)), 1:5000)
Population
  Subjects: 5000
  Covariates: dose, wt
  Observations: 

julia> @time sims = simobs(model, Sub, final_estimates,
           simulate_error=false, obstimes=0:0.5:288, rng = Random.seed!(1234))
 12.803235 seconds (87.68 M allocations: 7.893 GiB, 13.31% gc time)
Simulated population (Vector{<:Subject})
  Simulated subjects: 5000
  Simulated variables: cmax, auc, auc04

julia> @time simdf = DataFrame(sims, include_events=false, include_covariates=false)
 75.476109 seconds (173.89 M allocations: 221.364 GiB, 6.89% gc time)
2885000×19 DataFrame
     Row │ id       time     cmax      auc        auc04     evid    η_1         η_2        η_3         η_4        η_5        y1             y2        y3         CLi        Vi    ⋯
         │ String?  Float64  Float64?  Float64?   Float64?  Int64?  Float64?    Float64?   Float64?    Float64?   Float64?   Float64?       Float64?  Float64?   Float64?   Float ⋯
─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       1 │ 1            0.0   9743.63  1.37404e6   25207.5       0  -0.139322    0.421074  -0.162506    0.278433  0.162761   154.772          0.0      0.0       0.0381881  11.64 ⋯
       2 │ 1            0.5   9743.63  1.37404e6

julia> @time sims_df = vcat(map(sims) do z
           obs = DataFrame(id = z.subject.id, cmax = z.observations.cmax, auc = z.observations.auc, auc04 = z.observations.auc04)
           covs = DataFrame([(z.subject.covariates.u..., id = z.subject.id)])
           eves = vcat(DataFrame.(z.subject.events)...)
           obs_eves = hcat(obs, eves)
           subj_sims = outerjoin(covs, obs_eves, on=[:id])
       end...)
  0.966107 seconds (5.02 M allocations: 327.896 MiB, 30.04% compilation time)
1 Like

One additional tip that is not the bottleneck here but might be a good thing to know in the future, especially if you are doing heavy map stuff (such as in tons of simulations).

You can use the ThreadsX.map instead of map. It will run a multithreaded parallel map.

2 Likes

This is really helpful folks! Appreciate that.

Sorry for being late to the party, but if any of you could try to replicate this on v2.4 (the latest version of Pumas) I would appreciate it. On an internal project we found a similar bottleneck, and realized that we could improve the DataFrame constructor significantly, so now it should run much faster.

1 Like