Speeding up NCA on Simulations

I am performing NCA on simulations (x1000) with varying Nsubjects (x7) and sampling scenarios (x8), and it tests out successfully on 2 reps. At 100 reps, it fails to complete in 6 hrs.

Is there a way to speed up the following code?

cum_sim_df = CSV.read("./modeling/Depo-SubQ Provera 104/abs1st_PK_sims_SD_adjust_params.csv", DataFrame, missingstrings = [""])

cum_sim_NCAresults = map(Base.product(1:2, 1:length(NSubjs), 1:length(samp_names))) do i
    println(i[1])
    i_sim_df = filter([:rep, :NSubj, :samp] => (x, y, z) -> x==i[1] && y==NSubjs[i[2]] && z==samp_names[i[3]], cum_sim_df)
    
    i_sim_df[!, :route] .= "ev"

    i_sim_NCA = read_nca(i_sim_df,
                        id              =   :id,
                        time            =   :time,
                        observations    =   :dv,
                        amt             =   :amt,
                        # all subj's same grouping w/in each read_nca run, but to retain labels
                        group           =   [:rep, :NSubj, :samp],
                        route           =   :route)
 
    i_sim_tmax          =   NCA.tmax(i_sim_NCA)
    i_sim_cmax          =   NCA.cmax(i_sim_NCA)
    i_sim_auc           =   NCA.auc(i_sim_NCA)
    i_sim_auc_t         =   NCA.auc(i_sim_NCA, interval=(0, 90))
    x = zip(i_sim_NCA, NCA.tmax.(i_sim_NCA))
    i_sim_auc_0_tmax    =   map(x -> NCA.auc(x[1], interval=(0, x[2])), x)
    i_sim_auc_tmax_t    =   map(x -> NCA.auc(x[1], interval=(x[2], 90)), x)

    temp_df             =   DataFrame(auc_0_tmax = i_sim_auc_0_tmax, auc_tmax_t = i_sim_auc_tmax_t)
    i_sim_NCAresults_df =   hcat(i_sim_tmax, i_sim_cmax, i_sim_auc, i_sim_auc_t, temp_df, makeunique=true)

    return i_sim_NCAresults_df
end

cum_sim_NCAresults_df   =   vcat(cum_sim_NCAresults...)

hi Donald

Without knowing what your dataset structure is, it is hard to guess, but my first suggestion is to perhaps break up the function into smaller parts and bring the pieces together like this below

function do_nca(sim)
  ncadf = nca_prep(sim)
  nca = rnca(ncadf)
  res = compute_results(nca)
  return res
end

The biggest slowdown in your code above is probably the filter statement, so having a separate nca_prep function will help. Also, you may want to think through if there are alternate ways of passing the data in.

rnca is just call the read_nca function and get the data ready. And finally, compute_results will do all your specific NCA related computations.

We can help more if you have some example data or mock structure of what you are passing in.

Vijay

It would useful to know where the time is spent. If you define a normal function instead of using the do syntac to generate a closure then you can profile that function on a single subject, see Home · FlameGraphs.jl. Alternatively, you might be able to just use GitHub - KristofferC/TimerOutputs.jl: Formatted output of timed sections in Julia to get some timer output.

Btw, isn’t the filtering essentially a groupby operation? If so then using groupby might also be more efficient.