Hi there - is there a way to output multiple data summaries by group with simple one-line code in Pumas?
For example, I have the following code and output below:
using Distributions
using Random
using Statisitcs
using DataFramesMeta
subjs = [randstring("abc123") for i in 1:30]
z = DataFrame(
CL = rand(LogNormal(2, 1), 90),
trialID = vcat(fill("trial1", 30), fill("trial2", 30), fill("trial3", 30)),
subjID = vcat(fill(subjs, 3)...))
@chain z begin
groupby([:trialID])
@combine :X = vcat(quantile(:CL, [0.1, 0.25, 0.5, 0.75, 0.9]), mean(:CL))
end
Current output:
18×2 DataFrame
Row │ trialID X
│ String Float64
─────┼───────────────────
1 │ trial1 2.315
2 │ trial1 3.18237
3 │ trial1 6.43796
⋮ │ ⋮ ⋮
17 │ trial3 17.3299
18 │ trial3 9.49089
13 rows omitted
Expected output:
3×7 DataFrame
Row │ trialID X1 X2 X3 X4 X5 X6
│ String Float64 Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────────
1 │ trial1 2.315 3.18237 6.43796 11.0749 12.9025 8.76228
2 │ trial2 1.39964 2.39561 7.55012 15.5998 24.8026 11.8699
3 │ trial3 1.7409 2.49885 6.60066 12.9614 17.3299 9.49089
I can write multiple lines to get multiple quantiles or means (code below) as separate columns, but I am looking for a less redundant way of doing it
@chain z begin
groupby([:trialID])
@combine begin
:X1 = quantile(:CL, 0.1)
:X2 = quantile(:CL, 0.25)
:X3 = quantile(:CL, 0.5)
:X4 = quantile(:CL, 0.75)
:X5 = quantile(:CL, 0.9)
:X6 = mean(:CL)
end
end