Hello,
I have a DataFrame and I want to take the average of specific sections of the columns.
I could use df = mean.(eachcol(_)) to average the entire column, but I want the average of only the values in the column with the same sigma value.
Here is what I want the final DataFrame to look like:
The relevant part of the documentation is Split-apply-combine · DataFrames.jl. You are looking for a groupby
operation together with combine
, i.e. something like
julia> df = crossjoin(DataFrame(σ=0.1:0.1:0.3), DataFrame(x1=randn(5), x2=rand(5)))
15×3 DataFrame
Row │ σ x1 x2
│ Float64 Float64 Float64
─────┼────────────────────────────────
1 │ 0.1 -0.0615925 0.86128
2 │ 0.1 -1.99798 0.424245
3 │ 0.1 -0.733117 0.0715064
4 │ 0.1 0.0684695 0.135994
5 │ 0.1 0.638693 0.113608
6 │ 0.2 -0.0615925 0.86128
7 │ 0.2 -1.99798 0.424245
8 │ 0.2 -0.733117 0.0715064
9 │ 0.2 0.0684695 0.135994
10 │ 0.2 0.638693 0.113608
11 │ 0.3 -0.0615925 0.86128
12 │ 0.3 -1.99798 0.424245
13 │ 0.3 -0.733117 0.0715064
14 │ 0.3 0.0684695 0.135994
15 │ 0.3 0.638693 0.113608
julia> combine(groupby(df, "σ"), Not("σ") .=> mean)
3×3 DataFrame
Row │ σ x1_mean x2_mean
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 0.1 -0.417105 0.321327
2 │ 0.2 -0.417105 0.321327
3 │ 0.3 -0.417105 0.321327
or
julia> gdf = groupby(df, "σ");
julia> combine(gdf, valuecols(gdf) .=> mean)
3×3 DataFrame
Row │ σ x1_mean x2_mean
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 0.1 -0.417105 0.321327
2 │ 0.2 -0.417105 0.321327
3 │ 0.3 -0.417105 0.321327