As per title, say I’ve done a
res = cluster_subjects(pop_df, :dv, 2);
and see something like
julia> res.assignments
10×4 DataFrame
Row │ subject cluster cost cluster_center
│ String Int64 Float64 Int64
─────┼─────────────────────────────────────────────
1 │ c1403 1 5.25774 7
2 │ c1406 1 32.7788 7
3 │ c1409 1 4.9941 7
4 │ c1411 1 13.8512 7
5 │ c1413 2 0.0 5
6 │ c15023 1 6.81329 7
7 │ c15025 1 0.0 7
8 │ c15027 2 0.345323 5
9 │ c15029 1 4.26129 7
10 │ c15031 1 5.39411 7
What is the distance function used in DTW to calculate cost between the subject and the medoid? I want to run some analyses on top of this, to validate the clustering (calculate silhouette values, etc.) and wanted to use the same distance function in DynamicAxistWarping.dtw that was used in fitting for consistency. Are there any plans to allow the usage of a user-defined DTW cost function in the future?
While using the cluster_subjects functionality another question/point arose (might add more in the future if more things pop up, if you’d rather me create a new post - please let me know). Why does medoids(ClusterResults) return an unordered array of medoids? That is, the first entry in it isn’t necessarily the medoid of the first cluster, which is a bit confusing and not what I would’ve expected.
julia> res
DeepPumas.ClusterResults(10×4 DataFrame
Row │ subject cluster cost cluster_center
│ String Int64 Float64 Int64
─────┼─────────────────────────────────────────────
1 │ c1403 2 1.44789 10
2 │ c1406 2 6.48461 10
3 │ c1409 2 0.910235 10
4 │ c1411 1 0.0 4
5 │ c1413 3 0.0 5
6 │ c15023 1 1.53086 4
7 │ c15025 2 2.51494 10
8 │ c15027 3 0.171713 5
9 │ c15029 2 0.0955006 10
10 │ c15031 2 0.0 10, 2, true)
julia> medoids(res)
3-element Vector{Int64}:
10
4
5
Hi, Domas.
Answering both questions:
- The default distance is
SqEuclidean. You can change it with the dist kwarg
medoids(ClusterResults) calls unique in clusterResults.assignments.cluster_center, so the vector is ordered as the cluster centers show up in that dataframe’s column. This can be verified by comparing the cluster_center column and medoids(res) shown in the question
Hope this helps.
Lucas.
Hi Lucas,
Many thanks for the reply!
- Ah, the
dist kwarg didn’t seem to be listed in the docs, many thanks for clarifying!
- My question was rather not what it does, but why does it do that, as I found it somewhat confusing and counterintuitive that the medoids would not be ordered according to the cluster number (the medoid of the first cluster being the first element in the returned medoids).