What distance function is used in `DeepPumas.cluster_subjects` DTW?

As per title, say I’ve done a

res = cluster_subjects(pop_df, :dv, 2);

and see something like

julia> res.assignments
10×4 DataFrame
 Row │ subject  cluster  cost       cluster_center 
     │ String   Int64    Float64    Int64
─────┼─────────────────────────────────────────────
   1 │ c1403          1   5.25774                7
   2 │ c1406          1  32.7788                 7
   3 │ c1409          1   4.9941                 7
   4 │ c1411          1  13.8512                 7
   5 │ c1413          2   0.0                    5
   6 │ c15023         1   6.81329                7
   7 │ c15025         1   0.0                    7
   8 │ c15027         2   0.345323               5
   9 │ c15029         1   4.26129                7
  10 │ c15031         1   5.39411                7

What is the distance function used in DTW to calculate cost between the subject and the medoid? I want to run some analyses on top of this, to validate the clustering (calculate silhouette values, etc.) and wanted to use the same distance function in DynamicAxistWarping.dtw that was used in fitting for consistency. Are there any plans to allow the usage of a user-defined DTW cost function in the future?

While using the cluster_subjects functionality another question/point arose (might add more in the future if more things pop up, if you’d rather me create a new post - please let me know). Why does medoids(ClusterResults) return an unordered array of medoids? That is, the first entry in it isn’t necessarily the medoid of the first cluster, which is a bit confusing and not what I would’ve expected.

julia> res
DeepPumas.ClusterResults(10×4 DataFrame
 Row │ subject  cluster  cost       cluster_center 
     │ String   Int64    Float64    Int64
─────┼─────────────────────────────────────────────
   1 │ c1403          2  1.44789                10
   2 │ c1406          2  6.48461                10
   3 │ c1409          2  0.910235               10
   4 │ c1411          1  0.0                     4
   5 │ c1413          3  0.0                     5
   6 │ c15023         1  1.53086                 4
   7 │ c15025         2  2.51494                10
   8 │ c15027         3  0.171713                5
   9 │ c15029         2  0.0955006              10
  10 │ c15031         2  0.0                    10, 2, true)

julia> medoids(res)
3-element Vector{Int64}:
 10
  4
  5

Hi, Domas.

Answering both questions:

  1. The default distance is SqEuclidean. You can change it with the dist kwarg
  2. medoids(ClusterResults) calls unique in clusterResults.assignments.cluster_center, so the vector is ordered as the cluster centers show up in that dataframe’s column. This can be verified by comparing the cluster_center column and medoids(res) shown in the question

Hope this helps.

Lucas.

Hi Lucas,

Many thanks for the reply!

  1. Ah, the dist kwarg didn’t seem to be listed in the docs, many thanks for clarifying!
  2. My question was rather not what it does, but why does it do that, as I found it somewhat confusing and counterintuitive that the medoids would not be ordered according to the cluster number (the medoid of the first cluster being the first element in the returned medoids).