CSV.read times with multithreading

Running julia from a terminal:

julia> using CSV, DataFrames
julia> Threads.nthreads()
2
julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));
julia> @time CSV.read("test.csv", DataFrame);
  0.414442 seconds (1.25 M allocations: 87.230 MiB, 1.07% gc time, 164.29% compilation time)
julia> @time CSV.read("test.csv", DataFrame);
  0.009756 seconds (12.57 k allocations: 2.717 MiB)

and with julia --threads=auto

julia> Threads.nthreads()
8
julia> using CSV, DataFrames
julia> @time CSV.read("test.csv", DataFrame);
  0.464540 seconds (1.40 M allocations: 97.160 MiB, 4.46% gc time, 690.06% compilation time)
julia> @time CSV.read("test.csv", DataFrame);
  0.005446 seconds (16.48 k allocations: 2.821 MiB)

In VSCode with Pumas-2.4.1.app (Pumas for Desktop on an academic license)

julia> using CSV, DataFrames
julia> Threads.nthreads()
2
julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));
julia> @time CSV.read("test.csv", DataFrame);
  2.442001 seconds (2.85 M allocations: 188.613 MiB, 1.72% gc time, 186.92% compilation time: 10% of which was recompilation)
julia> @time CSV.read("test.csv", DataFrame);
  0.019704 seconds (12.75 k allocations: 2.719 MiB)

and when I set "julia.NumThreads": 8 in settings.json, I’ve now been waiting minutes for @time CSV.read("test.csv", DataFrame); and nothing happens.

Am I doing something wrong or why is there a difference in read-in times in the Pumas environment?

@pascalschulthess can you give us more information on the system that you are running?
What Pumas version, OS, Julia (standalone) version and so on?

I cannot reproduce that locally.

Pumas:

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.559291 seconds (1.33 M allocations: 94.718 MiB, 3.04% gc time, 372.51% compilation time: 24% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.008426 seconds (14.14 k allocations: 2.720 MiB)

Julia (standalone):

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.400417 seconds (1.23 M allocations: 85.394 MiB, 1.42% gc time, 327.69% compilation time)

julia> @time CSV.read("test.csv", DataFrame);
  0.009028 seconds (14.08 k allocations: 2.716 MiB)

I’m on a Mac M1 Pro running MacOS 13.5.2, Julia 1.9.3, Pumas 2.4.1, VSCode 1.82.2, and Julia for VSCode 1.51.2.

I think that Pumas Desktop is running using Rosetta not native Apple Silicon.

Are you running Julia for Apple Silicon?
Can you check if Pumas runs with Rosetta.

That would explain the performance difference…

Yes. That’s the case. Cool. How do I change that?

Unfortunately it is currently not possible to run PumasDesktop in Apple Silicon without Rosetta emulation.

That’s indeed unfortunate.