CSV.read times with multithreading

pascalschulthess · September 18, 2023, 4:11pm

Running julia from a terminal:

julia> using CSV, DataFrames
julia> Threads.nthreads()
2
julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));
julia> @time CSV.read("test.csv", DataFrame);
  0.414442 seconds (1.25 M allocations: 87.230 MiB, 1.07% gc time, 164.29% compilation time)
julia> @time CSV.read("test.csv", DataFrame);
  0.009756 seconds (12.57 k allocations: 2.717 MiB)

and with julia --threads=auto

julia> Threads.nthreads()
8
julia> using CSV, DataFrames
julia> @time CSV.read("test.csv", DataFrame);
  0.464540 seconds (1.40 M allocations: 97.160 MiB, 4.46% gc time, 690.06% compilation time)
julia> @time CSV.read("test.csv", DataFrame);
  0.005446 seconds (16.48 k allocations: 2.821 MiB)

In VSCode with Pumas-2.4.1.app (Pumas for Desktop on an academic license)

julia> using CSV, DataFrames
julia> Threads.nthreads()
2
julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));
julia> @time CSV.read("test.csv", DataFrame);
  2.442001 seconds (2.85 M allocations: 188.613 MiB, 1.72% gc time, 186.92% compilation time: 10% of which was recompilation)
julia> @time CSV.read("test.csv", DataFrame);
  0.019704 seconds (12.75 k allocations: 2.719 MiB)

and when I set "julia.NumThreads": 8 in settings.json, I’ve now been waiting minutes for @time CSV.read("test.csv", DataFrame); and nothing happens.

Am I doing something wrong or why is there a difference in read-in times in the Pumas environment?

storopoli · September 18, 2023, 6:26pm

@pascalschulthess can you give us more information on the system that you are running?
What Pumas version, OS, Julia (standalone) version and so on?

I cannot reproduce that locally.

Pumas:

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.559291 seconds (1.33 M allocations: 94.718 MiB, 3.04% gc time, 372.51% compilation time: 24% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.008426 seconds (14.14 k allocations: 2.720 MiB)

Julia (standalone):

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(10_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.400417 seconds (1.23 M allocations: 85.394 MiB, 1.42% gc time, 327.69% compilation time)

julia> @time CSV.read("test.csv", DataFrame);
  0.009028 seconds (14.08 k allocations: 2.716 MiB)

pascalschulthess · September 18, 2023, 6:39pm

I’m on a Mac M1 Pro running MacOS 13.5.2, Julia 1.9.3, Pumas 2.4.1, VSCode 1.82.2, and Julia for VSCode 1.51.2.

storopoli · September 18, 2023, 8:10pm

I think that Pumas Desktop is running using Rosetta not native Apple Silicon.

Are you running Julia for Apple Silicon?
Can you check if Pumas runs with Rosetta.

That would explain the performance difference…

pascalschulthess · September 18, 2023, 8:14pm

Yes. That’s the case. Cool. How do I change that?

storopoli · September 18, 2023, 8:39pm

Unfortunately it is currently not possible to run PumasDesktop in Apple Silicon without Rosetta emulation.

pascalschulthess · September 19, 2023, 7:39am

That’s indeed unfortunate.

Topic		Replies	Views
Pumas 2.5.0 bugs Pumas issues	3	120	February 16, 2024
Simulation time Simulation	12	404	July 24, 2023
Export DataFrame to csv - Pumas How-to	2	344	August 27, 2021
Parallel processing Estimation	3	199	September 1, 2023
Resolving Pkg.test("Pumas") errors Basic Usage	1	634	April 14, 2020

CSV.read times with multithreading

Related topics