Is parallel processing for parameter estimation using nlme in Pumas and simulation from Pumas models available? If so, how does it work for threading and distributed processing?
For both Pumas Desktop and Pumas in the cloud (using JuliaHub), multithreading is on by default and you don’t need to do anything for fit
and infer(..., Bootstrap(...)
.
However, for vpc
and simobs
, the default behavior is single-threaded due to absolute reproducibility.
That means, if you want maximum reproducibility for these you need to run them as single-threaded (the default behavior).
Side note: this is because of pseudo-random number generator (PRNG) in multithreaded operations, and plague all programming languages.
I’ve included links above to the docstrings that shows that the default function argument for ensemblealg
is EnsembleThreads()
for functions that are multi-threaded by default, and EnsembleSerial()
non-multi-threaded by default.
If you want to have a function that accepts ensemblealg
to run multi-threaded just call them with the ensemblealg = EnsembleThreads()
.
For the distributed, also known as multi-process, parallelism, this is available in Pumas in the Cloud through the JuliaHub VSCode extension.
I will defer to the JuliaHub Distributed Jobs documentation on more details.
In Pumas, you would need to change ensemblealg = EnsembleDistributed()
to use this JuliaHub feature.
@storopoli thank you. Multiple follow-up questions.
- could you please explain the difference between
threaded
anddistributed
? - Could you explain how should I interpret a thread if I have an 8-core laptop?
- Can a user match Distributed and Threaded?
Both are parallel. And both “spawn processes”.
However Threaded means that there is a master process that controls all the other threads by forking and this is accomplished using the underlying machine operational system. Hence, it is contained to the machine. And it can only use the memory and resources allocated to the "parent thread` in the underlying machine.
Whereas Distributed means that the process does a different routine that sends an asynchronous request to another thread (that could be located in the machine or anywhere else in the world) and, since this is asynchronous, it returns a sort of promise, that is also called Future
in Julia.
This is not tied, i.e. not forked from, to the “parent thread” and do not have the memory and resources restrictions of the “parent thread”.
If this is dispatched to the same machine, the behavior is similar to threaded, but maybe with a little overhead cost of managing the threads without forking them (which is quite efficient in most OS).
Distributed enables us to use threads on other machines.
So we can have a parallel computation running on 1,001 threads, where one of these is one of our 8 available threads in “8-core laptop”.
The other thousand is someone else’s computer.
Note: most laptops have hyper-threading capabilities that with some fancy scheduling they can simulate the behavior of 2 threads per 1 core. This means that a 8-core laptop could have 16 threads, 2 per core. Also there is a new trend that some laptops can have “efficiency” and “power” cores. Efficiency core have a lower frequency (i.e. they consume less energy) and do not have hyper-threading capabilities. Power cores have higher frequency (consume more energy) and have hyper-threading capabilities, but are only used when necessary. This makes the laptop very efficient for easy stuff like browsing the internet, yet powerful enough to do big number crunching when necessary.