For the past 3-4 days when I go to launch and instance i will be stuck on the page " We’re spinning up a machine for you. This page will automatically [reload] once it is ready." for hours, and the instance does not actually spin up. The instance did spin up one time in this period but in the middle of the run it disconnected and then went back to the spinning up page. I am not sure if there is a way around this issue? Thank you in advance!
Brooke, thanks for the report. The issue has been fixed, can you please try again.
Thank you very much for the help!
The new instance did spin up relatively quick, and it ran for about 10 minutes, then it crashed again and went to the connecting screen. I tried to stop the instance and launch a new one and it has been stuck on the connecting screen for ~30 minutes.
Hi @Blangev1 please could try launching a pumas job again and let us know ?
I did launch a new instance, and it did spin up and even had a successful run. When I went to re-run the same file it disconnected and the same issue occurred and instances will again no longer spin up.
Here are a few details on the run in case it is in anyway helpful:
- I am trying to run/ launch the 8x64 instance.
- There is no error code it displays the same message it would if the internet were to disconnect or it timed out ( the reconnect or reload window message)
- If you attempt to reload it brings you to the spinning up connect to julia page
- It does not seem to be a memory issue as it is reaching a max of ~27 GiB for the full run ( and disconnected in the second run around the 23 GiB mark).
- It is failing at ~80% CPU utilization
- In the past this same file has run multiple times on more than one instance simultaneously.
Just following up on this issue as I am still experiencing the same issue.
Only additional detail to add is that today I had tried to launch 2 instances and they both crashed simultaneously.
We have received the following reply from the JuliaHub engineers:
From [this post] we know that the user is running a julia script that presumably takes a lot of CPU resources and/or memory. Moments after the user has run the script, the user notices that Pumas IDE is not responding. The user then launches another Pumas IDE but this one gets stuck in “Submitted” state.
We were unable to reproduce this issue on our test cluster but based on our understanding of the system we have come up with the below diagnosis.
All apps (PumasIDE, RStudio etc) that are run by a user are launched on an instance called a Cloudstation. Each user is dynamically assigned only one Cloudstation instance. Each app can scale within this instance potentially taking up all the CPU/memory/disk resources. In [this post] the high CPU usage of the Pumas julia script causes the code-server process also running in the same container to become unresponsive which results in a 502 response. The 502 response shows up as the waiting page in our system. When the user launches another PumasIDE there is not enough free resources in the existing Cloudstation instance and since we can have only one Cloudstation per user, the new PumasIDE remains in Submitted state.
- Change Cloudstation preference to a larger instance with more CPU cores
- When PumasIDE is not responsive it is advisable to wait or stop that IDE before starting a new one
- It is advisable to run Pumas scripts as batch jobs. Batch jobs will run on dedicated instances and so will not affect the users Pumas IDE
- “Failed” Pumas IDEs will be cleaned up by the system so refreshing the page will not help. A Pumas IDE can become “Failed” if you tried to use more memory than what was available on the system, for example. You can check for the “Failed” state in the jobs list on JuliaHub.