VASP Job Failure with Soft Lockup on HPC Nodes | Vasp5.4.4

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
n_sukumar1
Newbie
Newbie
Posts: 1
Joined: Sat May 14, 2022 7:11 am

VASP Job Failure with Soft Lockup on HPC Nodes | Vasp5.4.4

#1 Post by n_sukumar1 » Tue Nov 12, 2024 9:02 am

We are experiencing persistent issues with VASP 5.4.4 compiled using Intel OneAPI on our HPC system. These issues result in the compute nodes frequently going down, causing jobs to fail or terminate unexpectedly. Below is a typical error message seen in the logs:

[1822049.022978] watchdog: BUG: soft lockup - CPU#101 stuck for 22s! [vasp_std:1145679]

Has anyone encountered similar issues with VASP (particularly version 5.4.4) on high-core systems?
Could this problem be related to Intel OneAPI's handling of MPI or thread scheduling?
What are the recommended compilation flags or settings to optimize stability and performance on systems with high CPU core counts?
Troubleshooting Steps Taken:

We've tried modifying runtime parameters to reduce CPU load but still encounter the issue sporadically.
Standard diagnostic tools have not indicated memory leaks, but CPU utilization spikes are observed.
Seeking Input: Any insights on how to adjust VASP configuration, compilation options, or node settings to avoid soft lockups would be greatly appreciated. Additionally, suggestions for kernel tuning or Intel OneAPI settings that could mitigate this would be helpful.


henrique_miranda
Global Moderator
Global Moderator
Posts: 503
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: VASP Job Failure with Soft Lockup on HPC Nodes | Vasp5.4.4

#2 Post by henrique_miranda » Tue Nov 12, 2024 3:41 pm

Firstly, thank you for your report!

I have never seen this issue and unfortunately, it is hard for us to try and reproduce it.
I have a few suggestions for you to try:

  1. compile with a different toolchain: perhaps a previous version of the intel compiler or even the gnu compiler
  2. try compiling with another version of MPI: openmpi for example
  3. compile your own version of scalapack and link VASP to it

Post Reply