[6.4.0] Invalid Memory Crash at High k Point Sampling

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
marvin_poul
Newbie
Newbie
Posts: 4
Joined: Tue Sep 07, 2021 8:14 am

[6.4.0] Invalid Memory Crash at High k Point Sampling

#1 Post by marvin_poul » Wed Oct 09, 2024 8:53 am

Dear Forum,

I'm experiencing a lot of crashes when running fairly unsymmetric, but small structures at high kpoint sampling (KSPACING=0.05) and high plane wave cutoff (ENCUT=750). The structures are originally derived from symmetric crystals, but have a lot of random strain applied to them.
Within ~30s and before k point generation starts the jobs crash with the following error

munmap_chunk(): invalid pointer

before SLURM cancels the job and VASP prints a stack trace. I'm suspecting it's a memory issue, because the same structures do run successfully on larger KSPACING. It is my understanding that reducing KPAR and increasing NCORE should reduce memory load, so I ran two trials once with 32 cores, KPAR=2, NCORE=4 and once with 128 cores, KPAR=1 and NCORE=32, but the same error appears. I allocated 2GB/core RAM via SLURM. I've attached both runs, the stdout and stderr during the run are in the files error.out and error.msg.
Our hardware are nodes with 2 AMD EPYC 9754 128-Core Processors and 768GB RAM, so the calculations should run within a single processor on the same node.

The present calculations use ADDGRID, but I've observed the same problem in older runs without it as well. LREAL should not make a difference I suppose as the structures are only 2 atoms.
Are there any other options to reduce memory load that I could try?
If it really does turn out to be an insufficient memory issue, it would be very helpful if VASP could print a better error message before quitting.

Because the crash occurs before k points and plane wave info is printed to OUTCAR, I'm not sure about the exact memory requirements, but as quick check I ran the same structure and same settings with a different DFT code (SPHInX https://sxrepo.mpie.de/) and those finish without problem. So I think we can rule out hardware limitations.

Let me know if you need more information and thanks for the help already,
Marvin

EDIT: I saw this post forum/viewtopic.php?t=18812 that sounds similar, but I've check that on our machines

Code: Select all

ulimit -s unlimited
You do not have the required permissions to view the files attached to this post.

fabien_tran1
Global Moderator
Global Moderator
Posts: 406
Joined: Mon Sep 13, 2021 11:02 am

Re: [6.4.0] Invalid Memory Crash at High k Point Sampling

#2 Post by fabien_tran1 » Thu Oct 10, 2024 7:37 am

Hi,

I could reproduce the crash that you reported. However, it is not due to a memory problem, but to a bug in the tetrahedron method. We will go back to you as soon as we have a fix.


marvin_poul
Newbie
Newbie
Posts: 4
Joined: Tue Sep 07, 2021 8:14 am

Re: [6.4.0] Invalid Memory Crash at High k Point Sampling

#3 Post by marvin_poul » Thu Oct 10, 2024 8:56 am

Hi Fabien,

thanks for you quick reply! Indeed, without tetrahedron smearing the calculation start to run, though with KSPACING=0.05 I then run into the NKPTS>NKDIM problem. Larger KSPACING seem to work (at least the actual calculation starts, it hasn't finished yet).

Best,
Marvin


fabien_tran1
Global Moderator
Global Moderator
Posts: 406
Joined: Mon Sep 13, 2021 11:02 am

Re: [6.4.0] Invalid Memory Crash at High k Point Sampling

#4 Post by fabien_tran1 » Thu Oct 10, 2024 2:59 pm

A simple fix for the tetrahedron method is, in mkpoints.F (at line 1925), to replace

Code: Select all

               NSIZE=SIZE(IDTMP,2)*(2-REAL((I3-1)*NKX*NKY+(I2-1)*I1,KIND=q)/REAL(NKX*NKY*NKZ,KIND=q))+1000

by

Code: Select all

               NSIZE=NPX*NPY*NPZ*6

I have quickly tested this fix. Please, monitor the results to check that they are ok.

Besides, from your previous message I understood that with the default method you have another problem. It is right? I tried KSPACING=0.04, but did not run into problem. Could you be more specific?


marvin_poul
Newbie
Newbie
Posts: 4
Joined: Tue Sep 07, 2021 8:14 am

Re: [6.4.0] Invalid Memory Crash at High k Point Sampling

#5 Post by marvin_poul » Fri Oct 11, 2024 11:45 am

Sorry for being unspecific. I was referring to this error, which I get for the posted structure with ISMEAR=1 and KSPACING=0.05

VERY BAD NEWS! internal error in subroutine IBZKPT:
NKPT>NKDIM 21002

My understanding was that NKDIM refers to a hardcoded limit that can be changed by recompiling VASP. Are there any adverse side effects to increasing the limit? In particular is the memory for the k points statically allocated or only for the amount actually needed in a given calculation? (Sorry if this goes off topic a bit.)

I'll also report back once I've had a chance to recompile with the fix you suggested.

Best,
Marvin


Post Reply