Continuing a HSE06 band structure calculation in time constrained HPC facility

To share experience including discussions about scientific questions.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
hatedark1
Newbie
Newbie
Posts: 24
Joined: Fri Mar 24, 2023 1:19 pm

Continuing a HSE06 band structure calculation in time constrained HPC facility

#1 Post by hatedark1 » Fri Jul 28, 2023 5:30 pm

Dear fellows,

I am facing a time constraint when calculating the band structure of a material using the HSE06 functional on an HPC facility. The time limit per job is 7 days and I reached it the first time I tried, resulting in a killed job in electronic step 11 (see snippet of the last lines of stdout below).

[...]
the WAVECAR file was read successfully
initial charge from wavefunction
entering main loop
N E dE d eps ncg rms ort
gam= 0.000 g(H,U,f)= 0.798E+00 0.177E+00 0.236E-17 ort(H,U,f) = 0.000E+00 0.000E+00 0.000E+00
SDA: 1 -0.221003576234E+03 -0.22100E+03 -0.39012E+00 15232 0.975E+00 0.000E+00
gam= 0.382 g(H,U,f)= 0.220E+00 0.768E-01 0.810E-60 ort(H,U,f) = 0.304E+00 0.802E-01 0.573E-60
DMP: 2 -0.221282101018E+03 -0.27852E+00 -0.17737E+00 15232 0.297E+00 0.384E+00
gam= 0.382 g(H,U,f)= 0.600E-01 0.428E-01 0.676-168 ort(H,U,f) = 0.653E-01 0.649E-01 0.908-168
DMP: 3 -0.221403766823E+03 -0.12167E+00 -0.60993E-01 15232 0.103E+00 0.130E+00
gam= 0.382 g(H,U,f)= 0.320E-01 0.243E-01 0.338E-32 ort(H,U,f) = 0.128E-01 0.497E-01 0.700E-32
DMP: 4 -0.221450059456E+03 -0.46293E-01 -0.32079E-01 15232 0.563E-01 0.626E-01
gam= 0.382 g(H,U,f)= 0.141E-01 0.135E-01 0.458E-37 ort(H,U,f) = 0.140E-01 0.334E-01 0.127E-36
DMP: 5 -0.221477077149E+03 -0.27018E-01 -0.18294E-01 15232 0.277E-01 0.473E-01
gam= 0.382 g(H,U,f)= 0.568E-02 0.779E-02 0.000E+00 ort(H,U,f) = 0.735E-02 0.211E-01 0.000E+00
DMP: 6 -0.221492571805E+03 -0.15495E-01 -0.97377E-02 15232 0.135E-01 0.285E-01
gam= 0.382 g(H,U,f)= 0.284E-02 0.464E-02 0.188-124 ort(H,U,f) = 0.295E-02 0.133E-01 0.732-124
DMP: 7 -0.221500960146E+03 -0.83883E-02 -0.54818E-02 15232 0.748E-02 0.163E-01
gam= 0.382 g(H,U,f)= 0.138E-02 0.280E-02 0.116E-43 ort(H,U,f) = 0.140E-02 0.848E-02 0.508E-43
DMP: 8 -0.221505772571E+03 -0.48124E-02 -0.31794E-02 15232 0.417E-02 0.988E-02
gam= 0.382 g(H,U,f)= 0.569E-03 0.166E-02 0.324E-15 ort(H,U,f) = 0.653E-03 0.527E-02 0.151E-14
DMP: 9 -0.221508582782E+03 -0.28102E-02 -0.17965E-02 15232 0.223E-02 0.592E-02
gam= 0.382 g(H,U,f)= 0.200E-03 0.937E-03 0.682E-71 ort(H,U,f) = 0.237E-03 0.310E-02 0.315E-70
DMP: 10 -0.221510165123E+03 -0.15823E-02 -0.96542E-03 15232 0.114E-02 0.334E-02
gam= 0.382 g(H,U,f)= 0.675E-04 0.491E-03 0.302E-23 ort(H,U,f) = 0.656E-04 0.169E-02 0.136E-22
DMP: 11 -0.221511007259E+03 -0.84214E-03 -0.49241E-03 15232 0.559E-03 0.176E-02
I started the calculation again last week and had the idea to stop it using the STOPCAR file a little bit more than a day before the time limit would be reached and continue the calculation later. Yesterday I generated a STOPCAR file with the following content
LABORT = .TRUE.
to stop the calculation at the next electronic step. The calculation stopped today and the stdout file shows
[...]
reading WAVECAR
the WAVECAR file was read successfully
initial charge from wavefunction
entering main loop
N E dE d eps ncg rms ort
gam= 0.000 g(H,U,f)= 0.278E+03 0.996E+02 0.116E-46 ort(H,U,f) = 0.000E+00 0.000E+00 0.000E+00
SDA: 1 -0.470604565658E+02 -0.47060E+02 -0.15093E+03 15232 0.377E+03 0.000E+00
gam= 0.382 g(H,U,f)= 0.833E+02 0.278E+02 0.531-183 ort(H,U,f) = 0.134E+03 0.307E+02-0.105-182
DMP: 2 -0.155432281255E+03 -0.10837E+03 -0.69587E+02 15232 0.111E+03 0.165E+03
gam= 0.382 g(H,U,f)= 0.181E+02 0.164E+02 0.282-122 ort(H,U,f) = 0.380E+02 0.187E+02-0.251-122
DMP: 3 -0.201416887500E+03 -0.45985E+02 -0.22501E+02 15232 0.346E+02 0.568E+02
gam= 0.382 g(H,U,f)= 0.610E+01 0.385E+01 0.516-139 ort(H,U,f) = 0.474E+00 0.100E+02-0.225-139
DMP: 4 -0.214800266613E+03 -0.13383E+02 -0.55850E+01 15232 0.995E+01 0.105E+02
gam= 0.382 g(H,U,f)= 0.399E+01 0.176E+01 0.141E-55 ort(H,U,f) =-0.130E+01 0.369E+01 0.561E-56
DMP: 5 -0.218088999089E+03 -0.32887E+01 -0.26631E+01 15232 0.574E+01 0.239E+01
gam= 0.382 g(H,U,f)= 0.176E+01 0.672E+00 0.942E-18 ort(H,U,f) = 0.473E+00 0.192E+01 0.160E-17
DMP: 6 -0.219898347762E+03 -0.18093E+01 -0.13392E+01 15232 0.244E+01 0.239E+01
gam= 0.382 g(H,U,f)= 0.693E+00 0.306E+00 0.259E-12 ort(H,U,f) = 0.410E+00 0.866E+00 0.724E-12
DMP: 7 -0.220823103308E+03 -0.92476E+00 -0.59470E+00 15232 0.999E+00 0.128E+01
gam= 0.382 g(H,U,f)= 0.230E+00 0.132E+00 0.947E-22 ort(H,U,f) = 0.219E+00 0.405E+00 0.215E-21
DMP: 8 -0.221245480715E+03 -0.42238E+00 -0.24024E+00 15232 0.362E+00 0.625E+00
gam= 0.382 g(H,U,f)= 0.614E-01 0.583E-01 0.296E-15 ort(H,U,f) = 0.559E-01 0.176E+00 0.727E-15
DMP: 9 -0.221411860667E+03 -0.16638E+00 -0.83269E-01 15232 0.120E+00 0.232E+00
hard stop encountered! aborting job ...
soft stop encountered! aborting job ...
1 F= -.22141186E+03 E0= -.22141186E+03 d E =-.289231E-22
Start KPOINTS_OPT (optional k-point list driver)
k-point batch [1-119\150]
N E dE ncg
DAV: 1 0.207055104358E+05 -0.30297E+06 45696
However, the WAVECAR and CHG* files needed to continue the calculation were not generated. Did I do something wrong?

Best regards,
Lira.

pedro_melo
Global Moderator
Global Moderator
Posts: 133
Joined: Thu Nov 03, 2022 1:03 pm

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#2 Post by pedro_melo » Mon Jul 31, 2023 6:56 pm

Dear Lira,

It seems strange that your job took 7 days to compute a band structure. Could you provide me with the input files (INCAR, POTCAR, POSCAR, KPOINTS) that you are using?

Kind regards,
Pedro Melo

hatedark1
Newbie
Newbie
Posts: 24
Joined: Fri Mar 24, 2023 1:19 pm

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#3 Post by hatedark1 » Thu Aug 03, 2023 2:37 pm

Dear Pedro Melo,

The requested files are attached to this message. Thanks for looking into it. The system is composed of an ABC stacked bulk material with 21 atoms on the unit cell, this is probably why the hybrid band structure calculation takes so long. When I did the same calculation for a monolayer of this material, it took 11 days on my local cluster, which is older than the one I'm using now.

However, I still don't know what went wrong on trying to stop and continue the calculation.

Best regards,
Lira.
You do not have the required permissions to view the files attached to this post.

hatedark1
Newbie
Newbie
Posts: 24
Joined: Fri Mar 24, 2023 1:19 pm

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#4 Post by hatedark1 » Tue Aug 15, 2023 5:43 pm

Dear fellows,

Is anyone able to assist me regarding the continuation of the calculations? Does the continuation of a calculation that has been stopped at a certain electronic step work?

Best regards,
Lira.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#5 Post by alex » Wed Aug 16, 2023 6:35 am

Dear Lira,

a) your jobscript let me guess that you are on AMD 128 core CPUs, but just one of them. and you are using all cores of it. Due to the very(!) limited memory bandwidth of this CPU this is usally a very bad idea. For plain DFT I'm normally taking just half of it. Assuming HSE takes a bit more memory, you are probably better of with less, but this is up to you for figuring out.
b) your smearing of sigma=0.01 is even below room temperature. Is this what you want? It will affect convergence drastically (in a bad way if it stays like that)
c) you are starting off with about 120 k-points. Your cell has small lattice constants a and b, ok. However, what does plain DFT say? Is it metallic so you really need a dense mesh?

Happy crunching

alex

hatedark1
Newbie
Newbie
Posts: 24
Joined: Fri Mar 24, 2023 1:19 pm

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#6 Post by hatedark1 » Wed Aug 30, 2023 9:46 pm

Dear Alex,

Thanks for the reply.
your jobscript let me guess that you are on AMD 128 core CPUs, but just one of them. and you are using all cores of it. Due to the very(!) limited memory bandwidth of this CPU this is usally a very bad idea. For plain DFT I'm normally taking just half of it. Assuming HSE takes a bit more memory, you are probably better of with less, but this is up to you for figuring out.
I'll take your advice regarding the number of CPU cores, thank you.
your smearing of sigma=0.01 is even below room temperature. Is this what you want?
Could you please elaborate on this?
It will affect convergence drastically (in a bad way if it stays like that)
I assume I should increase sigma, then? What value do you suggest?
you are starting off with about 120 k-points. Your cell has small lattice constants a and b, ok. However, what does plain DFT say? Is it metallic so you really need a dense mesh?
PBE results show the system is a semiconductor. I'll try and reduce number of k-points.

Best regards,
Lira.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#7 Post by alex » Fri Sep 01, 2023 12:57 pm

Hello Lira,
your smearing of sigma=0.01 is even below room temperature. Is this what you want?
Could you please elaborate on this?
smearing puts temperature into electrons, so room temperature is about 0.03 eV of energy. Normally, convergence is better with higher smearing, but this might end up in unphysical results. So with your semiconductor you should be safe with about 0.1 to 0.2 eV smearing and far less dense k-point mesh for starters.

However, I don't know how this would help with your band structure simulation.

Good luck!

alex

hatedark1
Newbie
Newbie
Posts: 24
Joined: Fri Mar 24, 2023 1:19 pm

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#8 Post by hatedark1 » Thu Sep 07, 2023 1:45 pm

Thank you for the help Alex.

I was able to complete the calculation using your suggestions. I'll try to improve the precision now adjusting the parameters as much as my time constraint permits.

Best regards,
Lira.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Re: Continuing a HSE06 band structure calculation in time constrained HPC facility

#9 Post by alex » Wed Oct 04, 2023 2:03 pm

You are welcome, Lira.

The overall convergence of, e.g., the total energy, is one side of the medal, where the convergence of your desired quantity might be reached with computationally much(!) cheaper settings!

Good luck!

alex

Post Reply