Page 1 of 1

machine learning MD caused the silicon structure to fall apart

Posted: Sun Jan 26, 2025 2:07 pm
by yina_huang

Dear VASP group,

I am testing machine learning MD with silicon; I found strange things happen.

The POSCAR I use is

Si
1.0000000000000000
5.4688911249173726 0.0000000000000000 0.0000000000000000
0.0000000000000000 5.4688911249173726 0.0000000000000000
0.0000000000000000 0.0000000000000000 5.4688911249173726
Si
8
Direct
0.5000000000000000 0.0000000000000000 0.0000000000000000
0.2500000000000000 0.2500000000000000 0.7500000000000000
0.5000000000000000 0.5000000000000000 0.5000000000000000
0.2500000000000000 0.7500000000000000 0.2500000000000000
0.0000000000000000 0.0000000000000000 0.5000000000000000
0.7500000000000000 0.2500000000000000 0.2500000000000000
0.0000000000000000 0.5000000000000000 0.0000000000000000
0.7500000000000000 0.7500000000000000 0.7500000000000000

The INCAR I use is

system=INCAR_MD

kpar=16
ncore=1

# accuracy test for below 4 parameters should be done
kspacing=0.1667
ismear=1
sigma=0.2
encut=320

nsw=1000
potim=2

tebeg=0
teend=300

# ====== no modification below ======
ibrion=0

algo=fast

ediff=1E-6

nelm=1000
prec=accurate
lasph=T
lmaxmix=4 # only need to change to 6 for f electron
gga_compat=F

lwave=.FALSE.

lreal=auto

isif=3
isym=0
mdalgo=3
LANGEVIN_GAMMA = 5*100
LANGEVIN_GAMMA_L = 5

ml_lmlff=.true.
ml_mode=train

After running vasp, I found there is only 5 steps in OSZICAR, then the calculation just stopped without any error. Why is that?

If I change tebeg to tebeg=1, then VASP finishes 1000 ion steps. But step 10 to 1000 are all MLFF steps with no DFT calculation which is apparent not right, and the CONTCAR shows the silicon is already fall apart.

If I set tebeg=300 and teend=300, then vasp only did 1 DFT step, the remaining are all MLFF steps. The CONTCAR shows the silicon is already fall apart.

If I delete ml_lmlff and ml_mode in INCAR and using normal MD, even tebeg=0 teend=300 finishes without any problem and CONTCAR is just fine.

So I am wondering what is wrong with my INCAR for machine learning MD? Is it a bug or something?

I shared my calculation folders https://drive.google.com/file/d/1hEBbba ... sp=sharing

Thank you so much for helping.


Re: machine learning MD caused the silicon structure to fall apart

Posted: Tue Jan 28, 2025 1:09 am
by yina_huang

Dear VASP,

I also tried NVT ensemble just now. I encountered the same problem.

system=INCAR_MD

kpar=16
ncore=1

# accuracy test for below 4 parameters should be done
kspacing=0.1667
ismear=1
sigma=0.2
encut=320

nsw=1000
potim=1

tebeg=300
teend=300

# ====== no modification below ======
ibrion=0

algo=fast

ediff=1E-6

nelm=1000
prec=normal
lasph=T
lmaxmix=4 # only need to change to 6 for f electron
gga_compat=F

lwave=.FALSE.

lreal=auto

isif=2
isym=0
mdalgo=2
smass=1

ml_lmlff=.true.
ml_mode=train

1000 nsw is finished. But the silicon structure fall apart like this

Image

I also attached calculation folder https://drive.google.com/file/d/1fpVH6m ... drive_link

What is wrong? Could you please help me. It is urgent for my project. Thank you so much!


Re: machine learning MD caused the silicon structure to fall apart

Posted: Tue Jan 28, 2025 9:24 am
by ahampel

Hi,

thank you for reaching out to us on the official VASP forum and sorry for the delay.

I needed some time to try your calculation. I am not an expert in the MLFF at VASP but contacted on of my team members to get more insights. There seem to be a few problems here:
1) TEBEG=0 will not work with the Langevin thermostat. The initial velocities are determined using a Maxwell-Boltzmann distribution and for T=0 there are just wrong. Unfortunately there is no warning right now in the code for that. I fixed this by setting TEBEG=50 (you can try lower as well)
2) your langevin_gamma parameters are damping too much making the mlff reach its threshold too fast, not requiring a recalculation via full DFT. I changed these parameters to:

Code: Select all

LANGEVIN_GAMMA = 10
LANGEVIN_GAMMA_L = 3

3) Your cell is relatively small which comes along with some problems. The default threshold for the learning ml_ctifor is a bit too large. If you set it to ml_ctifor=1E-5 the code will actually start to learn. You can observe this by grep REGRF ML_LOGFILE and check if there are actual fitting steps for the force field. In your case this is not happening and thus the force field is very poor. https://www.vasp.at/wiki/index.php/ML_CTIFOR
4) ML_RCUT1 is too large for the small cell you have (https://www.vasp.at/wiki/index.php/ML_RCUT1) . The default is 8 Angstrom, which is larger then your cell, hence the atoms see neighbouring cell atoms. Try to run for a supercell or play with the parameter to improve the force field.

Code: Select all

system=INCAR_MD

kpar=4

kspacing=0.1667
ismear=1
sigma=0.05
encut=320

nsw=1000
potim=2

tebeg=50
teend=300

algo=fast
ediff=1E-7

nelm=1000
prec=accurate
lasph=T
gga_compat=F

lreal=false

ibrion=0
isif=3
isym=0
mdalgo=3
LANGEVIN_GAMMA = 10
LANGEVIN_GAMMA_L = 3

! machine learning
ML_LMLFF  = T
ML_MODE = train
!ML_WTSIF  = 2
ml_ctifor=1E-5

Best,
Alex


Re: machine learning MD caused the silicon structure to fall apart

Posted: Wed Jan 29, 2025 1:34 pm
by yina_huang

Dear ahampel,

Thank you very much for your patient reply. I tried your suggestions, and it seems to me that only one parameter, ml_ctifor, is truly crucial. It seems that even if I modify the other parameters as you suggested, it doesn't work unless I reduce ml_ctifor. According to the VASP documentation, ml_ctifor is automatically adjusted during machine learning. So, can I understand this to mean that for any structure undergoing machine learning MD, we can set a much smaller initial ml_ctifor value than the default of 0.002 to ensure that the machine learning proceeds correctly?

Another question is, if I want to create a machine learning force field for a temperature range of 5 to 300K, which of the following approaches do you recommend:

1. tebeg=5, teend=300, nsw=100000 (do this simulation once)
2. tebeg=300, teend=5, nsw=100000 (do this simulation once). However, I've noticed that for the Si8 structure I'm testing, tebeg=300, teend=5 yields 1/3 fewer ML_ABN structures compared to tebeg=5, teend=300. I'm not sure why this is the case. Does this mean that heating up can result in a better force field instead of cooling?
3. tebeg=5, teend=300, nsw=1000 done 100 times consecutively. In this approach, each simulation will have different initial velocities. Could this approach explore the phase space more completely and lead to a more comprehensive force field?

best regards


Re: machine learning MD caused the silicon structure to fall apart

Posted: Wed Jan 29, 2025 8:54 pm
by ahampel

Cooling down during a MD or even Machine learning run is not a very good idea. This will result in a lot of trouble. It should only be done very carefully if necessary for observing some transitions in full MD, certainly not when learning a force field.

Doing the calculation 100 times with 1000 steps or directly with 100000 steps will more or less give the same result. I would say doing one long run is preferable since it only creates one MLFF file that is easy to handle. Otherwise you should not see much of a difference.

If you did not read it yet this how-to best practices site on our VASP wiki is also a good guide: https://www.vasp.at/wiki/index.php/Best ... rce_fields

Best,
Alex