Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
5-1051
Newbie
Newbie
Posts: 7
Joined: Mon May 27, 2013 10:26 am

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

#1 Post by 5-1051 » Mon May 27, 2013 11:20 am

Hi,

I m working fine with VASP 5.3 in a system with "vacuum +monolayer+substrate" of about 1500 atoms with 4 nodes and 64 cores/node and 250 Gb/node as RAM, (ARCH: lx26-amd64, It uses Mellanox Infiniband QDR 40 Gb/s for parallel communication and file system access). In that case, I m using almost 200 Gb/node during execution. For now, all seems to be ok. but I m in the limit of having memory problems because so much memory is used, and I will need to increase the number of atoms.

However, it would be interesting a new node to include in my calculations to reduce the load memory per node. But now, it is imposible to have an stable running with 5 nodes. I ve tried different combinations of parameters KPAR, NCORE or NPAR, NSIM, LPANE but nothing seems to be work. The execution always breaks during the first interation at EDDAV, after POTLOK and SETDIJ.

Am I in the limit of VASP 5.3 for handling RAM? or It is a limitation of my CLUSTER?

If VASP 5.3 can manage any memory independently of the number of nodes, Can anyone help me to configure VASP for running in 5 nodes? or should I use even nodes instead?.

My script for running VASP is:

#!/bin/bash
#
#$ -cwd
#$ -o job.out -j
#$ -pe mp64 256
## Create rank file
./mkrnkfile.sh
mpirun -np 128 --rankfile rank.$JOB_ID --bind-to-core vasp.

and my INCAR is:

ISTART = 0; ICHARG = 2
GGA = PE
PREC = High
AMIN = 0.01
general:
SYSTEM = (110)system vacuum
LWAVE = .FALSE.
LCHARG = .FALSE.
LREAL = Auto
ISMEAR = 1; SIGMA = 0.2
ALGO = Fast
NGX = 194; NGY = 316; NGZ = 382

linux:
LSCALAPACK = .TRUE.
NCORE = 32
KPAR = 1
LSCALU = .FALSE.
LPLANE = .TRUE.
NSIM = 1
LREAL = Auto
no magnetic:
ISPIN = 1
dynamics:
NSW = 0
IBRION = 0

I m only using 3 k-points (irr).

Thank you for your attention.
Last edited by 5-1051 on Mon May 27, 2013 11:20 am, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

#2 Post by alex » Mon May 27, 2013 3:50 pm

Hi,

is there any error message?

Cheers,

alex
Last edited by alex on Mon May 27, 2013 3:50 pm, edited 1 time in total.

5-1051
Newbie
Newbie
Posts: 7
Joined: Mon May 27, 2013 10:26 am

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

#3 Post by 5-1051 » Mon May 27, 2013 7:55 pm

Not really.

I found some errors in the log of infiniband.

It seems to be waiting for something indefinitely.
<span class='smallblacktext'>[ Edited Tue May 28 2013, 08:01AM ]</span>
Last edited by 5-1051 on Mon May 27, 2013 7:55 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

#4 Post by alex » Tue May 28, 2013 8:35 am

Please check, if you can log in from the executing host (normally the first in the run list) to the other without giving a password. Since you are yousing 256 cores it looks like you have at least for physical machines ...

Hth

alex
Last edited by alex on Tue May 28, 2013 8:35 am, edited 1 time in total.

5-1051
Newbie
Newbie
Posts: 7
Joined: Mon May 27, 2013 10:26 am

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

#5 Post by 5-1051 » Thu May 30, 2013 3:53 pm

Hi,

Many Thanks!, I will but ...

I was repeating my last well-ended calculation with 4 nodes and 1500 atoms, but now there no way. Must be something wrong in my cluster. Anyway thanks you for your coments.

Cheers,
Cesar
Last edited by 5-1051 on Thu May 30, 2013 3:53 pm, edited 1 time in total.

Post Reply