Very strange problem when running parallel vasp5

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
stshcs
Newbie
Newbie
Posts: 8
Joined: Fri Jun 05, 2009 7:08 am

Very strange problem when running parallel vasp5

#1 Post by stshcs » Wed Mar 23, 2011 3:28 am

I used the latest vasp5.2.11 to calculate B4C crystal.
First I submit a task with 4 nodes, every node has 2 cpu. It means this task will distribute 8 parallel threads and all are normal.

Then I submit another VASP task. It is also with 4 nodes. This time the task can not be accepted. The error messages like the following:
vasp5.2.11: error while loading shared libraries: libmkl_lapack.so: cannot open shared object file: No such file or directory

It's a very strange problem. I compile vasp5.2.11 with Intel fortran/c 9.1 em64, mkl-9.1. And I also add library path to the ld_library_path.
================== Here is the intel fortran information========
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.1 Build 20071016 Package ID: l_fc_c_9.1.052
Copyright (C) 1985-2007 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY
==============end of intel fortran information==========
I want to know if the "NON-COMMERCIAL INTEL FORTRAN and MKL" limit the concurrent threads.

The following is the option when compiling parallel vasp5.2.11
BLAS= -L/opt/intel/mkl/9.1.023/lib/em64t -lmkl_em64t -lguide -lpthread

LAPACK= -L/opt/intel/mkl/9.1.023/lib/em64t -lmkl_lapack

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

LINK = -L/opt/intel/fce/9.1.052/lib/ -lsvml

~:(
Last edited by stshcs on Wed Mar 23, 2011 3:28 am, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Very strange problem when running parallel vasp5

#2 Post by alex » Wed Mar 23, 2011 8:05 am

It looks like that your nodes hold different .profile or .cshrc. The error arises from the variable LD_LIBRARYPATH.
Figure out, which is the node where vasp is started on, check, if the LD_LIBRARYPATH is set to find mkl and retry.
Or: Use NFS to have the same $HOME everywhere (much preferred).
The variable might also read LD_LIBRARY_PATH (two '_'), I do not remember ...

Hth

alex
Last edited by alex on Wed Mar 23, 2011 8:05 am, edited 1 time in total.

stshcs
Newbie
Newbie
Posts: 8
Joined: Fri Jun 05, 2009 7:08 am

Very strange problem when running parallel vasp5

#3 Post by stshcs » Wed Mar 23, 2011 9:53 am

Thank alex. It's not the case as you say. NFS is used in our cluster, every node is the same profile when a user login.
Last edited by stshcs on Wed Mar 23, 2011 9:53 am, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 586
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Very strange problem when running parallel vasp5

#4 Post by alex » Thu Mar 24, 2011 8:22 am

Hm, maybe your mkl isn't installed where it is looked for. Submit a job like the one which did not work and try
ldd path_to_vasp.exe
and check if all libraries are found.

Hth

alex
Last edited by alex on Thu Mar 24, 2011 8:22 am, edited 1 time in total.

stshcs
Newbie
Newbie
Posts: 8
Joined: Fri Jun 05, 2009 7:08 am

Very strange problem when running parallel vasp5

#5 Post by stshcs » Tue Mar 29, 2011 1:32 am

Thank alex. I have found the problem. It's from the "LD_LIBRARY_PATH" which you have said. The /etc/profile in some nodes is different, so those nodes don't set the right environment.
Last edited by stshcs on Tue Mar 29, 2011 1:32 am, edited 1 time in total.

Post Reply