Page 1 of 1

VASP-6.4.3 NCCL

Posted: Tue Feb 25, 2025 1:15 am
by vladimir.ladygin

Dear VASP Developers,

I've got an official bug trying to use one core per one gpu nccl setup on NERSC Perlmutter. This is just an ordinary relaxation calculation.

"""internal error in: mpi.F at line: 903

M_init_nccl: Error in ncclCommInitRank

If you are not a developer, you should not encounter this problem.
Please submit a bug report.

"""

Kind Regards,
Vladimir


Re: VASP-6.4.3 NCCL

Posted: Tue Feb 25, 2025 9:48 am
by ferenc_karsai

Thanks for the report, I will try to reproduce the error on our machines.


Re: VASP-6.4.3 NCCL

Posted: Wed Feb 26, 2025 12:30 pm
by ferenc_karsai

I talked to a colleague and he observed a similar bug before from a user on the forum.

Here is the originale post:
https://www.vasp.at/forum/viewtopic.php?t=19822

The solution for the moment is that you don't use NCCL, so compile without -DUSENCCL in the makefile.include.