Artima.com provides this forum to the IEEE Computer Society Task Force on Cluster Computing (TFCC) to promote discussion of articles appearing in its TFCC Newsletter and related topics. The TFCC is an international organization that promotes cluster computing research and education and supports the development of technical standards in the cluster computing area.
I ran my MPI code, and it worked well for small processor number (e.g., less than 16). However, if I used 32 processors, the code hanged up. After I killed the jobs, it appeared the following message:
p19_23230: (359.613715) net_send: could not write to fd=6, errno = 14 p19_23230: p4_error: net_send write: -1
This happened in Intel Fortran compiler with MPICH library. However when I employed MPI/Pro with absoft (or GNU g77), the code worked very well for 32 processors. Could anybody provide any suggestions to this problem?