Build and Install MPI, parallel HDF5, and h5py from Source on Linux

On occasion, I have to set up a machine or VM to use parallel h5py, sometimes with a particular version or snapshot of HDF5. Unfortunately, package managers always seem to make a mess of the MPI-HDF5-Python trifecta, even for more vanilla installs, so I’ve put this blog post together to remind myself of the steps I need to take to get everything working.

Preparation

Obviously, you’ll need to have your machine set up to have Python and be able to compile C code. I’m using 64-bit Ubuntu 16.04 LTS, starting with a fresh and fully patched/upgraded install. Ubuntu can compile C code and comes with Python 2.7 and 3.5 out of the box (I’m only concerned with 2.7). The only thing I’ve installed is vim, Synaptic, and the VMWare tools.

I’m assuming that you know how to build software from source and just need help getting everything to click together. If enough people grump that this is not a reasonable assumption, I’ll update the instructions to be more explicit.

Figure Out Where To Put Everything

You’ll want to figure out where to put MPICH and HDF5. I put anything I build in my home directory to keep it from causing system issues and to avoid sudo. Do whatever you want but decide now so you can set prefixes when building. I install h5py to the system so I don’t need to pick a spot for that.

MPICH

Get the source from http://www.mpich.org/downloads/. I used MPICH 3.2. You want the full MPICH distribution, not the hydra one. Unpack it.

Run the configure script with the following options:

--enable-romio
--enable-shared
--with-device=ch3:sock
--disable-fortran
--prefix=/path/to/install/location

You can also use –enable-fortran=all in lieu of –disable-fortran and –enable-cxx if you are using those languages.

make / make check / make install

If you need to, make symlinks to the programs in the install location’s bin directory. On linux, you can use cp -rs to make symlinks to all the files in a directory. On Ubuntu, if a bin directory exists in your home directory, it will automatically be added to $PATH.

Make sure you can use mpicc to build and run a simple parallel program. You can find such a program (and an MPI tutorial) here.

Build it with:

$ mpicc mpitest.c

Run it with:

$ mpiexec -n 5 ./a.out

And you should see:

Hello world from processor ubuntu, rank 3 out of 5 processors
Hello world from processor ubuntu, rank 0 out of 5 processors
Hello world from processor ubuntu, rank 4 out of 5 processors
Hello world from processor ubuntu, rank 1 out of 5 processors
Hello world from processor ubuntu, rank 2 out of 5 processors

mpi4py

If you try to install this (python-mpi4py) via the package manager, it will try to install openmpi, which isn’t what you want. Instead, install pip (python-pip) via apt-get or Synaptic and then run:

$ pip install mpi4py

This will use your existing MPICH installation. I did this as myself and not root and it succeeds. If you do it as root, you’ll need to ensure that your MPI location is on the $PATH or pip will complain that it can’t compile MPI code.

Make sure it works with a simple program (from here):

#hello.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
print "hello world from process ", rank

Run it:

$ mpiexec -n 5 python hello.py

Output:

hello world from process 3
hello world from process 1
hello world from process 4
hello world from process 0
hello world from process 2

pHDF5

Get the source from https://www.hdfgroup.org/HDF5/. I used HDF5 1.10.0-patch1 since that’s the latest at time of writing. Unpack it.

Run the configure script with –enable-parallel and –prefix=/path/to/install/location. The h5py docs say you need –enable-shared, but on Linux this is the default so it’s not needed.

make / make check (can take a while…) / make install / make install-check

If you need to, make symlinks to the programs in the install location’s bin directory.

Make sure you can use h5cc to build and run a simple parallel HDF5 program. You can find examples in the parallel HDF5 tutorials located here.

Build it with:

$ h5pcc Hyperslab_by_row.c

Run it with:

$ mpiexec -n 4 ./a.out

The test programs generally don’t print to stdout, so use h5dump or h5ls to inspect the contents:

$ h5dump SDS_row.h5
HDF5 "SDS_row.h5" {
  GROUP "/" {
    DATASET "IntArray" {
      DATATYPE H5T_STD_I32LE
      DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) }
      DATA {
        (0,0): 10, 10, 10, 10, 10,
        (1,0): 10, 10, 10, 10, 10,
        (2,0): 11, 11, 11, 11, 11,
        (3,0): 11, 11, 11, 11, 11,
        (4,0): 12, 12, 12, 12, 12,
        (5,0): 12, 12, 12, 12, 12,
        (6,0): 13, 13, 13, 13, 13,
        (7,0): 13, 13, 13, 13, 13
      }
    }
  }
}

h5py

First off, you’ll need numpy, which I install via pip.

$ pip install numpy

Cython is also needed, but I didn’t install it separately and the h5py configure seems to have pulled it down for me.

Get the source from https://pypi.python.org/pypi/h5py. I used 2.6.0 since that’s the latest version. Unpack it.

Then configure h5py:

$ export CC=mpicc
$ python setup.py configure --mpi --hdf5=/path/to/hdf5

For me, this looks like it hangs during the configure but if you leave it alone it will complete. I think it’s when it’s getting Cython.

Then build:

$ python setup.py build

And install:

$ sudo python setup.py build

I have no idea how to install python libraries for general use without using sudo.

To test, get a sample parallel h5py program from here.

$ mpiexec -n 4 python testh5py.py

And check the output file:

$ h5dump parallel_test.hdf5
HDF5 "parallel_test.hdf5" {
  GROUP "/" {
    DATASET "test" {
      DATATYPE H5T_STD_I32LE
      DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
      DATA {
        (0): 0, 1, 2, 3
      }
    }
  }
}

Everything works. Hooray.

Conclusion

Now you should have a working version of h5py that is parallel-aware and uses your preferred software version at each level of the stack. The downside is that the package manager won’t be aware of your changes, so be careful when installing new packages.

If I’m doing something incorrectly or have suggestions for improvements, please let me know in the comments.

Leave a Reply

Your email address will not be published.