Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Example of using MPI in Python with mpi4py (and SLURM!)

License

NotificationsYou must be signed in to change notification settings

akkornel/mpi4py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MPI is a complicated beast, conjuring thoughts of grey-beards sitting in frontof amber screens, typing Fortran code that runs in distant, dark computinghalls.

But it's not like that! MPI is something that anyone can use! If you havecode that would normallyfork() (orclone()) itself, but your singlemachine does not have enough CPU cores to meet your needs, MPI is a way to runthe same code across multiple machines, without needing to think about settingup or coordinating IPC.

The code and docs in this repository are targeted at Python developers, andwill take you from zero to running a simple MPI Python script!

Start with an Ubuntu Bionic (18.04) system, and run these commands to installsoftware and run this code!

git clone https://github.com/akkornel/mpi4py.gitsudo -sapt install python3-mpi4pyecho 'btl_base_warn_component_unused = 0' > /etc/openmpi/openmpi-mca-params.conf exitcd mpi4pympirun -n 20 python3 mpi4.py

See it in action:asciicast

The Python code is simple: It has one 'controller', and one or more 'workers':

  1. The controller generates a random number, and broadcasts it to everyone.

  2. Each worker has a unique ID (its "rank"), and the hostname of the machinewhere it is running. The worker receives the random number, adds its rank,and sends the result (the new number) and its hostname back to thecontroller.

  3. The controller "gathers" all of the results (which come to it sorted byrank), and displays them. The output includes an "OK" if the math (randomnumber + rank) is correct.

The Python code is simple: it does not have to deal with launching workers, andit does not have to set up communications between them (inter-processcommunications, or IPC).

The above example showed 20 copies of mpi4.py, all running on the local system.But, if you have two identical machines, you can spread the workers out acrossall hosts, without any changes:

asciicast

If you have access to a compute environment running SLURM, you can let SLURM doall of the work for you:

asciicast

Read on for information on the technologies that make all this work, and how totry it for yourself!

Behind the Scenes

Behind the above example, there are many moving pieces. Here is an explanationof each one.

MPI

MPI is an API specification for inter-process communication. The specificationdefines bindings for both C/C++ and Fortran, which are used as the basis forother languages (in our case, Python). MPI provides for many IPC needs,including…

  • Bi-directional communication, both blocking and non-blocking

  • Scatter-gather and broadcast communication

  • Shared access to files, and to pools ("windows") of memory

The latest published version of the standard isMPI-3.1, publishedJune 4, 2015. MPI specification versions have an "MPI-" prefix to distinguishthem from implementation versions. Specifications are developed by theMPIForum.

MPI is interesting in that it only specifies an API; it is up to other groupsto provide the implementations of the API. In fact, MPI does not specifythings like encodings, or wire formats, or even transports. The implementationhas full reign to use (what it feels is) the most efficient encoding, and tosupport whatever transport it wants. Most MPI implementations supportcommunications over Infiniband (using native Infiniband verbs) or TCP(typically over Ethernet). Newer implementations also support nativecommunication over modern 'converged' high-performance platforms, likeRoCE.

Because each MPI implementation has control over how the MPI API isimplemented, a given "world" of programs (or, more accurately, a world ofcopies of a single program) must all use the same MPI implementation.

OpenMPI

OpenMPI is an implementation of the MPI API,providing libraries and headers for C, C++, Fortran, and Java; and supportinginterfaces from TCP to Infiniband to RoCE. OpenMPI is fullyopen-source, with contributors fromadademic and research fields (IU, UT, HPCC) to commercial entities (Cisco,Mellanox, and nVidia). It is licensed under the 3-clause BSD license. YourLinux distribution probably has it packaged already.

NOTE: When running OpenMPI for the first time, you may get a warning whichsays "A high-performance Open MPI point-to-point messaging module was unable tofind any relevant network interfaces". That is OpenMPI complaining that it isnot able to find anything other than a basic Ethernet interface.

To suppress the warning about no high-performance network interfaces, edit thefile at path/etc/openmpi/openmpi-mca-params.conf, and insert the linebtl_base_warn_component_unused = 0.

Python

As this is a Python demonstration, you will need Python! MPI has been aroundfor a long time, and so many versions of Python are supported. The code inthis repo works with Python 2.7, and also Python 3.5 and later.

The Python code in this repository includes type annotations. To run in Python2.7, you will need to install thetyping package (it comes as part of thestandard library in 3.5 and later).

mpi4py

There are no MPI implementations written specifically for Python. Instead, thempi4py package provides a Python wrapperaround your choice of MPI implementation. In addition, mpi4py provides twoways to interface with MPI: You can provide binary data (as abytes string orabytearray); or you can provide a Python object, which mpi4py will serializefor you (usingpickle). This adds a massive convenience for Pythonprogrammers, because you do not have to deal with serialization (other thanensuring your Python objects are pickleable).

WARNING: Although mpi4py will handle serialization for you, you are stillresponsible for checking what data are being received. It is your code runningat the other end, but you may not be sure exactly what state the other copy isin right now.

NOTE: If you plan on building mpi4py yourself, you should be aware that MPIwas created at a time whenpkg-config did notexist. Instead of storing things like compiler flags in a.pc file, itprovides a set of wrapper scripts (mpicc, and the like), which you shouldcall instead of calling the normal compiler programs. The commands are builtas part of a normal OpenMPI installation, and mpi4py'ssetup.py script knowsto use them, and sopip install --user mpi4py should work.

Tested Configurations

The code in this repository was tested by the author on the followingOS, OpenMPI, and mpi4py configurations, on an x64 platform:

  • Ubuntu 16.04.6

    OpenMPI 3.0.0 (built from upstream) was used in all tests.

    mpi4py version 3.0.2 (built from upstream) was used in all tests. TheUbuntu-supplied version (1.10.2-8ubuntu1) was tested, but reliably crashedwith a buffer overflow.

    The above software was tested in the following Python versions:

    • Python 2.7 (2.7.12-1~16.04)

    • Python 3.5 (3.5.1-3)

    • Python 3.6.8 from upstream

    • Python 3.7.3 from upstream

  • Ubuntu 18.04.2

    OpenMPI 2.1.1 (2.1.1-8) was used in all tests.

    mpi4py 2.0.0 (2.0.0-3) was used in all tests.

    The above software was tested in the following Python versions:

    • Python 2.7 (2.7.15~rc1-1).

    • Python 3.6 (3.6.7-1~18.04).

Running

You can run this demonstration code multiple ways. This documentation explainshow to run directly (withmpirun). This works automatically on one machine,and can be configured to run on multiple machines. Information is alsoprovided for people with access to a SLURM environment.

mpirun

The easiest way to run your code is with thempirun command. This command,run in a shell, will launch multiple copies of your code, and set upcommunications between them.

Here is an example of runningmpirun on a single machine, to launch fivecopies of the script:

akkornel@machine1:~/mpi4py$ mpirun -n 5 python3 mpi4.pyController @ MPI Rank   0:  Input 773507788   Worker at MPI Rank   1: Output 773507789 is OK (from machine1)   Worker at MPI Rank   2: Output 773507790 is OK (from machine1)   Worker at MPI Rank   3: Output 773507791 is OK (from machine1)   Worker at MPI Rank   4: Output 773507792 is OK (from machine1)

The-n (or-np) option specifies the number of MPI tasks to run.

Note above how all of the MPI tasks ran on the same machine. If you want torun your code across multiple machines (which is probably the reason why youcame here!), there are some steps you'll need to take:

  • All machines should be running the same architecture, and the same versionsof OS, MPI implementation, Python, and mpi4py.

    Minor variations in the version numbers may be tolerable (for example, onemachine running Ubuntu 18.04.1 and one machine running Ubuntu 18.04.2), butyou should examine what is different between versions. For example, if onemachine is running a newer mpi4py, and that version fixed a problem thatapplies to your code, you should make sure mpi4py isat least that versionacross all machines.

  • There should be no firewall block: Each host needs to be able to communicatewith each other host, on any TCP or UDP port. You can limit the ports used,but that is out of the scope of this quick-start.

  • SSH should be configured so that you can get an SSH prompt on each machinewithout needing to enter a password or answer a prompt. In other words,you'll need public-key auth configured.

  • The code should either live on shared storage, or should be identical on allnodes.

    mpirun will not copy your code to each machine, so it needs to be presentat the same path on all nodes. If you do not have shared storage available,one way to ensure this is to keep all of your code in a version-controlledrepository, and check the commit ID (or revision number, etc.) beforerunning.

Assuming all of those conditions are met, you just need to tellmpirun whathosts to use, like so:

akkornel@machine1:~/mpi4py$ cat host_listmachine1machine2akkornel@machine1:~/mpi4py$ mpirun -n 5 --hostfile host_list python3 mpi4.pyController @ MPI Rank   0:  Input 1421091723   Worker at MPI Rank   1: Output 1421091724 is OK (from machine1)   Worker at MPI Rank   2: Output 1421091725 is OK (from machine1)   Worker at MPI Rank   3: Output 1421091726 is OK (from machine2)   Worker at MPI Rank   4: Output 1421091727 is OK (from machine2)

OpenMPI will spread out the MPI tasks evenly across nodes. If you would liketo limit how many tasks a host may run, addslots=X to the file, like so:

machine1 slots=2machine2 slots=4

In the above example, machine1 has two cores (or, one hyper-threaded core) andmachine2 has four cores.

NOTE: If you specify a slot count onany machine,mpirun will fail ifyou ask for more slots that what is available.

And that's it! You should now be up and running with MPI across multiplehosts.

SLURM

Many compute environments, especially in HPC, use theSLURM job scheduler. SLURM is able to communicatenode and slot information to programs that use MPI, and in some MPIimplementations SLURM is able to do all of the communications setup on its own.

With SLURM, there are two ways of launching our MPI job. The first is to usesrun, launching the job in a synchronous fasion (that was shown in theexample at the top of this page). The second is to usesbatch, providing abatch script that will be run asynchronously.

Regardless of the method of execution (that is, regardless of the SLURM commandwe run), we will need to tell SLURM what resources we will require:

  • Time. Our job takes less than five minutes to complete.

  • Number of Tasks. This is the equivalent to the-n option formpirun.We need to specify how many MPI tasks to run.

  • CPUs/cores. Our code is single-threaded, so this is "1 per task".

  • Memory. We are running a single Python interpreter in each task, butlet's overestimate and say "512 MB per core". Note that SLURM specifiesmemory either per core or per node, not per task.

Note that in our request we never specified how many machines to run on. Aslong as all machines have access to the same storage (which is normal in acompute environment), we do not care exactly where our program runs. All ofour tasks may run on a single machine, or they may be spread out acrossmultiple machines.

Note also that SLURM has support for many different MPI implementations, but insome cases, special care may need to be taken. For OpenMPI, nothing special isrequired: SLURM supports OpenMPI natively, and OpenMPI supports SLURM bydefault. More information on other MPI implementations is availablefromSLURM. To see which MPIs aresupported by your SLURM environment, run the commandsrun --mpi list. Youcan check the default by looking at the environment'sslurm.conf file (whichis normally located in/etc on each machine). The setting is named"MpiDefault".

In the examples which follow, we will always be requesting at least two nodes.

Synchronous SLURM with salloc

Running "synchronously" means you submit the work to SLURM, and you wait forresults to come back. Any messages output by your code (for example, viastandard output or standard error) will be displayed in the terminal.

To run synchronously, we use thesrun command, like so:

akkornel@rice15:~/mpi4py$ module load openmpi/3.0.0akkornel@rice15:~/mpi4py$ srun --time 0:05:00 --ntasks 10 --cpus-per-task 1 --mem-per-cpu 512M --nodes 2-10 python3 mpi4.pyController @ MPI Rank   0:  Input 3609809428   Worker at MPI Rank   1: Output 3609809429 is OK (from wheat09)   Worker at MPI Rank   2: Output 3609809430 is OK (from wheat10)   Worker at MPI Rank   3: Output 3609809431 is OK (from wheat11)   Worker at MPI Rank   4: Output 3609809432 is OK (from wheat12)   Worker at MPI Rank   5: Output 3609809433 is OK (from wheat13)   Worker at MPI Rank   6: Output 3609809434 is OK (from wheat14)   Worker at MPI Rank   7: Output 3609809435 is OK (from wheat15)   Worker at MPI Rank   8: Output 3609809436 is OK (from wheat16)   Worker at MPI Rank   9: Output 3609809437 is OK (from wheat17)

In the above example, OpenMPI is provided via anLmodmodule, instead of a system package.We have already built mpi4py using that version of OpenMPI. So, before runningour code, we first load the same OpenMPI module used to build mpi4py.

(We don't need to runmodule load on the remote system, because Lmod modulesdo their work by manipulating environment variables, which SLURM copies over tothe compute nodes.)

Once that is complete, we use thesrun command to run our job. SLURM findsenough nodes to run our job, runs it, and prints the contents of standardoutput and standard error.

Note that the above example assumes OpenMPI is configured as SLURM's defaultMPI. If not , you can provide--mpi=openmpi to thesrun command, to forceit to use native OpenMPI.

SLURM batch jobs with sbatch

When running as a batch job, most of the resource options should be specifiedin the script that is provided tosbatch. That script should include (via##SBATCH special comments) all of the resource-setting options that youpreviously included in thesrun command line,except for--ntasks (andpossibly also--time).

Since the code supports any number of workers (as is typical for an MPIprogram), you should provide--ntasks on thesbatch command-line, so thatyou can adjust it to fit your current needs.

This example uses the sbatch script included in this repository. Ten workersare requested.

akkornel@rice15:~/mpi4py$ sbatch --ntasks 10 mpi4.py.sbatchSubmitted batch job 1491736akkornel@rice15:~/mpi4py$ sleep 10akkornel@rice15:~/mpi4py$ squeue -j 1491736             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)akkornel@rice15:~/mpi4py$ cat slurm-1491736.outController @ MPI Rank   0:  Input 3761677288   Worker at MPI Rank   1: Output 3761677289 is OK (from wheat08)   Worker at MPI Rank   2: Output 3761677290 is OK (from wheat08)   Worker at MPI Rank   3: Output 3761677291 is OK (from wheat08)   Worker at MPI Rank   4: Output 3761677292 is OK (from wheat08)   Worker at MPI Rank   5: Output 3761677293 is OK (from wheat08)   Worker at MPI Rank   6: Output 3761677294 is OK (from wheat08)   Worker at MPI Rank   7: Output 3761677295 is OK (from wheat08)   Worker at MPI Rank   8: Output 3761677296 is OK (from wheat08)   Worker at MPI Rank   9: Output 3761677297 is OK (from wheat08)

When SLURM launches the job, the batch script is executed on one of the nodesallocated to the job. The batch script is responsible for setting up theenvironment (in this case, loading the OpenMPI module), and then executingsrun. Note that in the batch script,srun does not have any command-lineoptions (except for--mpi);srun is able to recognize that it is runninginside a batch job, and will automatically scale itself to use all of theresources available.

That's it! Without having to change any Python code, your work is now beingrun via a job scheduler, with all of the resource scheduling and IPC facilitiesmanged for you.

Where to go next?

This was only a dip of the toe into the world of MPI. To continue on, youshould first access thempi4pydocumentation. As most MPIpublications provide their code in C, the mpi4py docs will guide you on how toconvert that C code into Python calls. You can also use the built-in helpsystem, like so:

akkornel@rice13:~$ python3.5Python 3.5.2 (default, Nov 12 2018, 13:43:14)[GCC 5.4.0 20160609] on linuxType "help", "copyright", "credits" or "license" for more information.>>> from mpi4py import MPI>>> mpi_comm = MPI.COMM_WORLD>>> help(mpi_comm.Get_rank)Help on built-in function Get_rank:Get_rank(...) method of mpi4py.MPI.Intracomm instance    Comm.Get_rank(self)    Return the rank of this process in a communicator

The mpi4py docs also has a tutorial, which you should read!

Now that is done, the next thing to do is to think of a problem, and to trysolving it using MPI! Focus on a problem that can be parallelized easily.

The Lawrence Livermore National Lab has a tutorial on MPI, featuring someexercises that you can try. For example, trygoing to thetutorial and skipped ahead to thePoint to Point Communication Routines section, which guides you through oneof the ways of calculating the digits of Pi.

Other tutorials include Rajeev Thakur'sIntroduction toMPIslide deck, which has been condenced by Brad Chamberlain of the University ofWashington (for a course there); and theMPI forDummiesslide deck, created for thePPoPPconference.

If you are interested in spreading your MPI code out across multiple machines,you might want to set up a SLURM environment. First, get some shared storageby setting up an NFS server (inUbuntu, or you could useAmazon EFS orGoogle CloudFilestore). Then, check out theSLURMQuick Start. On RPM-basedsystems, the quick-start walks you through creating RPMs; on Debian-basedsystems, the packages are already available under the name "slurm-llnl".

Once you are done with all that, start looking at nearby universities andresearch institutions. They may have a job opening for you!

Copyright & License

The contents of this repository are Copyright © 2019 the Board of Trustees ofthe Leland Stanford Junior University.

The code and docs (with the exception of the.gitignore file) are licensedunder the GNU General Public License, version 3.0.

About

Example of using MPI in Python with mpi4py (and SLURM!)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp