Introduction
Over the last years, high performance computing has become anaffordable resource to many more researchers in the scientificcommunity than ever before. The conjunction of quality open sourcesoftware and commodity hardware strongly influenced the now widespreadpopularity ofBeowulf class clusters and cluster of workstations.
Among many parallel computational models, message-passing has provento be an effective one. This paradigm is specially suited for (butnot limited to) distributed memory architectures and is used intoday’s most demanding scientific and engineering application relatedto modeling, simulation, design, and signal processing. However,portable message-passing parallel programming used to be a nightmarein the past because of the many incompatible options developers werefaced to. Fortunately, this situation definitely changed after theMPI Forum released its standard specification.
High performance computing is traditionally associated with softwaredevelopment using compiled languages. However, in typical applicationsprograms, only a small part of the code is time-critical enough torequire the efficiency of compiled languages. The rest of the code isgenerally related to memory management, error handling, input/output,and user interaction, and those are usually the most error prone andtime-consuming lines of code to write and debug in the wholedevelopment process. Interpreted high-level languages can be reallyadvantageous for this kind of tasks.
For implementing general-purpose numerical computations, MATLAB[1]is the dominant interpreted programming language. In the open sourceside, Octave and Scilab are well known, freely distributed softwarepackages providing compatibility with the MATLAB language. In thiswork, we present MPI for Python, a new package enabling applicationsto exploit multiple processors using standard MPI “look and feel” inPython scripts.
[1]MATLAB is a registered trademark of The MathWorks, Inc.
What is MPI?
MPI,[mpi-using][mpi-ref] theMessage Passing Interface, is astandardized and portable message-passing system designed to functionon a wide variety of parallel computers. The standard defines thesyntax and semantics of library routines and allows users to writeportable programs in the main scientific programming languages(Fortran, C, or C++).
Since its release, the MPI specification[mpi-std1][mpi-std2] hasbecome the leading standard for message-passing libraries for parallelcomputers. Implementations are available from vendors ofhigh-performance computers and from well known open source projectslikeMPICH[mpi-mpich] andOpen MPI[mpi-openmpi].
What is Python?
Python is a modern, easy to learn, powerful programming language. Ithas efficient high-level data structures and a simple but effectiveapproach to object-oriented programming with dynamic typing anddynamic binding. It supports modules and packages, which encouragesprogram modularity and code reuse. Python’s elegant syntax, togetherwith its interpreted nature, make it an ideal language for scriptingand rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library areavailable in source or binary form without charge for all majorplatforms, and can be freely distributed. It is easily extended withnew functions and data types implemented in C or C++. Python is alsosuitable as an extension language for customizable applications.
Python is an ideal candidate for writing the higher-level parts oflarge-scale scientific applications[Hinsen97] and drivingsimulations in parallel architectures[Beazley97] like clusters ofPC’s or SMP’s. Python codes are quickly developed, easily maintained,and can achieve a high degree of integration with other librarieswritten in compiled languages.
Related Projects
As this work started and evolved, some ideas were borrowed from wellknown MPI and Python related open source projects from the Internet.
It has no relation with Python, but is an excellent objectoriented approach to MPI.
It is a C++ class library specification layered on top of the Cbindings that encapsulates MPI into a functional class hierarchy.
It provides a flexible and intuitive interface by adding someabstractions, likePorts andMessages, which enrich andsimplify the syntax.
Its interface is rather minimal. There is no support forcommunicators or process topologies.
It does not require the Python interpreter to be modified orrecompiled, but does not permit interactive parallel runs.
General (picklable) Python objects of any type can becommunicated. There is good support for numeric arrays,practically full MPI bandwidth can be achieved.
It rebuilds the Python interpreter providing a built-in modulefor message passing. It does permit interactive parallel runs,which are useful for learning and debugging.
It provides an interface suitable for basic parallel programming.There is not full support for defining new communicators or processtopologies.
General (picklable) Python objects can be messaged betweenprocessors. There is native support for numeric arrays.
It provides a collection of Python modules that areuseful for scientific computing.
There is an interface to MPI and BSP (Bulk Synchronous Parallelprogramming).
The interface is simple but incomplete and does not resemblethe MPI specification. There is support for numeric arrays.
Additionally, we would like to mention some available tools forscientific computing and software development with Python.
NumPy is a package that provides array manipulation andcomputational capabilities similar to those found in IDL, MATLAB, orOctave. Using NumPy, it is possible to write many efficientnumerical data processing applications directly in Python withoutusing any C, C++ or Fortran code.
SciPy is an open source library of scientific tools for Python,gathering a variety of high level science and engineering modulestogether as a single package. It includes modules for graphics andplotting, optimization, integration, special functions, signal andimage processing, genetic algorithms, ODE solvers, and others.
Cython is a language that makes writing C extensions for thePython language as easy as Python itself. The Cython language isvery close to the Python language, but Cython additionally supportscalling C functions and declaring C types on variables and classattributes. This allows the compiler to generate very efficient Ccode from Cython code. This makes Cython the ideal language forwrapping for external C libraries, and for fast C modules that speedup the execution of Python code.
SWIG is a software development tool that connects programswritten in C and C++ with a variety of high-level programminglanguages like Perl, Tcl/Tk, Ruby and Python. Issuing header filesto SWIG is the simplest approach to interfacing C/C++ libraries froma Python module.
MPI Forum. MPI: A Message Passing Interface Standard.International Journal of Supercomputer Applications, volume 8,number 3-4, pages 159-416, 1994.
MPI Forum. MPI: A Message Passing Interface Standard.High Performance Computing Applications, volume 12, number 1-2,pages 1-299, 1998.
William Gropp, Ewing Lusk, and Anthony Skjellum. UsingMPI: portable parallel programming with the message-passinginterface. MIT Press, 1994.
Mark Snir, Steve Otto, Steven Huss-Lederman, DavidWalker, and Jack Dongarra. MPI - The Complete Reference, volume 1,The MPI Core. MIT Press, 2nd. edition, 1998.
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. Ahigh-performance, portable implementation of the MPI messagepassing interface standard. Parallel Computing, 22(6):789-828,September 1996.
Edgar Gabriel, Graham E. Fagg, George Bosilca, TharaAngskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay,Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, RalphH. Castain, David J. Daniel, Richard L. Graham, and TimothyS. Woodall. Open MPI: Goals, Concept, and Design of a NextGeneration MPI Implementation. In Proceedings, 11th EuropeanPVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004.
Konrad Hinsen. The Molecular Modelling Toolkit: a casestudy of a large scientific application in Python. In Proceedingsof the 6th International Python Conference, pages 29-35, San Jose,Ca., October 1997.
David M. Beazley and Peter S. Lomdahl. Feeding alarge-scale physics application to Python. In Proceedings of the6th International Python Conference, pages 21-29, San Jose, Ca.,October 1997.