Next: High Level Libraries for
Up: Message-Passing for HPC
Previous: Early Message-Passing Frameworks
Contents
The main goal of the MPI standard is to define a common standard
for writing message passing applications. The standard ought to be
practical, portable, efficient, and flexible for message passing. The
following is the complete list of goals of MPI [18], stated
by the MPI Forum.
- Design an application programming interface (not necessarily for
compilers or a system implementation library).
- Allow efficient communication: avoid memory-to-memory copying
and allow overlap of computation and communication, and offload to
communication co-processor, where available.
- Allow for implementations that can be used in a heterogeneous
environment.
- Allow convenient C and Fortran 77 bindings for the interface.
- Assume a reliable communication interface: the user need not
cope with communication failures. Such failures are dealt with by
the underlying communication subsystem.
- Define an interface that is not too different from current
practice, such as PVM, NX, Express, p4, etc., and provide
extensions that allow greater flexibility.
- Define an interface that can be implemented on many vendor's
platforms, with no significant changes in the underlying
communication and system software.
- Semantics of the interface should be language independent.
- The interface should be designed to allow for thread-safety.
The standard covers point-to-point communications, collective
operations, process groups, communication contexts, process
topologies, bindings for Fortran 77 and C, environmental management
and inquiry, and a profiling interface.
The main functionality of MPI, the point-to-point and collective
communication of MPI are generally executed within process groups.
A group is an ordered set of processes, each process in
the group is assigned a unique rank such that 0, 1, ...,
, where
is the number of processes.
A context is a system-defined object that uniquely
identifies a communicator. A message sent in one context can't be
received in other contexts. Thus, the communication context is the
fundamental methodology for isolating messages in distinct libraries
and the user program from one another.
The process group is high-level, that is, it is visible to users in
MPI. But, the communication context is low-level--not visible. MPI puts
the concepts of the process group and communication context together
into a communicator. A communicator is a data object that
specializes the scope of a communication. MPI supports an initial
communicator, MPI_COMM_WORLD which is predefined and consists
of all the processes running when program execution begins.
Point-to-point communication is the basic concept of MPI
standard and fundamental for send and receive operations for typed
data with associated message tag. Using the point-to point
communication, messages can be passed to another process with explicit
message tag and implicit communication context. Each process can carry
out its own code in MIMD style, sequential or multi-threaded. MPI is
made thread-safe by not using global state
.
MPI supports the blocking send and receive primitives.
Blocking means that the sender buffer can be reused right after the send
primitive returns, and the receiver buffer holds the complete
message after the receive primitive returns. MPI has one blocking
receive primitive, MPI_RECV, but four blocking send primitives
associated with four different communication modes:
MPI_SEND (Standard mode), MPI_SSEND (Synchronous mode),
MPI_RSEND (Ready mode), and MPI_BSEND (Buffered mode).
Moreover, MPI supports non-blocking send and receive primitives
including MPI_ISEND and MPI_IRECV, where the message
buffer can't be used until the communication has been completed by a
wait operation. A call to a non-blocking send and receive simply posts
the communication operation, then it is up to the user program to
explicitly complete the communication at some later point in the
program. Thus, any non-blocking operation needs a minimum of two
function calls: one call to start the operation and another to
complete the operation.
A collective communication is a communication pattern
that involves all the processes in a communicator. Consequently, a
collective communication is usually associated with more than two
processes. A collective function works as if it involves group
synchronization. MPI supports the following collective communication
functions: MPI_BCAST, MPI_GATHER, MPI_SCATTER,
MPI_ALLGATHER, and MPI_ALLTOALL.
Figure 2.6:
Collective communications with 6
processes.
|
|
Figure 2.6 on page
illustrates the
above collective communication functions with 6 processes.
A user-defined data type is taken as an argument of all MPI
communication functions. The data type can be, in the simple case, a
primitive type like integer or floating-point number. As well as the
primitive types, a user-defined data type can be the argument, which
makes MPI communication powerful. Using user-defined data types, MPI
provides for the communication of complicated data structures like array
sections.
Next: High Level Libraries for
Up: Message-Passing for HPC
Previous: Early Message-Passing Frameworks
Contents
Bryan Carpenter
2004-06-09