Many computer-related scientists and professionals will probably
agree with the idea that, in a short time--less than decade--every
PC will have multi-processors rather than uni-processors
.
This implies that parallel computing plays a critically
important role not only in scientific computing but also the modern
computer technology.
There are lots of places where parallel computing can be
successfully applied--supercomputer simulations in government labs,
scaling Google's core technology powered by the world's largest
commercial Linux cluster (more than 10,000 servers), dedicated
clusters of commodity computers in a Pixar RenderFarm for animations
[43,39]. Or by collecting
cycles of many available processors over Internet. Some of the
applications involve large-scale ``task farming,'' which is applicable
when the task can be completely divided into a large number of
independent computational parts. The other popular form of massive
parallelism is ``data parallelism.'' The term data parallelism is
applied to the situation where a task involves some large
data-structures, typically arrays, that are split across nodes. Each
node performs similar computations on a different part of the data
structure. For data parallel computation to work best, it is very
important that the volume of communicated values should be small
compared with the volume of locally computed results. Nearly all
successful applications of massive parallelism can be classified as
either task farming or data parallel.
For task farming, the level of parallelism is usually
coarse-grained. This sort of parallel programming is naturally
implementable in the framework of conventional sequential programming
languages: a function or a method can be an abstract task, a
library can provide an abstract, problem-independent infrastructure for
communication and load-balancing. For data parallelism, we meet a
slightly different situation. While it is possible to code data
parallel programs for modern parallel computers in ordinary sequential
languages supported by communication libraries, there is a long and
successful history of special languages for data parallel
computing. Why is data parallel programming special?
Historically, a motivation for the development of data parallel
languages is strongly related with Single Instruction Multiple Data
(SIMD) computer architectures. In a SIMD computer, a single control
unit dispatches instructions to large number of compute nodes, and
each node executes the same instruction on its own local data.
The early data parallel languages developed for machines such as
the Illiac IV and the ICL DAP were very suitable for efficient
programming by scientific programmers who would not use a parallel
assembly language. They introduced a new programming language
concept--distributed or parallel arrays--with different
operations from those allowed on sequential arrays.
In the 1980s and 1990s microprocessors rapidly became more powerful,
more available, and cheaper. Building SIMD computers with specialized
computing nodes gradually became less economical than using general
purpose microprocessors at every node. Eventually SIMD computers were
replaced almost completely by Multiple Instruction Multiple Data
(MIMD) parallel computer architectures. In MIMD computers, the
processors are autonomous: each processor is a full-fledged CPU with
both a control unit and an ALU. Thus each processor is capable of
executing its own program at its own pace at the same time:
asynchronously.
The asynchronous operations make MIMD computers extremely
flexible. For instance, they are well-suited for the task farming
parallelism, which is hardly practical on SIMD computers at all. But
this asynchrony makes general programming of MIMD computers hard. In
SIMD computer programming, synchronization is not an issue since every
aggregate step is synchronized by the function of the control unit. In
contrast, MIMD computer programming requires concurrent programming
expertise and hard work. A major issue on MIMD computer programming is
the explicit use of synchronization primitives to control
nondeterministic behavior. Nondeterminism appears almost inevitably in
a system that has multiple independent threads of the control--for
example, through race conditions. Careless synchronization leads to a
new problem, called deadlock.
Soon a general approach was proposed in order to write data parallel
programs for MIMD computers, having some similarities to programming
SIMD computers. It is known as Single Program Multiple Data (SPMD)
programming or the Loosely Synchronous Model. In SPMD, each node
executes essentially the same thing at essentially the same time, but
without any single central control unit like a SIMD
computer. Each node has its own local copy of the control variables,
and updates them in an identical way across MIMD nodes. Moreover,
each node generally communicates with others in well-defined
collective phases. These data communications implicitly or explicitly
synchronize the nodes. This aggregate synchronization
is easier to deal with than the complex synchronization problems of
general concurrent programming. It was natural to think that catching
the SPMD model for programming MIMD computers in data parallel
languages should be feasible and perhaps not too difficult, like
the successful SIMD languages. Many distinct research prototype
languages experimented with this, with some success. The High
Performance Fortran (HPF) standard was finally born in the 90s.