The directive, ALIGN aligns arrays to the templates.
We consider an example. The core code of an LU decomposition
subroutine looks as follows;
01 REAL A (N, N) 02 INTEGER N, R, R1 03 REAL, DIMENSION (N) :: L_COL, U_ROW 04 05 DO R = 1, N - 1 06 R1 = R + 1 07 L_COL (R : ) = A (R : , R) 08 A (R , R1 : ) = A (R, R1 : ) / L_COL (R) 09 U_ROW (R1 : ) = A (R, R1 : ) 10 FORALL (I = R1 : N, J = R1 : N) 11 & A (I, J) = A (I, J) - L_COL (I) * U_ROW (J) 12 ENDDO |
!HPF$ TEMPLATE T (N, N) |
A that holds
the matrix, is identically matched with this template. In order to
align A to T we need an ALIGN directive like;
!HPF$ ALIGN A(I, J) WITH T (I, J) |
DO-loop from our
example is in the following statement, which is line 11 of the program,
A (I, J) = A (I, J) - L_COL (I) * U_ROW |
L_COL (I) and
U_ROW (J) are allocated wherever
A (I, J) is allocated. The following statement can manage it
using a replicated alignment to the template T,
!HPF$ ALIGN L_COL (I) WITH T (I, *) !HPF$ ALIGN U_ROW (J) WITH T (*, I) |
FORALL
construct since all operands of each elemental assignment will be
allocated on the same processor. Do the other statements require some
communications?
The line 8 is equivalent to
FORALL (J = R1 : N) A (R, J) = A (R, J) / L_COL (R) |
L_COL (R) will be available on any
processor wherever A (R, J) is allocated, it requires no
communications.
But, the other two array assignment statements do need
communications. For instance, the assignment to L_COL, which is
the line 7 of the program, is equivalent to
FORALL (I = R : N) L_COL (I) = A (I, R) |
L_COL (I) is replicated in the J direction,
while A (I, R) is allocated only on the processor which holds the
template element where L_COL element
is to broadcast the A element to all concerned parties. These
communications will be properly inserted by the compiler.
The next step is to distribute the template (we already aligned the
arrays to a template). A BLOCK distribution is not good choice
for this algorithm since successive iterations work on a shrinking
area of the template. Thus, a block distribution will make some
processors idle in later iterations. A CYCLIC distribution will
accomplish better load balancing
In the above example, we illustrated simple alignment--``identity
mapping'' array to template--and also replicated alignments. What
would general alignments look like?
One example is that we can transpose an array to a template.
DIMENSION B(N, N)
!HPF$ ALIGN B(I, J) WITH T(J, I)
|
B to T (B (1, 2) is aligned
to T (2, 1), and so on). More generally, a subscript of an
align target (i.e. the template) can be a linear expression
in one of the alignment dummies. For example,
DIMENSION C(N / 2, N / 2)
!HPF$ ALIGN C(I, J) WITH T(N / 2 + I, 2 * J)
|
DIMENSION D(N, N, N)
!HPF$ ALIGN D(I, J, K) WITH T(I, J)
|
T, is not dependent on K. For fixed I and
J, each element of the array, D, is mapped to the same
template element.
In this section, we have covered HPF's processor arrangement,
distributed arrays, and data alignment which we will basically adopt
to the HPspmd programming model we present in chapter
4.