next up previous contents
Next: Parallel classification Up: Common Runtime Support for Previous: Java compiler   Contents

Florida University Report

Florida became involved in the project after Sanjay Ranka, one of the original co-PIs, moved there from Syracuse.

We evaluated the High-Performance Fortran (HPF) language as a candidate for implementing scientific, engineering and computer science software on parallel machines [73]. We reviewed major HPF language features and discusses general algorithmic issues common to broad classes of SPMD applications. We developed concrete examples on how they may be programmed in HPF, or have provided code fragments demonstrating the pertinent language features. These fragments were representative of applications in magnetic spin models, computational fluid dynamics, financial simulations, image processing, particle dynamics, and the solution of matrix equations. We found that High Performance Fortran language is suitable for these applications. The codes are part of the NPAC HPF Applications suite, and the further expanding and benchmarking them is an ongoing process.

Parallelization of applications requires each processor to access all the nonlocal data required for its local computation. For a number of applications these accesses can be described more formally by using Random Access Reads (RAR) and Random Access Writes (RAW). This formalism covers the basic runtime support required for the parallelization of the assignment statement, forall statement/construct, redistribution statement and several computational intrinsics in HPF for arbitrary distributions (regular as well as irregular). The theoretical work was developed a few years ago and was recently revised and accepted in JPDC [74,75].

We extended the theoretical work and developed practical algorithms for combining gather/scatter functions. These algorithms involved several novel ideas for local and global combining of accesses at runtime [7]. We performed extensive benchmarking on parallel machines and demonstrated that the algorithms developed were practical. They were robust in the presence of hotspots for data access. The software developed was integrated as part of the PCRC runtime library and required generalizations for arbitrary distributions, arbitrary number of dimensions and arbitrary number of processors.

Vector reduction and prefix operations can be used effectively in many applications with unstructured data accesses such as Pack/Unpack, Array Reduction/Prefix Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. They can also be used in applications with structured accesses such as HPF array reduction and array prefix/suffix functions. We have developed a scalable and work optimal algorithms for this operation [6]. This algorithm is shown to be practical and shows a factor of 3-5 improvement for large number of processors.

PACK/UNPACK are Fortran 90/HPF array construction functions which derive new arrays from existing arrays. We developed new algorithms for performing these operations on coarse-grained parallel machines [8]. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distribution along every dimension.

Integer sorting is a subclass of the sorting problem where the elements have integer values and the largest element is polynomially bounded in the number of elements to be sorted. It is useful for applications in which the size of the maximum value of element to be sorted is bounded. We developed a new algorithm, distributed radix-sort, for integer sorting [4]. The structure of our algorithm is similar to radix sort except that it requires less number of communication phases. Our experimental results on the CM-5 show that the algorithm is better than previous algorithms designed for sorting arbitrary radix integers. These algorithm can be utilized to support Sort routines (Grade_up and Grade_down) in the HPF runtime library.



Subsections
next up previous contents
Next: Parallel classification Up: Common Runtime Support for Previous: Java compiler   Contents
Bryan Carpenter 2002-07-12