Next: Benchmarking HPJava, Part II:
Up: Benchmarking HPJava, Part I:
Previous: Experimental Study - Q3
Contents
Discussion
In this chapter, we have experimented on and benchmarked the HPJava
language with scientific and engineering applications on a Linux
machine (Red Hat 7.3) with a single processor (Pentium IV 1.5 GHz CPU
with 512 MB memory and 256 KB cache).
The main purpose we concentrated on benchmarking on the Linux machine
in this chapter is to verify if the HPJava system and its optimization
strategies produces efficient node code. Without confidence of
producing efficient node code, there is no prospect of
high-performance for the HPJava system on parallel machines.
Moreover, through these benchmarks, we studied the behaviour of the
overall construct and the subscript expression of a multiarray element
access in HPJava programs and experimented with the effect of
optimization strategies to the HPJava system.
Unlike direct matrix multiplication and Q3 index, the index triplets of
overall constructs of Laplace equation using red-black relaxation and
3D diffusion equation in HPJava are not using the default value (e.g.
overall(x = i for :)). If the index triplet depends on
variables, then one of control variables, whose type is
localBlock(), is not loop invariant. This means that it can't be
hoisted outside the most outer overall construct. To eliminate this
problem in a common case, we adopted a Loop Unrolling optimization in
our HPJOPT2. Moreover, when creating multiarrays, all dimensions are
distributed in these PDE examples. In contrast, some of dimensions in
direct matrix multiplication and Q3 index are sequential. The
translation scheme for the subscript expression of a distributed
dimension is obviously more complicated than that of a sequential
dimension.
Table 6.2:
Speedup of each application over naive translation and
sequential Java after applying HPJOPT2.
|
|
Direct Matrix |
Laplace equation |
|
|
|
|
Multiplication |
red-black relaxation |
3D Diffusion |
Q3 |
|
HPJOPT2 over |
|
|
|
|
|
naive translation |
150% |
361% |
200% |
115% |
|
HPJOPT2 over |
|
|
|
|
|
sequential Java |
122% |
94% |
161% |
138% |
As we see from table 6.2, HPJava with HPJOPT2
optimization can maximally increase performance of scientific and
engineering applications with large problem size and more distributed
dimensions. It proves that the HPJava system should be able to produce
efficient node code and the potential performance of HPJava on
multi-processors looks very promising.
Next: Benchmarking HPJava, Part II:
Up: Benchmarking HPJava, Part I:
Previous: Experimental Study - Q3
Contents
Bryan Carpenter
2004-06-09