Next: Discussion
Up: Benchmarking HPJava, Part II:
Previous: 3-Dimensional Diffusion Equation
Contents
Q3 - Local Dependence Index
Figure 7.7:
Q3 on shared memory machine
|
|
Figure 7.7 shows the performance of Q3 on the shared
memory machine. Again, we need to see the Java performance over the C
performance on the shared memory machine. It is 55% over C.
The table 7.12 shows the speedup of the HPJava naive
translation over sequential Java and C programs. Moreover, it shows
the speedup of HPJOPT2 over the naive translation.
Table 7.12:
Speedup of the naive translation over sequential Java and C
programs for Q3 on the shared memory machine.
|
Number of Processors |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Naive translation |
|
|
|
|
|
|
|
|
|
over Java |
1.47 |
3.85 |
7.08 |
9.50 |
11.53 |
13.71 |
16.18 |
18.02 |
|
Naive translation |
|
|
|
|
|
|
|
|
|
over C |
0.84 |
2.20 |
4.04 |
5.42 |
6.58 |
7.83 |
9.23 |
10.29 |
|
HPJOPT2 over Java |
1.85 |
5.39 |
12.23 |
16.30 |
19.89 |
24.81 |
27.77 |
32.00 |
|
HPJOPT2 over |
|
|
|
|
|
|
|
|
|
Naive translation |
1.26 |
1.40 |
1.73 |
1.72 |
1.73 |
1.81 |
1.72 |
1.76 |
The speedups of the naive translation with 8 processors over
sequential Java and C is up to 1802%. The speedups of HPJOPT2 with 8
processors over sequential Java and C is up to 3200%. The
speedup of HPJOPT2 over the naive translation is up to 181%. We recall
that performance of Q3 is slow compared to other applications on the
Linux machine. As expected, with multi-processors, performance of Q3
is excellent even without any optimizations. It illustrates that
performance of HPJava can be outstanding for applications with large
problem sizes.
The table 7.13 shows the speedup of the naive
translation and HPJOPT2 for each number of processors over the
performance with one processor.
Table 7.13:
Speedup of the naive translation and HPJOPT2 for each number of
processors over the performance with one processor for Q3 on the
shared memory machine.
|
Number of Processors |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Naive translation |
2.62 |
4.81 |
6.46 |
7.84 |
9.32 |
11.00 |
12.26 |
|
HPJOPT2 |
2.91 |
6.61 |
8.80 |
10.75 |
13.40 |
15.01 |
17.29 |
The naive translation gets up to 1226% speedup using 8 processors
on the shared memory machine. Moreover, HPJOPT2 gets up to 1729%
speedup. Unlike traditional benchmark programs, Q3 gives a
tremendous speedup with a moderate number (= 8) of processors.
Next: Discussion
Up: Benchmarking HPJava, Part II:
Previous: 3-Dimensional Diffusion Equation
Contents
Bryan Carpenter
2004-06-09