How to do (parallel) performance analysis
- The first step is to verify that your code is correct
- construct a test that convinces you.
- Eg, if you compute something known by other means (like "pi") compare with it
- print the result of the test with enough comments that you will remember
what it was a year later
- The second step is to time your (sequential) code
- pick a timer with sufficient resolution
- decide which segments of the code you want to time separately
- repeat short segments, or choose a large enough data size,
to allow your timer to resolve them
- The final step is to time your parallel code and do a performance
prediction graph
- A trick of the trade: synchronize all processors before you
begin timing.
- use log-log plots
- separate timeing-data gathering and printing from the timing
sections
- Helpful hints
- label all your graphs (both axes, with units, and a title)
so you will remember what it was a year later
- make your "output" a complete report (give the name of the
timer used)
- Communication
- At least initially, you should always submit the complete
code listing
- At least initially, you should always print out timing (and
other data) from all processors to verify what is going on