close
Skip to content

refactor(bench): parallelize benchmark report builder#1990

Merged
numinnex merged 7 commits into
apache:masterfrom
blake-hu:parallelize-benchmark-report-builder
Jul 10, 2025
Merged

refactor(bench): parallelize benchmark report builder#1990
numinnex merged 7 commits into
apache:masterfrom
blake-hu:parallelize-benchmark-report-builder

Conversation

@blake-hu
Copy link
Copy Markdown
Contributor

Description

Currently, the build() method from BenchmarkReportBuilder is single-threaded. This PR parallelizes this in 3 places:

  1. In build(), spawn a new thread for every call to from_individual_metrics or from_producers_and_consumers_statistics.
  2. In calculating aggregated time series, spawn one thread for each time series (MB, msg, latency).
  3. In TimeSeriesCalculator, use rayon parallel iterators.

(I also tried implementing multithreading and par_iters in other places, but those led to slower performance, likely due to threading overhead. Those are not included in the PR.)

Measuring speedup

I measured the average runtime of build() with 200 producers and 200 consumers on a Ryzen 9 7950X with 124 GB RAM. build() shows between 3.7-11x speedup, with greater speedup for greater message loads.

Message count Single-threaded runtime (ms) Multi-threaded runtime (ms) Speedup
8M messages 155 42 3.7x
16M messages 497 74 6.7x
32M messages 1791 162 11.0x

Closes #1976

@blake-hu blake-hu changed the title Parallelize benchmark report builder using std::thread and rayon refactor(bench): parallelize benchmark report builder Jul 10, 2025
@numinnex
Copy link
Copy Markdown
Contributor

LGTM, Good job!

@numinnex numinnex requested a review from spetz July 10, 2025 07:03
@spetz
Copy link
Copy Markdown
Contributor

spetz commented Jul 10, 2025

Looks solid, thanks!

@numinnex numinnex merged commit 177daea into apache:master Jul 10, 2025
52 of 60 checks passed
hageshiame pushed a commit to hageshiame/iggy that referenced this pull request Nov 7, 2025
## Description

Currently, the build() method from BenchmarkReportBuilder is
single-threaded. This PR parallelizes this in 3 places:

1. In build(), spawn a new thread for every call to
`from_individual_metrics` or `from_producers_and_consumers_statistics`.
2. In calculating aggregated time series, spawn one thread for each time
series (MB, msg, latency).
3. In TimeSeriesCalculator, use rayon parallel iterators.

(I also tried implementing multithreading and `par_iter`s in other
places, but those led to slower performance, likely due to threading
overhead. Those are not included in the PR.)

## Measuring speedup

I measured the average runtime of build() with 200 producers and 200
consumers on a Ryzen 9 7950X with 124 GB RAM. build() shows between
3.7-11x speedup, with greater speedup for greater message loads.

| Message count | Single-threaded runtime (ms) | Multi-threaded runtime
(ms) | Speedup |
| ------------- | ---------------------------- |
--------------------------- | ------- |
| 8M messages | 155 | 42 | 3.7x |
| 16M messages | 497 | 74 | 6.7x |
| 32M messages | 1791 | 162 | 11.0x |

Closes apache#1976
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve benchmark by parallelizing BenchmarkReportBuilder.

3 participants