close
Skip to content

feat: improve integer parsing performance#156695

Draft
gilescope wants to merge 3 commits into
rust-lang:mainfrom
gilescope:giles-int-parse-perf
Draft

feat: improve integer parsing performance#156695
gilescope wants to merge 3 commits into
rust-lang:mainfrom
gilescope:giles-int-parse-perf

Conversation

@gilescope
Copy link
Copy Markdown
Contributor

@gilescope gilescope commented May 18, 2026

Const functions have improved significantly. Rather than the rough approximation of can_not_overflow that was introduced earlier (#95399), we can be accurate on digits that can lead to the fast path.

There is also a slight specialisation for u128 to improve its parsing performance specifically as we can reduce the u128 multiplications that are done for fewer digits.

(I've done additional fuzz testing on both implementations to ensure there's no regressions. And full disclosure: this PR was designed by Claude and I while I was having a bath yesterday using the /remote-control )

Benchmarks Arm64

Apple M3 Max, ./x bench library/coretests --test-args=from_str, against main. New u128/i128 and long-string benches added in this PR.

38 wins (≥5% faster), 1 regression (≥5% slower), 16 within ±5%. Best −55%, worst +13%.

  Selected wins (≥15%):

  bench_i8_from_str_radix_2          27525 → 12384   −55.0%
  bench_i64_from_str_radix_2         22842 → 11993   −47.5%
  bench_i16_from_str_radix_2         22292 → 11857   −46.8%
  bench_i128_from_str_radix_36       45014 → 27791   −38.3%
  bench_i16_from_str                 28609 → 17713   −38.1%
  bench_i32_from_str_radix_2         19408 → 12095   −37.7%
  bench_i16_from_str_radix_10        28303 → 18160   −35.8%
  bench_u64_from_str_radix_36        27969 → 19613   −29.9%
  bench_i128_from_str_radix_10_long 159379 → 113716  −28.7%    ← long-string
  bench_i128_from_str_radix_2        20049 → 14421   −28.1%
  bench_i64_from_str_radix_10        25465 → 18990   −25.4%
  bench_u128_from_str_radix_10_long 124595 → 102374  −17.8%    ← long-string
  bench_i128_from_str_radix_16       31002 → 25582   −17.5%
  bench_i64_from_str_radix_10_long   97396 → 81480   −16.3%    ← long-string
  bench_u128_from_str_radix_36       30580 → 25816   −15.6%

Sole regression:

  bench_i32_from_str_radix_36        23811 → 26923   +13.1%

The regression is structural - it's always that i32 / radix 36 combo.

Benchmarks x86

On x86_64 (AMD Ryzen 9 5950X): all 55 benches stayed within ±1.5%.
x86 already looks about as optimum as can be.

I've included the results of some additional benchmarks in here _long that exercise the longer path more. I could include them in the PR but one could have too many benchmarks.

That said, if you prefer minimal changes in this PR and just want most of the performance wins, it seems changing #[inline] to #[inline(always)] here will give you most of the perf wins with a one line change:

#[inline(always)]
pub const fn from_ascii_radix(src: &[u8], radix: u32) -> Result<$int_ty, ParseIntError> {
  ┌────────────────────┬──────┬─────────────┬─────────┬─────────────────┐
  │       Option       │ Wins │ Regressions │ i32_r36 │     u16_r2      │
  ├────────────────────┼──────┼─────────────┼─────────┼─────────────────┤
  │ Just inline-always │ 22   │ 2           │ +17.2%  │ +9.8%           │
  ├────────────────────┼──────┼─────────────┼─────────┼─────────────────┤
  │ Full PR            │ 38   │ 1           │ +13.1%  │ +1.1% (noise)   │
  └────────────────────┴──────┴─────────────┴─────────┴─────────────────┘

I'll leave the choice up to you. I guess there must be some compile time cost to executing the const function, but there is an elegance that we have the maximum number hitting the unchecked path as possible.

Signed-off-by: Giles Cope <gilescope@gmail.com>
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 18, 2026
@rust-log-analyzer

This comment has been minimized.

Signed-off-by: Giles Cope <gilescope@gmail.com>
@rust-log-analyzer

This comment has been minimized.

Signed-off-by: Giles Cope <gilescope@gmail.com>
/// ```
#[unstable(feature = "int_from_ascii", issue = "134821")]
#[inline]
// `inline(always)` so the body's `radix`-dependent `ilog` bound
Copy link
Copy Markdown

@ds84182 ds84182 May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bits and is_signed_ty are both compile time constants, and there are only 34 possible radix', so there is a better solution than forcing this to be inlined.

View changes since the review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but the perf was worse. We could make the always inline just be for arm or we could close this and accept that at some point arm64 llvm will improve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants