fmt: drop `#[inline(never)]` from `pad_integral`'s `write_prefix` helper by gilescope · Pull Request #156822 · rust-lang/rust

gilescope · 2026-05-22T10:14:36Z

Summary

Formatter::pad_integral calls a nested write_prefix helper at three sites. The helper was marked #[inline(never)] in commit ed2157a (Feb 2019) explicitly "for smaller code size", replacing an earlier closure that was duplicated across the four match arms.

Seven years on, that rationale is inverted. Modern LLVM inlines the trivial body and constant-folds the common (sign = None, prefix = None) case (positive non-alternate Display) into a no-op. With the attribute in place, every pad_integral call pays for an unnecessary call/ret + parameter setup to a function that, in the hot path, does nothing.

This PR removes the single #[inline(never)] line.

Bench results

./x bench library/coretests on aarch64-apple-darwin, median of 3 runs:

bench	V0 (current)	V1 (this PR)	Δ
`fmt::write_i32_hex`	54.97 ns	38.31 ns	−30.3 %
`fmt::write_i64_oct`	67.18 ns	48.45 ns	−27.9 %
`fmt::write_i16_hex`	50.23 ns	37.39 ns	−25.6 %
`fmt::write_i64_bin`	85.43 ns	65.77 ns	−23.0 %
`fmt::write_i8_bin`	52.75 ns	43.13 ns	−18.2 %
`fmt::write_i64_42`	9.15 ns	8.31 ns	−9.2 %
`fmt::write_f64_42`	9.42 ns	8.57 ns	−9.0 %
`fmt::write_i64_million`	12.45 ns	11.82 ns	−5.1 %
`fmt::write_u64_max`	36.87 ns	37.30 ns	+1.2 % (noise)
`fmt::write_f64_pi`	30.21 ns	30.25 ns	+0.1 %

Wins are largest on radix benches (bin/hex/oct/exp) which call pad_integral four times per iter mixing positive/negative/alternate cases — every call saves the wasted helper invocation. No bench regressed beyond the noise band.

Size

libstd dylib on darwin:

Metric	V0	V1	Δ
`pad_integral` symbol	940 B	1 116 B	+176 B (helper inlined at 3 sites)
`write_prefix` symbol	124 B	0 B (inlined away)	−124 B
Net functional code	1 064 B	1 116 B	+52 B
libstd `__TEXT` section	835 584 B	835 584 B	0 (same page)
libstd dylib file	1 300 864 B	1 300 752 B	−112 B

The dylib is actually slightly smaller — the symbol-table entry for the standalone write_prefix more than offsets the 52 bytes of duplicated body.

Why this is safe to land

write_prefix is a nested fn, not generic. There's no per-T monomorphization to multiply across downstream binaries — the savings are local to libstd and the per-call overhead removal is local to pad_integral. (We separately checked the per-type pattern by patching nightly's libstd and rebuilding rust-analyzer; for generic candidates like Arc::drop_slow, removing #[inline(never)] grows downstream binaries. This one doesn't have that risk.)

Correctness

68 + 43 fmt:: tests pass on the patched libstd.
No source semantics change — only the inlining attribute.

What V1 does structurally

The disassembled pad_integral body in V1 absorbs the helper's three branches inline:

0x758d8  cbz w25, 0x758e0   ; is_nonneg=false ? -> handle sign
0x758dc  cbz w28, 0x758f0   ; sign_plus=false ? -> skip sign write
0x758e0  ldr x8, [x19,#0x20]; load write_char vtable
0x758e8  blr x8             ; call write_char (only when sign present)
0x758f0  cbz x22, 0x75968   ; prefix=None ? -> skip prefix write
…                            ; call write_str (only when prefix present)

For the common positive non-alternate Display (sign = None, prefix = None), control flow takes two cbz branches and never executes a single instruction from the (former) write_prefix body.

Test plan

./x bench library/coretests shows the wins above
./x test library/coretests fmt:: passes (68 + 43 tests)
CI green

`Formatter::pad_integral` calls a nested `write_prefix` helper at three sites. The helper was a closure until Feb 2019, when commit ed2157a ("De-duplicate write_prefix lambda in pad_integral. For smaller code size.") converted it to a `#[inline(never)]` nested fn to share the duplicated body across the call sites. Seven years later that rationale is inverted: modern LLVM happily inlines the trivial body and constant-folds the common `(sign=None, prefix=None)` case (positive non-alternate `Display`) into a no-op. The explicit `#[inline(never)]` was forcing an unnecessary call/ret + parameter setup on every `Formatter::pad_integral` invocation. Measurements (aarch64-apple-darwin, stage-1 libstd, median of 3 runs): bench V0 (current) V1 (no attr) delta fmt::write_i32_hex 54.97 ns 38.31 ns -30.3 % fmt::write_i64_oct 67.18 ns 48.45 ns -27.9 % fmt::write_i16_hex 50.23 ns 37.39 ns -25.6 % fmt::write_i64_bin 85.43 ns 65.77 ns -23.0 % fmt::write_i8_bin 52.75 ns 43.13 ns -18.2 % fmt::write_i64_42 9.15 ns 8.31 ns -9.2 % fmt::write_f64_42 9.42 ns 8.57 ns -9.0 % fmt::write_i64_million 12.45 ns 11.82 ns -5.1 % fmt::write_u64_max 36.87 ns 37.30 ns +1.2 % (noise) fmt::write_f64_pi 30.21 ns 30.25 ns +0.1 % Wins are largest on radix benches (bin/hex/oct/exp) which call `pad_integral` four times per iteration mixing positive/negative/alternate cases - every call saves the wasted helper invocation. Size, libstd dylib on darwin: pad_integral 940 B -> 1116 B (+176 B inlined at 3 sites) write_prefix standalone 124 B -> 0 B (-124 B, inlined away) Net functional code 1064 B -> 1116 B (+52 B) libstd __TEXT 835 584 B -> 835 584 B (0; same page) libstd dylib file 1 300 864 B -> 1 300 752 B (-112 B, fewer symbols) The dylib is slightly smaller with the attribute removed because the symbol-table entry for the standalone write_prefix is gone, more than offsetting the +52 B of duplicated body across the three call sites. write_prefix is a nested fn, not generic. There is no per-T monomorphization to multiply across downstream binaries - the savings are local to libstd. 68 + 43 fmt:: tests pass on the patched libstd.

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 22, 2026

ajasad25 approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fmt: drop `#[inline(never)]` from `pad_integral`'s `write_prefix` helper#156822

fmt: drop `#[inline(never)]` from `pad_integral`'s `write_prefix` helper#156822
gilescope wants to merge 1 commit into
rust-lang:mainfrom
gilescope:giles-fmt-pad-integral-drop-write-prefix-inline-never

gilescope commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

gilescope commented May 22, 2026

Summary

Bench results

Size

Why this is safe to land

Correctness

What V1 does structurally

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants