close
Skip to content

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496

Open
asder8215 wants to merge 6 commits into
rust-lang:mainfrom
asder8215:components_rewrite
Open

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
asder8215 wants to merge 6 commits into
rust-lang:mainfrom
asder8215:components_rewrite

Conversation

@asder8215
Copy link
Copy Markdown
Contributor

@asder8215 asder8215 commented May 12, 2026

View all comments

This PR entirely changes how Components<'_> is implemented. Currently, the Components<'_> iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned in Components::next or Components::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.

This PR benefits implementations like Components::as_path, which is pretty used is multiple areas of the standard library. Previously, Components<'_> iterator was required to clone inside the function to present the unconsumed path because our original Component<'_> consuming behavior on path will not allow the returned &'a Path from Components::as_path to last after a Components::next or Components::next_back call. Due to the current implementation of Components iterator has a size of 64 bytes, if you're using Components::as_path after each Components::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").

On the point of size, with the indexing strategy, this PR has further optimized the size of Components<'_> from 64 bytes -> 40 bytes since a large chunk of the Components<'_> was taken up by the Option<Prefix> (this takes up 40 bytes). Instead of holding a prefix field in Components<'_>, we can encode the length of the Prefix within our front field index and use another enum called FirstComponent to check whether our first component of the given path is Prefix (or something else). If it's a Prefix, we can use parse_prefix on the subslice self.path[..self.front] since we know our front index encodes the Prefix length.

Due to not having the prefix Option<Prefix> field inside Components<'_> anymore, all the prefix functions in Components<'_> have been removed in favor of calling parse_prefix, Prefix::is_verbatim, Prefix::is_drive, etc.

I'm curious if this redesign of Components<'_> improves Path equality as pointed out by @clarfonthey in #154521 with Path equality (not to be confused with Path ordering as mentioned in the issue, since that uses Components:::compare_components and the example code shows equality) being slow. I haven't benchmarked this though. I have benchmarked the result and I can say that currently this implementation improves Path equality due to Components::next_back running faster with this implementation than the current mutating path with a subslice implementation. However, Path ordering runs slightly slower. You can check the benchmark code I used here, and play around with the number of bytes in a component, the number of components, etc..

Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with Components<'_> dealing something with threads using Path::components, but now that's all resolved). I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.

There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.

Update: I got to testing some things out with Prefixes on my Windows VM manually, so the prefix component index encoded into the Components<'_> front field seems to work out nicely. I've also accounted for root directory being able to exists after a Prefix component like "\?\checkout\src\tools" having the following components: PrefixVerbatim -> RootDir -> Normal -> Normal -> None (learnt this from the fail that occurred in miri tests, which is nice to see this Components<'_> implementation works on the Windows tests in CI).

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 12, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 12, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates

@rustbot

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 1627e2f to 33e69e1 Compare May 12, 2026 09:09
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 12, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 33e69e1 to ed9d33d Compare May 12, 2026 17:05
@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from ed9d33d to 0b0f84c Compare May 12, 2026 17:19
@rust-log-analyzer

This comment has been minimized.

… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
@asder8215 asder8215 force-pushed the components_rewrite branch from 0b0f84c to 8ed33ea Compare May 12, 2026 22:05
@asder8215

This comment was marked as outdated.

@asder8215 asder8215 force-pushed the components_rewrite branch from 2151b8f to 83cdbed Compare May 13, 2026 22:21
@asder8215

This comment was marked as outdated.

…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
@asder8215 asder8215 force-pushed the components_rewrite branch from 83cdbed to 3921fff Compare May 15, 2026 00:30
@asder8215 asder8215 marked this pull request as draft May 16, 2026 12:22
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2026
asder8215 added 2 commits May 16, 2026 19:56
…here to use iter().position()/.iter().rposition(), refactored code in compare_components, and removed stale comments
@asder8215 asder8215 force-pushed the components_rewrite branch from 0a25dda to 92e0132 Compare May 17, 2026 16:09
@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 17, 2026

New benchmarking results. You can see what the benchmark code looks like here and run it yourself to see if there are any difference in measurements on your end:

This is the measurement of the current implementation of Components<'_> (without black box):

Std Components (No BB)  time:   [21.546 µs 21.800 µs 22.096 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Std Components Next (No BB)
                        time:   [20.434 µs 20.482 µs 20.538 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Std Components Next Back (No BB)
                        time:   [38.367 µs 38.757 µs 39.199 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Std Path Iter (No BB)   time:   [21.547 µs 21.730 µs 21.921 µs]

Std As Path Iter (No BB)
                        time:   [87.680 µs 88.439 µs 89.231 µs]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Std Eq Comps (No BB)    time:   [591.21 ns 593.35 ns 595.82 ns]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

Std Uneq Comps (No BB)  time:   [60.953 ns 61.419 ns 61.911 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Std Uneq 2 Comps (No BB)
                        time:   [75.454 µs 75.734 µs 76.027 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Std Compare Comps (No BB)
                        time:   [46.182 µs 46.621 µs 47.192 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq Comps (No BB)
                        time:   [46.679 µs 46.980 µs 47.291 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq 2 Comps (No BB)
                        time:   [41.480 ns 41.827 ns 42.160 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

This is the measurement of the new implementation of Components<'_> I'm working on (without black box):

Components Rewrite (No BB)
                        time:   [24.982 µs 25.267 µs 25.570 µs]

Components Next Rewrite (No BB)
                        time:   [24.388 µs 24.655 µs 24.937 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

Components Next Back Rewrite (No BB)
                        time:   [18.184 µs 18.567 µs 19.034 µs]
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

Path Iter Rewrite (No BB)
                        time:   [23.485 µs 23.659 µs 23.829 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

As Path Iter Rewrite (No BB)
                        time:   [22.936 µs 23.066 µs 23.208 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Eq Comps Rewrite (No BB)
                        time:   [605.12 ns 608.83 ns 612.98 ns]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

Uneq Comps Rewrite (No BB)
                        time:   [31.799 ns 32.108 ns 32.433 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Uneq Comps 2 Rewrite (No BB)
                        time:   [47.091 µs 48.186 µs 49.085 µs]

Compare Comps Rewrite (No BB)
                        time:   [50.234 µs 50.725 µs 51.254 µs]
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

Compare Uneq Comps Rewrite (No BB)
                        time:   [49.262 µs 49.631 µs 50.067 µs]
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

Compare Uneq Comps 2 Rewrite (No BB)
                        time:   [43.397 ns 43.767 ns 44.171 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Edit: Updated Components::as_path to match on Option<FirstComponent>/self.first_comp instead of using if let Some(_) = self.first_comp and matching on that, benchmarking for this PR Components<'_> has been updated as a result. Everything else is unaffected by this change.

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 17, 2026

Here are the benchmark results with black box:

From current Components<'_> implementation:

Std Components          time:   [20.947 µs 21.010 µs 21.084 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Std Components Next     time:   [20.967 µs 20.993 µs 21.021 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Std Components Next Back
                        time:   [35.715 µs 35.802 µs 35.925 µs]
Found 20 outliers among 100 measurements (20.00%)
  6 (6.00%) high mild
  14 (14.00%) high severe

Std Path Iter           time:   [20.883 µs 20.992 µs 21.152 µs]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Std As Path Iter        time:   [80.673 µs 80.935 µs 81.261 µs]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Std Eq Comps            time:   [589.43 ns 593.36 ns 597.88 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

Std Uneq Comps          time:   [63.919 ns 64.262 ns 64.765 ns]
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Std Uneq 2 Comps        time:   [75.284 µs 75.939 µs 76.599 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

From this Components<'_> implementation PR:

Components Rewrite      time:   [24.190 µs 24.425 µs 24.687 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Components Next Rewrite time:   [24.230 µs 24.550 µs 24.889 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Components Next Back Rewrite
                        time:   [17.339 µs 17.488 µs 17.655 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Path Iter Rewrite       time:   [23.845 µs 23.996 µs 24.154 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

As Path Iter Rewrite    time:   [22.431 µs 22.676 µs 23.010 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Eq Comps Rewrite        time:   [586.16 ns 588.10 ns 590.14 ns]

Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Uneq Comps Rewrite      time:   [31.733 ns 32.023 ns 32.378 ns]

Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

Uneq 2 Comps Rewrite    time:   [36.318 µs 36.574 µs 36.913 µs]
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) high severe

Edit: Updated Components::as_path to match on Option<FirstComponent>/self.first_comp instead of using if let Some(_) = self.first_comp and matching on that, benchmarking for this PR Components<'_> has been updated as a result. Everything else is unaffected by this change.

Edit 2: Took off Path ordering benchmark here since it was incorrect see below to see corrected path ordering benchmarks.

@asder8215 asder8215 force-pushed the components_rewrite branch from 92e0132 to 574d7f2 Compare May 17, 2026 18:41
@asder8215 asder8215 marked this pull request as ready for review May 17, 2026 18:59
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 17, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 17, 2026
@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 17, 2026

I'm confident this code works (passed CI in previous run, the current amended commit change I made doesn't change logic, but makes the code written in a more idiomatic way).

In my opinion, the logic in this code should look more readable than how Component<'_> is currently written implemented as. From benchmarking, we can see that Components::next_back, Components::as_path, and in the cases where Components equality falls down to using Components::next_back (when they are unequal or equality can't be determine unless one/both Components<'_> are normalized), this PR implementation of Components<'_> is faster than how it's currently implemented as. The trade off is that this PR implementation of Components<'_> has a slight reduction in performance for Components::next and as a result Components<'_> comparison, but I would take this slight reduction in performance to make path equality faster.

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 19, 2026

@rustbot label +I-libs-nominated

Since Components<'_> is pretty well-used in many other methods, I think this may need discussion from the libs team on whether the re-implementation of Components<'_> here is okay/valid to take over the current implementation (and the trade-off between faster Components::next_back with a slight reduction in performance in Components::next). I wasn't sure if this should be labeled as I-libs-api-nominated or not since it pertains to an existing stable feature than a new feature.

@rustbot rustbot added the I-libs-nominated Nominated for discussion during a libs team meeting. label May 19, 2026
…omponents::normalize_back instead, refactored Components::as_path code
@asder8215 asder8215 force-pushed the components_rewrite branch from 574d7f2 to 1d45aef Compare May 19, 2026 18:51
@clarfonthey
Copy link
Copy Markdown
Contributor

Have been meaning to take a closer look at this implementation; will try to give it a read over the weekend. Would like to see to what extent this changes performance for comparisons.

@asder8215
Copy link
Copy Markdown
Contributor Author

Would like to see to what extent this changes performance for comparisons.

By comparisons, are you referring to Path equality or Path ordering?

I was a bit confused from the issue because you mentioned std::path::compare_components as the reason for your slowdown, but that's used for comparing path ordering while path equality uses the PartialEq trait impl for Components<'_>. The benchmarking I've done indicated that this impl of Components<'_> would improve path equality performance but cause a slight reduction to path ordering comparison.

@clarfonthey
Copy link
Copy Markdown
Contributor

I was thinking about both, but I was particularly using paths in a BTreeMap, which did require ordering.

@asder8215
Copy link
Copy Markdown
Contributor Author

I was thinking about both, but I was particularly using paths in a BTreeMap, which did require ordering.

Got you. Also, I think I could optimize compare_components a bit more if I do the todo thing I commented with checking the characters they differ on (so long as either one of them is not '/' or '.', which require normalization or checking components to produce the accurate Ordering result, I can return either Ordering::Greater or Ordering::Less). This would make comparing components in the same directory level faster.

@clarfonthey
Copy link
Copy Markdown
Contributor

Yeah, I suspect that probably the best solution would be to do something similar to what Python does and offer some sort of PosixPath / WindowsPath types instead of them all being supported under Path, but that seems a little ahead of the game here.

I'll take any wins if the code ends up working better.

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 21, 2026

I realized my benchmarking for path ordering is incorrect; I thought the cmp function would use the PartialOrd impl of Components<'_>, but it uses the Iterator::cmp (I forgot that it uses that; will update that soon to use either > or <). That being said, I think I've got an idea to preserve some of the previous code in Components::compare_components, which should bring the performance to be the same or similar.

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 21, 2026

New benchmarks for path ordering comparisons (BB abbrev for Black Box):

Compare Comps Rewrite   
                        time:   [13.882 µs 13.942 µs 14.037 µs]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

Compare Uneq Comps Rewrite
                        time:   [14.475 µs 14.641 µs 14.831 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

Compare Uneq 2 Comps Rewrite
                        time:   [41.087 ns 41.521 ns 41.973 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Compare Comps Rewrite (No BB)
                        time:   [14.077 µs 14.152 µs 14.238 µs]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

Compare Uneq Comps Rewrite (No BB)
                        time:   [14.023 µs 14.032 µs 14.042 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

Compare Uneq Comps 2 Rewrite (No BB)
                        time:   [39.542 ns 39.735 ns 39.950 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Std Compare Comps       
                        time:   [13.667 µs 13.690 µs 13.716 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Std Compare Uneq Comps  
                        time:   [13.694 µs 13.709 µs 13.726 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

Std Compare Uneq 2 Comps
                        time:   [40.555 ns 40.650 ns 40.758 ns]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

Std Compare Comps (No BB)
                        time:   [13.738 µs 13.779 µs 13.827 µs]
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

Std Compare Uneq Comps (No BB)
                        time:   [14.011 µs 14.134 µs 14.255 µs]
Found 11 outliers among 100 measurements (11.00%)
  10 (10.00%) high mild
  1 (1.00%) high severe

Std Compare Uneq 2 Comps (No BB)
                        time:   [41.148 ns 41.277 ns 41.430 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mildCompare Comps Rewrite   time:   [13.882 µs 13.942 µs 14.037 µs]
  3 (3.00%) high severe

Performance is nearly the same using what the current implementation of Components<'_> did (though tweaking it to use front index from Components<'_>).

Edit: Had to correct the fast path None match condition (should be matching at back field since that encodes the length of the path we've subsliced); I noticed that it couldn't optimize the fast path well if I used left.back/right.back in both None matches, but it was able to optimize it if I use a variable containing left.back/right.back in one of the None matches. Updated benchmarking for compare cases as a result (others are unaffected by this change because they don't rely on comparison operators like <, >).

…e in previous implementation, but making it work with Components<'_> front index
@asder8215 asder8215 force-pushed the components_rewrite branch from cb82f61 to 1a25002 Compare May 23, 2026 15:34
Comment thread library/std/src/path.rs
// causes this function to run slower than using a variable that stores
// the `left.back` and `right.back` information (which `back` field
// encodes the length of the `Components<'_>` unconsumed path)
let left_back = left.back;
Copy link
Copy Markdown
Contributor Author

@asder8215 asder8215 May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarfonthey I had to make an update to the None matching condition code here because the length that needed to be compared between the left and right Components<'_> are actually the back field not path.len() (could possibly make a mistake in the fast path on an existing Components<'_> that used Components::next_back). I updated the benchmarking code to reflect this change as well

However, I noticed a strange thing while benchmarking in that if I do:

None if left.back == right.back => { ... },
None => left.back.min(right.back),

This runs two times slower than me storing left.back and right.back in separate variables and using that in the default None condition. Alternatively, I use left_back and right_back variables in both None matching conditions, it also causes a 2x performance degradation. Does this performance degradation occur on your end if you use left.back and right.back in both None match (or left_back and right_back)? If so, do you happen to know why this occurs?

I have the godbolt link here, but I couldn't figure out what changed.

View changes since the review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this appears way more pronounced if you look at the generated MIR (post-opt):

For the "faster" case:
scope 1 {
    debug left_back => _7;
    let _8: usize;
    scope 2 {
        debug right_back => _8;
        let _9: usize;
        let _24: usize;
        scope 3 {
            debug first_difference => _9;
            scope 5 {
                debug previous_sep => _32;
                let _32: usize;
            }
        }
        scope 4 {
            debug diff => _24;
        }
    }
}
For the "slower" case:
scope 1 {
    debug first_difference => _17;
    scope 3 {
        debug previous_sep => _22;
        let _22: usize;
        scope 35 (inlined #[track_caller] core::slice::index::<impl Index<RangeTo<usize>> for [u8]>::index) {
            debug self => _31;
            debug ((index: RangeTo<usize>).0: usize) => _17;
            scope 36 (inlined #[track_caller] <RangeTo<usize> as SliceIndex<[u8]>>::index) {
                debug ((self: RangeTo<usize>).0: usize) => _17;
                debug slice => _31;
                scope 37 (inlined #[track_caller] <std::ops::Range<usize> as SliceIndex<[u8]>>::index) {
                    debug ((self: std::ops::Range<usize>).0: usize) => const 0_usize;
                    debug ((self: std::ops::Range<usize>).1: usize) => _17;
                    debug slice => _31;
                    debug new_len => _17;
                    let mut _62: bool;
                    let mut _63: usize;
                    let _64: *const [u8];
                    let mut _65: *const [u8];
                    let mut _66: !;
                    let mut _67: usize;
                    scope 38 (inlined core::num::<impl usize>::checked_sub) {
                        debug self => _17;
                        debug rhs => const 0_usize;
                        let mut _68: bool;
                    }
                    scope 39 (inlined core::slice::index::get_offset_len_noubcheck::<u8>) {
                        debug ptr => _31;
                        debug offset => const 0_usize;
                        debug len => _17;
                        let mut _69: *const u8;
                        scope 40 {
                            scope 41 {
                            }
                        }
                    }
                }
            }
        }
        scope 42 (inlined core::slice::<impl [u8]>::iter) {
            debug self => _64;
            scope 43 (inlined std::slice::Iter::<'_, u8>::new) {
                debug slice => _64;
                let mut _71: std::ptr::NonNull<[u8]>;
                let mut _73: *mut u8;
                let mut _74: *mut u8;
                scope 44 {
                    debug len => _17;
                    let _70: std::ptr::NonNull<u8>;
                    scope 45 {
                        debug ptr => _70;
                        let _72: *const u8;
                        scope 46 {
                            debug end_or_len => _72;
                        }
                        scope 50 (inlined std::ptr::without_provenance::<u8>) {
                            debug addr => _17;
                            scope 51 (inlined without_provenance_mut::<u8>) {
                            }
                        }
                        scope 52 (inlined NonNull::<u8>::as_ptr) {
                            debug self => _70;
                        }
                        scope 53 (inlined #[track_caller] std::ptr::mut_ptr::<impl *mut u8>::add) {
                            debug self => _74;
                            debug count => _17;
                        }
                    }
                    scope 47 (inlined NonNull::<[u8]>::from_ref) {
                        debug r => _64;
                        let mut _75: *const [u8];
                    }
                    scope 48 (inlined NonNull::<[u8]>::cast::<u8>) {
                        debug self => _71;
                        let mut _76: *mut u8;
                        let mut _77: *mut [u8];
                        scope 49 (inlined NonNull::<[u8]>::as_ptr) {
                        }
                    }
                }
            }
        }
    }
}

I have a feeling that this might have something to do with how match guards are generated, although this is genuinely very weird. Will bring up to some compiler folks on Zulip and see if they have any insights.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the Zulip thread in case you want to participate: #t-compiler/performance > Bindings change dramatically affecting generated MIR

For now, I would say to obviously go with whichever one gets better optimised, but it would be really interesting to figure out why this is being compiled so differently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slower case looks so horrendous; it really is strange to see how reusing the Components<'_> iterator field leads to this mess of generated code when it's just due to how you extract the back indices of each Components<'_> (via variables vs from the struct directly). I'm curious what goes on in match guard code generation.

I'll definitely keep my eyes on the Zulip thread, appreciate you linking it here!

@clarfonthey
Copy link
Copy Markdown
Contributor

Might as well:

r? @clarfonthey

For now since I have more or less agreed to review this. Will hand over to someone else if there are any additional things that need to be resolved that I can't/shouldn't handle.

Comment thread library/std/src/path.rs
fn prefix_verbatim(&self) -> bool {
if !HAS_PREFIXES {
return false;
fn consume_first_component(&mut self, dir_front: bool) -> Option<Component<'a>> {
Copy link
Copy Markdown
Contributor

@clarfonthey clarfonthey May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, specifically since the first component appears to already be eagerly evaluated, I do wonder if this method is really necessary or if we should simply make the first component store Option<Component<'a>> directly. It does feel like a bit of extra work that could be cut out for simplicity, but I might be misreading.

View changes since the review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my general opinion on eager evaluation for this kind of iterator is that if eager evaluation dramatically simplifies a majority of the cases that use this code, we can afford a little bit of eager evaluation as a treat, even if there are a few cases where something might be done that is ultimately discarded later.

Copy link
Copy Markdown
Contributor Author

@asder8215 asder8215 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also evaluate this function inside the Components::next and Components::next_back cases (it'll also get rid of the dir_front argument). I think I just did it earlier to write it somewhat of shared code in a neater way (although, they are not exactly shared since certain match conditions have different effects whether dir_front is true of false).

I'll change this and put the the code directly inside Components::next and Components::next_back and benchmark again to see if that affects anything.

Copy link
Copy Markdown
Contributor Author

@asder8215 asder8215 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With storing an Option<Component<'a>>, my concern was on increasing the memory size of Components<'_> and whether that would be worth it. I'm pretty sure one of the Component enum member takes in a Prefix, which that enum takes up 40 bytes. I also wanted to reduce the size of Components<'_> with this front and back index approach since I know cloning occurs in Components<'_> comparison (for equality or ordering).

I think that's one of the benefits I was trying to make with this front and back index approach. That this approach compresses the size penalty we take with storing Prefix enum into Components<'_> (it was like why use a Prefix enum that takes up 40 bytes, when we could use a usize that serves as an index marker on where our Prefix length ends?).

The other thing is that while FirstComponent::Absolute and FirstComponent::Prefix is evaluated already via has_root or parse_prefix, the relative path first component is not eagerly evaluated. I wasn't too worried about FirstComponent::Absolute, though it does suck to see FirstComponent::Prefix get evaluated again (especially if you have PrefixVerbatim and it's a pretty big component).

I'm okay with storing a Option<Component<'a>> here, but do we find the increase on Components<'_> iterator size acceptable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually hadn't fully taken in how big the Prefix enum is, but… yeah.

That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a Prefix. This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of Chain<MaybePrefix, Back>, where even if MaybePrefix does some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either (doing whatever logic is required to output the prefix) or (doing whatever logic is required for everything else). Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the Option being stored just to simplify the logic.

Also, as far as size goes… unless it actively messes with codegen, I would say we should basically be assuming in all cases that people will be storing a Path, not a Components iterator, at least modulo any iterator adapters. If we still end up in the case where the iteration can't be inlined, it makes sense to optimise for this size, but hopefully these changes fix that.

Copy link
Copy Markdown
Contributor Author

@asder8215 asder8215 May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a Prefix.

I agree with you there.

This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of Chain<MaybePrefix, Back>, where even if MaybePrefix does some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either.

I feel like I might need a bit of clarity on what you mean here.

However, I was thinking about something a bit simpler. I was thinking back to your point on storing an Option<Component<'a> into Components<'_>, and instead of an Option<Component<'a>, I was thinking we can add the PrefixComponent<'a> from Component<'a> as field of FirstComponent::Prefix. What's eagerly evaluated when creating a Components<'_> iterator is Prefix and whether we have an absolute path or not (which the latter should be trivial to compute, and in both cases, this would be trivial to compute on unix platforms since Prefix doesn't exist). The first component of a relative path is not eagerly evaluated and I don't think we need to do that if it's not necessary.

The benefit of just adding PrefixComponent<'a> into FirstComponent::Prefix instead of using Option<Component<'a>> is just that the size of the FirstComponent enum will be smaller as it doesn't add the Normal(&'a OsStr) enum member (and, a lesser issue, all the other enum members) that Component<'a> has (note: size argument isn't true, it's just less convoluted). This would mitigate the issue of re-parsing the Prefix occurring in this function (and elsewhere) as we can just take and move it out from Option<FirstComponent::Prefix> into Component::Prefix.

Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the Option being stored just to simplify the logic.

I think the front index is fine doing double-duty. By my previous suggestion, after parsing the Prefix and storing it inside FirstComponent::Prefix, we can still have the front index start at the length of the Prefix for the next component(s) it needs to parse.

@asder8215
Copy link
Copy Markdown
Contributor Author

@clarfonthey I tried out incorporating the FirstComponent::Prefix(PrefixComponent) in the benchmark code first. Every measurement seems to run fine, but then Path ordering measurements runs into the 2x performance degradation. Could you verify if that's what you see on your end?

I didn't see much of difference between the MIR code without storing PrefixComponent and with storing a PrefixComponent. I think the only difference I see are these lines:

With PrefixComponent:

let mut _33: std::option::Option<FirstComponent<'_>>;
...
let mut _36: std::option::Option<FirstComponent<'_>>;

Without PrefixComponent:

let mut _33: std::option::Option<FirstComponent>;
...
let mut _36: std::option::Option<FirstComponent>;

Does this have to do with how the size of the Components<'_> struct is 96 bytes with storing a PrefixComponent (from originally 40 bytes)?

Godbolt links attached:

Also note: I'm running on Fedora Linux. I have not benchmarked the code on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I-libs-nominated Nominated for discussion during a libs team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants