Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496asder8215 wants to merge 6 commits into
Components iterator to use front and back indexing instead mutating and subslicing path field#156496Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
1627e2f to
33e69e1
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
33e69e1 to
ed9d33d
Compare
This comment has been minimized.
This comment has been minimized.
ed9d33d to
0b0f84c
Compare
This comment has been minimized.
This comment has been minimized.
… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
0b0f84c to
8ed33ea
Compare
This comment was marked as outdated.
This comment was marked as outdated.
2151b8f to
83cdbed
Compare
This comment was marked as outdated.
This comment was marked as outdated.
…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
83cdbed to
3921fff
Compare
…ng iter().position()/rposition()
…here to use iter().position()/.iter().rposition(), refactored code in compare_components, and removed stale comments
0a25dda to
92e0132
Compare
|
New benchmarking results. You can see what the benchmark code looks like here and run it yourself to see if there are any difference in measurements on your end: This is the measurement of the current implementation of This is the measurement of the new implementation of Edit: Updated |
|
Here are the benchmark results with black box: From current From this Edit: Updated Edit 2: Took off Path ordering benchmark here since it was incorrect see below to see corrected path ordering benchmarks. |
92e0132 to
574d7f2
Compare
|
I'm confident this code works (passed CI in previous run, the current amended commit change I made doesn't change logic, but makes the code written in a more idiomatic way). In my opinion, the logic in this code should look more readable than how |
|
@rustbot label +I-libs-nominated Since |
…omponents::normalize_back instead, refactored Components::as_path code
574d7f2 to
1d45aef
Compare
|
Have been meaning to take a closer look at this implementation; will try to give it a read over the weekend. Would like to see to what extent this changes performance for comparisons. |
By comparisons, are you referring to Path equality or Path ordering? I was a bit confused from the issue because you mentioned |
|
I was thinking about both, but I was particularly using paths in a |
Got you. Also, I think I could optimize |
|
Yeah, I suspect that probably the best solution would be to do something similar to what Python does and offer some sort of I'll take any wins if the code ends up working better. |
|
I realized my benchmarking for path ordering is incorrect; I thought the |
|
New benchmarks for path ordering comparisons (BB abbrev for Black Box): Performance is nearly the same using what the current implementation of Edit: Had to correct the fast path |
f119f48 to
cb82f61
Compare
…e in previous implementation, but making it work with Components<'_> front index
cb82f61 to
1a25002
Compare
| // causes this function to run slower than using a variable that stores | ||
| // the `left.back` and `right.back` information (which `back` field | ||
| // encodes the length of the `Components<'_>` unconsumed path) | ||
| let left_back = left.back; |
There was a problem hiding this comment.
@clarfonthey I had to make an update to the None matching condition code here because the length that needed to be compared between the left and right Components<'_> are actually the back field not path.len() (could possibly make a mistake in the fast path on an existing Components<'_> that used Components::next_back). I updated the benchmarking code to reflect this change as well
However, I noticed a strange thing while benchmarking in that if I do:
None if left.back == right.back => { ... },
None => left.back.min(right.back),This runs two times slower than me storing left.back and right.back in separate variables and using that in the default None condition. Alternatively, I use left_back and right_back variables in both None matching conditions, it also causes a 2x performance degradation. Does this performance degradation occur on your end if you use left.back and right.back in both None match (or left_back and right_back)? If so, do you happen to know why this occurs?
I have the godbolt link here, but I couldn't figure out what changed.
There was a problem hiding this comment.
So this appears way more pronounced if you look at the generated MIR (post-opt):
For the "faster" case:
scope 1 {
debug left_back => _7;
let _8: usize;
scope 2 {
debug right_back => _8;
let _9: usize;
let _24: usize;
scope 3 {
debug first_difference => _9;
scope 5 {
debug previous_sep => _32;
let _32: usize;
}
}
scope 4 {
debug diff => _24;
}
}
}For the "slower" case:
scope 1 {
debug first_difference => _17;
scope 3 {
debug previous_sep => _22;
let _22: usize;
scope 35 (inlined #[track_caller] core::slice::index::<impl Index<RangeTo<usize>> for [u8]>::index) {
debug self => _31;
debug ((index: RangeTo<usize>).0: usize) => _17;
scope 36 (inlined #[track_caller] <RangeTo<usize> as SliceIndex<[u8]>>::index) {
debug ((self: RangeTo<usize>).0: usize) => _17;
debug slice => _31;
scope 37 (inlined #[track_caller] <std::ops::Range<usize> as SliceIndex<[u8]>>::index) {
debug ((self: std::ops::Range<usize>).0: usize) => const 0_usize;
debug ((self: std::ops::Range<usize>).1: usize) => _17;
debug slice => _31;
debug new_len => _17;
let mut _62: bool;
let mut _63: usize;
let _64: *const [u8];
let mut _65: *const [u8];
let mut _66: !;
let mut _67: usize;
scope 38 (inlined core::num::<impl usize>::checked_sub) {
debug self => _17;
debug rhs => const 0_usize;
let mut _68: bool;
}
scope 39 (inlined core::slice::index::get_offset_len_noubcheck::<u8>) {
debug ptr => _31;
debug offset => const 0_usize;
debug len => _17;
let mut _69: *const u8;
scope 40 {
scope 41 {
}
}
}
}
}
}
scope 42 (inlined core::slice::<impl [u8]>::iter) {
debug self => _64;
scope 43 (inlined std::slice::Iter::<'_, u8>::new) {
debug slice => _64;
let mut _71: std::ptr::NonNull<[u8]>;
let mut _73: *mut u8;
let mut _74: *mut u8;
scope 44 {
debug len => _17;
let _70: std::ptr::NonNull<u8>;
scope 45 {
debug ptr => _70;
let _72: *const u8;
scope 46 {
debug end_or_len => _72;
}
scope 50 (inlined std::ptr::without_provenance::<u8>) {
debug addr => _17;
scope 51 (inlined without_provenance_mut::<u8>) {
}
}
scope 52 (inlined NonNull::<u8>::as_ptr) {
debug self => _70;
}
scope 53 (inlined #[track_caller] std::ptr::mut_ptr::<impl *mut u8>::add) {
debug self => _74;
debug count => _17;
}
}
scope 47 (inlined NonNull::<[u8]>::from_ref) {
debug r => _64;
let mut _75: *const [u8];
}
scope 48 (inlined NonNull::<[u8]>::cast::<u8>) {
debug self => _71;
let mut _76: *mut u8;
let mut _77: *mut [u8];
scope 49 (inlined NonNull::<[u8]>::as_ptr) {
}
}
}
}
}
}
}I have a feeling that this might have something to do with how match guards are generated, although this is genuinely very weird. Will bring up to some compiler folks on Zulip and see if they have any insights.
There was a problem hiding this comment.
Also, the Zulip thread in case you want to participate: #t-compiler/performance > Bindings change dramatically affecting generated MIR
For now, I would say to obviously go with whichever one gets better optimised, but it would be really interesting to figure out why this is being compiled so differently.
There was a problem hiding this comment.
The slower case looks so horrendous; it really is strange to see how reusing the Components<'_> iterator field leads to this mess of generated code when it's just due to how you extract the back indices of each Components<'_> (via variables vs from the struct directly). I'm curious what goes on in match guard code generation.
I'll definitely keep my eyes on the Zulip thread, appreciate you linking it here!
|
Might as well: r? @clarfonthey For now since I have more or less agreed to review this. Will hand over to someone else if there are any additional things that need to be resolved that I can't/shouldn't handle. |
| fn prefix_verbatim(&self) -> bool { | ||
| if !HAS_PREFIXES { | ||
| return false; | ||
| fn consume_first_component(&mut self, dir_front: bool) -> Option<Component<'a>> { |
There was a problem hiding this comment.
So, specifically since the first component appears to already be eagerly evaluated, I do wonder if this method is really necessary or if we should simply make the first component store Option<Component<'a>> directly. It does feel like a bit of extra work that could be cut out for simplicity, but I might be misreading.
There was a problem hiding this comment.
Note that my general opinion on eager evaluation for this kind of iterator is that if eager evaluation dramatically simplifies a majority of the cases that use this code, we can afford a little bit of eager evaluation as a treat, even if there are a few cases where something might be done that is ultimately discarded later.
There was a problem hiding this comment.
We could also evaluate this function inside the Components::next and Components::next_back cases (it'll also get rid of the dir_front argument). I think I just did it earlier to write it somewhat of shared code in a neater way (although, they are not exactly shared since certain match conditions have different effects whether dir_front is true of false).
I'll change this and put the the code directly inside Components::next and Components::next_back and benchmark again to see if that affects anything.
There was a problem hiding this comment.
With storing an Option<Component<'a>>, my concern was on increasing the memory size of Components<'_> and whether that would be worth it. I'm pretty sure one of the Component enum member takes in a Prefix, which that enum takes up 40 bytes. I also wanted to reduce the size of Components<'_> with this front and back index approach since I know cloning occurs in Components<'_> comparison (for equality or ordering).
I think that's one of the benefits I was trying to make with this front and back index approach. That this approach compresses the size penalty we take with storing Prefix enum into Components<'_> (it was like why use a Prefix enum that takes up 40 bytes, when we could use a usize that serves as an index marker on where our Prefix length ends?).
The other thing is that while FirstComponent::Absolute and FirstComponent::Prefix is evaluated already via has_root or parse_prefix, the relative path first component is not eagerly evaluated. I wasn't too worried about FirstComponent::Absolute, though it does suck to see FirstComponent::Prefix get evaluated again (especially if you have PrefixVerbatim and it's a pretty big component).
I'm okay with storing a Option<Component<'a>> here, but do we find the increase on Components<'_> iterator size acceptable?
There was a problem hiding this comment.
I actually hadn't fully taken in how big the Prefix enum is, but… yeah.
That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a Prefix. This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of Chain<MaybePrefix, Back>, where even if MaybePrefix does some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either (doing whatever logic is required to output the prefix) or (doing whatever logic is required for everything else). Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the Option being stored just to simplify the logic.
Also, as far as size goes… unless it actively messes with codegen, I would say we should basically be assuming in all cases that people will be storing a Path, not a Components iterator, at least modulo any iterator adapters. If we still end up in the case where the iteration can't be inlined, it makes sense to optimise for this size, but hopefully these changes fix that.
There was a problem hiding this comment.
That said, given the complexity of the operation, I still think that it might be better to still do everything ahead of time, minus maybe parsing a
Prefix.
I agree with you there.
This basically means that I think it might be better to effectively convert the iterator into a monomorphised version of
Chain<MaybePrefix, Back>, where even ifMaybePrefixdoes some extra parsing at runtime to fully expand the prefix, the iterator starts out in a form where you have definitively separated out the prefix and don't need any special casing besides either.
I feel like I might need a bit of clarity on what you mean here.
However, I was thinking about something a bit simpler. I was thinking back to your point on storing an Option<Component<'a> into Components<'_>, and instead of an Option<Component<'a>, I was thinking we can add the PrefixComponent<'a> from Component<'a> as field of FirstComponent::Prefix. What's eagerly evaluated when creating a Components<'_> iterator is Prefix and whether we have an absolute path or not (which the latter should be trivial to compute, and in both cases, this would be trivial to compute on unix platforms since Prefix doesn't exist). The first component of a relative path is not eagerly evaluated and I don't think we need to do that if it's not necessary.
The benefit of just adding PrefixComponent<'a> into FirstComponent::Prefix instead of using Option<Component<'a>> is just that the size of the FirstComponent enum will be smaller as it doesn't add the Normal(&'a OsStr) enum member (and, a lesser issue, all the other enum members) that Component<'a> has (note: size argument isn't true, it's just less convoluted). This would mitigate the issue of re-parsing the Prefix occurring in this function (and elsewhere) as we can just take and move it out from Option<FirstComponent::Prefix> into Component::Prefix.
Right now, because the front index is doing double-duty for both keeping track of the location of the prefix, and keeping track of the iteration position in the rest of the path, it might be better to instead duplicate the extra index if needed in the
Optionbeing stored just to simplify the logic.
I think the front index is fine doing double-duty. By my previous suggestion, after parsing the Prefix and storing it inside FirstComponent::Prefix, we can still have the front index start at the length of the Prefix for the next component(s) it needs to parse.
|
@clarfonthey I tried out incorporating the I didn't see much of difference between the MIR code without storing With Without Does this have to do with how the size of the Godbolt links attached: Also note: I'm running on Fedora Linux. I have not benchmarked the code on Windows. |

View all comments
This PR entirely changes how
Components<'_>is implemented. Currently, theComponents<'_>iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned inComponents::nextorComponents::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.This PR benefits implementations like
Components::as_path, which is pretty used is multiple areas of the standard library. Previously,Components<'_>iterator was required to clone inside the function to present the unconsumed path because our originalComponent<'_>consuming behavior on path will not allow the returned&'a PathfromComponents::as_pathto last after aComponents::nextorComponents::next_backcall. Due to the current implementation ofComponentsiterator has a size of 64 bytes, if you're usingComponents::as_pathafter eachComponents::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").On the point of size, with the indexing strategy, this PR has further optimized the size of
Components<'_>from 64 bytes -> 40 bytes since a large chunk of theComponents<'_>was taken up by theOption<Prefix>(this takes up 40 bytes). Instead of holding a prefix field inComponents<'_>, we can encode the length of thePrefixwithin ourfrontfield index and use another enum calledFirstComponentto check whether our first component of the given path isPrefix(or something else). If it's aPrefix, we can useparse_prefixon the subsliceself.path[..self.front]since we know our front index encodes thePrefixlength.Due to not having the prefix
Option<Prefix>field insideComponents<'_>anymore, all the prefix functions inComponents<'_>have been removed in favor of callingparse_prefix,Prefix::is_verbatim,Prefix::is_drive, etc.I'm curious if this redesign of
Components<'_>improves Path equality as pointed out by @clarfonthey in #154521 with Path equality (not to be confused with Path ordering as mentioned in the issue, since that usesComponents:::compare_componentsand the example code shows equality) being slow.I haven't benchmarked this though.I have benchmarked the result and I can say that currently this implementation improves Path equality due toComponents::next_backrunning faster with this implementation than the current mutating path with a subslice implementation. However, Path ordering runs slightly slower. You can check the benchmark code I used here, and play around with the number of bytes in a component, the number of components, etc..Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with
Components<'_>dealing something with threads usingPath::components, but now that's all resolved).I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.
Update: I got to testing some things out with Prefixes on my Windows VM manually, so the prefix component index encoded into the
Components<'_>front field seems to work out nicely. I've also accounted for root directory being able to exists after a Prefix component like "\?\checkout\src\tools" having the following components:PrefixVerbatim->RootDir->Normal->Normal->None(learnt this from the fail that occurred in miri tests, which is nice to see thisComponents<'_>implementation works on the Windows tests in CI).