[Spark] Fix timezone conversion for non-UTC timestamp partition values#5599
Merged
OussamaSaoudi merged 3 commits intoDec 2, 2025
Merged
Conversation
33c519e to
eef9f69
Compare
de60fcf to
6bd7470
Compare
6bd7470 to
6ed9026
Compare
lzlfred
approved these changes
Dec 2, 2025
OussamaSaoudi
approved these changes
Dec 2, 2025
harperjiang
pushed a commit
to harperjiang/delta
that referenced
this pull request
Dec 8, 2025
delta-io#5599) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change fixes the incorrect conversion of timestamp partition values which are produced in Spark environments with non-UTC session timezones. Currently, we do not correctly determine the time zone from the spark session config, and instead just use the JVM timezone. The PR updates to actually use the spark session config timezone. This was missed in the tests because the tests also use a utility `withTimeZone` which actually sets the JVM timezone, so various cases were passing but these needed to be updated to set the spark SQL session timezone (as a user would do) to hit the issue. This change also updates handling various precision ranges from 1 to 6 rather than failing if it's not 1 or 6. In the end, the timestamp partition is formatted in microsecond level precision as per the protocol but this makes this code robust to a wider variety of timestamp precision cases we should be able to handle rather than failing. ## How was this patch tested? Updated the existing unit tests to properly set the spark SQL session config timezone. Added some new tests as well for multiple levels of precision and some "short string" based timezones. ## Does this PR introduce _any_ user-facing changes? Fix for incorrect timestamp partition values that user would see before <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
huashi-st
pushed a commit
to huashi-st/delta
that referenced
this pull request
Apr 24, 2026
delta-io#5599) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This change fixes the incorrect conversion of timestamp partition values which are produced in Spark environments with non-UTC session timezones. Currently, we do not correctly determine the time zone from the spark session config, and instead just use the JVM timezone. The PR updates to actually use the spark session config timezone. This was missed in the tests because the tests also use a utility `withTimeZone` which actually sets the JVM timezone, so various cases were passing but these needed to be updated to set the spark SQL session timezone (as a user would do) to hit the issue. This change also updates handling various precision ranges from 1 to 6 rather than failing if it's not 1 or 6. In the end, the timestamp partition is formatted in microsecond level precision as per the protocol but this makes this code robust to a wider variety of timestamp precision cases we should be able to handle rather than failing. ## How was this patch tested? Updated the existing unit tests to properly set the spark SQL session config timezone. Added some new tests as well for multiple levels of precision and some "short string" based timezones. ## Does this PR introduce _any_ user-facing changes? Fix for incorrect timestamp partition values that user would see before <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Which Delta project/connector is this regarding?
Description
This change fixes the incorrect conversion of timestamp partition values which are produced in Spark environments with non-UTC session timezones. Currently, we do not correctly determine the time zone from the spark session config, and instead just use the JVM timezone. The PR updates to actually use the spark session config timezone.
This was missed in the tests because the tests also use a utility
withTimeZonewhich actually sets the JVM timezone, so various cases were passing but these needed to be updated to set the spark SQL session timezone (as a user would do) to hit the issue.This change also updates handling various precision ranges from 1 to 6 rather than failing if it's not 1 or 6. In the end, the timestamp partition is formatted in microsecond level precision as per the protocol but this makes this code robust to a wider variety of timestamp precision cases we should be able to handle rather than failing.
How was this patch tested?
Updated the existing unit tests to properly set the spark SQL session config timezone. Added some new tests as well for multiple levels of precision and some "short string" based timezones.
Does this PR introduce any user-facing changes?
Fix for incorrect timestamp partition values that user would see before