close
Skip to content

[Spark] Fix timezone conversion for non-UTC timestamp partition values#5599

Merged
OussamaSaoudi merged 3 commits into
delta-io:masterfrom
amogh-jahagirdar:fix-incorrect-timezone-conversion
Dec 2, 2025
Merged

[Spark] Fix timezone conversion for non-UTC timestamp partition values#5599
OussamaSaoudi merged 3 commits into
delta-io:masterfrom
amogh-jahagirdar:fix-incorrect-timezone-conversion

Conversation

@amogh-jahagirdar
Copy link
Copy Markdown
Collaborator

@amogh-jahagirdar amogh-jahagirdar commented Nov 28, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This change fixes the incorrect conversion of timestamp partition values which are produced in Spark environments with non-UTC session timezones. Currently, we do not correctly determine the time zone from the spark session config, and instead just use the JVM timezone. The PR updates to actually use the spark session config timezone.
This was missed in the tests because the tests also use a utility withTimeZone which actually sets the JVM timezone, so various cases were passing but these needed to be updated to set the spark SQL session timezone (as a user would do) to hit the issue.

This change also updates handling various precision ranges from 1 to 6 rather than failing if it's not 1 or 6. In the end, the timestamp partition is formatted in microsecond level precision as per the protocol but this makes this code robust to a wider variety of timestamp precision cases we should be able to handle rather than failing.

How was this patch tested?

Updated the existing unit tests to properly set the spark SQL session config timezone. Added some new tests as well for multiple levels of precision and some "short string" based timezones.

Does this PR introduce any user-facing changes?

Fix for incorrect timestamp partition values that user would see before

@amogh-jahagirdar amogh-jahagirdar changed the title Delta: Fix timezone conversion for non-UTC timestamp partition values Spark: Fix timezone conversion for non-UTC timestamp partition values Nov 28, 2025
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-incorrect-timezone-conversion branch from 33c519e to eef9f69 Compare November 28, 2025 19:09
@amogh-jahagirdar amogh-jahagirdar changed the title Spark: Fix timezone conversion for non-UTC timestamp partition values [Spark] Fix timezone conversion for non-UTC timestamp partition values Dec 1, 2025
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-incorrect-timezone-conversion branch 3 times, most recently from de60fcf to 6bd7470 Compare December 1, 2025 20:02
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-incorrect-timezone-conversion branch from 6bd7470 to 6ed9026 Compare December 1, 2025 20:06
@OussamaSaoudi OussamaSaoudi merged commit 7b69041 into delta-io:master Dec 2, 2025
20 checks passed
harperjiang pushed a commit to harperjiang/delta that referenced this pull request Dec 8, 2025
delta-io#5599)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

This change fixes the incorrect conversion of timestamp partition values
which are produced in Spark environments with non-UTC session timezones.
Currently, we do not correctly determine the time zone from the spark
session config, and instead just use the JVM timezone. The PR updates to
actually use the spark session config timezone.
This was missed in the tests because the tests also use a utility
`withTimeZone` which actually sets the JVM timezone, so various cases
were passing but these needed to be updated to set the spark SQL session
timezone (as a user would do) to hit the issue.

This change also updates handling various precision ranges from 1 to 6
rather than failing if it's not 1 or 6. In the end, the timestamp
partition is formatted in microsecond level precision as per the
protocol but this makes this code robust to a wider variety of timestamp
precision cases we should be able to handle rather than failing.

## How was this patch tested?

Updated the existing unit tests to properly set the spark SQL session
config timezone. Added some new tests as well for multiple levels of
precision and some "short string" based timezones.

## Does this PR introduce _any_ user-facing changes?

Fix for incorrect timestamp partition values that user would see before

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
huashi-st pushed a commit to huashi-st/delta that referenced this pull request Apr 24, 2026
delta-io#5599)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

This change fixes the incorrect conversion of timestamp partition values
which are produced in Spark environments with non-UTC session timezones.
Currently, we do not correctly determine the time zone from the spark
session config, and instead just use the JVM timezone. The PR updates to
actually use the spark session config timezone.
This was missed in the tests because the tests also use a utility
`withTimeZone` which actually sets the JVM timezone, so various cases
were passing but these needed to be updated to set the spark SQL session
timezone (as a user would do) to hit the issue.

This change also updates handling various precision ranges from 1 to 6
rather than failing if it's not 1 or 6. In the end, the timestamp
partition is formatted in microsecond level precision as per the
protocol but this makes this code robust to a wider variety of timestamp
precision cases we should be able to handle rather than failing.

## How was this patch tested?

Updated the existing unit tests to properly set the spark SQL session
config timezone. Added some new tests as well for multiple levels of
precision and some "short string" based timezones.

## Does this PR introduce _any_ user-facing changes?

Fix for incorrect timestamp partition values that user would see before

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants