close
Skip to content

Allow writes to CDF tables for add-only, remove-only, and non-data-ch…#5567

Merged
scottsand-db merged 48 commits into
delta-io:masterfrom
nubank:relax_cdf_write_condition
Mar 24, 2026
Merged

Allow writes to CDF tables for add-only, remove-only, and non-data-ch…#5567
scottsand-db merged 48 commits into
delta-io:masterfrom
nubank:relax_cdf_write_condition

Conversation

@giovanni-sorice
Copy link
Copy Markdown
Contributor

@giovanni-sorice giovanni-sorice commented Nov 24, 2025

Co-authored-by: Coauthor One andreAmorimF
Co-authored-by: Coauthor Two aryadneguardieiro
Co-authored-by: Coauthor Three yurimantchev-sudo

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Allow writes to CDF tables for add-only, remove-only, and non-data-change transactions.
More context: https://delta-users.slack.com/archives/C04TRPG3LHZ/p1763402127485709?thread_ts=1763400653.848629&cid=C04TRPG3LHZ

How was this patch tested?

Does this PR introduce any user-facing changes?

…ange transactions

Co-authored-by: Coauthor One andreAmorimF
Co-authored-by: Coauthor Two aryadneguardieiro
Co-authored-by: Coauthor Three yurimantchev-sudo
@giovanni-sorice giovanni-sorice marked this pull request as draft November 25, 2025 16:26
assert(TableConfig.CHANGE_DATA_FEED_ENABLED.fromMetadata(snapshot.getMetadata))
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get test cases with removes?

  • Remove only succeeds
  • Add + remove with data change = false succeeds
  • Add + remove fails

// Track CDF validation state: whether we've seen adds and removes
// with enableChangeDataFeed=true
// This is wrapped in an array to allow modification from within the lambda
final boolean[] hasAddWithDataChange = {false};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think an AtomicBoolean would be a bit cleaner?

// Track removes with enableChangeDataFeed=true for table creation validation
if (isCdfEnabled && removeFile.getDataChange()) {
hasRemoveWithDataChange[0] = true;
// Check if we've already seen adds with enableChangeDataFeed=true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of with enableChangeDataFeed=true you mean with dataChange = true?

if (isAppendOnlyTable && removeFile.getDataChange()) {
throw DeltaErrors.cannotModifyAppendOnlyTable(dataPath.toString());
}
// Track removes with enableChangeDataFeed=true for table creation validation
Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand: What here is specific to table creation?

Next, what logic here is doing an check for is-this-table-creation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we based ourselves in this kernel-rs modification and there it doesn't seem we have a check specific to table creation 🤔 . Did we miss something maybe?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not clear what table creation has to do with here? Your comment says // Track removes with dataChange=true for table creation validation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the comment is misleading. Removing references to table creation.

@andreAmorimF andreAmorimF force-pushed the relax_cdf_write_condition branch from 17e41f4 to 12da593 Compare December 2, 2025 12:23
// Kernel now supports writes to CDF-enabled tables with restrictions.
// Validation is performed at commit time to check for unsupported mixed operations
// (add+remove with dataChange=true) which would require CDC file generation.
return !TableConfig.ICEBERG_WRITER_COMPAT_V1_ENABLED.fromMetadata(metadata)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel iceberg doesn't supprot CDF

boolean isTableFeatureOverrideKey =
key.startsWith(TableFeatures.SET_TABLE_FEATURE_SUPPORTED_PREFIX);
boolean isTableConfigKey = key.startsWith("delta.");
boolean isTableConstraintKey = key.startsWith("delta.constraints.");
Copy link
Copy Markdown
Contributor

@andreAmorimF andreAmorimF Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta.constraints.* should be a known feature. This was needed since all tests that was previously checking for changeDataFeed as unsupported feature, were changed to check against constraints.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about this deeply yet but I don't think this is the right way to handle this. I think ideally we don't need to add this code at all, but if we do it should definitely be in TableConfig not adhoc here. I can look at this closer when I get back from PTO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a larger API change and we should think about this carefully. This may be the correct decision but we should figure out why. Because the tests are failing isn't a good enough reason to make this change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @scottsand-db this is along the lines of allowing delta properties discussion we were having

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @allisonport-db, please revert this. This PR is meant for CDF, not constraints.

@OussamaSaoudi
Copy link
Copy Markdown
Collaborator

OussamaSaoudi commented Jan 30, 2026

Hey @andreAmorimF, we're getting close! It seems like the failing test is coming from

  testIncompatibleUnsupportedTableFeature(
    "changeDataFeed",
    tablePropertiesToEnable = Map(TableConfig.CHANGE_DATA_FEED_ENABLED.getKey -> "true"))

I think this should become

  testIncompatibleUnsupportedTableFeature(
    "changeDataFeed",
    tablePropertiesToEnable = Map(TableConfig.CHANGE_DATA_FEED_ENABLED.getKey -> "true"))
    expectedErrorMessage = "Table features [changeDataFeed] are incompatible with icebergWriterCompatV1"))

this is in the IcebergWriterCompatV1Suite.scala ^

@OussamaSaoudi OussamaSaoudi marked this pull request as ready for review February 4, 2026 00:39
andreAmorimF and others added 4 commits February 4, 2026 15:10
…/DeltaTableWritesSuite.scala

Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com>
if (isAppendOnlyTable && removeFile.getDataChange()) {
throw DeltaErrors.cannotModifyAppendOnlyTable(dataPath.toString());
}
// Track removes with dataChange=true for table creation validation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand how this if statement applies to table creation? sorry, I don't see the connection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I understood, these checks run in all commit actions (including table creation). The issue here is that the comment is too specific to table creation, do I understand well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the problem and you are okay with it, I can remove these confusing comments since we already have some explanation of the tracking mechanism on these lines

}
if (!action.isNullAt(ADD_FILE_ORDINAL)) {
AddFile addFile = new AddFile(action.getStruct(ADD_FILE_ORDINAL));
// Track adds with dataChange=true for table creation validation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here -- what does this have to do with table creation?

For example, I don't see a check elsewhere in the code of: if hasAddWithDataChange and isCreateTable then do X

Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a minor comment on your comment that I don't quite follow. Once you respond/fix, will merge! Thanks!

Copy link
Copy Markdown
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@scottsand-db
Copy link
Copy Markdown
Collaborator

Can you please rebase against master?

@scottsand-db scottsand-db merged commit cca7c70 into delta-io:master Mar 24, 2026
33 checks passed
huashi-st pushed a commit to huashi-st/delta that referenced this pull request Apr 24, 2026
delta-io#5567)

Co-authored-by: Coauthor One andreAmorimF
Co-authored-by: Coauthor Two aryadneguardieiro
Co-authored-by: Coauthor Three yurimantchev-sudo

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description
Allow writes to CDF tables for add-only, remove-only, and
non-data-change transactions.
More context:
https://delta-users.slack.com/archives/C04TRPG3LHZ/p1763402127485709?thread_ts=1763400653.848629&cid=C04TRPG3LHZ
<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->

---------

Co-authored-by: André Fonseca <1077309+andreAmorimF@users.noreply.github.com>
Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants