Allow writes to CDF tables for add-only, remove-only, and non-data-ch…#5567
Conversation
…ange transactions Co-authored-by: Coauthor One andreAmorimF Co-authored-by: Coauthor Two aryadneguardieiro Co-authored-by: Coauthor Three yurimantchev-sudo
| assert(TableConfig.CHANGE_DATA_FEED_ENABLED.fromMetadata(snapshot.getMetadata)) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Could we get test cases with removes?
- Remove only succeeds
- Add + remove with data change = false succeeds
- Add + remove fails
| // Track CDF validation state: whether we've seen adds and removes | ||
| // with enableChangeDataFeed=true | ||
| // This is wrapped in an array to allow modification from within the lambda | ||
| final boolean[] hasAddWithDataChange = {false}; |
There was a problem hiding this comment.
Nice! I think an AtomicBoolean would be a bit cleaner?
| // Track removes with enableChangeDataFeed=true for table creation validation | ||
| if (isCdfEnabled && removeFile.getDataChange()) { | ||
| hasRemoveWithDataChange[0] = true; | ||
| // Check if we've already seen adds with enableChangeDataFeed=true |
There was a problem hiding this comment.
I think instead of with enableChangeDataFeed=true you mean with dataChange = true?
| if (isAppendOnlyTable && removeFile.getDataChange()) { | ||
| throw DeltaErrors.cannotModifyAppendOnlyTable(dataPath.toString()); | ||
| } | ||
| // Track removes with enableChangeDataFeed=true for table creation validation |
There was a problem hiding this comment.
Can you help me understand: What here is specific to table creation?
Next, what logic here is doing an check for is-this-table-creation?
There was a problem hiding this comment.
Actually, we based ourselves in this kernel-rs modification and there it doesn't seem we have a check specific to table creation 🤔 . Did we miss something maybe?
There was a problem hiding this comment.
Sorry, I'm not clear what table creation has to do with here? Your comment says // Track removes with dataChange=true for table creation validation
There was a problem hiding this comment.
True, the comment is misleading. Removing references to table creation.
17e41f4 to
12da593
Compare
| // Kernel now supports writes to CDF-enabled tables with restrictions. | ||
| // Validation is performed at commit time to check for unsupported mixed operations | ||
| // (add+remove with dataChange=true) which would require CDC file generation. | ||
| return !TableConfig.ICEBERG_WRITER_COMPAT_V1_ENABLED.fromMetadata(metadata) |
There was a problem hiding this comment.
kernel iceberg doesn't supprot CDF
| boolean isTableFeatureOverrideKey = | ||
| key.startsWith(TableFeatures.SET_TABLE_FEATURE_SUPPORTED_PREFIX); | ||
| boolean isTableConfigKey = key.startsWith("delta."); | ||
| boolean isTableConstraintKey = key.startsWith("delta.constraints."); |
There was a problem hiding this comment.
delta.constraints.* should be a known feature. This was needed since all tests that was previously checking for changeDataFeed as unsupported feature, were changed to check against constraints.
There was a problem hiding this comment.
I haven't thought about this deeply yet but I don't think this is the right way to handle this. I think ideally we don't need to add this code at all, but if we do it should definitely be in TableConfig not adhoc here. I can look at this closer when I get back from PTO.
There was a problem hiding this comment.
This is a larger API change and we should think about this carefully. This may be the correct decision but we should figure out why. Because the tests are failing isn't a good enough reason to make this change.
There was a problem hiding this comment.
cc @scottsand-db this is along the lines of allowing delta properties discussion we were having
There was a problem hiding this comment.
Agreed with @allisonport-db, please revert this. This PR is meant for CDF, not constraints.
|
Hey @andreAmorimF, we're getting close! It seems like the failing test is coming from testIncompatibleUnsupportedTableFeature(
"changeDataFeed",
tablePropertiesToEnable = Map(TableConfig.CHANGE_DATA_FEED_ENABLED.getKey -> "true"))I think this should become testIncompatibleUnsupportedTableFeature(
"changeDataFeed",
tablePropertiesToEnable = Map(TableConfig.CHANGE_DATA_FEED_ENABLED.getKey -> "true"))
expectedErrorMessage = "Table features [changeDataFeed] are incompatible with icebergWriterCompatV1"))this is in the IcebergWriterCompatV1Suite.scala ^ |
…ormat into relax_cdf_write_condition
…ormat into relax_cdf_write_condition
…/DeltaTableWritesSuite.scala Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com>
| if (isAppendOnlyTable && removeFile.getDataChange()) { | ||
| throw DeltaErrors.cannotModifyAppendOnlyTable(dataPath.toString()); | ||
| } | ||
| // Track removes with dataChange=true for table creation validation |
There was a problem hiding this comment.
can you help me understand how this if statement applies to table creation? sorry, I don't see the connection
There was a problem hiding this comment.
from what I understood, these checks run in all commit actions (including table creation). The issue here is that the comment is too specific to table creation, do I understand well?
There was a problem hiding this comment.
If this is the problem and you are okay with it, I can remove these confusing comments since we already have some explanation of the tracking mechanism on these lines
| } | ||
| if (!action.isNullAt(ADD_FILE_ORDINAL)) { | ||
| AddFile addFile = new AddFile(action.getStruct(ADD_FILE_ORDINAL)); | ||
| // Track adds with dataChange=true for table creation validation |
There was a problem hiding this comment.
Ditto here -- what does this have to do with table creation?
For example, I don't see a check elsewhere in the code of: if hasAddWithDataChange and isCreateTable then do X
scottsand-db
left a comment
There was a problem hiding this comment.
Looks great! Just a minor comment on your comment that I don't quite follow. Once you respond/fix, will merge! Thanks!
|
Can you please rebase against master? |
delta-io#5567) Co-authored-by: Coauthor One andreAmorimF Co-authored-by: Coauthor Two aryadneguardieiro Co-authored-by: Coauthor Three yurimantchev-sudo <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description Allow writes to CDF tables for add-only, remove-only, and non-data-change transactions. More context: https://delta-users.slack.com/archives/C04TRPG3LHZ/p1763402127485709?thread_ts=1763400653.848629&cid=C04TRPG3LHZ <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Co-authored-by: André Fonseca <1077309+andreAmorimF@users.noreply.github.com> Co-authored-by: OussamaSaoudi <45303303+OussamaSaoudi@users.noreply.github.com>

Co-authored-by: Coauthor One andreAmorimF
Co-authored-by: Coauthor Two aryadneguardieiro
Co-authored-by: Coauthor Three yurimantchev-sudo
Which Delta project/connector is this regarding?
Description
Allow writes to CDF tables for add-only, remove-only, and non-data-change transactions.
More context: https://delta-users.slack.com/archives/C04TRPG3LHZ/p1763402127485709?thread_ts=1763400653.848629&cid=C04TRPG3LHZ
How was this patch tested?
Does this PR introduce any user-facing changes?