close
Skip to content

Spark Iceberg manifest reports wrong parquet file sizes. #1980

@dmgcodevil

Description

@dmgcodevil

We are using spark iceberg and some iceberg manifest files report the wrong data file (parquet) size, it's ~ 2x larger than the actual parquet file size. The issue was found while investigating Presto Iceberg iss6369

the problem might be in ParquetWriter#length(), method

return writer.getPos() + (writeStore.isColumnFlushNeeded() ? writeStore.getBufferedSize() : 0);

maybe that's why a parquet file size in manifest > actual file size on drive

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions