Skip to content

Support data skipping based on record index for flink reader #17519

@cshuo

Description

@cshuo

Feature Description

What the feature achieves:
Support data skipping based on record index for flink reader.

Why this feature is needed:

Flink reader currently support the following data skipping optimization:

  1. partition prune based on partition stats
  2. bucket pruning for bucket index
  3. file slice pruning based on column stats.

To further improve query performance, we can also support file slice pruning based on record level index.

User Experience

How users will use this feature:

  • Configuration changes needed
  • API changes
  • Usage examples

Hudi RFC Requirements

RFC PR link: (if applicable)

Why RFC is/isn't needed:

  • Does this change public interfaces/APIs? (Yes/No)
  • Does this change storage format? (Yes/No)
  • Justification:

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:featureNew features and enhancements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions