Support data skipping based on record index for flink reader #17519

Open

Labels

opened

Feature Description

What the feature achieves:
Support data skipping based on record index for flink reader.

Why this feature is needed:

Flink reader currently support the following data skipping optimization:

partition prune based on partition stats
bucket pruning for bucket index
file slice pruning based on column stats.

To further improve query performance, we can also support file slice pruning based on record level index.

User Experience

How users will use this feature:

Configuration changes needed
API changes
Usage examples

Hudi RFC Requirements

RFC PR link: (if applicable)

Why RFC is/isn't needed:

Does this change public interfaces/APIs? (Yes/No)
Does this change storage format? (Yes/No)
Justification:

Metadata

Assignees

No one assigned

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests