why layoutlmv3 did't use image data, when fintune xfund? #1731

Open

opened

on Oct 10, 2025

Expected behavior
A clear and concise description of what you expected to happen.

Platform:
Python version: 3.11
PyTorch version (GPU?):
2.8
transformers:4.56.2

here images key in features is removed

        def __call__(self, features):
              label_name = "label" if "label" in features[0].keys() else "labels"
              labels = [feature[label_name] for feature in features] if label_name in features[0].keys() else None
      
              images = None
              if "images" in features[0]:
                  images = torch.stack([torch.tensor(d.pop("images")) for d in features])
                  IMAGE_LEN = int(images.shape[-1] / 16) * int(images.shape[-1] / 16) + 1
      
              batch = self.tokenizer.pad(
                  features,
                  padding=self.padding,
                  max_length=self.max_length,
                  pad_to_multiple_of=self.pad_to_multiple_of,
                  # Conversion to tensors will fail if we have labels as they are not of the same length yet.
                  return_tensors="pt" if labels is None else None,
              )

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests