Skip to content

Conversation

@aeeeeeep
Copy link

@aeeeeeep aeeeeeep commented Dec 8, 2025

Problem Description

In distributed training scenarios using torchrun with custom coverage plugins, we encounter empty other_data during coverage combine. Specifically, when the tracer, arc, and line_bits tables in the coverage database contain zero records, this leads to a critical error:

Conflicting file tracer name for 'xxx': '' vs 'coverage_plugin'

Configuration Context

[run]
parallel = True
plugins =
    coverage_plugin
...

Root Cause

  • Distributed Process Behavior: Multiple processes in torchrun may generate coverage data with different plugin configurations
  • Empty Coverage Data: Some processes produce coverage databases with no actual measurements (all three key tables empty)

Solution

Added a pre-check in CoverageData.update() to skip empty coverage data by querying record counts from tracer, arc, and line_bits tables, and skipping update operations when all three tables contain zero records.

@aeeeeeep aeeeeeep force-pushed the fix_coverage_combine branch from ae1caf8 to 2e1a23d Compare December 8, 2025 13:51
@nedbat
Copy link
Member

nedbat commented Dec 8, 2025

My reaction is, "I guess it's fine to skip empty data files" but also, "it seems wrong that the data file is empty." Can you give me a way to reproduce this? You mentioned custom plugins, are those coverage plugins? Can you share them?

@aeeeeeep
Copy link
Author

aeeeeeep commented Dec 8, 2025

Please wait a while, this scene is very complicated.

@nedbat nedbat added the question Further information is requested label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants