Skip to content

Fix: --include-pattern incorrectly filters directories leading to empty output #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ccwq
Copy link

@ccwq ccwq commented Jun 13, 2025

Problem Description:

It has been observed that when users utilize the --include-pattern (or -i) argument to specify file types for inclusion, for example, gitingest . -i "*.py, *.js", the resulting text digest is empty even if the target directory genuinely contains matching .py or .js files.

Root Cause Analysis:

Upon investigation, the root cause was identified in the _process_node function within src/gitingest/ingestion.py. The filtering logic for query.include_patterns was erroneously applied to all sub_path types, including directories.

The _should_include function (located in src/gitingest/utils/ingestion_utils.py) is designed to match based on file extensions or file name patterns (e.g., *.py). However, directories themselves (e.g., src/, my_project/) cannot match these file-specific patterns.

Consequently, when _process_node traverses the file system, it prematurely skips any subdirectories that do not match an include_pattern. This prevents the traversal from descending into the directory tree, leading to the omission of all files that should have been included in the digest, even if those files (like .py or .js files) reside within these skipped directories.

Proposed Solution:

To address this issue, modifications have been made to the _process_node function in src/gitingest/ingestion.py. The _should_include logic will now only be applied to file types (sub_path.is_file()).

For directories (sub_path.is_dir()), as long as they are not explicitly excluded by query.ignore_patterns (which already incorporates default ignore rules, user-defined -e patterns, and removed overlaps with -i), they will continue to be recursively traversed. This ensures that even if a directory itself does not match a file pattern, it will still be properly navigated to discover files within it that do match the include_patterns criteria.

Summary of Code Changes:

The call to _should_include in _process_node has been changed from:

        if query.include_patterns and not _should_include(sub_path, query.local_path, query.include_patterns):
            continue

to:

        # If include patterns are specified, apply them only to files.
        # Directories, if not excluded, should always be traversed as they may contain files to be included.
        if query.include_patterns and sub_path.is_file():
            if not _should_include(sub_path, query.local_path, query.include_patterns):
                continue

Expected Outcome:

This fix ensures that the --include-pattern argument functions as intended, effectively filtering file content without impeding directory traversal. Users can now correctly specify desired file types for inclusion and receive accurate output.

Thank you for your review and support!

# Directories should be traversed as long as they are not excluded,
# because they may contain files that need to be included.
@filipchristiansen
Copy link
Contributor

This PR: #259 should already fix this. Please checkout the corresponding branch (or main once it has been merged) and confirm that it does. If not, please reopen this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants