Fix: --include-pattern incorrectly filters directories leading to empty output #278
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Description:
It has been observed that when users utilize the
--include-pattern
(or-i
) argument to specify file types for inclusion, for example,gitingest . -i "*.py, *.js"
, the resulting text digest is empty even if the target directory genuinely contains matching.py
or.js
files.Root Cause Analysis:
Upon investigation, the root cause was identified in the
_process_node
function withinsrc/gitingest/ingestion.py
. The filtering logic forquery.include_patterns
was erroneously applied to allsub_path
types, including directories.The
_should_include
function (located insrc/gitingest/utils/ingestion_utils.py
) is designed to match based on file extensions or file name patterns (e.g.,*.py
). However, directories themselves (e.g.,src/
,my_project/
) cannot match these file-specific patterns.Consequently, when
_process_node
traverses the file system, it prematurely skips any subdirectories that do not match aninclude_pattern
. This prevents the traversal from descending into the directory tree, leading to the omission of all files that should have been included in the digest, even if those files (like.py
or.js
files) reside within these skipped directories.Proposed Solution:
To address this issue, modifications have been made to the
_process_node
function insrc/gitingest/ingestion.py
. The_should_include
logic will now only be applied to file types (sub_path.is_file()
).For directories (
sub_path.is_dir()
), as long as they are not explicitly excluded byquery.ignore_patterns
(which already incorporates default ignore rules, user-defined-e
patterns, and removed overlaps with-i
), they will continue to be recursively traversed. This ensures that even if a directory itself does not match a file pattern, it will still be properly navigated to discover files within it that do match theinclude_patterns
criteria.Summary of Code Changes:
The call to
_should_include
in_process_node
has been changed from:to:
Expected Outcome:
This fix ensures that the
--include-pattern
argument functions as intended, effectively filtering file content without impeding directory traversal. Users can now correctly specify desired file types for inclusion and receive accurate output.Thank you for your review and support!