Skip to content

feat(cli): add --include-gitignored flag to exclude files listed in .gitignore #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 25, 2025

Conversation

ArmanJR
Copy link
Contributor

@ArmanJR ArmanJR commented Apr 5, 2025

This pull request introduces a new CLI flag (--use-gitignore) that enhances Gitingest by automatically loading and applying ignore patterns from all .gitignore files found in the target repository or directory. When enabled, files and directories matching any pattern specified in any .gitignore are excluded from the generated text digest.

Key Changes:

  • CLI Update:

    • Modified src/gitingest/cli.py to add a new option --use-gitignore that accepts a boolean value.
    • Updated the main() function to pass the new flag to the asynchronous ingestion entry point.
  • Ingestion Entry Point:

    • Updated src/gitingest/entrypoint.py to include a new parameter use_gitignore.
    • Integrated a call to the new helper function load_gitignore_patterns() (from src/gitingest/utils/ignore_patterns.py) to update the query’s ignore patterns with all patterns extracted from .gitignore files.
  • Gitignore Loader:

    • Implemented load_gitignore_patterns() in src/gitingest/utils/ignore_patterns.py, which recursively searches for .gitignore files starting from the repository root and aggregates their ignore patterns.
    • Added comprehensive docstrings to the loader function to adhere to coding style guidelines.
  • Testing:

    • Created new tests in tests/test_gitignore_feature.py to verify that:
      • With --use-gitignore enabled, files matching .gitignore patterns are excluded from the digest.
      • Without the flag, all files are included.
    • Fixed linting and formatting issues in tests and source files.

This feature provides a seamless way to respect repository-level ignore rules, ensuring that the generated digest is more relevant for ingestion by large language models. It improves usability by reducing the need for manual pattern exclusions and aligns the tool’s behavior more closely with Git’s own ignore logic.

@cyclotruc
Copy link
Member

@ArmanJR Thanks for your contribution, I've looked at the code and it looks OK, I would have to run some tests myself

In order to merge this I think we would need to reflect those changes in the front-end so gitingest keeps 1-1 features no matter from where it is accessed

Here's a rough sketch of how it could look like:
image

Do you think you can handle this or do you want us to help you with that?

@ArmanJR
Copy link
Contributor Author

ArmanJR commented Apr 10, 2025

I'll try :)

@neerax
Copy link

neerax commented Apr 15, 2025

Very useful feature!
I’d suggest enabling it by default and renaming the flag to something like --no-gitignore to ensure a safer default behavior.

@ArmanJR
Copy link
Contributor Author

ArmanJR commented Apr 18, 2025

@cyclotruc I believe having the checkbox on UI is redundant, as the files on a GitHub repo are already ignored if mentioned in the .gitignore. The intuition behind my PR is having the ability to ignore files when running gitingest on a local repo via CLI.

@ArmanJR
Copy link
Contributor Author

ArmanJR commented May 22, 2025

@cyclotruc bump!

@cyclotruc
Copy link
Member

Hi, sorry for the delay i've been busy but will come back to this soon, thanks again for your patience

@zazencodes
Copy link

This will be a great feature. For me I expected this behaviour by default and was surprised to find some credentials in the digest.txt after running on a local project.

@ArmanJR
Copy link
Contributor Author

ArmanJR commented May 22, 2025

Also, there is another issue in local gitingest ./ calling. When you call it for the second time, it includes the previous digest.txt in the new digest file:

~/code/Go/sandbox ···········································································································  10:39:58 AM
❯ echo "boz hi" > main.go
~/code/Go/sandbox ···········································································································  10:40:18 AM
❯ cat main.go
boz hi
~/code/Go/sandbox ···········································································································  10:40:21 AM
❯ gitingest ./
Analysis complete! Output written to: digest.txt

Summary:
Repository: ./
Files analyzed: 1

Estimated tokens: 29
❯ cat digest.txt
Directory structure:
└── .//
    └── main.go

================================================
File: /main.go
================================================
boz hi

~/code/Go/sandbox ···········································································································  10:40:32 AM
❯ gitingest ./
Analysis complete! Output written to: digest.txt

Summary:
Repository: ./
Files analyzed: 2

Estimated tokens: 73
~/code/Go/sandbox ···········································································································  10:40:37 AM
❯ cat digest.txt
Directory structure:
└── .//
    ├── digest.txt
    └── main.go

================================================
File: /digest.txt
================================================
Directory structure:
└── .//
    └── main.go

================================================
File: /main.go
================================================
boz hi




================================================
File: /main.go
================================================
boz hi

I believe since users usually don't double-check the content of digest.txt, it's better to ignore digest.txt by default.

@filipchristiansen
Copy link
Contributor

@ArmanJR The tests are failing. Can you have a look at it?

@cyclotruc
Copy link
Member

@ArmanJR Thank you for the contribution
We're interested in this feature, do you think you want to continue working on it or should we take it from there?
Happy to help if you want to finish

@filipchristiansen filipchristiansen changed the title Add Flag to Automatically Exclude .gitignore feat(cli): add --use-gitignore flag to exclude files listed in .gitignore Jun 23, 2025
@ArmanJR
Copy link
Contributor Author

ArmanJR commented Jun 23, 2025

Of course, I'd be happy to help. Is there anything else that should be implemented?

@filipchristiansen
Copy link
Contributor

filipchristiansen commented Jun 23, 2025

Suggestion on flag semantics & naming

  1. Respect .gitignore by default.
    Most developer-facing CLIs (ripgrep, fd, etc.) do this because it’s safer (no secrets or build artefacts leak).

  2. Invert the flag so users opt in when they genuinely want the extra noise. Two workable spellings:
    --no-gitignore (mirrors ripgrep --no-ignore)
    --include-gitignored (reads like “please pull in the files that are normally ignored”).

  3. Whatever name we choose, the default should be True so existing scripts keep working and only the rare cases need the override:

    # normal – git-ignored files skipped
    gitingest
    
    # exceptional – include everything
    gitingest --no-gitignore
  4. Implementation note: using the pathspec library would give us full Git-wildmatch coverage (negations, **, order-aware precedence) practically for free. We might also want to reimplement the _should_include and _should_exclude functions that currently use fnmatch with pathspec.

@ArmanJR ArmanJR marked this pull request as ready for review June 25, 2025 00:58
@cyclotruc cyclotruc changed the title feat(cli): add --use-gitignore flag to exclude files listed in .gitignore feat(cli): add --include-gitignored flag to exclude files listed in .gitignore Jun 25, 2025
Copy link
Member

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful!
I just tried it and it works like a charm,
thanks a lot @ArmanJR and @filipchristiansen, merging

@cyclotruc cyclotruc merged commit ba701a8 into coderamp-labs:main Jun 25, 2025
18 checks passed
@filipchristiansen
Copy link
Contributor

filipchristiansen commented Jun 25, 2025

Thanks a lot @ArmanJR!

Just a follow up question: What was the reason for the version (>=0.12.1) dependency of pathspec>=0.12.1?

@ArmanJR
Copy link
Contributor Author

ArmanJR commented Jun 25, 2025

@filipchristiansen You mean why 0.12.1? I think 0.12.0 had a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Provide exclude pattern presets (and optionally inherit from repo's .gitignore) feat: ignore .gitignored files when running locally
5 participants