Skip to content

feat(cli):Add support for .gitingest file processing in query ingestion #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 19, 2025

Conversation

AbhiRam162105
Copy link
Contributor

Feature Documentation: .gitingest Ignore Patterns

This pull request includes changes to the src/gitingest/query_ingestion.py file to enhance the query ingestion process by applying ignore patterns from a .gitingest file. The most important changes include importing the tomllib library and adding a new function to handle the .gitingest file.

Enhancements to query ingestion:

Overview

This feature introduces support for a .gitingest configuration file in the project root directory that allows users to define file and directory ignore patterns using a TOML format. The .gitingest file enables more flexible and customizable query ingestion by allowing users to exclude specific files or directories from being processed.

Purpose

The primary goal of this feature is to provide users with a simple and effective way to define ignore patterns for files and directories that should not be ingested by the query ingestion process. This reduces unnecessary processing, improves performance, and allows for better control over the indexing of repository contents.

File Format

The .gitingest file is written in TOML format and consists of a [config] section where users specify ignore patterns as a list.

Example .gitingest File:

[config]
ignore_patterns = ["README.md", "tests/", "*.log", "docs/*.md"]

Explanation of Fields:

  • ignore_patterns: A list of file paths, directory paths, or glob patterns that define which files should be excluded from ingestion.
    • Specific file names (e.g., README.md) will be ignored.
    • Directory names (e.g., tests/) will cause all files inside the directory to be ignored.
    • Wildcard patterns (e.g., *.log) allow for flexible filtering of file types.
    • Path-based patterns (e.g., docs/*.md) allow filtering within specific subdirectories.

Future Enhancements

  • Support for negation rules (e.g., exclude everything except a specific file type).
  • Support for inclusion rules (e.g., exclude everything except a specific file type).
  • Allow .gitingest to be placed in subdirectories for more granular control.
  • Writing tests for the .gitingest file working.

By implementing the .gitingest file support, this feature significantly improves the usability and efficiency of the query ingestion system in gitingest.

@AbhiRam162105
Copy link
Contributor Author

@cyclotruc this is a bit of bare bones implementation, I shall look into writing tests for this feature as well as add options to include files.

Until then can you merge this :)

@cyclotruc cyclotruc merged commit f90595d into coderamp-labs:main Feb 19, 2025
18 checks passed
filipchristiansen added a commit that referenced this pull request Mar 13, 2025
…on (#191)


Co-authored-by: Romain Courtois <romain@coderamp.io>
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
filipchristiansen added a commit that referenced this pull request Mar 13, 2025
…on (#191)

Co-authored-by: Romain Courtois <romain@coderamp.io>
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
Signed-off-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants