Skip to content

Refactor/ingestion #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 4, 2025
Merged

Refactor/ingestion #209

merged 21 commits into from
Mar 4, 2025

Conversation

cyclotruc
Copy link
Member

@cyclotruc cyclotruc commented Mar 3, 2025

Gitingest Code Refactoring and Structure Improvements

This PR performs a comprehensive refactoring of the Gitingest codebase to improve maintainability, organization, and performance:

Key Changes

  • Reorganized the codebase into a more modular structure:

    • Split monolithic query_ingestion.py into smaller dedicated modules
    • Created a new utils package for shared functionality
    • Added new filesystem_schema.py with proper data classes
  • Improved code architecture:

    • Implemented proper OOP patterns with dedicated classes
    • Established clear module boundaries and responsibilities
    • Renamed modules and functions for better clarity:
      • query_parser.pyquery_parsing.py
      • repository_clone.pycloning.py
      • run_ingest_queryingest_query
  • Enhanced functionality:

    • Added blob-specific cloning support
    • Fixed file encoding detection with better fallbacks
    • Added better detection for textual vs binary files
  • Quality improvements:

    • Standardized error handling across modules
    • Fixed variable naming conventions (e.g., eexc)
    • Added proper return type hints and docstrings

Smaller Changes

  • Added digest.txt to the default .gitignore
  • Renamed OUTPUT_FILE_PATH to OUTPUT_FILE_NAME for clarity
  • Fixed some Node/JavaScript references in comments
  • Improved error messages and warnings
  • Added chardet to dependencies
  • Bumped version from 0.1.3 to 0.1.4

Thanks @filipchristiansen

@cyclotruc cyclotruc merged commit d6cb920 into main Mar 4, 2025
18 checks passed
@atyrode atyrode deleted the refactor/ingestion branch March 4, 2025 18:05
filipchristiansen added a commit that referenced this pull request Mar 13, 2025
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
filipchristiansen added a commit that referenced this pull request Mar 13, 2025
Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
Signed-off-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants