Source: (Heart Disease Dataset)[https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data]
Description: The dataset includes various clinical features such as age, cholesterol levels, blood pressure, and more, aimed at determining the presence of heart disease.
Handling missing values using different methods like iterative imputer or filling by constant value, encoding categorical variables, and feature scaling.
Application of Recursive Feature Elimination (RFE) and Chi-Square tests to identify significant predictors. I also plotted each feature importance.
Used PCA to preserve the most variance for Unsupervised Machine learning models.
Implementation of multiple classifiers including Logistic Regression, Random Forest, Decision Tree, and Support Vector Machine (SVM).
Kmeans and Hierarchical Clustering to group similar objects.
Used GridSearch and RandomizedSearch to get the best paramters for models.
- Logistic Regression: 82% Accuracy
- Decision Tree: 80% Accuracy
- Random Forest: 84% Accuracy
- SVM: 84% Accuracy
Utilization of metrics like ROC curves and AUC scores to assess model performance.
Saving trained models using joblib for future inference.
- Clone the repository:
git clone https://github.com/yahia997/SprintUp-AI-ML-project.git
cd SprintUp-AI-ML-project- Install dependencies:
pip install -r requirements.txt