Questions tagged [model-evaluations]
This tag is meant to be used for questions related to how to evaluate a model performance, not only based on standard metrics, but also in the context of real use case applications. What is a good model might depend on many factors to take into account, to eventually get really useful data science applications.
369 questions
5
votes
2
answers
65
views
What if my model performs well even on small training set?
I was working on a dataset which is available on kaggle. At first, I split my data with a train-test ratio of 90:10. Then I fit 24 different models (6 different regressors with 4 different ...
2
votes
0
answers
29
views
How can I evaluate an LLM’s reliability for use in high-stakes, risk-sensitive decision support?
I’m working with a large language model that has been configured to behave conservatively in high-stakes contexts:
it refuses unsafe or ambiguous user requests,
prioritizes client welfare over ...
0
votes
0
answers
18
views
Reporting results with a little high standard deviation within Nested CV
I'm working on a binary classification problem to identify struggling students, my dataset contains 10 features and 200 samples, I implement Nested CV, the distribution of the target variable is 58%/...
4
votes
1
answer
44
views
How Do You Balance Feature Search Strategy and HP Optimization Cost?
What I’m trying to figure out
I'm working on a machine learning project and would love to hear your thoughts on two things:
A. How to prioritize feature exploration
B. Whether to fix hyperparameters (...
2
votes
1
answer
70
views
In production, how do you evaluate the quality of the response generated by a RAG system?
I am working on a use case where I need to get the right answer and send it to the user. I have been struggling for a time to find a reliable metric to use that tells me when an answer is correct.
The ...
6
votes
2
answers
217
views
Normalization strategy after combining train and validation sets for final training
I'm working on a classification task using PyTorch and Optuna. I originally split my dataset into three parts: training, validation, and test. I fit a MinMaxScaler only on the training set and applied ...
4
votes
3
answers
110
views
Can cross validation for tuning and LOO for evaluation on the exact same dataset cause bias?
I read two articles by the same guy where he uses the whole dataset for hyperparameter optimisation using with CV and then evaluates the model with the best hyperparameters using leave one out on the ...
2
votes
0
answers
71
views
How to evaluate a new policy given a historical dataset?
Suppose I have a dataset where, for each observation, we observe the loan's interest rate and whether the customer defaulted (i.e., failed to repay the loan). The interest rate is determined by a ...
2
votes
0
answers
34
views
Evaluation of token importance attribution based on human rationales
I am working on evaluating an explainability method for a text classification model that predicts whether a given text sequence contains hate speech or not.
The method outputs token-level importance ...
2
votes
0
answers
143
views
Evaluating model performance when used in targeting decisions
I have a logistic regression model, the output of which is used to make decisions.
I am testing an improved version of this model. In testing, it has substantially improved logloss vs old model.
When ...
3
votes
1
answer
48
views
Evaluation of model on imperfect validation set
I would like to get help with evaluation of my classification model. It is a typical model that for each input produces vector of floats that represents probabilities of labels and I classify the ...
0
votes
0
answers
44
views
Data Exploration - Uneven Sampling Frequency
I apologize in advance for the noob question, this is the first ML project that I have attempted although I have some stats background. I am in the data exploration phase for a project, where I am ...
0
votes
0
answers
34
views
Getting low accuracy while using QSVM
I am trying to predict weather using QSVM. The dataset I am using can be seen here :
Dataset: https://www.kaggle.com/datasets/muthuj7/weather-dataset
I am using ZZfeatyremap and Linear Quantum Kernel. ...
5
votes
3
answers
140
views
Same validation curves for training and test dataset
I am learning machine learning by myself. I am applying logistic regression to Weather Forecast dataset from Kaggle Weather_data. The goal is to predict Rain according to the given features and the ...
1
vote
0
answers
62
views
Is overfitting always bad?
I have trained my model for the first time and inference it on random images. When I tried random image that has similar camera position with my dataset, it fits well at detecting river. But when it’s ...