Skip to main content

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange

Loading…

current community
- Data Science
  
  help chat
- Data Science Meta
your communities

Sign up or log in to customize your list.

more stack exchange communities
company blog
Log in
Sign up

1. Home
2. Questions
3. Unanswered
4. AI Assist
5. Tags
7. Chat
8. Users
10. Companies
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Stack Internal

Knowledge at work

Bring the best of human thought and AI automation together at your work.

Explore Stack Internal

Questions tagged [model-evaluations]

Ask Question

This tag is meant to be used for questions related to how to evaluate a model performance, not only based on standard metrics, but also in the context of real use case applications. What is a good model might depend on many factors to take into account, to eventually get really useful data science applications.

Learn more…
Top users
Synonyms (1)

369 questions

Newest Active Bountied Unanswered

Bountied 0
Unanswered
Frequent
Score
Trending
Week
Month
Unanswered (my tags)

Filter by

No answers

No upvoted or accepted answers

Has bounty

Days old

Sorted by

Newest

Recent activity

Highest score

Most frequent

Bounty ending soon

Trending

Most activity

Tagged with

My watched tags

The following tags:

5 votes

2 answers

65 views

What if my model performs well even on small training set?

I was working on a dataset which is available on kaggle. At first, I split my data with a train-test ratio of 90:10. Then I fit 24 different models (6 different regressors with 4 different ...

cross-validation
model-evaluations
rmse
test

ArshakParsa

51

asked Nov 18 at 11:27

2 votes

0 answers

29 views

How can I evaluate an LLM’s reliability for use in high-stakes, risk-sensitive decision support?

I’m working with a large language model that has been configured to behave conservatively in high-stakes contexts: it refuses unsafe or ambiguous user requests, prioritizes client welfare over ...

machine-learning
model-evaluations
llm
explainable-ai

Rex H

21

asked Nov 14 at 12:51

0 votes

0 answers

18 views

Reporting results with a little high standard deviation within Nested CV

I'm working on a binary classification problem to identify struggling students, my dataset contains 10 features and 200 samples, I implement Nested CV, the distribution of the target variable is 58%/...

machine-learning
classification
cross-validation
model-evaluations
performance

Youness Belhaj

1

asked Oct 22 at 2:13

4 votes

1 answer

44 views

How Do You Balance Feature Search Strategy and HP Optimization Cost?

What I’m trying to figure out I'm working on a machine learning project and would love to hear your thoughts on two things: A. How to prioritize feature exploration B. Whether to fix hyperparameters (...

machine-learning
feature-selection
feature-engineering
model-evaluations
hyperparameter-tuning

Ten

41

asked Oct 20 at 11:49

2 votes

1 answer

70 views

In production, how do you evaluate the quality of the response generated by a RAG system?

I am working on a use case where I need to get the right answer and send it to the user. I have been struggling for a time to find a reliable metric to use that tells me when an answer is correct. The ...

model-evaluations
generative-models
information-retrieval
rag

Espoir Murhabazi

231

asked Oct 13 at 15:57

6 votes

2 answers

217 views

Normalization strategy after combining train and validation sets for final training

I'm working on a classification task using PyTorch and Optuna. I originally split my dataset into three parts: training, validation, and test. I fit a MinMaxScaler only on the training set and applied ...

data
model-evaluations
normalization
data-leakage

Antonio Rossi

331

asked Jul 12 at 8:09

4 votes

3 answers

110 views

Can cross validation for tuning and LOO for evaluation on the exact same dataset cause bias?

I read two articles by the same guy where he uses the whole dataset for hyperparameter optimisation using with CV and then evaluates the model with the best hyperparameters using leave one out on the ...

cross-validation
overfitting
model-evaluations
hyperparameter-tuning
bias

Lisana Daniel

55

asked May 14 at 17:55

2 votes

0 answers

71 views

How to evaluate a new policy given a historical dataset?

Suppose I have a dataset where, for each observation, we observe the loan's interest rate and whether the customer defaulted (i.e., failed to repay the loan). The interest rate is determined by a ...

classification
model-evaluations

Aaron

231

asked Apr 15 at 19:26

2 votes

0 answers

34 views

Evaluation of token importance attribution based on human rationales

I am working on evaluating an explainability method for a text classification model that predicts whether a given text sequence contains hate speech or not. The method outputs token-level importance ...

model-evaluations
language-model
explainable-ai
feature-importances

Marc

21

asked Apr 8 at 17:18

2 votes

0 answers

143 views

Evaluating model performance when used in targeting decisions

I have a logistic regression model, the output of which is used to make decisions. I am testing an improved version of this model. In testing, it has substantially improved logloss vs old model. When ...

xgboost
logistic-regression
model-evaluations

user179361

29

asked Mar 24 at 16:34

3 votes

1 answer

48 views

Evaluation of model on imperfect validation set

I would like to get help with evaluation of my classification model. It is a typical model that for each input produces vector of floats that represents probabilities of labels and I classify the ...

model-evaluations

Keeehi

31

asked Mar 4 at 10:47

0 votes

0 answers

44 views

Data Exploration - Uneven Sampling Frequency

I apologize in advance for the noob question, this is the first ML project that I have attempted although I have some stats background. I am in the data exploration phase for a project, where I am ...

time-series
regression
data-cleaning
model-evaluations

therinoa

1

asked Feb 26 at 21:07

0 votes

0 answers

34 views

Getting low accuracy while using QSVM

I am trying to predict weather using QSVM. The dataset I am using can be seen here : Dataset: https://www.kaggle.com/datasets/muthuj7/weather-dataset I am using ZZfeatyremap and Linear Quantum Kernel. ...

machine-learning
machine-learning-model
svm
accuracy
model-evaluations

ahmad javaid

1

asked Feb 21 at 12:50

5 votes

3 answers

140 views

Same validation curves for training and test dataset

I am learning machine learning by myself. I am applying logistic regression to Weather Forecast dataset from Kaggle Weather_data. The goal is to predict Rain according to the given features and the ...

logistic-regression
training
model-evaluations

noreli

51

asked Jan 11 at 21:12

1 vote

0 answers

62 views

Is overfitting always bad?

I have trained my model for the first time and inference it on random images. When I tried random image that has similar camera position with my dataset, it fits well at detecting river. But when it’s ...

training
transformer
model-evaluations
image-segmentation

Dean Debrio

11

asked Dec 30, 2024 at 3:50

15 30 50 per page

1

2 3 4 5

…

The Overflow Blog
Tell us what you really, really… do not want to spend time working on
Simulating lousy conversations: Q&A with Silvio Savarese, Chief Scientist &...
Featured on Meta
AI Assist is now available on Stack Overflow
Native Ads coming soon to Stack Overflow and Stack Exchange

Hot Network Questions

Are there any obvious holes in my homebrew rule for identifying curses in magic items?
Benefits and drawbacks of SSH RSA long key
Where should the bridges be built to minimize the length of the path between two towns?
How much is a factor of 2?
Estimating confidence interval for parameters in a mathematical model
Do indoor plants significantly lower indoor carbon dioxide levels?
Why doesn't SQL Server match computed column to an index when the expression is trivial?
Apply specific font and stroke to title text
Extract Metadata "comment" from MP4 using ffmpeg
How should we understand the last guidance to us by Jesus regarding commandments?
Does a Guardian laser do more damage on a direct hit?
Where is the location shown in the Breath of the Wild box art where Link is scaling a cliff?
Prove/disprove that polygons on a sheet of paper having a common edge, can always be colored with one of two different colors.
Trying to create a 3D reveal animation
Where can I see the cost per hour or number of requests per hour in Google Cloud Platform?
60s Short story about a deadly amusement park for children
Past event horizon of a Rindler observer and causal accessibility
Vintage sci-fi book with human descendants
Making your own brake pads for mechanical disc brakes
How has the proportion of visitor entry refusals at the US border evolved under Trump?
Right-clicking in nautilus to get hashes of various files in 24.04
Where can I find the original “Mouse Without Borders” settings save location?
Meaning of "nilpotents on the a curve" in Hartshorne.
move files to directory according to prefix

more hot questions

Newest model-evaluations questions feed

Subscribe to RSS

Newest model-evaluations questions feed

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Data Science

Tour
Help
Chat
Contact
Feedback

Company

Stack Overflow
Stack Internal
Stack Data Licensing
Stack Ads
About
Press
Legal
Privacy Policy
Terms of Service
Cookie Policy

Stack Exchange Network

Technology
Culture & recreation
Life & arts
Science
Professional
Business
API
Data

Blog
Facebook
Twitter
LinkedIn
Instagram

Site design / logo © 2025 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2025.12.11.37917