Questions tagged [loss-functions]
A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.
1,191 questions
7
votes
1
answer
171
views
Why do “good” loss functions in ML need both Lipschitz continuity and smoothness?
I’m trying to understand the common assumptions in machine-learning optimization theory, where a “well-behaved” loss function is often required to be both L-Lipschitz and β-smooth (i.e., have β-...
0
votes
0
answers
16
views
Plotting Training VS Testing Curve
I am using gradient boosting regressor from scikit-learn with squared error as the loss function. Then i want to plot the training set vs test set curve. Based on what i read, it is used to see the ...
0
votes
0
answers
42
views
Multiplying probabilities of weights in Bayesian neural networks to formulate a prior
A key element in Bayesian neural networks is finding the probability of a set of weights, so that it can be applied to Bayes rule.
I cannot think of many ways of doing this, for P(w) (also sometimes ...
2
votes
1
answer
125
views
A question about minimizing $l_2$ norm with regularization
PREMISES: this question likely arises from my very basic knowledge of the field. Please, be very detailed in the answer, even it can seem that some facts are trivial. Also, sorry for my poor english.
...
0
votes
0
answers
70
views
Why is my loss curve so steep at the beginning?
For different models with same batchsizes the start loss and loss after the steep part would be very similar, is that normal?
With bigger batchsizes, axis gets scaled but graph still has the same ...
4
votes
1
answer
316
views
A detail on how MSE loss works in PyTorch
Given two tensors $x$ and $y$ both of shape $(N,n)$ ($N$ being the number of samples and $n$ the number of dimensions of each sample), the MSE loss is (according to what I think):
$$
\mathrm{MSE}(x,y)=...
0
votes
0
answers
55
views
MSE Loss: Which target representation allows better focus on minority class learning?
Given these two target representations for the same underlying data:
Target A : Minority class samples (Cluster 5) isolated in distribution tail, majority class samples (Clusters 3+6) shifted toward ...
5
votes
1
answer
172
views
Distribution based loss for regression with unbounded data
Currently I am dealing with time-series data conserning the power consumption of machines. Therefore, all target variables range from zero to infinity, technically ($y \in [0, \infty)$). The data ...
1
vote
0
answers
93
views
KL divergence and deep learning paradigm
My question is regarding the paradigm of deep learning, I do not get where does the cost functions come from? For example for a classification task are we treating the encoder as the expected value of ...
4
votes
2
answers
511
views
Proper loss functions in machine learning
Many textbooks on the theory of machine learning state that statistical decision theory provides the basis for comparing ML algorithms.
In statistical decision theory, decision rules are compared ...
0
votes
0
answers
80
views
What is a suitable loss function for predicting cos(φ) and sin(φ) of a circular data using a CNN?
I want to predict an angular parameter ($\phi$) from some signal using a CNN. Due to the architecture of my code, the regression is done on the two targets ($\cos\phi$, $\sin\phi$).
I created a model ...
8
votes
5
answers
1k
views
Have we been using the wrong objective function when training logistic regression?
The standard objective function when training a logistic regression model is:
Minimize Negative Log Likelihood
This form makes it easier to optimize, but it is mathematically equivalent to the more ...
4
votes
1
answer
141
views
Loss function that is minimized by an HPD interval with specific coverage?
This answer describes two loss functions for Bayesian credible intervals, each of which is minimized by a particular kind of interval. I am curious whether there exists a loss function on credible ...
1
vote
0
answers
69
views
Order sensitivity of scoring rules
This is from another question here.
The theorem below is from Lambert's paper about forecasting, (Elicitation and Evaluation of Statistical Forecasts):
$\textbf{Proposition}\quad 1:$ Let $(\Theta = \{\...
1
vote
0
answers
29
views
Choice of estimator that minimizes expected loss [duplicate]
Let us say we have an i.i.d. sample of data from a random variable $X$. Suppose an agent must guess the value $x$ of $X$ that will be generated next. The guess is $\hat x$. They will make an error $e:=...