Questions tagged [kullback-leibler]
An asymmetric measure of distance (or dissimilarity) between probability distributions. It might be interpreted as the expected value of the log likelihood ratio under the alternative hypothesis.
542 questions
3
votes
2
answers
142
views
Why is KL Divergence notation $D_{KL}(P^* \parallel \hat{Q})$ "reversed" compared to Euclidean distance/difference $d(A,B)$ and subtraction $A-B$?
I'm having a fundamental disconnect between my intuition for KL divergence and the standard notation $D_{KL}(P \parallel Q)$. My intuition, which I believe is correct, is based on "excess ...
0
votes
0
answers
61
views
Is the binary hypothesis testing's minimum error probability monotonically decreasing with KL divergence?
For the following binary hypothesis testing problem
$$
\begin{aligned}
H_0: \boldsymbol{y} \sim f(\boldsymbol{y} | H_0)\\
H_1: \boldsymbol{y} \sim f(\boldsymbol{y} | H_1)
\end{aligned}
$$
where $\...
13
votes
1
answer
576
views
Is Berk (1966)'s main theorem standard in Statistics/Probability? Is there a name for it?
Not a technical question, more of a curiosity from someone outside of Statistics/Probability.
The paper from Berk (1966), "Limiting Behavior of Posterior Distributions when the Model is Incorrect&...
4
votes
1
answer
125
views
Is i.i.d assumption necessary for MLE?
I learnt from the course 18.650 MIT OCW that we need i.i.d samples to derive MLE from KL divergence. But in the GLM framework the catch is when we model the mean of the selected distribution basically ...
1
vote
0
answers
93
views
KL divergence and deep learning paradigm
My question is regarding the paradigm of deep learning, I do not get where does the cost functions come from? For example for a classification task are we treating the encoder as the expected value of ...
4
votes
0
answers
120
views
Is there an exact decomposition of KL divergence into marginal mismatches and higher-order dependencies?
Exact hierarchical decomposition of KL divergence into marginals and higher‑order interactions
In the standard set‑up, you compare a joint distribution
$$
P(X_1,\dots,X_k)
$$
to an independent ...
0
votes
0
answers
22
views
Mutual information (also known as Kullback–Leibler divergence). Is that true? [duplicate]
I came across an article that stated the following:
However, from this discussion, mutual information is not equivalent to Kullback–Leibler divergence. I assume only one interpretation can be correct ...
3
votes
1
answer
182
views
Estimate Kullback-Leibler divergence with Monte Carlo when one of the distributions is simple
I'm interested in estimating $D_\mathrm{KL}(q \parallel p) = \int q(x) \log \frac{q(x)}{p(x)}\,\mathrm dx$, where $p$ is a multivariate Gaussian and $q$ is an implicit distribution parameterized by a ...
0
votes
0
answers
91
views
How to calculate the KL divergence between two multivariate complex Gaussian distributions?
I am reading a paper "Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra"
In this paper, the author calculate the KL ...
1
vote
1
answer
135
views
How do I compute the KL divergence between my MCMC samples and the target distribution?
Assume $(E,\mathcal E,\lambda)$ is a $\sigma$-finite measure space and $\nu$ is a probability measure on $(E,\mathcal E)$ with $\nu\ll\lambda$. Furthermore, assume that $\mu=\sum_{i=0}^{n-1}\delta_{...
1
vote
1
answer
165
views
How exactly is Empirical KL Divergence Defined and how is it calculated
Suppose that we have two independent identically distributed samples. The first sample looks like $x_1 , \ldots, x_n$ with $x_i \in \mathbb{R}^d$ for every $i$. The second sample looks like $y_1, \...
0
votes
0
answers
63
views
Quantification of KL divergence error for approximating a distribution over discrete random variables
I would like to know the following which has been stated in some literature, but never explicitly proved
Consider a setup consisting of a binary vector of random variables of length n say $\vec{v}=(...
1
vote
1
answer
394
views
Generalised Jensen-Shannon divergence - What is a small JSD?
I am comparing the similarity between multiple distributions based on the output of different machine-learning models. I am applying the generalised JS divergence (wiki):
$$
JSD_{\pi_1,...,\pi_n}(p_1,....
1
vote
0
answers
133
views
Proof of asymmetry of relative entropy (KL-divergence) $D(p∥q) \neq D(q∥p)$ [duplicate]
Unlike a real distance measure, relative entropy is not symmetric in the
sense that $D(p(x)∥q(x)) \neq D(q(x)∥p(x))$. It turns out that many information measures can be expressed by relative entropies....
1
vote
0
answers
29
views
Why are we using KL divergence over cross entropy? [duplicate]
I read this question
Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?
and I cannot fully understand the answer.
If we're using KL divergence for the ...