$\DeclareMathOperator{\p}{P}$ $\DeclareMathOperator{\P}{P}$ $\DeclareMathOperator{\c}{^C}$ $\DeclareMathOperator{\or}{ or}$ $\DeclareMathOperator{\and}{ and}$ $\DeclareMathOperator{\var}{Var}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\Std}{Std}$ $\DeclareMathOperator{\E}{E}$ $\DeclareMathOperator{\std}{Std}$ $\DeclareMathOperator{\Ber}{Bern}$ $\DeclareMathOperator{\Bin}{Bin}$ $\DeclareMathOperator{\Poi}{Poi}$ $\DeclareMathOperator{\Uni}{Uni}$ $\DeclareMathOperator{\Geo}{Geo}$ $\DeclareMathOperator{\NegBin}{NegBin}$ $\DeclareMathOperator{\Beta}{Beta}$ $\DeclareMathOperator{\Exp}{Exp}$ $\DeclareMathOperator{\N}{N}$ $\DeclareMathOperator{\R}{\mathbb{R}}$ $\DeclareMathOperator*{\argmax}{arg\,max}$ $\newcommand{\d}{\, d}$

Distance Between Distributions


Here is an important question! How can you quantify how different two distributions are? In other words, if I have two Probability Mass Functions, can I calculate a number that says how much they diverge from one another? Here are three reasonable ways to quantify distance between distributions:

Total Variational Distance

Loop over all possible values and calculate the absolute difference in probability.

Definition: Total Variation Distance

Let $X$ and $Y$ be discrete random variables

\begin{align*} \text{TV}(X, Y) = \frac{1}{2} \sum_{i} |\P(X = i) - \P(Y = i)| \end{align*}

Earth Mover's Distance

Imagine one distribution is a lump of dirt. How much work would it take to make it look just like the other? Also called the Wasserstein metric.

This value is very meaningful, but it doesn't have a closed form equation. Instead it is computed by solving a linear program. If the range of values that the random variables can take on is of size $n$, the linear program has a run time of $O(n^3 \cdot \log n)$ which is very slow.

Kullback Leiber Divergence

Let $X$ and $Y$ be discrete random variables. Calculate the expected "excess surprise" from using Y as your Probability Mass Function instead of X when the actual Probability Mass Function is X.

Definition: Kullback Leiber Divergence

Let $X$ and $Y$ be discrete random variables

\begin{align*} \text{KL}(X, Y)  &= \sum_{x \in X} \text{ExcessSurprise}(x) \cdot P(X=x) \\ &= \sum_{x \in X} \Big[\text{Surprise}_X(x) - \text{Surprise}_Y(x)\Big] \cdot P(X=x) \\ &= \sum_{x \in X} \Big[\log_2\frac{1}{P(Y=x)} - \log_2 \frac{1}{P(X=x)}\Big] \cdot P(X=x) \\ &= \sum_{x \in X} - \log_2 P(Y=x) + \log_2 P(X=x) \cdot P(X=x) \\ &= \sum_{x \in X}  \log_2 \frac{P(X=x)}{P(Y=x) } \cdot P(X=x) \\ \end{align*}

Here is example code for calculating how different an observed pmf of hurricanes per year is from a predicted poisson distribution

from scipy import stats
import math

def kl_divergence(predicted_lambda, observed_pmf):
    """
    We predicted that the number of hurricanes would be 
    X ~ Poisson(predicted_lambda) and observed a real world
    number of hurricanes Y ~ observed_pmf
    """
    X = stats.poisson(predicted_lambda)
    divergence = 0
    # loop over all the values of hurricanes
    for i in range(0, 40):
      pr_X_i = X.pmf(i)
      pr_Y_i = observed_pmf[i]
      excess_surprise_i = math.log(pr_X_i / pr_Y_i)
      divergence += excess_surprise_i * pr_X_i
    return divergence