$\DeclareMathOperator{\p}{P}$ $\DeclareMathOperator{\P}{P}$ $\DeclareMathOperator{\c}{^C}$ $\DeclareMathOperator{\or}{ or}$ $\DeclareMathOperator{\and}{ and}$ $\DeclareMathOperator{\var}{Var}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\Std}{Std}$ $\DeclareMathOperator{\E}{E}$ $\DeclareMathOperator{\std}{Std}$ $\DeclareMathOperator{\Ber}{Bern}$ $\DeclareMathOperator{\Bin}{Bin}$ $\DeclareMathOperator{\Poi}{Poi}$ $\DeclareMathOperator{\Uni}{Uni}$ $\DeclareMathOperator{\Geo}{Geo}$ $\DeclareMathOperator{\NegBin}{NegBin}$ $\DeclareMathOperator{\Beta}{Beta}$ $\DeclareMathOperator{\Exp}{Exp}$ $\DeclareMathOperator{\N}{N}$ $\DeclareMathOperator{\R}{\mathbb{R}}$ $\DeclareMathOperator*{\argmax}{arg\,max}$ $\newcommand{\d}{\, d}$

Normal Distribution


The single most important random variable type is the Normal (aka Gaussian) random variable, parametrized by a mean ($\mu$) and variance ($\sigma^2$), or sometimes equivalently written as mean and variance ($\sigma^2$). If $X$ is a normal variable we write $X \sim N(\mu, \sigma^2)$. The normal is important for many reasons: it is generated from the summation of independent random variables and as a result it occurs often in nature. Many things in the world are not distributed normally but data scientists and computer scientists model them as Normal distributions anyways. Why? Because it is the most entropic (conservative) modelling decision that we can make for a random variable while still matching a particular expectation (average value) and variance (spread).

The Probability Density Function (PDF) for a Normal $X \sim N(\mu, \sigma^2)$ is: \begin{align*} f_X(x) = \frac{1}{\sigma \sqrt{2\pi} } e ^{\frac{-(x-\mu)^2}{2\sigma^2}} \end{align*} Notice the $x$ in the exponent of the PDF function. When $x$ is equal to the mean ($\mu$) then e is raised to the power of $0$ and the PDF is maximized.

By definition a Normal has $\E[X] = \mu$ and $\var(X) = \sigma^2$.

There is no closed form for the integral of the Normal PDF, and as such there is no closed form CDF. However we can use a transformation of any normal to a normal with a precomputed CDF. The result of this mathematical gymnastics is that the CDF for a Normal $X \sim N(\mu, \sigma^2)$ is: \begin{align*} F_X(x) = \Phi\left(\frac{x - \mu}{\sigma}\right) \end{align*}

Where $\Phi$ is a precomputed function that represents that CDF of the Standard Normal.

Normal (aka Gaussian) Random Variable

Notation: $X \sim \N(\mu, \sigma^2)$
Description: A common, naturally occuring distribution.
Parameters: $\mu \in \mathbb{R}$, the mean.
$\sigma^2 \in \mathbb{R}$, the variance.
Support: $x \in \mathbb{R}$
PDF equation: $$f(x) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\Big(\frac{x-\mu}{\sigma}\Big)^2}$$
CDF equation: $$\begin{align} F(x) &= \phi(\frac{x-\mu}{\sigma}) && \text{Where $\phi$ is the CDF of the standard normal} \end{align}$$
Expectation: $\E[X] = \mu$
Variance: $\var(X) = \sigma^2$
PDF graph:
Parameter $\mu$:
Parameter $\sigma$:

Linear Transform

If $X$ is a Normal such that $X \sim N(\mu, \sigma^2)$ and $Y$ is a linear transform of $X$ such that $Y = aX + b$ then $Y$ is also a Normal where: \begin{align*} Y \sim N(a\mu + b, a^2\sigma^2) \end{align*}

Projection to Standard Normal

For any Normal $X$ we can find a linear transform from $X$ to the standard normal $Z \sim N(0, 1)$. Note that $Z$ is the typical notation choice for the standard normal. For any normal, if you subtract the mean ($\mu$) of the normal and divide by the standard deviation ($\sigma$) the result is always the standard normal. We can prove this mathematically. Let $W = \frac{X - \mu}{\sigma}$: \begin{align*} W &= \frac{X -\mu}{\sigma} && \text{ Transform X: Subtract by $\mu$ and diving by $\sigma$} \\ & = \frac{1}{\sigma}X - \frac{\mu}{\sigma} && \text{ Use algebra to rewrite the equation}\\ & = aX + b && \text{ Linear transform where $a = \frac{1}{\mu}$, $b = - \frac{\mu}{\sigma}$ }\\ &\sim N(a\mu + b, a^2\sigma^2) && \text{ The linear transform of a Normal is another Normal}\\ &\sim N(\frac{\mu}{\sigma} - \frac{\mu}{\sigma}, \frac{\sigma^2}{\sigma^2}) && \text{ Substituting values in for $a$ and $b$}\\ &\sim N(0, 1) && \text{ The standard normal} \end{align*}

Using this transform we can express $F_X(x)$, the CDF of $X$, in terms of the known CDF of $Z$, $F_Z(x)$. Since the CDF of $Z$ is so common it gets its own Greek symbol: $\Phi(x)$ \begin{align*} F_X(x) &= P(X \leq x) \\ &= P\left(\frac{X - \mu}{\sigma} \leq \frac{x - \mu}{\sigma}\right) \\ &= P\left(Z \leq \frac{x - \mu}{\sigma}\right)\\ &= \Phi\left(\frac{x - \mu}{\sigma}\right) \end{align*}

The values of $\Phi(x)$ can be looked up in a table. Every modern programming language also has the ability to calculate the CDF of a normal random variable!

Example: Let $X\sim \mathcal{N}(3, 16)$, what is $P(X > 0)$? \begin{align*} P(X > 0) &= P\left(\frac{X-3}{4} > \frac{0-3}{4}\right) = P\left(Z > -\frac{3}{4}\right) = 1 - P\left(Z \leq -\frac{3}{4}\right)\\ &= 1 - \Phi(-\frac{3}{4}) = 1 - (1- \Phi(\frac{3}{4})) = \Phi(\frac{3}{4}) = 0.7734 \end{align*} What is $P(2 < X < 5)$? \begin{align*} P(2 < X < 5) &= P\left(\frac{2 - 3}{4} < \frac{X-3}{4} < \frac{5-3}{4}\right) = P\left(-\frac{1}{4} < Z < \frac{2}{4}\right)\\ &= \Phi(\frac{2}{4})-\Phi(-\frac{1}{4}) = \Phi(\frac{1}{2})-(1 - \Phi(\frac{1}{4})) = 0.2902 \end{align*}

Example: You send voltage of 2 or -2 on a wire to denote 1 or 0. Let $X$ = voltage sent and let $R$ = voltage received. $R = X + Y$, where $Y \sim \mathcal{N}(0, 1)$ is noise. When decoding, if $R \geq 0.5$ we interpret the voltage as 1, else 0. What is $P(\text{error after decoding}|\text{original bit} = 1)$? \begin{align*} P(X + Y < 0.5) &= P(2 + Y < 0.5) \\ &= P(Y < -1.5) \\ &= \Phi(-1.5) \\ &\approx 0.0668 \end{align*}

Example: The 67% rule of a normal within one standard deviation. What is the probability that a normal variable $X \sim \N(\mu, \sigma)$ has a value within one standard deviation of its mean?

$$ \begin{align} \p(\text{Within one }\sigma \text{ of } \mu) &= \p(\mu - \sigma < X < \mu + \sigma) \\ &= \p(X < \mu + \sigma) - \p(X < \mu - \sigma) && \text{Prob of a range} \\ &= \Phi\Big(\frac{(\mu + \sigma)-\mu}{\sigma}\Big) - \Phi\Big(\frac{(\mu - \sigma)-\mu}{\sigma}\Big) && \text{CDF of Normal} \\ &= \Phi\Big(\frac{\sigma}{\sigma}\Big) - \Phi\Big(\frac{- \sigma}{\sigma}\Big) && \text{Cancel $\mu$s} \\ &= \Phi(1) - \Phi(-1) && \text{Cancel $\sigma$s} \\ &\approx 0.8413 - 0.1587 \approx 0.683 && \text{Plug into $\Phi$} \end{align} $$ We made no assumption about the value of $\mu$ or the value of $\sigma$ so this will apply to every single normal random variable. Since it uses the Normal CDF this doesn't apply to other types of random variables.