$\DeclareMathOperator{\p}{P}$ $\DeclareMathOperator{\P}{P}$ $\DeclareMathOperator{\c}{^C}$ $\DeclareMathOperator{\or}{ or}$ $\DeclareMathOperator{\and}{ and}$ $\DeclareMathOperator{\var}{Var}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\Std}{Std}$ $\DeclareMathOperator{\E}{E}$ $\DeclareMathOperator{\std}{Std}$ $\DeclareMathOperator{\Ber}{Bern}$ $\DeclareMathOperator{\Bin}{Bin}$ $\DeclareMathOperator{\Poi}{Poi}$ $\DeclareMathOperator{\Uni}{Uni}$ $\DeclareMathOperator{\Geo}{Geo}$ $\DeclareMathOperator{\NegBin}{NegBin}$ $\DeclareMathOperator{\Beta}{Beta}$ $\DeclareMathOperator{\Exp}{Exp}$ $\DeclareMathOperator{\N}{N}$ $\DeclareMathOperator{\R}{\mathbb{R}}$ $\DeclareMathOperator*{\argmax}{arg\,max}$ $\newcommand{\d}{\, d}$

Multivariate Gaussian


By Lisa Yan and Chris

Many times, we talk about multiple Normal (Gaussian) random variables, otherwise known as Multivariate Normal (Gaussian) distributions. Here, we talk about the two-dimensional case, called a Bivariate Normal Distribution. $X_1$ and $X_2$ follow a bivariate normal distribution if their joint PDF is \[ f(x_1, x_2) = \frac{1}{2\pi \sigma_1 \sigma_2 \sqrt{1 - \rho^2}} e^{-\dfrac{1}{2(1-\rho^2)} \left( \dfrac{(x_1 - \mu_1)^2}{\sigma_1^2} - \dfrac{2\rho(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1 \sigma_2} + \dfrac{(x_2 - \mu_2)^2}{\sigma_2^2} \right)}\]

This is a specific form of probabilistic model which is parametric. In fact a bivariate nrmal distribution has five parameters: $\mu_1, \mu_2, \sigma_1^2, \sigma_2^2, \rho$. We often write the distribution of the vector $\mathbf{X} = (X_1, X_2)$ as $\mathbf{X}\sim \mathcal{N}(\vec{\mu}, \vec{\Sigma})$, where $\vec{\mu} = (\mu_1, \mu_2)$ is a mean vector and $\vec{\Sigma} = \begin{bmatrix} \sigma_1^2 & \rho \sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma_2^2 \end{bmatrix}$ is a covariance matrix.

Note that $\rho$ is the correlation between $X_1$ and $X_2$, and $\sigma_1, \sigma_2 > 0$. We defer to Ross Chapter 6, Example 5d, for the full proof, but it can be shown that the marginal distributions of $X_1$ and $X_2$ are $X_1\sim\mathcal{N}(\mu_1, \sigma_1^2)$ and $X_2\sim\mathcal{N}(\mu_2, \sigma_2^2)$, respectively.

Independent Normal RVs

Let $\mathbf{X}=(X_1, X_2)\sim\mathcal{N}(\vec{\mu}, \vec{\Sigma})$, where $\vec{\mu} = (\mu_1, \mu_2)$ and $\vec{\Sigma} = \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}$, a diagonal covariance matrix.

Noting that the correlation between $X_1$ and $X_2$ is $\rho = 0$: \begin{align*} f(x_1, x_2) &= \frac{1}{2\pi \sigma_1 \sigma_2 } e^{-\dfrac{1}{2} \left( \dfrac{(x_1 - \mu_1)^2}{\sigma_1^2} + \dfrac{(x_2 - \mu_2)^2}{\sigma_2^2} \right)} \\ &= \frac{1}{\sigma_1 \sqrt{2\pi}} e^{-(x_1 - \mu_1)^2/(2 \sigma_1^2)} \frac{1}{\sigma_2 \sqrt{2\pi}} e^{-(x_2 - \mu_2)^2/(2 \sigma_2^2)}\\ &= f(x_1) \cdot f(x_2) && \text{} \end{align*}

Since the two distributions factorize for Bivariate Normal RVs, if $\text{Cov}(X_1, X_2) = 0$, then $X_1$ and $X_2$ are independent. Wild! More generally, you can model a collection of continuous joint random variables $\vec{X} = (X_1 \dots X_n)$ as being a composition of independent normals with means $\vec{mu} = (\mu_1 \dots \mu_n)$ and standard deviations $\vec{\sigma} = (\sigma_1 \dots \sigma_n)$. If you do so the PDF is: \begin{align*} f(\vec{x}) &= \prod_{i=1}^n f(x_i) \\ &= \prod_{i=1}^n \frac{1}{\sigma_i \sqrt{2\pi} } e ^{\frac{-(x-\mu_i)^2}{2\sigma_i^2}} \end{align*} And the CDF is \begin{align*} F(\vec{x}) &= \prod_{i=1}^n F(x_i) \\ &= \prod_{i=1}^n \Phi(\frac{x_i-\mu_i}{\sigma_i}) \end{align*}