$\DeclareMathOperator{\p}{P}$ $\DeclareMathOperator{\P}{P}$ $\DeclareMathOperator{\c}{^C}$ $\DeclareMathOperator{\or}{ or}$ $\DeclareMathOperator{\and}{ and}$ $\DeclareMathOperator{\var}{Var}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\Std}{Std}$ $\DeclareMathOperator{\E}{E}$ $\DeclareMathOperator{\std}{Std}$ $\DeclareMathOperator{\Ber}{Bern}$ $\DeclareMathOperator{\Bin}{Bin}$ $\DeclareMathOperator{\Poi}{Poi}$ $\DeclareMathOperator{\Uni}{Uni}$ $\DeclareMathOperator{\Geo}{Geo}$ $\DeclareMathOperator{\NegBin}{NegBin}$ $\DeclareMathOperator{\Beta}{Beta}$ $\DeclareMathOperator{\Exp}{Exp}$ $\DeclareMathOperator{\N}{N}$ $\DeclareMathOperator{\R}{\mathbb{R}}$ $\DeclareMathOperator*{\argmax}{arg\,max}$ $\newcommand{\d}{\, d}$

Bayesian Viral Load Test


Question from Fall 2022 Stanford Midterm

We are going to build a Bayesian Viral Load Test which updates a belief distribution regarding a patient's viral load. Though viral load is continuous, in our test we represent it by discretizing the quantity into whole numbers between 0 and 99, inclusive. The units of viral load are the number of viral instances per million samples.

If a person has a viral load of 9 (in other words, 9 viruses out of every 1 million samples) what is the probability that a random sample from the person is a virus?

$$\frac{9}{1,000,000}$$

We test 100,000 samples from one person for the virus. If the person's true viral load is 9, what is the probability that exactly 1 of our 100,000 samples is a virus? Use a computationally efficient approximation to compute your answer. Your approximation should respect that there is 0 probability of getting negative virus samples.

Let's define a random variable $X$, the number of samples that are viral given the true viral load is 9. The question is asking for $P(X = 1)$. We can think about this as a binomial process, where the number of trials $n$ is the number of samples and the probability $p$ is the probability that a sample is viral. \[ n = 100,000, p = \frac{9}{1,000,000} \] Notice that $n$ is very small and $p$ is very large, so we can use the Poisson approximation to approximate our answer. We find $\lambda = np = 100,000 \cdot 9/1,000,000 = 0.9$, so $X \sim$\Poi$(\lambda = 0.9)$. The last step is to use the PMF of the Poisson distribution. \[ P(X = 1) = \frac{(0.9)^1 e^{-0.9}}{1!} \]

Based on what we know about a patient (their symptoms and personal history) we have encoded a prior belief in a list prior where prior[i] is the probability that the viral load equals i. prior is of length 100 and has keys 0 through 99.

Write an equation for the updated probability that the true viral load is $i$ given that we observe a count of 1 virus sample out of 100,000 tested. Recall that $0 \leq i \leq 99$. You may use approximations.

We want to find \begin{equation} P(\text{viral load} = i | \text{observed count of} \frac{1}{100000}) \end{equation} We can apply Bayes Rule to get \begin{equation}\label{eq:bayes} = \frac{P(\text{observed count of} \frac{1}{100000} | \text{viral load} = i)P(\text{viral load} = i)}{P(\text{observed count of} \frac{1}{100000})} \end{equation} We know that we can define a random variable $X \sim \text{observed count out of 100,000} | \text{viral load} = i$, and we can model $X$ as a Poisson approximation to a binomial with $n = 100000$ and $p = \frac{i}{1000000}$, with \[ \lambda = np = 100000 \cdot \frac{i}{1000000} = \frac{i}{10} \] So $X$ can be written as \[ X \sim Poi(\lambda = \frac{i}{10}) \] Now we can rewrite our Bayes Rule equation as \[ = \frac{P(X = 1)P(\text{viral load} = i)}{P(\text{observed count of} \frac{1}{100000})} \] We can now use the Poisson PMF and our given \texttt{prior} to get: \begin{equation} = \frac{\frac{\frac{i}{10}e^{\frac{-i}{10}}}{1!}\cdot \texttt{prior[i]}}{P(\text{observed count of} \frac{1}{100000})} \end{equation} We now need to expand our denominator. We can use the General Law of Total Probability to expand \[ P(\text{observed count of} \frac{1}{100000}) = \sum_{j = 0}^{99} P(\text{observed count of} \frac{1}{100000} | \text{viral load} = i)P(\text{viral load} = i) \] We can rewrite this as \[ = \sum_{j = 0}^{99} \frac{\frac{j}{10}e^{\frac{-j}{10}}}{1!}\cdot \texttt{prior[j]} \] \[ = \sum_{j = 0}^{99} \frac{j}{10}e^{\frac{-j}{10}}\cdot \texttt{prior[j]} \] And finally, we can plug this in to get \[ \boxed{% \frac{ \frac{i}{10} e^{\frac{-i}{10}} \cdot \texttt{prior[i]}}{\sum_{j = 0}^{99} \frac{j}{10}e^{\frac{-j}{10}}\cdot \texttt{prior[j]}}% }. \]