Unveiling the Reliability of COVID-19 Test Results: A Probabilistic Approach using Bayes' Rule and Python Simulation

Unveiling the Reliability of COVID-19 Test Results: A Probabilistic Approach using Bayes' Rule and Python Simulation

“Are COVID-19 tests accurate? Let’s ask Thomas Bayes”, Nir Regev
📌
For more content check out the Circuit of Knowledge.

Introduction:

In the midst of the COVID-19 pandemic, accurate testing plays a pivotal role in controlling the spread of the virus and making informed decisions. However, interpreting test results is not always straightforward, as it depends on various factors such as the prevalence of the disease in the population and the characteristics of the test itself. In this blog post, we will delve into a probabilistic approach to analyze the reliability of COVID-19 test results using Bayes' Rule and complement our analysis with a Python simulation.

Problem Statement:

Consider a city experiencing a new outbreak of COVID-19. The health department estimates that, at this moment, 1% of the city's population is infected with the virus. A pharmaceutical company has developed a new COVID-19 test with a sensitivity (true positive rate) of 95% and a specificity (true negative rate) of 98%. Sensitivity is the probability that the test correctly identifies an infected person as positive, while specificity is the probability that the test correctly identifies a non-infected person as negative.

Our objectives are to determine:

  1. The probability that a person who tests positive is actually infected with the virus.
  2. The probability that a person who tests negative is actually not infected with the virus.

Solution:

To tackle this problem, we will employ Bayes' Rule, a fundamental principle in probability theory that allows us to update our beliefs based on new evidence. Let's define the following events:

  • AA: The event that a person is infected with COVID-19.
  • AA': The event that a person is not infected with COVID-19.
  • BB: The event that a person tests positive for COVID-19.
  • BB': The event that a person tests negative for COVID-19.

Given:

  • P(A)=0.01P(A) = 0.01 (Prevalence or prior probability of being infected)
  • P(A)=0.99P(A') = 0.99 (Probability of not being infected)
  • P(BA)=0.95P(B|A) = 0.95 (Sensitivity or probability of testing positive given that the person is infected)
  • P(BA)=0.98P(B'|A') = 0.98 (Specificity or probability of testing negative given that the person is not infected)

Using Bayes' Rule, we aim to calculate:

  1. P(AB)P(A|B), the probability that a person is actually infected given they tested positive.
  2. P(AB)P(A'|B'), the probability that a person is actually not infected given they tested negative.

Detailed mathematical derivation:

  1. Calculate P(AB)P(A|B) using Bayes Rule:
P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}

where P(B)P(B), the probability of testing positive, can be found using the Law of Total Probability:

P(B)=P(BA)P(A)+P(BA)P(A),P(B) = P(B|A)P(A) + P(B|A')P(A'),

where P(BA)P(B|A') is the probability of testing positive when not infected, which can be calculated as 1P(BA)=10.98=0.021 - P(B'|A') = 1 - 0.98 = 0.02, thus,

P(B)=0.950.01+0.020.99=0.0293P(B) = 0.95 \cdot 0.01 + 0.02 \cdot0.99 = 0.0293

Thus, The calculated probability that a person is actually infected given they tested positive is

P(AB)=0.990.010.02=0.337,P(A|B) = \frac{0.99 \cdot 0.01} {0.02} = 0.337,

approximately 33.7%33.7\%.

  1. Similarly, we can find P(AB)P(A'|B') using Bayes' Rule:
  2. P(AB)=P(BA)P(A)P(B),P(A'|B') = \frac{P(B'|A')P(A')}{P(B')},

where P(B)P(B'), the probability of testing negative, is:

P(B)=P(BA)P(A)+P(BA)P(A).P(B') = P(B'|A)P(A) + P(B'|A')P(A').

Given P(BA)=1P(BA)=10.95=0.05P(B'|A) = 1 - P(B|A) = 1 - 0.95 = 0.05 we get that

P(B)=0.050.01+0.980.99=0.97.P(B') = 0.05 \cdot 0.01 + 0.98 \cdot 0.99 = 0.97.

Thus, The calculated probability that a person is actually not infected given they tested negative, P(AB)P(A'|B') is approximately 0.9990.999 or 99.9%99.9\% chance.

Summary:

These results demonstrate the importance of considering both the prevalence of the disease in the population and the characteristics of the test (such as its sensitivity and specificity) when interpreting test results. Despite the test's high sensitivity and specificity, the relatively low prevalence of the disease leads to a situation where a significant portion of positive test results could be false positives. This underscores the necessity of careful analysis in medical testing and decision-making.

Python Monte-Carlo Simulation:

To complement our analytical solution and validate the results, let's perform a Monte-Carlo simulation using Python. The simulation code is as follows:

# -*- coding: utf-8 -*-
"""
Created on Mon Mar 18 13:56:30 2024

@author: Nir Regev
"""

import numpy as np

# Number of simulations: This represents how many individuals we simulate to mimic a large experiment.
# By increasing this number, we make our simulation closer to real-life scenarios.
n_simulations = 100000

# Prevalence of the disease in the population: This is the proportion of individuals who are actually infected with the disease.
# In this case, we're assuming a prevalence rate of 1%, as mentioned in the problem statement.
prevalence = 0.01

# Sensitivity (True positive rate): The probability that the test correctly identifies an infected person as positive.
# A sensitivity of 95% means the test will correctly identify 95% of infected individuals as having the disease.
sensitivity = 0.95

# Specificity (True negative rate): The probability that the test correctly identifies a non-infected person as negative.
# A specificity of 98% means the test will correctly identify 98% of non-infected individuals as not having the disease.
specificity = 0.98

# Simulate 'infected' array: This randomly assigns a true/false (infected/not infected) status to each individual,
# based on the prevalence rate. It uses a uniform distribution to decide if each simulated individual is infected.
infected = np.random.rand(n_simulations) < prevalence

# Simulate test outcomes: This step simulates the outcome of the COVID-19 test for each individual.
# It uses the sensitivity and specificity values to determine the outcome based on whether the individual is actually infected.
test_positive = np.random.rand(n_simulations) < (infected * sensitivity + (1 - infected) * (1 - specificity))

# Calculate the overall probability of testing positive (P(B)).
# This is done by finding the mean (average) of the 'test_positive' array,
# which represents the proportion of individuals who tested positive in the simulation.
P_B = test_positive.mean()

# Calculate the probability of being infected given a positive test result (P(A|B)).
# First, we find the proportion of individuals who are both infected and tested positive.
# Then, we divide this by P(B), the overall probability of testing positive.
P_A_given_B = (test_positive & infected).mean() / P_B

# For those who tested negative: We invert the 'test_positive' array to get 'test_negative'.
test_negative = ~test_positive

# Calculate the overall probability of testing negative (P(B')).
P_B_complement = test_negative.mean()

# Calculate the probability of not being infected given a negative test result (P(A'|B')).
# We find the proportion of individuals who are not infected and tested negative,
# then divide this by P(B'), the overall probability of testing negative.
P_A_complement_given_B_complement = ((~infected) & test_negative).mean() / P_B_complement

# Finally, we print out the calculated probabilities which tell us how reliable positive and negative test results are,
# according to our simulation based on the given sensitivity, specificity, and disease prevalence.
print("Probability of being infected given a positive test (P(A|B)):", P_A_given_B)
print("Probability of not being infected given a negative test (P(A'|B')):", P_A_complement_given_B_complement)

The simulation generates a large number of individuals (100,000 in this case) and assigns them a true infection status based on the prevalence rate. It then simulates the test outcomes based on the sensitivity and specificity of the test. Finally, it calculates the probabilities of interest using the simulated data.

Results and Interpretation:

Running the Python simulation yields the following results:

Probability of being infected given a positive test (P(A|B)): 0.33158813263525305
Probability of not being infected given a negative test (P(A'|B')): 0.9994949494949495

These results align closely with our analytical solution.

Conclusion:

In this blog post, we explored a probabilistic approach to analyze the reliability of COVID-19 test results using Bayes' Rule. We defined the problem statement, provided a detailed mathematical derivation, and complemented our analysis with a Monte-Carlo Python simulation. The results obtained from both the analytical solution and the simulation emphasize the importance of considering the prevalence of the disease and the characteristics of the test when interpreting test results.

Probabilistic reasoning, particularly Bayes' Rule, provides a powerful framework for making informed decisions under uncertainty.

I hope this blog post has provided valuable insights into the reliability of COVID-19 test results and the application of probabilistic reasoning in real-world scenarios. Stay tuned for more in-depth technical discussions on related topics! PS: A video lecture explaining this example can be watched here: