{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Bayes' Rule in Real Life"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Simplified Bayes' Rule**\n",
    "$$ P(A|B) \\propto P(B|A) \\cdot P(A) $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say that I want to know if I have COVID. The tests do not actually tell us if we COVID or not. These tests have sensitivities and specificities which are conditional probabilities. Because we use these to update our beliefs, we call them likelihoods. The most accepted COVID tests have a specificity >97% and a sensitivity >80%. (I'll put a link here.)\n",
    "\n",
    "We are going to go through a real-life example involving an individual named Alex and COVID tests. The events where Alex is or is not infected by COVID-19 will be denoted as $COVID+$ and $COVID-$, respectively. The events where Alex tests positive or negative for COVID will be denoted as $TEST+$ and $TEST-$, respectively. Specificity and sensitivity represent the true negative rate and true positive rate, respectively. This means specificity = $P(TEST-|COVID-)$ and sensitivity = $P(TEST+|COVID+)$. In other words, specificity is the chance of receiving a negative result if the true state of nature is negative, and sensitivity is the chance of receiving a positive result if the true state of nature is positive."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is our prior belief?\n",
    "\n",
    "Let's say that Alex was exposed to COVID yesterday and their friend started displaying symptoms today. Upon hearing this from their friend, Alex will automatically begin forming a prior belief about how likely it was they contracted COVID from their friend. This will depend on their proximity to each other the day before, the length of exposure, immunizations, and trust in their immune system and information about COVID-19. For simplicity, let's say that after considering these factors, Alex believes he has about 50/50 chance of having contracted COVID and decides that is enough to get tested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "prior = .5 # Prior belief of having contracted COVID-19"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is our likelihood?\n",
    "\n",
    "Alex gets a PCR test done which will have the specificity and sensitivity defined before. These will become part of Bayes' formula as soon as Alex receives a test result. Let us investigate the more likely case, where Alex's test result is negative."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "sens = .8\n",
    "spec = .97"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the sensitivity and specifity, we can build a table that represents the joint probability mass function of these two events. Because both random variables are discrete, we will have a 2x2 table. We will put the true states of nature in the columns and the test results in the rows. We can find the probability of each combination of events happening like so:\n",
    "\n",
    "$$ P(\\text{TEST- & COVID-}) = P(COVID-) \\cdot P(TEST-|COVID-) $$\n",
    "$$ P(\\text{TEST+ & COVID-}) = P(COVID-) \\cdot P(TEST+|COVID-) $$\n",
    "$$ P(\\text{TEST- & COVID+}) = P(COVID+) \\cdot P(TEST-|COVID+) $$\n",
    "$$ P(\\text{TEST+ & COVID+}) = P(COVID+) \\cdot P(TEST+|COVID+) $$\n",
    "\n",
    "Percent chance of true negatives and true positives are located in the top-left and bottom-right positions, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "table = np.zeros((2,2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "table[0,0] = (1-prior) * spec\n",
    "table[1,0] = (1-prior) * (1 - spec)\n",
    "table[0,1] = prior * (1-sens)\n",
    "table[1,1] = prior * sens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>COVID-</th>\n",
       "      <th>COVID+</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>TEST-</th>\n",
       "      <td>0.485</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TEST+</th>\n",
       "      <td>0.015</td>\n",
       "      <td>0.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       COVID-  COVID+\n",
       "TEST-   0.485     0.1\n",
       "TEST+   0.015     0.4"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(table, index=['TEST-', 'TEST+'], columns=['COVID-', 'COVID+'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is the posterior?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we can see the probability of each combination of events and we know that Alex's test came back negative, we only need to consider the probabilities in the first row. Then the probabilities .485 and .1 become .83 and .17 so that they sum to 1. And after receiving this one test, Alex can conclude that they have only a .17, or 17 percent, chance of having contracted COVID-19."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8290598290598291\n",
      "0.17094017094017097\n"
     ]
    }
   ],
   "source": [
    "print(.485  / (.485 + .1))\n",
    "print(.1  / (.485 + .1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "posterior = table[0,1] / table[0,:].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.17094017094017092"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "posterior"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What to do with Bayes' Rule?\n",
    "\n",
    "Using this example, any individual might be able to decide whether it is safe to visit relatives or whether they should social distance or even quarantine. However, for a business owner with employees, it would enable them to decide whether it is too risky to allow a sick employee come in to work. In addition, the business owner should keep in mind the effectiveness of a test, namely the specificity and sensitivity and also consider selecting tests that favor one over the other. In this example, for instance, the tests have a higher specificity so that Alex's chances of receiving a false positive on the test are close to 0 (0.015), but some employers might prefer more sensitive tests (and less specific) so that false positives are more common. The costs of false positives versus false negatives (and even probabilities of each event) will depend on the individual's utility function and sometimes decision makers have control over which attributes they get to see in a test or experiment. We will discuss this in the next section."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}