An Introduction to Decision Theory¶

This chapter of the course will present a practical example (math and code included) and introduce the elements of decision theory. These four/five (TODO) elements are all necessary to practice decision theory. Most should be familiar with situations where the correct choice is not known or obvious. The rational decision-maker has a number of actions they can take in a situation, and depending on what is really going on (the state of nature), the decision-maker experiences a certain amount of gain or loss. Then they may perform experiments to further inform their decisions and devise a strategy that is optimal for them.

States of Nature, Experiment, Actions, and Loss¶

Imagine an employee who has worked at their firm under the same manager long enough to know about how long certain kinds of tasks should take and what kind of temperament their manager has. Also imagine that the firm has a rather opaque structure so that the only hint as to task’s importance is given by their manager’s interpretation of their bosses’ instructions. They have learned a happy manager equals less stress at work.

At the beginning of the week, the manager comes to their desk and gives the employee a new task and asks how long it will take to accomplish. The employee knows they can finish by Thursday working at a normal pace, but first asks in return how urgent the task is, knowing that their manager is a little high-strung and sometimes exaggerates deadlines. With that information, the employee then gives an estimate based on how long it should take while reassuring their manager and trying not to overpromise. Whatever estimate they give, they will work towards earnestly.

We will fill in details, but this summary already includes all the elements an individual needs in order to apply decision theory.

State of nature. In this case, it is “What is the actual expectation from higher management?”. The true state of nature is rarely, if ever, known at the time that a decision is made.
Experiment. Before giving a time estimate, the employee asks about the urgency level or deadline of the assignment. The employee is doing their best to gather information about the state of nature. Is this a top priority or can it wait? Will the manager stay on top of them until it is done or will they forget about it until the deadline?
Actions. The employee has the option to give a few different time estimates. They can give an estimate that aligns with the time needed to complete the task, or an estimate that is shorter, or an estimate that is longer.
Loss or utility. In this situation, we can think of the amount of stress induced in the employee as an example of a loss function. The employee’s goal is to minimize the amount of stress (or loss) they experience during the week after examining all the different possible outcomes. (Loss and utility are discussed after the review of basic probability, but for now know that they are unit-less. You can think of loss and utility as costs and benefits.)

Making a Decision¶

Know Your Loss¶

Taking this information, the employee can construct a table with the loss they would observe during the week based on the true state of nature and the time estimate they give to their manager. In this example, the employee prefers to give longer estimates for unimportant tasks so by the time their manager asks about them, they have finished or nearly finished the task already. For urgent tasks, they give shorter time estimates to signal to their manager that they understand the need for urgency and to try to finish more quickly so they can have some respite at the end of the week. For more-or-less urgent tasks, they are more indifferent to the time estimate they give, but feel that the safest choice is to go with the normal time to completion.

Table 1.1: Loss Given State of Nature and Action Taken¶

		Time Estimate
Urgency Level	Wednesday	Thursday	Friday
Not urgent	3	2	1
Urgent	5	3	4
Critical	6	7	9

[1]:

import numpy as np
loss = np.array([
    [3, 2, 1],
    [5, 3, 4],
    [6, 7, 9]
])

Conduct an Experiment¶

After deciding on the cost of each possible outcome, the employee attempts to reduce the amount of risk they take on by collecting information on the true state of nature. Since their manager is high-strung, they are more likely to overestimate the urgency of a task. If the task is not urgent, there is still a 50 percent probability that they will say it is urgent instead. If the task is urgent, there is a 50 percent probability that they will instead say it is critical. For simplicity, we will say that the manager never mistakes a critical task for something less urgent.

Table 1.2: Conditional Probability of Experiment Outcome Given State of Nature¶

		Manager’s Answer
Urgency Level	Not urgent	Urgent	Critical
Not urgent	1/2	1/2	0
Urgent	0	1/2	1/2
Critical	0	0	1

Notice that each row is a conditional distribution that sums to 1, the probability of a family answering a certain way given their state of nature. This is also called a “frequency of response” table

[2]:

# We call this the "frequency of response table"
# I have included a few comments to connect these table with notation
# used in the probability review. Here `x` is the outcome of the
# experiment and `theta` is the state of nature.
# P(x|theta)
x_given_theta = np.array([
    [0.5, 0.5, 0.0],
    [0.0, 0.5, 0.5],
    [0.0, 0.0, 1.0],
])

Strategies, Average Losses, and Expected Loss¶

The employee now has acquired information so they can determine their best action. The employee may select any action, but through experience and preference eventually they will likely settle on a specific strategy, or a set of actions mapped to the manager’s response.

In this case, there are 3 possible outcomes to the experiment and 3 possible actions to take for each outcome which leads to 9 total possible strategies. The set of strategies will be notated as \(S\). If an action is notated as \(a_i\), the a set will be written as \(s = (a_i, a_j, a_k)\). For example, \(a_0\) represents the action of telling the manager to expect the task to be completed by Wednesday and \(s_1 = (a_0, a_1, a_2)\) is notation for Strategy 1 which is:

When the manager responds with “not urgent”, give an estimate of Wednesday.
When the manager responds with “urgent”, give an estimate of Thursday.
When the manager responds with “critical”, give an estimate of Friday.

In contrast, \(s_2 = (a_2, a_1, a_0)\) tells us that Strategy 2 is:

When the manager responds with “not urgent”, give an estimate of Friday.
When the manager responds with “urgent”, give an estimate of Thursday.
When the manager responds with “critical”, give an estimate of Wednesday.

We will also include a Strategy 3 \(s_3 = (2,2,1)\) and leave the reader to interpret the notation as practice.

[3]:

strat1 = np.array([0, 1, 2]) # Strategy 1 where the values represent the indices of actions
strat2 = np.array([2, 1, 0]) # Strategy 2 ...
strat3 = np.array([2, 2, 1]) # Strategy 3 ...

Now we that we have rules for acting on results from an experiment, how should we decide which strategy is best?

A start is to compute the average loss that the employee faces from each state of nature for each of the strategies. For each state of nature and strategy, we find from Table 1.2 the probability of taking an action. (Note that we may know the true state of nature in our contrived example, but the employee will not know this until after the end of the week, if they ever find out.) For example, if the state of nature is a critical task, the manager is certain to express that the task is critical. Then if the employee applies Strategy 1, we know that they will tell the manager it will be done by Friday. Therefore, the average loss when applying Strategy 1 when the task is critical is \(100\% \cdot 9 = 9\). If the state of nature is a not-urgent task, then we are not certain how urgent the manager will say the task is. Half the time they will say it is urgent and half the time they will say it is not. So if the employee applies Strategy 1 in this case, they will say that the task will be done by Wednesday half the time and the other half of the time they will say that the task will be done by Thursday. The average loss of this situation is then \(50\% \cdot 3 + 50\% \cdot 2 = 2.5\).

Doing this for each state of nature and strategy gives us Table 1.3:

		Strategies
Urgency Level	Strategy 1	Strategy 2	Strategy 3
Not urgent	2.5	1.5	1
Urgent	3.5	4	3.5
Critical	9	6	7

[12]:

avg_losses = []
strats = [strat1, strat2, strat3]
# Apply strategies one at a time
# NOTE: using loops with numpy arrays is very inefficient
for strat in strats:
    # e.g. when strat == strat3 == np.array([2, 2, 1]), avg_loss looks like this.
    # .sum(axis=1) flattens all columns into one, making it a one-dimensional array
    #     which numpy treats as rows
    # ([[0.5, 0.5, 0.0]    [[1, 1, 2] )                 [[0.5, 0.5, 0.0]
    # ( [0.0, 0.5, 0.5]  *  [4, 4, 3] )             ==   [0.0, 2.0, 1.5]              == [1.0, 3.5, 7.0]
    # ( [0.0, 0.0, 1.0]]    [9, 9, 7]]).sum(axis=1)      [0.0, 0.0, 7.0]].sum(axis=1)
    avg_loss = (x_given_theta * loss[:,strat]).sum(axis=1)
    avg_losses.append(avg_loss)
avg_losses = np.array(avg_losses).T
avg_losses

[12]:

array([[2.5, 1.5, 1. ],
       [3.5, 4. , 3.5],
       [9. , 6. , 7. ]])

Very quickly, we can see that Strategy 1 generates as much or more loss for the employee than Strategy 3 no matter the state of nature. Therefore, we can say that “\(s_3\) dominates \(s_1\)” and discard \(s_1\).

Now let’s compare Strategies 2 and 3.

	Strategies
Urgency Level	Strategy 2	Strategy 3
Not urgent	1.5	1
Urgent	4	3.5
Critical	6	7

How can we choose between these? This question is rather difficult if the employee may not know the average frequency with which they are assigned tasks with a certain urgency. For instance, if the manager trusts them and often assigns them the most urgent tasks, Strategy 2 might be preferable since the loss for critical tasks is reduced using that strategy. As you can see, the frequency (or probability) of the true state of nature will have an impact on our expected loss. That’s right. The expected loss is the weighted average of the weighted averages we obtained The employee might receive more non-urgent tasks while they build experience and observe expected losses like these,

\[E_{loss}(s_2) = 1.5 \cdot 1/2 + 4 \cdot 1/4 + 6 \cdot 1/4 = 3.250\]

\[E_{loss}(s_3) = 1 \cdot 1/2 + 3.5 \cdot 1/4 + 7 \cdot 1/4 = 3.125\]

so that the expected loss of Strategy 3 is less than the expected loss of Strategy 2. But maybe as the employee gains more experience, they will be handed more urgent and critical tasks, like so,

\[E_{loss}(s_2) = 1.5 \cdot 1/5 + 4 \cdot 2/5 + 6 \cdot 2/5 = 4.300\]

\[E_{loss}(s_3) = 1 \cdot 1/5 + 3.5 \cdot 2/5 + 7 \cdot 2/5 = 4.400\]

and Strategy 2 would be more attractive.

[15]:

freqs1 = np.array([0.5, 0.25, 0.25])
print(f"Strategy 2 Expected Loss: {(avg_losses[:,1] * freqs1).sum() : .3f}")
print(f"Strategy 3 Expected Loss: {(avg_losses[:,2] * freqs1).sum() : .3f}")

Strategy 2 Expected Loss:  3.250
Strategy 3 Expected Loss:  3.125

[17]:

freqs2 = np.array([.2, .4, .4])
print(f"Strategy 2 Expected Loss: {(avg_losses[:,1] * freqs2).sum() : .2f}")
print(f"Strategy 3 Expected Loss: {(avg_losses[:,2] * freqs2).sum() : .2f}")

Strategy 2 Expected Loss:  4.30
Strategy 3 Expected Loss:  4.40

Thus, the strategy that the employee ends up selecting may depend on the frequency of their customers’ states of nature. However, in the event that they do not wish to go to all that trouble of calculating expected loss, they may also select a strategy by the minimax method. The idea of minimax is to minimize the greatest possible loss. Here, the most loss that can come from Strategy 2 is 6 and the most that can come from Strategy 3 is 7, so the minimax method would prefer Strategy 2 over Strategy 3.