Bias and Noise: Daniel Kahneman on Errors in Decision-Making
Right now, many people are concerned about systematic biases in human decisions. If we care about improving human or algorithm decisions, how can we think about the sources of errors?
Speaking at the Kahneman-Treisman Center for Behavioral Science and Public Policy today is Daniel Kahneman, the Nobel prize-winning psychologist, one of the center’s two namesakes. I was lucky enough to get a ticket to this packed event and liveblog Danny’s talk.
Center Director Eldar Shafir starts out by telling us about the impact of the arrival of Anne Triesman and Danny Kahneman on the Wilson School. Danny was the first psychologist at Princeton’s policy school. Anne was the first psychologist to win the golden brain award, and Danny was the first psychologist to receive the Nobel Prize. Today is Danny’s first visit to the center since it was founded. Next Eldar introduces Betsy Paluck, the deputy director of the center.
Betsy talks about Danny’s research style, his dedication to precision, his willingness to wait for the right answer, and his commitment to talking out ideas, which are legendary. His collaboration with Amos Teversky produced a series of articles that shattered how we think about decision-making, a story that was recently documented in The Undoing Project.
Understanding Error in Decision-Making
Danny starts out by talking about his experiences consulting, and what he’s learned about the idea of error. Over the past few years, he’s been consulting with many large organizations. “I’ve been observing more folly than I expected,” he says. “People often say that the private sector is better than government, and if government is worse than what I’ve seen, then we’re really in trouble.”
We have too much emphasis on bias and not enough emphasis on random noise
Think of what comes to your mind when you say “measurement error,” says Danny. Now think of what comes to your mind when you think about “judgment error.” When we think about measurement error, we think it’s random. The main association to judgment error is bias. This is unfortunate, says Danny. We have too much emphasis on bias and not enough emphasis on random noise, says Danny. Noise is easier to measure and easier to control than bias. To think about statistical bias, think about your bathroom scale. Most scales are biased– very few bathroom scales are unbiased. A scale that is not very expensive is likely to show you different weights based on where you’re standing. If you stand on the scale several times, you’ll get different weights– that’s noise.
Noise, Danny tells us is like arrows that miss the mark randomly, while bias misses the mark consistently. Bias and noise are independent and shouldn’t be confused. Something can be both noisy and biased. People frequently think that noise cancels out, because the mean tends toward zero. This isn’t true; the standard measures of error in statistics add up. Both bias and noise are additive.
One important advantage of noise is that you can measure noise without knowing where the target was. When you remove the target, you have no idea if the measurement was accurate or not, but you can still see whether there was noise.
Many people focus on noise between subjects and within subjects, says Danny. In one study, Radiologists who see the same x-ray gave different diagnoses to the same x-ray 20% of times. In one study, wine tasters, who are supposed to be experts, will rarely agree with themselves when rating the same wine twice. We also have between-subject noise: auditors, product reviewers, and supervisors tend to different quite a lot. Psychologists have known about differences between people (noise) for a long time, but we tend to think more about bias within people.
Improving on Human Decisions By Reducing Noise
Next Danny tells us about a 1955 study by Paul Miehl, who compared the accuracy of professional judgments to the accuracy of very simple statistical models. Very simple rules tend to do better than individuals, found Miehl. Nearly 250 studies have looked at this issue. In 50% of them, the algorithm is clearly better, and in 50% they’re tied. If it’s a tie, then the algorithm wins.
Why are algorithms superior to people at decision-making? This is because algorithms are noise free. If you give it the same stimulus twice, you are going to get the same output. Danny talks about the idea of creating a statistical model predicting what professional appraisers will do. The model of people will be more accurate at predicting the outcome than the people themselves. There’s only one way this could happen: people are very noisy, and the model, which is less noisy, can be more accurate than they are.
If you ask a person to make multiple judgments, they will give different answers, and if you average their answers, the answer is more accurate
Danny emphasizes one point: where there is judgment there is noise, and there’s usually more of it than you think. Where does noise come from? Neurons, perception, internal judgment and more. If you ask a person to make multiple judgments, they will give different answers, and if you average their answers, the answer is more accurate. Why is this so? People put attention toward different things when making decisions at different moments. Individuals also differ, sometimes in systematic ways, he says. From an organization’s point of view, differences between people is noise.
Imagine you have two underwriters in two offices assessing the premium between two risks. If there’s noise, the assessment will depend on which underwriter is on duty that morning. The noise may come from individual bias, but from the point of view of the organization, we have noise.
Measuring Noise in Organizational Decisions
How can we measure noise? Kahneman tells us about an experiment conducted in an insurance company. In this study, Danny looked at guesses by claims adjusters about how much to reserve for an insurance claim. They also looked at underwriters, sharing six realistic insurance cases to 50–60 people. They calculated a statistic to estimate the noise. Underwriters were given a half day to evaluate the cases. When Danny talked to executives, he defined the statistic as follows: take two underwriters, compute the average of their assessment, compute the difference between their assessment, and then divided it by the average. They then compared every pair of underwriters. What would we expect the average to be across all pairs? Executives expected the noise to be 10%. Instead, the experiment found 50% variability. In practical terms, suppose you have two underwriters and one says the premium will be 700 units and the other says 1250, that’s a 50% different. That’s the average amount of noise; in 50% of cases, there is more noise than that between pairs of individuals.
This level of noise is shocking to executives; that level of noise is intolerable. If underwriters are that noisy, it undermines the point of the exercise. The most disappointing result was that experience did not matter. Comparing people with more than 5 years of experience on the job to novices, the experienced people had just as much noise.
Why Organizations Fail to Notice Errors From Noise
Noise is an invisible problem: nobody had guessed that it could occur. How could an organization not know that it has a noise problem? Imagine you’re an underwriter and you see a case. You have a good idea of what the premium ought to be. You respect your colleagues, so you expect them to make the same judgment you do. You don’t imagine that your neighbor, the person at the next table, could give a completely different judgment. People occasionally discover disagreements, but that always occurs as an isolated case. In general, we don’t tend to think about plausible alternatives to the judgments we make. We live in different realities, but we don’t realize how different those realities are.
When you have a judgment and it is the noisy judgment, your judgment is determined largely by chance factors. You don’t know what these factors are. You think you’re judging based on substance, but in fact, that’s just not the case.
Preventing Errors from Noise in Decisionmaking
What can we do about noise? This is fairly easy to answer, says Kahneman. You should try to avoid working with people and have an algorithm that is noise free. Creating an algorithm that will do better than judgers is a very simple exercise, he says, citing Paul Meehl. Even when you you create a multiple regression model and use it to make judgments, there is hardly any difference between a regression model and a simple weighted formula that gives +1s and -1s to things you think are good and bad, says Danny.
In many situations you can’t use an algorithm, either due to objective or organizational reasons. Employees hate to be replaced by algorithms; it’s extremely difficult. What can you do instead? There is a room for human judgment in algorithms as an input to the algorithm. What you don’t want is to have people make the final decision if you can afford it, but people can provide very useful input to algorithms. When algorithms are impossible, you can simulate them, Danny says.
Consider, for example, the problem of selecting personnel, says Danny. We have reached the point where we cannot improve the process further than is currently done. First, don’t try to form a general impression of the candidate; break it up into different areas, and evaluate each feature independently of the others. Nearly 60 years ago, Danny was tasked with setting up an interview system for the Israeli army. He defined six traits, interviewers rated people on those traits, and he averaged the results. People rebelled, saying he was turning them into robots. So Danny made a big concession. He told them: do it my way first, assessing each area one at a time. When you’ve completed the job, close your eyes, and make a judgment, asking “how good a soldier will they be?” He intended this to please & placate them. That followup intuitive judgment processed after assessing the components independently is quite valid, says Danny. These structured interviews are better than unstructured interviews, he tells us.
We can do something about noise, says Danny. By structuring the judgment task, we can improve the quality of the judgment by directing people to the same facts in the same way. Other, more expensive approaches involve using multiple raters.
The Bias Bias: Why We Emphasize Bias Over Noise
While noise is the big problem, we tend to have a bias bias. In the last fifty years, he says, there has been an explosion of study of cognitive biases. There have been thousands of articles and dozens of books on the topic. Why might this be? Cognitive biases are a byproduct of how intuitive thinking works. These errors are not motivated errors. If you trace how intuitive thinking works, you are bound to find biases. People sometimes accuse Kahneman of arguing that people are irrational; this isn’t true, he says. Errors are rare, he says. They’re theoretically important for understanding human decision-making, and they add up in large organizations, but you shouldn’t think that people are making errors all the time.
While psychologists sometimes see biases that are motivated, economists think of errors as random. It was useful to economic theories to allow that agents sometimes make mistakes without questioning the basic rational agent model. Richard Thaler wrote a book called Misbehaving, his intellectual autobiography. According to Thaler, the idea that people make predictable errors changed his life; he spent 30 years writing a spree of papers that led to the Nobel Prize. While we can know things about predictable errors, we have found very few ways to systematically-reduce those errors.
It’s false to hope that if you become more aware of your errors you will make better decisions, says Danny. There has been no breakthrough on efforts to reduce bias. Furthermore, all the work on biases has distracted from noise, which we know we can reduce. Kahneman’s recent research on noise has caused him to question the work he’s done over the years and consider the value of focusing instead on noise.