Two cab companies serve the city: the Green company operates 85% of the cabs and the Blue company operates 15% of the cabs. One of the cabs is involved in a hit-and-run accident, and a witness identifies the hit-and-run cab as a Blue cab. When the court tests the reliability of the witness under circumstances similar to those on the night of the accident, he correctly identifies the colour of a cab 80% of the time and misidentifies it the other 20% of the time. What is the probability that the cab involved in the accident was Blue, as stated by the witness?
If your answer was 80%, you are in the majority.
The 80% answer shows how we have a tendency to primarily consider only the last evidence given to us, ignoring earlier evidence. If we are simply told that a cab was involved in a hit-and-run accident, and are not given the information about the witness, then the majority of us will correctly estimate the probability of it being a Blue cab as 15%. Given new evidence (the 80% reliable witness) we throw away the first calculation and base our answer solely on the reliability of the witness. We do this to simplify the calculation, but in this case it leads to the wrong answer.
So let's do the calculation. There are four possible scenarios. The cab could either be Green (85%) or Blue (15%). The witness could have identified it correctly (80%) or incorrectly (20%). Let's work out the chances of each of these scenarios.
In this case, we know that the witness said it was a Blue cab, so we only need to consider those cases where the cab was identified as Blue. That means it was either a misidentified Green (17%) or a correctly identified Blue (12%). So the chance that it was actually Blue is the chance of it being correctly identified as Blue (12%) over the chance that it was identified as Blue, whichever colour it actually was (12% + 17%, or 29%). That means that the chance of it being Blue, after being identified as Blue, is 12/29, or about 41%. The chance that it was actually Green is the remaining 59%.
But with a witness who is 80% reliable, how can he be so likely to get it wrong? The catch is that the small chance of his incorrect identification is swamped by the huge number of Green cabs, which just make it so much more likely that any cab in the city is Green.
Basically, with a compound probability like this you have to be careful to check out the contribution of both the correct (correctly identified Blue) and the incorrect (misidentified Green) terms. Otherwise, you may miss a large contribution which works against your intuition.
If we consider what happens as the Blue cab company starts to shrink with its cabs being pulled off the street, we can see a disturbing trend. When only 10% of the taxis are Blue, the chances of a misidentified Blue get higher. The chances of a correctly identified Blue is 10% x 80% = 8%, and the chances of a misidentified Green is 90% x 20% = 18%, so the chances of a cab identified as Blue actually being Blue is only 8/26, or 31%.
If only 5% of the cabs in the city are Blue, the chances drop to 4/23, or 17%. In other words, if only 5% of the cabs are Blue and our 80% reliable witness identifies a Blue cab in an accident, there is only a 17% chance that he's actually right. Our 80% reliable witness is 5 times more likely to be wrong than right!
Interestingly, even after the Blue cab company has gone out of business, our 80% reliable witness will still identify Blue cabs in 20% of the accidents, even though there are no Blue cabs left!
As another way of looking at our original scenario (85% Green and 15% Blue cabs, and an 80% reliable witness), let's look at the total chances of the witness identifying Green or Blue cabs (this time without actually knowing which kind of cab was involved). From our calculations above, there were two cases in which the witness would identify the cab as Blue (Blue correctly at 12% and Green incorrectly at 17%), giving a total chance of 29%. Also, there were two cases in which the witness would identify the cab as Green (Green correctly at 68% and Blue incorrectly at 3%), giving a total chance of 71%.
So, if all the cabs are equally likely to get into an accident and our witnesses are 80% reliable, the Blue cabs will seem to be responsible for 29% of the accidents, even though they have only 15% of the cabs. So, with equally good drivers, the Blue cabs will be seen to be over twice as likely to cause accidents as the Green cab drivers.
However, since all the cabs are equally likely to get into an accident, the Mathemagician is free to choose whichever company he likes for other reasons. Hey, it's a nice day, so the Mathemagician decides to walk into town.