Hypothesis Testing with One Sample
45 Outcomes and the Type I and Type II Errors
When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H_{0} and the decision to reject or not. The outcomes are summarized in the following table:
Statistical Decision | H_{0} is actually… | |
---|---|---|
True | False | |
Cannot reject H_{0} | Correct outcome | Type II error |
Cannot accept H_{0} | Type I error | Correct outcome |
The four possible outcomes in the table are:
- The decision is cannot reject H_{0} when H_{0} is true (correct decision).
- The decision is cannot accept H_{0} when H_{0} is true (incorrect decision known as aType I error). This case is described as “rejecting a good null”. As we will see later, it is this type of error that we will guard against by setting the probability of making such an error. The goal is to NOT take an action that is an error.
- The decision is cannot reject H_{0} when, in fact, H_{0} is false (incorrect decision known as a Type II error). This is called “accepting a false null”. In this situation you have allowed the status quo to remain in force when it should be overturned. As we will see, the null hypothesis has the advantage in competition with the alternative.
- The decision is cannot accept H_{0} when H_{0} is false (correct decision).
Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.
α = probability of a Type I error = P(Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true: rejecting a good null.
β = probability of a Type II error = P(Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false. (1 − β) is called the Power of the Test.
α and β should be as small as possible because they are probabilities of errors.
Statistics allows us to set the probability that we are making a Type I error. The probability of making a Type I error is α. Recall that the confidence intervals in the last unit were set by choosing a value called Z_{α} (or t_{α}) and the alpha value determined the confidence level of the estimate because it was the probability of the interval failing to capture the true mean (or proportion parameter p). This alpha and that one are the same.
The easiest way to see the relationship between the alpha error and the level of confidence is with the following figure.
In the center of (Figure) is a normally distributed sampling distribution marked H_{0}. This is a sampling distribution of and by the Central Limit Theorem it is normally distributed. The distribution in the center is marked H_{0} and represents the distribution for the null hypotheses H_{0}: µ = 100. This is the value that is being tested. The formal statements of the null and alternative hypotheses are listed below the figure.
The distributions on either side of the H_{0} distribution represent distributions that would be true if H_{0} is false, under the alternative hypothesis listed as H_{a}. We do not know which is true, and will never know. There are, in fact, an infinite number of distributions from which the data could have been drawn if H_{a} is true, but only two of them are on (Figure) representing all of the others.
To test a hypothesis we take a sample from the population and determine if it could have come from the hypothesized distribution with an acceptable level of significance. This level of significance is the alpha error and is marked on (Figure) as the shaded areas in each tail of the H_{0} distribution. (Each area is actually α/2 because the distribution is symmetrical and the alternative hypothesis allows for the possibility for the value to be either greater than or less than the hypothesized value–called a two-tailed test).
If the sample mean marked as is in the tail of the distribution of H_{0}, we conclude that the probability that it could have come from the H_{0} distribution is less than alpha. We consequently state, “the null hypothesis cannot be accepted with (α) level of significance”. The truth may be that this did come from the H_{0} distribution, but from out in the tail. If this is so then we have falsely rejected a true null hypothesis and have made a Type I error. What statistics has done is provide an estimate about what we know, and what we control, and that is the probability of us being wrong, α.
We can also see in (Figure) that the sample mean could be really from an H_{a} distribution, but within the boundary set by the alpha level. Such a case is marked as . There is a probability that actually came from H_{a} but shows up in the range of H_{0} between the two tails. This probability is the beta error, the probability of accepting a false null.
Our problem is that we can only set the alpha error because there are an infinite number of alternative distributions from which the mean could have come that are not equal to H_{0}. As a result, the statistician places the burden of proof on the alternative hypothesis. That is, we will not reject a null hypothesis unless there is a greater than 90, or 95, or even 99 percent probability that the null is false: the burden of proof lies with the alternative hypothesis. This is why we called this the tyranny of the status quo earlier.
By way of example, the American judicial system begins with the concept that a defendant is “presumed innocent”. This is the status quo and is the null hypothesis. The judge will tell the jury that they can not find the defendant guilty unless the evidence indicates guilt beyond a “reasonable doubt” which is usually defined in criminal cases as 95% certainty of guilt. If the jury cannot accept the null, innocent, then action will be taken, jail time. The burden of proof always lies with the alternative hypothesis. (In civil cases, the jury needs only to be more than 50% certain of wrongdoing to find culpability, called “a preponderance of the evidence”).
The example above was for a test of a mean, but the same logic applies to tests of hypotheses for all statistical parameters one may wish to test.
The following are examples of Type I and Type II errors.
Suppose the null hypothesis, H_{0}, is: Frank’s rock climbing equipment is safe.
Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.
Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.
α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.
Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)
This is a situation described as “accepting a false null”.
Suppose the null hypothesis, H_{0}, is: The victim of an automobile accident is alive when he arrives at the emergency room of a hospital. This is the status quo and requires no action if it is true. If the null hypothesis cannot be accepted then action is required and the hospital will begin appropriate procedures.
Type I error: The emergency crew thinks that the victim is dead when, in fact, the victim is alive. Type II error: The emergency crew does not know if the victim is alive when, in fact, the victim is dead.
α = probability that the emergency crew thinks the victim is dead when, in fact, he is really alive = P(Type I error). β = probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead = P(Type II error).
The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)
Suppose the null hypothesis, H_{0}, is: a patient is not sick. Which type of error has the greater consequence, Type I or Type II?
The error with the greater consequence is the Type II error: the patient will be thought well when, in fact, he is sick, so he will not get treatment.
It’s a Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H_{0}, is: It’s a Boy Genetic Labs has no effect on gender outcome. The status quo is that the claim is false. The burden of proof always falls to the person making the claim, in this case the Genetics Lab.
Type I error: This results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α.
Type II error: This results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β.
The error of greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.
“Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.
In this scenario, an appropriate null hypothesis would beH_{0}: the mean level of toxins is at most 800 μg, H_{0} : μ_{0} ≤ 800 μg.
Type I error: The DMF believes that toxin levels are still too high when, in fact, toxin levels are at most 800 μg. The DMF continues the harvesting ban.
Type II error: The DMF believes that toxin levels are within acceptable levels (are at least 800 μg) when, in fact, toxin levels are still too high (more than 800 μg). The DMF lifts the harvesting ban. This error could be the most serious. If the ban is lifted and clams are still toxic, consumers could possibly eat tainted food.
In summary, the more dangerous error would be to commit a Type II error, because this error involves the availability of tainted clams for consumption.
A certain experimental drug claims a cure rate of at least 75% for males with prostate cancer. Describe both the Type I and Type II errors in context. Which error is the more serious?
Type I: A cancer patient believes the cure rate for the drug is less than 75% when it actually is at least 75%.
Type II: A cancer patient believes the experimental drug has at least a 75% cure rate when it has a cure rate that is less than 75%.
In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.
Chapter Review
In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when a false null hypothesis is not rejected.
The probabilities of these errors are denoted by the Greek letters α and β, for a Type I and a Type II error respectively. The power of the test, 1 – β, quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis being accepted. A high power is desirable.
The mean price of mid-sized cars in a region is ?32,000. A test is conducted to see if the claim is true. State the Type I and Type II errors in complete sentences.
Type I: The mean price of mid-sized cars is ?32,000, but we conclude that it is not ?32,000.
Type II: The mean price of mid-sized cars is not ?32,000, but we conclude that it is ?32,000.
A sleeping bag is tested to withstand temperatures of –15 °F. You think the bag cannot stand temperatures that low. State the Type I and Type II errors in complete sentences.
For Exercise 9.12, what are α and β in words?
α = the probability that you think the bag cannot withstand -15 degrees F, when in fact it can
β = the probability that you think the bag can withstand -15 degrees F, when in fact it cannot
In words, describe 1 – β For Exercise 9.12.
A group of doctors is deciding whether or not to perform an operation. Suppose the null hypothesis, H_{0}, is: the surgical procedure will go well. State the Type I and Type II errors in complete sentences.
Type I: The procedure will go well, but the doctors think it will not.
Type II: The procedure will not go well, but the doctors think it will.
A group of doctors is deciding whether or not to perform an operation. Suppose the null hypothesis, H_{0}, is: the surgical procedure will go well. Which is the error with the greater consequence?
The power of a test is 0.981. What is the probability of a Type II error?
0.019
A group of divers is exploring an old sunken ship. Suppose the null hypothesis, H_{0}, is: the sunken ship does not contain buried treasure. State the Type I and Type II errors in complete sentences.
A microbiologist is testing a water sample for E-coli. Suppose the null hypothesis, H_{0}, is: the sample does not contain E-coli. The probability that the sample does not contain E-coli, but the microbiologist thinks it does is 0.012. The probability that the sample does contain E-coli, but the microbiologist thinks it does not is 0.002. What is the power of this test?
0.998
A microbiologist is testing a water sample for E-coli. Suppose the null hypothesis, H_{0}, is: the sample contains E-coli. Which is the error with the greater consequence?
Homework
State the Type I and Type II errors in complete sentences given the following statements.
- The mean number of years Americans work before retiring is 34.
- At most 60% of Americans vote in presidential elections.
- The mean starting salary for San Jose State University graduates is at least ?100,000 per year.
- Twenty-nine percent of high school seniors get drunk each month.
- Fewer than 5% of adults ride the bus to work in Los Angeles.
- The mean number of cars a person owns in his or her lifetime is not more than ten.
- About half of Americans prefer to live away from cities, given the choice.
- Europeans have a mean paid vacation each year of six weeks.
- The chance of developing breast cancer is under 11% for women.
- Private universities mean tuition cost is more than ?20,000 per year.
- Type I error: We conclude that the mean is not 34 years, when it really is 34 years. Type II error: We conclude that the mean is 34 years, when in fact it really is not 34 years.
- Type I error: We conclude that more than 60% of Americans vote in presidential elections, when the actual percentage is at most 60%.Type II error: We conclude that at most 60% of Americans vote in presidential elections when, in fact, more than 60% do.
- Type I error: We conclude that the mean starting salary is less than ?100,000, when it really is at least ?100,000. Type II error: We conclude that the mean starting salary is at least ?100,000 when, in fact, it is less than ?100,000.
- Type I error: We conclude that the proportion of high school seniors who get drunk each month is not 29%, when it really is 29%. Type II error: We conclude that the proportion of high school seniors who get drunk each month is 29% when, in fact, it is not 29%.
- Type I error: We conclude that fewer than 5% of adults ride the bus to work in Los Angeles, when the percentage that do is really 5% or more. Type II error: We conclude that 5% or more adults ride the bus to work in Los Angeles when, in fact, fewer that 5% do.
- Type I error: We conclude that the mean number of cars a person owns in his or her lifetime is more than 10, when in reality it is not more than 10. Type II error: We conclude that the mean number of cars a person owns in his or her lifetime is not more than 10 when, in fact, it is more than 10.
- Type I error: We conclude that the proportion of Americans who prefer to live away from cities is not about half, though the actual proportion is about half. Type II error: We conclude that the proportion of Americans who prefer to live away from cities is half when, in fact, it is not half.
- Type I error: We conclude that the duration of paid vacations each year for Europeans is not six weeks, when in fact it is six weeks. Type II error: We conclude that the duration of paid vacations each year for Europeans is six weeks when, in fact, it is not.
- Type I error: We conclude that the proportion is less than 11%, when it is really at least 11%. Type II error: We conclude that the proportion of women who develop breast cancer is at least 11%, when in fact it is less than 11%.
- Type I error: We conclude that the average tuition cost at private universities is more than ?20,000, though in reality it is at most ?20,000. Type II error: We conclude that the average tuition cost at private universities is at most ?20,000 when, in fact, it is more than ?20,000.
For statements a-j in Exercise 9.109, answer the following in complete sentences.
- State a consequence of committing a Type I error.
- State a consequence of committing a Type II error.
When a new drug is created, the pharmaceutical company must subject it to testing before receiving the necessary permission from the Food and Drug Administration (FDA) to market the drug. Suppose the null hypothesis is “the drug is unsafe.” What is the Type II Error?
- To conclude the drug is safe when in, fact, it is unsafe.
- Not to conclude the drug is safe when, in fact, it is safe.
- To conclude the drug is safe when, in fact, it is safe.
- Not to conclude the drug is unsafe when, in fact, it is unsafe.
b
A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 of them attended the midnight showing. The Type I error is to conclude that the percent of EVC students who attended is ________.
- at least 20%, when in fact, it is less than 20%.
- 20%, when in fact, it is 20%.
- less than 20%, when in fact, it is at least 20%.
- less than 20%, when in fact, it is less than 20%.
It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average?
The Type II error is not to reject that the mean number of hours of sleep LTCC students get per night is at least seven when, in fact, the mean number of hours
- is more than seven hours.
- is at most seven hours.
- is at least seven hours.
- is less than seven hours.
d
Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test, the Type I error is:
- to conclude that the current mean hours per week is higher than 4.5, when in fact, it is higher
- to conclude that the current mean hours per week is higher than 4.5, when in fact, it is the same
- to conclude that the mean hours per week currently is 4.5, when in fact, it is higher
- to conclude that the mean hours per week currently is no higher than 4.5, when in fact, it is not higher
Glossary
- Type I Error
- The decision is to reject the null hypothesis when, in fact, the null hypothesis is true.
- Type II Error
- The decision is not to reject the null hypothesis when, in fact, the null hypothesis is false.