Welcome to an introduction to hypothesis testing. Hypothesis testing is a statistical method used to make decisions about populations based on sample data. The process involves comparing two competing hypotheses: the null hypothesis, which represents the status quo or no effect, and the alternative hypothesis, which represents the claim or effect being investigated. In this normal distribution curve, we can see the critical regions in red, which help us determine when to reject the null hypothesis.
Let's explore the steps involved in hypothesis testing. First, we state the null and alternative hypotheses. Then, we choose a significance level, typically 0.05 or 0.01, which represents our tolerance for Type I error. Next, we select an appropriate test statistic based on our data and assumptions. After calculating the test statistic from our sample data, we determine either the p-value or the critical region. In this graph, the red areas represent the critical regions where we would reject the null hypothesis. If our test statistic falls in these regions, we reject the null hypothesis; otherwise, we fail to reject it. Finally, we interpret these results in the context of our original problem.
Now let's explore the different types of hypothesis tests. One-sample tests compare a sample mean to a known population mean. For example, we might test if the average height of students at a school differs from the national average. Two-sample tests compare means from two different samples or populations. For instance, we could test if there's a difference in test scores between two teaching methods. In our graph, we can see two distributions representing different groups, with their means shown by the dashed lines. The difference between these means is what we're testing. Finally, paired tests compare paired measurements, such as before and after a treatment, or matched pairs. An example would be testing if a training program improved employee performance by comparing scores before and after the program.
Let's discuss p-values and how they're used in decision making. A p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. In our example, the observed test statistic is shown by the red dashed line, and the p-value is the red shaded area to the right of this line. The decision rule is straightforward: if the p-value is less than or equal to our significance level alpha, we reject the null hypothesis. If it's greater than alpha, we fail to reject. In this case, since our p-value is less than our significance level of 0.05, shown by the green critical value line, we reject the null hypothesis. It's important to understand that a small p-value indicates strong evidence against the null hypothesis, while a large p-value suggests insufficient evidence against it. However, the p-value is NOT the probability that the null hypothesis is true - this is a common misconception.
Let's discuss the types of errors in hypothesis testing. A Type I error, also known as a false positive, occurs when we reject the null hypothesis when it is actually true. The probability of a Type I error is alpha, our significance level. An example would be convicting an innocent person in a trial. A Type II error, or false negative, occurs when we fail to reject the null hypothesis when it is actually false. The probability of a Type II error is beta. An example would be acquitting a guilty person. In our decision table, you can see how these errors relate to the reality of the null hypothesis and our decision. Statistical power is the probability of correctly rejecting the null hypothesis when it is false, calculated as 1 minus beta. Power increases with sample size, meaning larger samples give us a better chance of detecting real effects. Understanding these error types helps researchers balance the risks in their hypothesis testing decisions.