Standard deviation is a fundamental statistical measure that tells us how spread out data points are from the average or mean. When we have a low standard deviation, the data points cluster tightly around the mean, like the blue dots shown here. When we have a high standard deviation, the data points are spread out widely, like the red dots. This measure helps us understand the variability in our dataset.
The standard deviation formula calculates the square root of the average squared differences from the mean. For a population, we use sigma equals the square root of the sum of x i minus mu squared, divided by N. The sample formula uses s and divides by n minus 1 instead. This visualization shows how we measure the distance of each data point from the mean, square those distances, average them, and take the square root.
Let's work through a step-by-step example with the data set 2, 4, 6, 8. First, we find the mean by adding all values and dividing by the count, which gives us 5. Next, we calculate each deviation from the mean and square them: 2 minus 5 squared equals 9, 4 minus 5 squared equals 1, 6 minus 5 squared equals 1, and 8 minus 5 squared equals 9. Then we find the variance by averaging these squared deviations: 9 plus 1 plus 1 plus 9, divided by 4, equals 5. Finally, we take the square root of the variance to get our standard deviation of approximately 2.24.
In a normal distribution, standard deviation follows the 68-95-99.7 rule, also known as the empirical rule. This states that approximately 68 percent of all data falls within one standard deviation of the mean, 95 percent falls within two standard deviations, and 99.7 percent falls within three standard deviations. This rule is extremely useful for understanding data distribution and identifying outliers. Any data point beyond three standard deviations is considered quite unusual.
To summarize what we have learned about standard deviation: It is a fundamental measure that quantifies how spread out data points are from the mean. A low standard deviation indicates that data is clustered tightly around the average, while a high standard deviation shows that data is more scattered. The formula uses squared deviations to ensure all differences are positive. In normal distributions, the 68-95-99.7 rule helps us understand data patterns. Standard deviation is an essential tool used across statistics, quality control, scientific research, and many other fields.