Linear regression is a fundamental statistical method used to model the relationship between variables. It finds the best fit line through data points, allowing us to understand patterns and make predictions. The basic equation is y equals mx plus b, where y is the predicted value, m is the slope, x is the input variable, and b is the y-intercept.
The linear regression equation is y equals beta zero plus beta one x plus epsilon. Beta zero is the y-intercept, beta one is the slope, x is the input variable, and epsilon represents the error term or residuals. These residuals are the vertical distances between actual data points and the predicted line. Our goal is to find the optimal values of beta zero and beta one that minimize these prediction errors.
The least squares method finds the best fit line by minimizing the sum of squared residuals. The optimal slope beta one is calculated using the covariance of x and y divided by the variance of x. The optimal intercept beta zero is the mean of y minus beta one times the mean of x. Watch as different lines are tested and the algorithm converges to the minimum sum of squared errors.
Let's work through a complete example with five data points. First, we calculate the means: x-bar equals 3 and y-bar equals 4. Next, we compute deviations from the means and their products. Using our formulas, beta one equals 10 divided by 10, which is 1.0, and beta zero equals 4 minus 1.0 times 3, which is 1.0. Our final regression equation is y equals 1.0 plus 1.0 times x.
Model evaluation assesses how well our regression fits the data. R-squared measures the proportion of variance explained by the model, ranging from 0 to 1. An R-squared of 0.95 means 95% of the variance is explained. Residual analysis checks for patterns in the errors. Good models show randomly scattered residuals around zero, indicating our assumptions are met.