Notes > Foundations of Computing > Regression
Regression involves describing a relationship between variables using an equation. It is important to note that some variables that may seem to have a relationship may not be related in actuality. Spurious correlations may occur between variables that are related by another factor but not directly to each other or the correlation may simply be coincidental.
The "line of best fit" that can be drawn on a scatter diagram is known as the regression line. This regression line can be found by estimation or inspection where the line is drawn by eye. The "least squares regression line" is a more accurately calculated line. This line makes the total of all the squared differences between the line and plotted points as small as possible. The least squares regression line has the linear model:
"x" and "y" are the independant and dependant variables on the x-axis and y-axis. "a" and "b" are constants with "a" representing the y-axis intercept and "b" being the gradient or slope of the line. After being plotted, regression lines can be used to forecast or estimate values for "y" given a value for "x".
The differences that occur between plotted points and the regression line are known as residuals. A point below the regression line will have a negative residual value and a point above the regression line will have a positive residual value. For example, on a scatter graph, if at x=5 the regression line passes through y=10, and a plotted point occurs at x=5 and y=8, the residual value of this point would be -2.
Multiple regression involves taking into consideration multiple independant variables. This is useful for when a variable is affected by many different factors. The equation for the estimated multiple regression line is:
Search for "Regression" on:
eBay (UK) |
Search for "Regression" on the rest of Computing Students: Regression