Lecture by Professor Omar Hasan Kasule Sr. for Year 2 Semester 2 PPSD Session on Wednesday 28th February 2007
1.0 REGRESSION TO THE MEAN
The term regression was introduced by Sir Francis Galton (1822-1911). He noticed that for any measurement there is regression to the mean. He reached this conclusion after his classical study of the heights and fathers and their sons.
The phenomenon of regression to the mean may also be one of the basic laws of nature that all variations tend to move towards the average value. Thus very tall fathers have sons who are not as tall. Very short fathers have sons who are not as short. The regression to the mean is necessary for balance and equilibrium of biological phenomena since variation does not get 'out of control'.
A similar phenomenon is seen in social change. Very rich fathers tend to have children who achieve less and may even squander all their inheritance. The sons of the poor struggle to escape poverty and end up doing better than their fathers. If this regression to the mean did not happen, advantages of wealth would be transmitted from generation to generation creating a very unjust society with very few super-rich and many paupers.
2.0 INDEPENDENT and DEPENDENT VARIABLES: Both correlation and regression address the relation between 2 variables. The scatter-gram is basic to both. In correlation both x and y are random. In regression x is independent (i.e. random) whereas y is dependent being determined by x. The outcome variable in regression is measured as means. The independent variable can be continuous or categorical. The dependent variable can be continuous or binary.
3.0 THE SIMPLE LINEAR REGRESSION EQUATION
The mathematical model of simple linear regression is shown in the regression equation/regression function/regression line: y = a + bx where ‘y’ is the dependent/response variable, ‘a’ is the intercept, ‘b’ is the slope/regression coefficient, and ‘x’ is the dependent/predictor variable. Both ‘a’ and ‘b’ are in a strict sense regression coefficients but the term is usually reserved for ‘b’ only.
4.0 HYPOTHESIS TESTING
The t test can be used to test the significance of the regression coefficient.
5.0 USES OF THE REGRESSION EQUATION
The regression equation is used for 2 main purposes: (a) testing for association between ‘x’ and ‘y’ and (b) predicting ‘y’ from ‘x’.
The regression coefficient ‘b’ is used to determine if ‘x’ is associated with ‘y’. By doing a t test on the regression coefficient, we can derive a p-value. If p <0.05 we conclude that there is significant association. If p>0.05 we conclude that there is no significant association.
Once the regression equation is constructed, we can predict ‘y’ by putting any selected value of ‘x’ in the equation.