By Professor Omar Hasan Kasule Sr.
Learning Objectives:
Interpretation of a scatter-gram
Definition of simple linear correlation
Interpretation of the Pearson linear correlation coefficient
Key Words and Terms:
Correlation, correlation coefficient
Correlation, linear correlation
Correlation, negative correlation
Correlation, perfect correlation
Correlation, positive correlation
Dependent Variable
Independent Variable
Relation, linear relation
Relation, non-linear relation
Scatter-gram
Scatter-plot matrix
UNIT SYNOPSIS
DESCRIPTION
Correlation analysis is used as preliminary data analysis before applying more sophisticated methods. Correlation describes the relation between 2 random variables (bivariate relation) about the same person or object with no prior evidence of inter-dependence. Correlation indicates only association; the association is not necessarily causative. Correlation analysis has the objectives of describing the relation between x and y, prediction of y if x is known, prediction of x if y is known, studying trends, and studying the effect of a third factor on the relation between x and y.
The first step in correlation analysis is to inspect a scatter plot of the data to obtain a visual impression of the data layout and identify out-liers. Then Pearson’s coefficient of correlation (product moments correlation), r, is the commonest statistic for linear correlation. It has a complicated formula but can be computed easily by modern computers. It essentially is a measure of the scatter of the data.
PEARSON'S CORRELATION COEFFICIENT, r
Inspecting a scatter-gram helps interpret the coefficient. The correlation is not interpretable for small samples. Values of 0.25 - 0.50 indicate a fair degree of association. Values of 0.50 - 0.75 indicate moderate to fair relation. Values above 0.75 indicate good to excellent relation. Values of r = 0 indicate either no correlation or that the two variables are related in a non-linear way. In perfect positive correlation, r=1. In perfect negative correlation, r=-1. In cases of no correlation, r=0. In cases of no correlation with r=0, the scatter-plot is circular. The linear correlation coefficient is not used when the relation is non-linear, outliers exist, the observations are clustered in 2 or 4 groups, and if one of the variables is fixed in advance.
NON-PARAMETRIC CORRELATION ANALYSIS
The Spearman rank correlation coefficient is used for small data sets for which the Pearson linear correlation coefficient would be invalid.