Synopsis of a lecture by Professor Omar Hasan Kasule Sr. for the MPH class at Universiti Malaya on Friday 17th November 2006
1.0 OVERVIEW OF PARAMETRIC ANALYSIS
Inference on numeric continuous data is based on the comparison of sample means. Three test statistics are commonly used: z, t- and F-statistics. The z-statistic is used for large samples. The t and F are used for small or moderate samples. The z-statistic and the t-statistic are used to compare 2 samples. The F statistic is used to compare 3 or more samples.
The student t-test is the most commonly used test statistic for inference on continuous numerical data. It is defined for independent and paired samples. It is robust and can give valid results even if the assumptions of normal distribution and equal variance are not perfectly fulfilled. It is used uniformly for sample sizes below 60 and for larger samples if the population standard deviation is not known. For larger samples there is no distinction between testing based on the z statistic and testing based on the t statistic.
The F-test is a generalized test used in inference on 3 or more sample means in procedures called analysis of variance, ANOVA. Assumptions of independent observations, normal distribution, and equal variances in the samples compared are necessary for validity of all 3 test statistics. If variances are not equal, data can be transformed and harmonic or weighted means may be used. If sample sizes are not equal, equality can be achieved by randomly discarding some observations.
The first step is to ascertain whether the data distribution follows an approximate Gaussian distribution, that the variances are approximately equal, and that the sample size is adequate. The formulas for the z, t, and F statistics vary depending on whether the samples are paired or are an unpaired. They also vary depending on whether the samples have equal numbers of observation or the number of observations in each sample is different.
2.0 PARAMETRIC ANALYSIS FOR 2 SAMPLE MEANS
Simple testing procedures assume 1 factor analysis, approximately normal distribution, equal variances, and equal numbers in each sample. Both the z and t test statistics can be used in both the p-value and confidence interval approaches.
3.0 PARAMETRIC ANALYSIS FOR 3 OR MORE SAMPLE MEANS
For the F-test only the p-value approach can be used since the confidence interval approach is inapplicable. One-way ANOVA involves comparison of 3 or more samples on one factor like height or weight. The F test and 1-way ANOVA are 2 names for the same procedure. ANOVA has become less popular because modern regression packages can do all what it did before. ANOVA can discover an omnibus association. However carrying out several pair-wise t tests to discover the specific sources of the omnibus association can lead to the problem of multiple comparisons in which some pair-wise associations may be significant by chance. .Multiple Analysis of Variance (MANOVA) is used to study 3 or more factors simultaneously. Such analyses are used for randomized block, factorial, Latin square, nested, and cross-over designs.
4.0 OVER VIEW OF NON PARAMETRIC ANALYSIS FOR CONTINUOUS DATA
Non-parametric methods were first introduced as rough, quick and dirty methods and became popular because of being un constrained by normality assumptions. They are about 95% as efficient as the more complicated and involved parametric methods. They are simple, easy to understand, and easy to use. They can be used for non-Gaussian data or data whose distribution is unknown. They work well for small data sets but not for large data sets. They also cannot be used with complicated experimental designs. Generally non-parametric methods are used where parametric methods are not suitable. Such situations occur when the test for normality is negative, when assumptions of the central limit theorem do not apply, and when the distribution of the parent population is not known. Virtually each parametric test has an equivalent non-parametric one.
5.0 PROCEDURES OF NON-PARAMETRIC ANALYSIS
Specialized computer programs can carry out all the non-parametric tests. The sign test, the signed rank test, and the rank sum tests are based on the median. The sign test is used for analysis of 1 sample median. The signed rank test is used for 2 paired sample medians. The rank sum test is used for 2 independent sample medians. The Kruskall-Wallis test is a 1-way test for 3 or more independent sample medians. The Friedman test is a 2-way test for 3 or more independent sample medians. Note that the Mann-Whitney test gives results equivalent to those of the signed rank test. The Kendall test gives results equivalent to those of the Spearman correlation coefficient.