search this site.

070117L - DISCRETE DATA ANALYSIS

Print Friendly and PDFPrint Friendly

Lecture for Year 2 Sem 2 PPSD session on Wednesday 17th January 2007 by Professor Omar Hasan Kasule Sr.


1.0 INTRODUCTION
Inference on discrete data uses an approximate method (chi-square) used for large samples and an exact method (Fisher's Exact Method) for small samples. Approximate methods are accurate for large samples and are inaccurate for small samples. There is nothing to prevent exact methods from being used for large samples. Both the chisquare and the exact methods yield a p-value. The p-value is used to make conclusions about the null hypothesis.

2.0 THE SIMPLE CHI SQUARE PROCEDURE
The first steps in the analysis are to ascertain the normal distribution of the data, equality of variances in the sample being compared, and adequacy of the sample size. If the data is not normally distributed or the sample size is too small, the chisquare will not be valid. If the variances in the groups being compared differ markedly the test will also not be accurate.

The data is laid out in contingency tables and is inspected manually before application of statistical tests. The Pearson chi square is computed based on the observed and expected frequencies of each cell in the contingency table and is in essence a measure of the deviation from the ‘average’. It can be used to test 2 or more proportions. Large contingency tables are better partitioned or collapsed before applying the chi square test.

3.0   THE STRATIFIED CHISQUARE PROCEDURE
The Mantel-Haenszel chi-square is used to test 2 proportions in stratified data. It is used for example to test the relation between exercise and cardiac health if the data is grouped (stratified) by gender.

4.0 THE MATCHED CHISQUARE
 The MacNemar chi square is used for pair matched data. An example of such data is to test if exercise improves cardiac health by comparing cardiac performance before and after exercise.

5.0 EXACT ANALYSIS OF PROPORTIONS
Exact methods are used instead of the chisquare test for small samples less than 20.They can be used in 2 x 2, 2 x k, and r x c contingency tables. They involve direct computation of the p-value using factorials and probability. The p-value is computed as the probability of results more extreme than the observed data.