search this site.

0900L - STUDY INTERPRETATION

Print Friendly and PDFPrint Friendly

Copyright by Professor Omar Hasan Kasule Sr.


MODULE OUTLINE

6.1 MEASURES OF ASSOCIATION and EFFECT
6.1.1 General Concepts
6.1.2 Tests of Association
6.1.3 Measures of Effect
6.1.4 Validity and Precision
6.1.5 Meta Analysis

6.2 SOURCES AND TREATMENT OF BIAS
6.2.1 Misclassification Bias
6.2.2 Selection Bias
6.2.3 Confounding Bias
6.2.4 Mis-Specification Bias
6.2.5 Survey Error and Sampling Bias

6.3 HEALTH STATUS
6.3.1 Hospital Information Systems
6.3.2 Public Health Information System

6.3.3 Disease Registries with Cancer as an Example

6.3.4 Vital Health Statistics Interpretation
6.3.5 Demographic Analysis

6.4 HEALTH SERVICES
6.4.1 Health Economics 
6.4.2 Health Policy
6.4.3 Health Planning
6.4.4 Health Care Financing
6.4.5 Health Care Delivery

6.5 READING AND WRITING SCIENTIFIC LITERATURE
6.5.1 Literature Search
6.5.2 Critical Reading of a Journal Article
6.5.3 Abuse or Misuse of Statistics
6.5.4 Scientific Writing
6.5.5 Scientific Publishing


UNIT 6.1

MEASURES OF ASSOCIATION and EFFECT


Learning Objectives:
·        Tests/measures of association for continuous data: t test, F test, regression
·        Tests/measures of association for discrete data: chi-square
·        Measures of effect: rate ratio, risk ratio, odds ratio, attributable rate
·        Validity and precision of effect measures
·        Meta analysis


Key Words and Terms:
·        Association, measures/tests of association
·        Chi-square, Mantel-Haenszel chi-square
·        Chi-square, Pearson chi-square
·        Effect modification
·        Effect, measures of effect
·        Interaction
·        Measures of trend
·        Meta-analysis
·        Odds ratio
·        Odds ratio, Mantel-Haenszel odds ratio
·        Precision
·        Meta analysis
·        Rate difference
·        Risk ratio
·        Validity


UNIT OUTLINE

6.1.1 GENERAL CONCEPTS
A. Analytic Epidemiology
B. Hypothesis Testing
C. Preliminaries to Data Analysis
D. Procedures Used:

6.1.2 TESTS OF ASSOCIATION
A. Tests of Association on Means: 
B. Tests of Association on Proportions: Single 2x2 Contingency Table
C. Tests of Association on Proportions: Single: 2 X K Contingency Table:
D. Tests of Association on Proportions: Stratified 2 X 2 Contingency Tables
E. Properties of the Chi-Square Statistic:

6.1.3 MEASURES OF EFFECT
A. Comparison of Proportions in Contingency Table
B. Measures of Excessive Risk:
C. Regression Effect Estimates
D. Properties of the Odds Ratio:
E. Interaction and Effect Modification

6.1.4 VALIDITY and PRECISION
A. Validity
B. Internal Validity
C. External Validity
D. Precision

6.1.5 META ANALYSIS
A. Definition and Historical Background
B. Advantages of Meta Analysis
C. Steps in Meta Analysis
D. Difficulties of Meta Analysis
E. Reading Results of Meta Analysis


1.6.1 GENERAL CONCEPTS
A. ANALYTIC EPIDEMIOLOGY
Analytic epidemiology is very important in public health because of its major role in the planning and evaluation of public health interventions. Intervention is the backbone of public health. Data analysis must be taken very seriously because of its involvement in practical field decisions and programs that affect the individual, the community, and the eco-system. Wrong analysis will lead to wrong conclusions that will have deleterious effects.

B. HYPOTHESIS TESTING
Tests for association, effect, or trend involve construction of hypotheses and testing them. Hypothesis testing is involved in all 3 major types of study design: cross-sectional, case-control, and follow-up. The discussion below is for a case-control study comparing proportions. Similar formulations can be made for other types of studies and other measures such as means. A decision also must be made whether a 2-tail of 1-tail test is being used. The 2-sided test covers the joint testing of two inequalities between proportions, p1>p2 and p2>p1. The 1-sided test covers the testing of only one inequality, p1 > p2 or P2 > p1. The 2-sided test is preferentially used because it is more conservative. The null hypothesis for a 2-sided test states that there is no association between the exposure and the disease outcome which also implies an odds ratio of unity, OR=1. The alternative hypothesis for a 2-sided test states that there is association between the exposure and the disease outcome; the association may be positive with OR>1.0 or negative with OR <1.0. The null hypothesis for a 1-sided test states that there is either a negative or no association between the exposure and disease; OR=1.0 or OR<1.0. The alternative hypothesis for a 1-sided test states that there is a positive association between the exposure and disease outcome; OR>1.0.

C. PRELIMINARIES TO DATA ANALYSIS
Simple manual inspection of the data is needed before applying the tests above. Indiscriminate application of the tests to data leads to wrong or misleading conclusions. Acquiring familiarity with the data by simple manual inspection can help identify outliers, assess the normality of data distribution, and identify commonsense relationships among variables that could alert the investigator to errors in computer analysis. It is also most important that the data model be selected properly to suit the data at hand. The data models for continuous data can be straight line regression, non-linear regression, or show trends. The trends may be parallel or non parallel. Special models are used for repeated (paired) observations. More data models are used in the analysis of categorical data. The maximum likelihood model assumes a binomial distribution and derives the maximum likelihood estimate (MLE) which is the value of the parameter that maximizes the data function. The logistic model allows use of proportions to compare two groups. The chi square is used to compare 2 proportions where no raw data is available. If there are more than 2 outcome categories, cells of the tables can be collapsed to produce a 2 x 2 table. If this is not possible, the log linear model is used. 

D. PROCEDURES USED:
Two procedures are employed in analytic epidemiology. The test for association is done first. The assessment of the effect measures is done after finding an association. Effect measures are useless in situations in which tests for association are negative. The tests for association commonly employed are: t-test, chi-square, the linear correlation coefficient, and the linear regression coefficient. The effect measures commonly employed are: Odds Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are not picked up by association and effect measures.

6.1.2 TESTS OF ASSOCIATION
A. TESTS OF ASSOCIATION ON MEANS: 
The tests described below are used for continuous measurement data. Their details are described in elementary books of statistics. The Student t-test is used for two independent sample means. The Student paired t-test is used for two paired sample means. Analysis of variance, ANOVA (F test) is used for more than 2 sample means. Multiple analysis of variance, MANOVA, is used to test for more than one factor. Linear regression is used in conjunction with the t test for data that requires modeling. Dummy variables in the regression model can be used to control for confounding factors like age and sex.

B. TESTS OF ASSOCIATION ON PROPORTIONS: Single 2x2 Contingency Table
 The tests for association described below can be applied to discrete data generated in 4 types of study design: cross-sectional, case-control, follow-up, and clinical trials. For two independent proportions, the chi-square test for independent samples is used (for large samples) and the Fischer's exact test is used (for small samples). For two paired proportions, the MacNemar chi-square test for paired samples is used for adequate samples and the Fischer exact test is used for small samples.

C. TESTS OF ASSOCIATION ON PROPORTIONS: Single 2 x k Contingency Table:
The table could be ordered qualitatively or quantitatively.  The global chi-square test is used to determine if there is any associations in the table. More specific associations can then be studied by partitioning the table and obtaining partial chi-squares. The results from the partition analyses can be used to decide on how to collapse some cells into one another for further analysis. In the end the aim should be to collapse the complex table into a 2 x2 contingency table and then apply the methods described above. The M-H chi-square test for linear trend could alternatively be used in a 2 x k table. If the data is scanty with few cell counts that make the chisquare test invalid, the exact test can be employed. With extremely sparse data some form of modeling will yield better results.

D. TESTS OF ASSOCIATION ON PROPORTIONS: Stratified 2x2 Contingency Tables
A stratified design gives rise to several 2 x 2 tables, one table for each stratum. The Mantel-Haenszel chi-square statistic is used. It is a weighted average of separate chi-squares across the strata. In order for this test to be valid, the chi-square of each separate table must be homogenous across all strata. There are special tests of homogeneity that must be applied before the Mantel-Haenszel test is applied. The homogeneity test essentially indicates whether the separate chi-square test statistics are of the same order of magnitude and can therefore be combined in the M-H procedure. The M-H statistic is based on the hypergeometric distribution and follows a chi-square distribution with 1 degree of freedom. The following are wrong methods of combining data from several tables: summing chi-squares across tables, computing the chi-square of the combined (total) table, and computing chi-square from sum of 'O' and 'E' across groups. The M-H procedure breaks down in cases of too many strata and the multiple logistic regression procedure will have to be used in such cases.

E. PROPERTIES OF THE CHI-SQUARE STATISTIC:
TWO TYPES OF CHI SQUARE
There are two types of chi square: the Pearson and the Mantel Haenszel chi squares. They are defined as shown in the table below:

Exposure +
Exposure -
Total
Disease +
A
b
m1
Disease -
C
d
M0

n1
n0
N

The Pearson chi square (cp)2 is defined as the summation over all cells of the table of å(O-E)/E = {n(ad-bc)2}/ {n1   n2   m1 m0}. The Mantel-Haenszel chi square is defined based on the ‘a’ cell only as (cMH)2 = {O(a) – E(a)} / Var(a) = {(n-1)(ad-bc)2} / { n1   n2   m1 m0}. The difference between (cp)2 and (cMH)2 is negligible when n is moderately large. Both statistics are reasonable approximations when cell frequencies exceed 5. (cMH)2 is preferred for stratified analysis and for computation of test-based confidence intervals.

PROPERTIES OF THE CHI SQUARE
The chi-square statistic has 2 components. The total chi-square is the sum of chi-square due to homogeneity and the chi-square due to association. Four assumptions must be fulfilled for the chisquare test to give valid results: the sample size must be big enough, the data must have been obtained by random sampling, observations must be independent of one another, and data must be normally distributed. Validity of the statistic is affected by the overall sample size but also by the cell numbers. According to Cochran, the statistic is valid if at least 80% of cells have more than 5 observed, at least 80% of cells have more than 1.0 expected, and at least 5 observed in 80% of cells. If the observations are not independent of one another as in paired or matched studies, the McNemar chisquare test is used instead of the usual Pearson chisquare test. The chisquare works best for approximately Gaussian distributions. The chi-square is a continuous distribution used for discrete data. This discrepancy calls forth the use of a continuity correction that is not agreed unanimously among statisticians. The Yates correction is used to correct for the fact that chi-square distribution is continuous but is used for discrete date. In the special case when degrees of freedom = 1, Yate’s correction to derive the correct formula for the chisquare as shown here: c2 = å {(|obs – exp| - 0.05)2 / exp}. The shape of the chi-square distribution varies by the degrees of freedom. The chi-square statistic can not be negative. It can be zero if the expected is equal to the observed.. A distinction must be made between the significance and strength of the association. The chi-square statistic is a measure of significance of association and not degree of association. The coefficient of correlation, phi, measures the degree of association. The coefficient varies between case control, follow-up, and cross-sectional studies which makes it less useful that the odds ratio that is invariant across case control, follow-up, and cross-sectional studies. The phi coefficient is used to adjust a computed chisquare for sample size. Phi = {c2/ N}1/2 . Phi is considered the correlation coefficient for data in 2 x 2 tables. Cramer’s V is the equivalent of the phi coefficient in the r x c table.

USES OF THE CHI SQUARE
The chi-square statistic is very versatile and is widely used and misused. It is used to test for independence of 2 variables by comparing observed with what would be expected unfder the null hypothesis assumptions. It is used in to test for goodness of fit by comparing observed values of a distribution with those expected under the binomial, poisson, or normal distributions. It is also used to test for homogeneity among stratified 2 x 2 tables. It is also used to test for trend in 2 x k and r x c tables. The p-value obtained is only approximate.

<Read more...>