search this site.

0900L - MODULE 5.0 STUDY DESIGN AND ANALYSIS

Print Friendly and PDFPrint Friendly

Copyright by Omar Hasan Kasule Sr.


MODULE OUTLINE

5.1 FIELD EPIDEMIOLOGY
5.1.1 Sample Size Determination
5.1.2 Sources of Secondary Data
5.1.3 Primary Data Collection by Questionnaire
5.1.4 Physical Primary Data Collection
5.1.5 Data Management and Data Analysis

5.2 CROSS-SECTIONAL DESIGN
5.2.1 Definition
5.2.2 Design and Data Collection
5.2.3 Statistical Parameters
5.2.4 Ecologic Design
5.2.5 Health Surveys

5.3 CASE-CONTROL DESIGN
5.3.1 Basics                   
5.3.2 Design and Data Collection of Case-Base Studies
5.3.3 Statistical Parameters
5.3.4 Strengths and Weaknesses
5.3.5 Sample Size Computation:

5.4 FOLLOW-UP DESIGN
5.4.1 Definition
5.4.2 Design and Data Collection
5.4.3 Statistical Parameters
5.4.4 Strengths and Weaknesses
5.4.5 Sample Size Computation
5.5 RANDOMIZED DESIGN: COMMUNITY TRIALS
5.5.1 Overview
5.5.2 Design of a Community Intervention Study
5.5.3 Community Trials: Strengths and Weaknesses
5.5.4 Procedure of the Community Trial
5.5.5 Data Interpretation

5.6 RANDOMIZED DESIGN: CLINICAL TRIALS
5.6.1 Study Design for Phase 3 Randomized Clinical Trials
5.6.2 Data Collection
5.6.3 Analysis and Interpretation


UNIT 5.1
FIELD EPIDEMIOLOGY

·    Sample determination
·    Data collection
·    Data Management


Key Words and Terms:


·    Analysis, bivariate analysis
·    Analysis, multivariate analysis
·    Analysis, simple analysis
·    Analysis, stratified analysis
·    Analysis, univariate analysis
·    Analytic models, likelihood model
·    Analytic models, probability model
·    Analytic models, regression model
·    Data coding
·    Data compression
·    Data editing
·    Data encryption
·    Data entry
·    Data interpretation
·    Data modeling
·    Data processing
·    Data reduction
·    Data replication
·    Data transformation
·    Data value
·    Data, data summary
·    Data, grouped
·    Data, primary data collection
·    Data, secondary data collection
·    Database design
·    Database mgmt system
·    Estimation
·    Inference
·    Questionnaire, face-to-face
·    Questionnaire, mail
·    Questionnaire, telephone
·    Questionnaire, computer
·    Sample, sample size
·    Study power
·    Test for association
·    Test for effect
·    Test for interaction
·    Test for effect modification
·    Test for trend



UNIT OUTLINE

5.1.1 SAMPLE SIZE DETERMINATION
A. Introduction
B. Sample Size for Estimation Of Population Parameters
C. Sample Size for Inference On Sample Means
D. Sample Size for Inference On 2 Sample Proportions
E. Sample Size for Experimental Studies

5.1.2 SOURCES OF SECONDARY DATA
A. General Population and Household Census
B. Vital Statistics
C. Routinely-Collected Data
D. Epidemiological Studies
E. Special Surveys:

5.1.3 PRIMARY DATA COLLECTION BY QUESTIONNAIRE
A. Questionnaire Design
B. Preparation for Data Collection:
C. Questionnaire Administration by Face-To-Face Interview
D. Questionnaire Administration by Telephone
E. Questionnaire Administration by Mail
F. Computer-Administered Questionnaire:

5.1.4 PHYSICAL PRIMARY DATA COLLECTION
A. Clinical Examination
B. Psychological/Psychiatric Examination
C. Environmental or Occupational Exposure
D. Biological Measurements
E. Experiments

5.1.5 DATA MANAGEMENT AND DATA ANALYSIS
A. Data Management
B. Preliminaries to Data Analysis
C.  Discrete Data Analysis: Unstratified Analysis
D. Discrete Data: Stratified Analysis
E. Multivariate Models
F. Polytomous Exposures and Outcomes


5.1.1 SAMPLE SIZE DETERMINATION
A. INTRODUCTION
Samples are selected so that they can be used to collect data to answer specific questions. At the conceptual level, sample selection is a tool to study the heterogeneity of the population. If a population is perfectly homogenous, then a sample of 1 person however selected will be sufficient to study that population. If a population has several perfectly homogenous subgroups then selection of one element from each group will provide a sample that sufficiently describes the population. Similarly a sample of one group with all its elements will be sufficient to represent the population.
The size of the sample needed depends on the nature of the question of hypothesis being tested. The following are considerations in the determination of the sample size: the budget available for the study, the time within which results are needed, minimization of sampling error, and achieving pre-specified parameters of precision. The most important consideration is the precision of the estimates. If the sample size is too small the study will not have sufficient power to answer the question under consideration accurately. If the sample size is bigger than is necessary there will be a waste of resource as information is collected from more persons that are needed.

Power is ability to detect a difference. Power is determined by the significance level, magnitude of the difference, and sample size. Power = 1 – beta = Pr (rejecting H0 when H0 is false) = Pr (true negative). The larger the sample size, the narrower the confidence interval. The higher the confidence level, the wider the confidence interval. Power can be computed or looked up in appropriate tables. The bigger the sample size the more powerful the study. Beyond an optimal sample size, increase in power does not justify costs of larger sample. Sample size can be computed or looked up in tables. We have to balance the requirement to have as powerful a study as is desired with the cost associated with large studies.

There are procedures and formulas for computing sample sizes. There are special computer programs such as EPI-INFO that can be used to compute sample sizes.

B. SAMPLE SIZE FOR ESTIMATION OF POPULATION PARAMETERS
SIMPLE RANDOM SAMPLES
If it is desired to estimate the mean with accuracy such that the lower bound is m - c and the upper bound is m + c and with probability 1-a, the sample size is given by the formula n = Ns2 / {(N-1) D2 + s2 } where D = c/1.96. We can estimate the 1 - a % confidence intervals for the mean estimated from a simple random sample as (sample average) +/- Za/2  {Var (x) }}1/2  where Var (x) = {s2 / (n-1)}{(N-n)/N}. A simpler formula gives the sample size as n = z2 1.96 s2/d2 where s = population standard deviation and d= minimum detectable difference.

If the determination of a population proportion, p, is desired with a certain accuracy such that it ranges from the low bound of p-c to the higher bound of p+c, the sample size required is given by the formula n= Ns2 / {(N-1) c + s2}. This formula can be rewritten as n = {N p(1-p) } / {(N-1) c + p (1-p) }. The formula can be rewritten without the ‘c’ term as  n = Np (1-p) / {(N-1)D2 + p (1-p) } where D = c/1.96 where Za/2 = 1.96. A simpler formula for sample size is given as n = {z2/d2}p(1-p).  We can estimate the 1-a% confidence intervals for the proportion computed from the sample as (sample proportion) +/- Za/2 {Var(p)}1/2 and Var(p) = p(1-p) / (n-1) . (N-n)/N. 

STRATIFIED RANDOM SAMPLE
The sample size needed to determine the average with accuracy of +c or –c and 1-a % confidence is given by the expression {å  (Ni2  s2 /ni)} / {N2 (c/za/2)2 + (å Ni si2) where ni = nni. . The sample size needed to determine the proportion with accuracy of +c or –c is given by the expression {å (Ni2  pi (1- pi) / nI} / {N2 ((c/za/2)2 + å Ni  pi  (1- pi )} where ni = nni. The unbiased estimator of the population average is given by the summation åwixi with i=1….i=n. The unbiased estimator of the variance of the average is given as the summation åwi2  si2 / ni    (Ni – ni) / (Ni – 1) from i =1 to i =n. The unbiased estimator of the population proportion is given by the summation å wi pi from i = 1 to i= n. The unbiased estimator of the variance of the proportion is given by the summation å wi2 pi (1- pi) / ni  (Ni – ni ) / (Ni – 1).

MULTI-STAGE RANDOM SAMPLE
In a 2-stage sampling, the sample average is given by the expression M/m å (wi xi-bar). The variance is given by (M/N)2 s01/n  (M-n) / (n-1) + M/m å wi2  si2/ni  (Ni - ni)/(Ni –1).

The sample proportion is given by the expression M/m åwi pi. The variance of the proportion is given by the expression (M/N)2 s02/m (M-m) / (M-1) + M/m å wi2 pi (1- pi)/ pi  (Ni –ni) / (Ni – 1) where M = number of groups in the population, m = number of groups selected in the first stage, N= number of elements in the population, Ni = number of elements in ith group, ni = number of elements selected from the ith group, xi-bar = sample mean from ith group, and pi = sample proportion from ith group.

CLUSTER SAMPLE

C. SAMPLE SIZE FOR INFERENCE ON SAMPLE MEANS
SIMPLE RANDOM SAMPLES
The sample size needed to compare averages of measurements of two independent groups is given by the formula n1 = (1 + 1/r) (Za/2  + Zb)2 sd2 / (m2 - m1)2 where r = n2/n1 (the ratio of the number in group 1 divided by the number on group2), Za/2 = 1.96 for 95% confidence, Zb =   [d/n{nr/(r+1)}1/2] – [Za/2] or in simplified form Zb = [{d – d*} / se(d)] – [Za/2], d = the magnitude of the difference one wishes to detect (the non-null value of the difference), d* = sd = standard deviation of the differences, m1 = average of group 1 and m2 = average of group2. The sample size needed to compare averages of measurements of two matched groups in a matched study is given by the formula n = (Za/2  + Zb)2 sd2 / (m2 - m1)2 where Za/2 = 1.96 for 95% confidence, sd = standard deviation of the differences, sd = average of group 1 and m1 = average of group2. If the correlation coefficient between measurements between the two groups is known, the formula above is adjusted to become n = 2 (1-r)(Za/2  + Zb)2 sd2 / (m2 - m1)2

The values of Za/2 usually used for various levels of significance are as follows: for α = 0.001 Za/2 = 3.291, for α = 0.005 Za/2 = 2.807, for α = 0.01 Za/2 = 2.576, for α = 0.02 Za/2 = 2.326, for α = 0.05 Za/2 = 1.96, for α = 0.10 Za/2 = 1.645 (Jennifer L Kelsey et al Methods in Observational Epidemiology 2nd edition OUP New York and Oxford 1996) 

STRATIFIED RANDOM SAMPLE

MULTI-STAGE RANDOM SAMPLE

CLUSTER SAMPLE

D. SAMPLE SIZE FOR INFERENCE ON 2 SAMPLE PROPORTIONS
SIMPLE RANDOM SAMPLES
The sample size needed to compare percentages (proportions) in two independent samples is given by n1 = [Za/2 (1+1/r)1/2 p(1-p) + Zb {p1(1- p1) + p1(1- p1) + p2(1- p2)/r }2] / [p1 - p2)2] where Za/2 = 1.96 for 95% confidence, Zb =  [{n(d)2r} / {(r +1) p(1-p)}]1/2 – [Za/2], p= (p1 + p2)/2 or weighted p = {p1 + (r)(p0)} / {1 + r}, p1 = proportion in group1 and p2 = proportion in group2. r = n2/n1, n2  = number in group2 and n1 = number in group 1.   In case of an unmatched  case control study, the formula is modified to become n1 = [Za/2 (1+1/r)1/2 p(1-p) + Zb {1/p1(1- p1) + p1(1- p1) + 1/rp2(1- p2) }2] / ln (OR) where Za/2 = 1.96 for 95% confidence, Zb =   for   % power, p= (p1 + p2)/2, p1 = proportion in group1 and p2 = proportion in group2. r = n2/n1, n2  = number in group2 and n1 = number in group1, OR = expected odds ratio.

The relation between p1 and p0 is different according to the effect measure being used. If the effect measure is the odds ratio (OR) the relation is given as p1 = {(p0)(OR)} / {1 + p0 (OR -1)}. If the effect measure is the risk ratio the relation is given as p1 = (p0)(RR).  

The values of Za/2 usually used for various levels of significance are as follows: for α = 0.001 Za/2 = 3.291, for α = 0.005 Za/2 = 2.807, for α = 0.01 Za/2 = 2.576, for α = 0.02 Za/2 = 2.326, for α = 0.05 Za/2 = 1.96, for α = 0.10 Za/2 = 1.645 (Jennifer L Kelsey et al Methods in Observational Epidemiology 2nd edition OUP New York and Oxford 1996) 

STRATIFIED RANDOM SAMPLE

MULTI-STAGE RANDOM SAMPLE

CLUSTER SAMPLE

E. SAMPLE SIZE FOR EXPERIMENTAL STUDIES
CLINICAL TRIALS
Formulas for suitable sample sizes for clinical trials are complicated. Recourse is often made to rules of thumb estimations such as the following. The 50:50 rule of thumb for counted outcome (discrete events) says that for an 80% chance of detecting a 50% relative reduction in event rate, at least 50 events are needed in the control group. The rule of thumb for measured outcomes states that the sample size is approximated by 16 (s/d)2  where s = the standard deviation of individual measurements in each group and d = minimum difference in average measurement that needs to be detected.

In a clinical trial comparing outcome as proportions in 2 groups we use the formula for comparison of 2 proportions that has been discussed above. The experimenter will have to state the following: the alpha level, the study power, and the outcome difference that should be detected by the study. Alpha is usually set at 0.05 or 5%. Study power is usually set at 80% ( a beta level of 0.2).

In a clinical trial comparing outcome as means in 2 groups, we use the formula for comparison of 2 means as discussed before. 

LABORATORY EXPERIMENTS

<Read more…>