search this site.

200211P - CHOOSING STUDY SUBJECTS: SPECIFICATIONS, SAMPLING, AND RECRUITMENT

Print Friendly and PDFPrint Friendly

Presented at the Clinical Investigator Program (CIP) Seminars 2020 held at College of Medicine, King Saud University, Riyadh on 11 February 2020 by Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard), Member of the Institutional Review Board, King Fahad Medical City


QUESTIONS and CONCLUSIONS

Substantial question,

Statistical question,

Data collection and analysis,

Statistical conclusion,

Substantial conclusion. 


The SCIENTIFIC METHOD and HYPOTHESES

Research without a hypothesis is ‘fishing’.

A hypothesis is a sophisticated guess based on logic or prior knowledge.

Null hypothesis.

Alternative hypothesis.

Prior and posterior probabilities. 


STUDY VARIABLES

Hypothesis tells us what variables to look for.

Variables tell us what population to study.

3 types of variables: independent, dependent, and? Confounding.

Variables can be qualitative or quantitative (discrete and continuous).

The main objective of scientific research is to establish causality (causality).

Humans manipulate causal relations for good or for bad. 


CONCEPT of POPULATION

The word population in statistical usage does not refer to human populations only. It is defined as a set of objects with a common observable characteristic or attribute. 

Population is used in a general sense to refer to any collection of people or objects of interest being studied.

The term population can be used to refer to observations or events. For example, repeated measurements of the same person constitute a population of observations. 

The individual objects, states, or events that are members of the population are called elements.

We talk of population elements to refer to members of the population and sample elements to refer to members of the sample.

Each element has more than one attribute that is measured or counted during the study.


TYPES of POPULATIONS: STUDY and TARGET POPULATIONS - 1

There are two types of populations: study population and target populations.

The sample is selected from the study population (population of interest). 

The study population is definable in an exact way and is part of the target population. 

The study population, also called represented population, is one to which generalizations from the sample can be legitimately made.


TYPES of POPULATIONS: STUDY and TARGET POPULATIONS - 2

The study population and the target population are different due to inadequate sampling or inaccessibility of some units of the target population for sampling.

The relation between the study and target populations need not be defined as a probability. It is not a condition that the study population is representative of the target population.

The target population need not be explicitly stated. It is understood in general terms. It may even be completely theoretical. 

Generalizations are ultimately made to the target population but there is a leap of faith involved from the study to the target population. 

The population studied may be finite or infinite. A list is a finite population whose elements are numbered. 


TYPES of POPULATIONS: STUDY and TARGET POPULATIONS - 1

Statistical inference is valid strictly only for the study population. We are sure that the sample is representative of the study population because it was selected by probability. 

The findings of the sample are internally valid as far as the study population is concerned. 

It is however not valid to refer findings from this one sample to the target population.

Several valid studies of samples are needed to make inferences on the target population.

It is, therefore, true to say that ultimately and in the long-run statistical inferences are about the target population. 


CONCEPT of A SAMPLE

A sample is a subset of the population selected to obtain information on the population.

The term sample can have two meanings: the intended sample and the achieved sample. The intended sample is one that an investigator plans to select from a given population. Practical limitations prevent the selection of all the intended elements into the sample leading to the achieved sample being less than the intended sample. 


CONCEPT of A SAMPLE, con’t…

A sample is a representative subset of the population. Failure to appreciate this can lead to disasters.

In 1936 the Literary Digest Post sent out 10 million ballots, 2.3 million of which were returned and they predicted that Alfred M. Landon would win the US Presidency but Franklin Roosevelt won with a 62% majority. 

The mistake that the pollsters made was to select their sample from the telephone directory. The sample was, therefore, a good representative of the higher socio-economic class and not the whole US voting population because in those days’ telephone ownership was restricted to the rich. 


SAMPLING PLAN and SAMPLING DESIGN

The term sampling plan is used to refer to the whole process of selecting a sample.

The term sampling design is used to refer to both the sampling plan and the estimation methods.

Survey design covers questionnaire design, methods of interview, recruitment, and training of interviewers, data analysis and management, assessment of precision and accuracy.


SAMPLING FRAME

Sampling starts by defining a sampling frame (list of individuals to be sampled).

Then specific methods are used to select the sample from the defined population.

The sampling units are the people or objects to be sampled. }A sampling frame can be looked at as the enumeration of the population by sampling units. 

Sampling can be of 4 types: convenience, quota, and random sampling. 

The main problem is sampling is getting the sampling frame. Once the sampling frame is assembled, sampling is fairly easy.


The RATIONALE for STUDYING SAMPLES 

Data can be collected from the whole population as in the census (all surveyed) or from a sample survey (including some members of the population). Most biostatistics is the study of samples and not target or study populations. There are only a few exceptional cases when the whole population is studied.

Samples and not whole populations are studied for three main reasons. 

o Study of populations is costly and logistically difficult. More manpower is needed and more time is spent in carrying out studies of populations. 

o Due to logistic considerations, it is easier to be more accurate when studying a small sample than when studying the whole population. 

o Some populations are hypothetical. It is not possible to identify or enumerate all their members. There is no way of studying them except by sampling. 


OBJECTIVES of SAMPLING: DESCRIPTIVE & ANALYTIC

Estimation of population parameters (descriptive) e.g. incidence, prevalence.

Estimation of total population (descriptive): it is possible, by using regression techniques, to estimate the total population from the observed characteristics of the sample.

Inference on population from probability samples (analytic): causal relations. 


PROBABILITY and SAMPLING

Sampling is the connecting bridge between probability theory and statistics. The assumption of random sampling is necessary for probability computations to be valid.

Data from a sample helps understand the underlying probabilistic events in the population. The sampling distribution is similar to a probability mass function or the probability density function. 

Probability theory enables inferring sample data to the target population. 

Probability theory also enables the assessment of precision and avoidance of bias in sample selection.

All the above is true if the sample is selected according to the laws of probability i.e. the sample is a valid representative of the population.

A probability or random sample is needed to infer correctly about the population. Other types of samples are non-representative and will give biased information about the underlying population.


DETERMINISTIC vs PROBABILITY MODELS

The deterministic model requires that all physical factors be under the control of the investigator in order to reach a true unbiased conclusion about a given outcome which is logistically difficult.

The probability model is easy and is convenient in that a random selection is made and it is assumed that the distribution of the unknown physical factors in the sample will mirror that of the population. 

Even if the representation of the population is not perfect we are at least assured that investigator bias did not influence the sample structure.


The RANDOM (Probability) SAMPLE 

A random sample is the best and most scientific of all samples. Any element that has the same inclusion probability, stated in other words, has an equal chance of being selected since selection is purely by chance.

In a self-weighing sample inclusion probabilities are the same for all elements. There are some situations in which the selection of sample elements is carried out with unequal inclusion probability in order to gain more precise estimates.


The   RANDOM   ( Probability ) SAMPLE, con’t…

The selection probability can be kept constant throughout the process of sample selection or can be allowed to vary according to pre-determined criteria.

Random does not always assure 100% representativeness, especially for small samples. In general, any sample above 60 elements is generally representative.

A sample may initially be selected as random and representative however by the time data is collected it is no longer random or representative because of differential non-response.


SAMPLING WITH / WITHOUT REPLACEMENT

Sampling with replacement based on the binomial random variable and sampling without replacement based on the hypergeometric random variable.

Both types of sampling satisfy conditions of random sampling.

In practice most sampling is without replacement. There is no difference between the two types of sampling if the sample is small compared to the study population. 


TYPES of SAMPLING PLANS

Simple random sampling is used when the population is approximately homogenous. Simple random sampling has the advantage that its estimators are unbiased i.e. they do not overestimate or underestimate the population parameters.

Stratified random sampling is used when the population can be divided into approximately homogenous groups.

Cluster sampling is used when the population can be divided into like groups that are not necessarily homogenous.


SIMPLE RANDOM SAMPLING

This is the simplest type of random sampling. In simple random sampling, any sample of size n has an equal chance of being selected from the population.

In simple random sampling, all units in the population are at equal risk of being selected into the sample.

The simple random sampling eliminates personal bias. This is because unlike the situation inconvenience or quota sampling, the researcher has no way of pre-determining that a particular member of the population will be included in the sample. 


SIMPLE RANDOM, con’t…

Much of statistics is concerned with the estimation of the magnitude of the sampling error. It is possible to compute the sampling error of the mean, proportions, and variance if the underlying sampling was simple random. The magnitude of the sampling error gives a measure of the precision of the parameters. Knowledge of this precision is necessary to interpret inferential findings.

The accuracy of estimators of the simple random sample can be expressed as a function of sample size, population size, and probability characteristics. 


STRATIFIED RANDOM SAMPLING

In this type of sampling the whole population is divided into groups called strata. It forces the investigator to select some elements from each of the strata thus achieving some sort of balance for the whole sample. 

A pre-determined proportion or fraction of each stratum is randomly selected into the sample. Selection is carried out separately in each stratum using random selection.

The sampling fraction from each stratum may be the same or may vary from stratum to stratum. The variation of sampling fractions enables deliberate over-sampling or under-sampling of some strata. 


STRATIFIED RANDOM SAMPLING, con’t…

The strata may be defined qualitatively or quantitatively. The strata usually employed are: SES (low/ middle / high), sex (male / female), age (young / old), race (black / Caucasian / mongoloid), occupational groups, and geographical units.

A stratified sample has lower variance and is, therefore, more precise than a simple random sample. The reason for this higher precision is that strata are more homogenous than the whole population. 


SYSTEMATIC RANDOM SAMPLING

This type of sampling is used when there is an ordered list ie the population is arranged in some definite and known order. The decision can then be made to include into the sample every nth unit where n may be any number. The first unit is selected at random and then you proceed according to the pre-defined pattern. 

Systematic sampling is less efficient and less accurate than simple random sampling if the sampling interval is the same as the pattern of natural variation in the population. 

It is more efficient than simple random sampling if the sampling interval is not the same periodicity as the population. 

This type of sampling will be invalid if there is a natural repeat order in the sample that repeats exactly every n element where n is the sampling interval. 


SYSTEMATIC RANDOM SAMPLING, con’t…

Systematic sampling can be carried out using tables of random numbers that are widely available.

Modern computers can be programmed to select a sample of any size following a defined systematic pattern. 

Systematic sampling has the advantage that it is quick and is easy to use. The disadvantage of systematic sampling is that it requires assembling a complete sampling frame. 


MULTISTAGE SAMPLING - 1

This is a random sample selected in 2 or more stages. The sample selected at the second stage is a sub-sample of that selected at the first stage. 

An example of a 5-stage multi-stage sampling may involve the following administrative units in descending order: city, neighborhood, block, household, and individual. 

This is done for example when a random sample is selected from each of the 2 gender categories, male and female. Then random samples are selected from each age category of each gender category. 

If a sample of households is selected, that sample is called the primary sampling unit (PSU). Household members selected from each household randomly are called the secondary sampling unit (SSU). 

We can talk of the first stage inclusion probability and the conditional inclusion probability at the second stage. 


MULTISTAGE SAMPLING - 2

The resulting multi-stage sample has the advantage of being balanced with respect to gender, age, or household characteristics.

Multi-stage sampling produces less efficient estimates of population parameters than simple random sampling. 

Multi-stage sampling saves time and money thus becoming cheaper than simple random sampling. 


MULTI-STAGE SAMPLING - 2, con’t…

Multi-stage sampling’s convenience is that it does not require prior enumeration of the entire sampling frame before the start of the sampling process.

Multistage sampling is especially convenient when the complete sampling frame is not known.

Multi-stage sampling has the great advantage of ensuring a balanced representation of the groups that may not occur with simple random sampling.

It is possible to have a sampling scheme that combines stratified with 2-stage sampling.


CLUSTER  SAMPLING  -  1

This is easy, cheap but less precise. Instead of using individuals as sampling units, groups of individuals (clusters) are used. For example, instead of sampling individuals, households may be sampled.

The clusters may be natural or artificial. Clusters are normally selected as natural sub-groupings of the population.

A random sample of clusters is selected and all elements of the cluster are included in the study sample.


CLUSTER  SAMPLING  -  1, con’t…

Cluster sampling can be viewed as a form of a simple random sampling of clusters and not individual sampling units.

Cluster sampling can also be looked at as a form of 2-stage sampling in which all elements of the groups drawn in the first stage are included in the study sample.

Cluster sampling proceeds by selecting geographical units like districts or zip codes. Then a house is selected at random in each unit. A cluster of a given size is then formed around the index house.


CLUSTER  SAMPLING  -  2

Sophisticated methods for this selection have been developed.

A researcher may walk in a straight line in a pre-determined direction while counting until a pre-determined number of houses is counted. These houses together with the index house will then constitute the cluster.

Similar clusters are formed in the other zip codes and members of the households are interviewed as study subjects.

Cluster sampling has several advantages: (a) no need to have a complete sampling frame for the whole population. (b) It is easy, quick, and cheap. (c) Clusters can be selected from the more accessible areas. 


CLUSTER  SAMPLING  -  2, con’t…

Cluster sampling has some disadvantages. (a) It is non-random. (b) It is less precise than the simple random sample because units selected within each cluster are similar to another. Thus a cluster sample produces more similarity than there is in the actual population.

Cluster sampling is used in studies of immunization coverage and in emergency situations.

The sample size for cluster sampling is computed as for the simple random sample and is multiplied by a design factor to account for cluster sampling. The design factor is obtained from previous studies.


CONVENIENCE  SAMPLING

Convenience or casual sampling is subjective.

It is according to the whims of the investigator. 

There is no particular concern for objectivity or representativeness. It is purely subjective.


QUOTA  SAMPLING

A quota sample is a representative sample in the sense that it is deliberately chosen to have the characteristics of the population. 

A fixed number to be selected from each category is fixed in advance.

Each interviewer is given instructions about certain characteristics such as age, sex, SES and is asked to select fixed numbers for each category corresponding to the category's proportion in the population.

This method is systematic at the level of the investigator but very subjective at the level of the interviewer.


QUOTA  SAMPLING, con’t…

Bias is likely in quota sampling. The method cannot ensure that the sample is representative of the population.

It is also not possible to think of all the relevant categories and their classifications in advance to enable correct categorization and determination of the proportions to be selected from each category. It is too expensive to carry out a preliminary study for the sole purpose of determining the categories.


RANDOMIZATION

Randomization in experimental studies e.g. clinical trials.

Randomization is not random sampling.

In randomization you start with one group and randomly divide it up into two or more groups that are compared.

You must choose your sample before randomization.


EPIDEMIOLOGICAL  SAMPLING

Epidemiological samples involve random sampling of human populations.

There are basically three types of sampling schemes: 

o cross-sectional, 

o case-control, 

o follow-up or cohort.


RECRUITMENT for CROSS-SECTIONAL  STUDIES

Sampling methods can be simple random sampling, cluster sampling, systematic sampling, and multi-stage sampling. The sample size is determined using specific formulas.

Cases are identified from clinical examinations, interviews, or clinical records.

Data is collected by clinical examination, questionnaires, personal interview, and review of clinical records.


RECRUITMENT for CASE CONTROL STUDIES

The source population for cases and controls must be the same. 

Cases are sourced from clinical records, hospital discharge records, disease registries, data from surveillance programs, employment records, and death certificates.

Cases are either all cases of a disease or a sample thereof.

Only incident cases (new cases) are selected.

Controls must be from the same population base as the cases and must be like cases in everything except having the disease being studied.


RECRUITMENT for CASE CONTROL STUDIES, con’t…

Information comparability between the case series and the control series must be assured.

Hospital, community, neighborhood, friend, dead, and relative controls are used.

There is little gain in efficiency beyond a 1:2 case-control ratio unless control data is obtained at no cost.


RECRUITMENT   for   FOLLOW  UP STUDIES

A sample is taken from the exposed and another sample is taken from the unexposed. Both the exposed and unexposed samples are followed for the appearance of the disease.

The study cohort is from special exposure groups, such as factory workers, or groups offering special resources, such as health insurance subscribers. Groups are the same.


RECRUITMENT for RANDOMIZED CLINICAL TRIALS

The study protocol describes objectives, the background, the sample, the treatments, data collection and analysis, informed consent; regulatory regulations, and drug ordering.

Trials may be single-center or multicenter, single-stage or multi-stage, factorial, or crossover. You start with defining a study sample.

The aim of randomization in controlled clinical trials is to make sure that there is no selection bias and that the two series are as alike as possible by randomly balancing confounding factors.


RECRUITMENT for RANDOMIZED CLINICAL TRIALS, con’t…

Equal allocation in randomization is the most efficient design.

Methods of randomization include alternate cases and sealed serially numbered envelopes.

Stratified randomization is akin to the block design of experimental studies.

Randomization is not successful with small samples and does not always ensure correct conclusions.


INCLUSION and EXCLUSION CRITERIA

These criteria are used to define the target population narrowly basically to focus on the sample that will have the variables related to the hypothesis.

Demographic criteria such as age and gender can be used liberally but we have to be careful about others because they may be confounders and we end up with selection bias.

In cases of disease criteria are needed to limit the study to the diagnosis that is of interest.