By Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Consultant on Research Almaarefa University
Empirical scientific research is observing and
measuring objects and events to be able to reach generalizable knowledge.
Generalization would be easily achieved if all objects and events are observed
or measured but this is logistically impossible. For example, if we wanted to
reach generalizable knowledge on distribution of blood pressure in the Saudi
population we would have to locate and measure each one of the over 40 million citizens
and residents which is an impossible task. Even if for argument’s sake we say
we can reach all the 40+ million the data will be of low quality because it is
impossible to recruit enough researchers and travel to all cities, valleys and
mountains of the country and measure blood pressure in the same standard way.
Data collected on so many will have many mistakes and will be of low quality.
We therefore have no choice but to study a small group of people let us say
5000 in the hope that data from them can be correctly collected so that it is generalizable
to apply to the whole country.
If we choose the sample from the whole population
randomly i.e., each member has an equal chance of being in the sample, we say
that we have a representative or scientific sample. All researchers, therefore, aim at random samples. Statistical formulas were developed and worked most
efficiently with random samples. Selection of a random sample is not easy in
practice. We may use a computer to generate a sample from a database of the
population if we have one. We may also choose a stratified random sample which
is selecting a random sample separately in each group let us males and females
and then combining. We may also select the sample in stages (multi-stage) for
example we pick 10,000 and then pick randomly pick 500 from them. There are
situations in which we cannot select individuals and we resort to selecting
groups or clusters. For example, we may select 10 houses at random and form a
cluster of 5 neighboring houses around each index house. Our sample will then
be the inhabitants of the index and surrounding houses.
In many situations, it is not possible to choose a
random sample and we resort to the less accurate non-random or non-scientific
sampling. We shall discuss these nonrandom samples in the next article.