Presentation at a Training
Program on Biostatistics for physician managers working in Public Health
Administration, Qassim Province on May 1, 2013 by Professor Omar Hasan Kasule
Sr MB ChB (MUK), MPH (Harvard), DrPH (Harvard). EM: omarkasule@yahoo.com
Constants
and variables
A
constant has only one unvarying value under all circumstances for example p and c = speed of light. A random
variable can be qualitative (descriptive with no intrinsic numerical value) or
quantitative (with intrinsic numerical value). A random quantitative variable
results when numerical values are assigned to results of measurement or
counting. It is called a discrete random variable if the assignment is based on
counting. It is called a continuous random variable if the numerical assignment
is based on measurement. The numerical continuous random variable can be
expressed as fractions and decimals. The numerical discrete can only be
expressed as whole numbers. Choice of the technique of statistical analysis
depends on the type of variable.
Qualitative
random variables
Qualitative variables (nominal, ordinal, and ranked)
are attribute or categorical with no intrinsic numerical value. The nominal has
no ordering, the ordinal has ordering, and the ranked has observations arrayed
in ascending or descending orders of magnitude.
Quantitative
(numerical) discrete random variables
The
discrete random variables are the Bernoulli, the binomial, the multinomial, the
negative binomial, the Poisson, the geometric, the hyper geometric, and the
uniform. You are not expected to remember the definitions below but reading
through makes the names of various variables encountered in the medical
literature seem familiar.
The
Bernoulli is the number of successes in a single unrepeated trial with only 2
outcomes. The binomial is the number of successes in more than 2 consecutive
trials each with a dichotomous outcome. The multinomial is the number of
successes in several independent trials with each trial having more than 2
outcomes. The negative binomial is the total number of repeated trials until a
given number of successes is achieved. The Poisson is the number of events for
which no upper limit can be assigned a priori. The geometric is the number of
trials until the first success is achieved. The hyper geometric is the number
selected from a sub-sample of a larger sample for example selecting males from
a sample of n persons from a population N. The uniform has the same value at
repeated trials.
Quantitative
(numerical) continuous random variables
The continuous random variables can be
natural such as the normal, the exponential, and the uniform or artificial such
as chi square, t, and F variables. The normal represents the result of a
measurement on the continuous numerical scale such as height and weight. The
exponential is the time until the first occurrence of the event of interest.
The uniform represents results of a measurement and takes on the same value at
repeated trials.
The continuous R.V can be measured on
either the interval or the ratio scales. Only 2 measurements are made on the
interval scale, the calendar and the thermometer. The rest of measurements are
on the ratio scale. The interval scale has the following properties: the
difference between 2 readings has a meaning, the magnitude of the difference
between 2 readings is the same at all parts of the scale, the ratio of 2
readings has no meaning, zero is arbitrary with no biological meaning, and both
negative and positive values are allowed. The ratio scale zero has the
following properties: zero has a biological significance, values can only be
positive; the difference between 2 readings has a meaning, the ratio of 2
readings has a meaning and can be interpreted, and intervals between 2 readings
have the same meaning at different parts of the scale.
Random
variables: properties and mathematical operations
A random variable has 6 properties
described below. You are not expected to remember these at this stage but
familiarity with the terminology will make reading scientific literature so
much easier.
The expectation of a random variable is a
central value around which it hovers most of the time. The variations of the
random variable around the expectation are measured by its variance. Covariance
measures the co-variability of the two random variables. Correlation measures
the linear relation between two random variables. Skew ness measures the bias
of the distribution of the random variable from the center. Kurtosis measures
the peaked ness of the random variable is at the point of its expectation.
Statistical
distributions are graphical representation of mathematical functions of random
variables. Each random variable mentioned above has a corresponding statistical
distribution that specifies all possible values of a variable with the
corresponding probability. Each statistical distribution is associated with a
specific statistical analytic technique.