Presentation to the Module I: Clinical Epidemiology at a Clinical Research Coordinator Course held on 5-9 January 2020 at Faculty of Medicine, King Fahad Medical City, Riyadh. by Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard), Chairman of the KFMC IRB
LEARNING OBJECTIVES:
• Use of averages in the data summary
• Definition, properties, advantages, and disadvantages of various types of averages
• Relations among the various averages
• Choice of average to use
KEYWORDS AND TERMS:
• Mean, Arithmetic mean
• Mean, geometric mean
• Mean, harmonic mean
• Median
• Mode
CONCEPT OF AVERAGES:
• Biological phenomena vary around the average. The average represents what is normal by being the point of equilibrium. The average is a representative summary of the data using one value.
• Three averages are commonly used: the mean, the mode, and the median.
• There are 3 types of means: the arithmetic mean, the geometric mean, and the harmonic mean. The most popular is the arithmetic mean.
• The arithmetic mean is considered the most useful measure of central tendency in data analysis.
• The geometric and harmonic means are not usually used in public health.
• The median is gaining popularity. It is the basis of some non-parametric tests as will be discussed later.
• The mode has very little public health importance.
ARITHMETIC MEAN: Types
• The arithmetic mean is the sum of the observations' values divided by the total number of observations and reflects the impact of all observations.
• The robust arithmetic mean is the mean of the remaining observations when a fixed percentage of the smallest and largest observations are eliminated.
• The mid-range is the arithmetic mean of the values of the smallest and the largest observations.
• The weighted arithmetic mean is used when there is a need to place extra emphasis on some values by using different weights.
• The indexed arithmetic mean is stated with reference to an index mean. The consumer price index (CPI) is an example of an indexed mean.
ARITHMETIC MEAN: Properties
• The arithmetic mean has 4 properties under the central limit theorem (CLT) assumptions.
• The sample mean is an unbiased estimator of the population mean.
• The mean of all sample means is the population mean.
• The variance of the sample means is narrower than the population variance.
• The distribution of sample means tends to the normal as the sample size increases regardless of the shape of the underlying population distribution.
ARITHMETIC MEAN: Advantages
• Best single summary statistic,
• Rigorous mathematical definition,
• Further mathematical manipulation,
• Stability with regard to sampling error.
ARITHMETIC MEAN: Disadvantages
• It is affected by extreme values.
• It is more sensitive to extreme values than other types of the mean.
OTHER TYPES OF MEAN: Geometric mean
• The geometric mean (GM) is defined as the nth root of the product of n observations and is less than the arithmetic means for the same data.
• It is used if the observations vary by a constant proportion, such as in serological and microbiological assays, to summarize divergent tendencies of very skewed data.
• It exaggerates the impact of small values while it diminishes the impact of big values.
• Its disadvantages are that it is cumbersome to compute and it is not intuitive.
OTHER TYPES OF MEAN: Harmonic mean
• The harmonic mean (HM) is defined as the arithmetic mean of the sum of reciprocals for a series of values.
• It is used in economics and business and not in public health. Its computation is cumbersome and it is not intuitive.
MODE:
• The mode is the value of the most frequent observation.
• It is rarely used in science and its mathematical properties have not been explored.
• It is intuitive, easy to compute, and is the only average suitable for nominal data.
• It is useless for small samples because it is unstable due to sampling fluctuation.
• It cannot be manipulated mathematically.
• It is not a unique average, one data set can have more than 1 mode.
MEDIAN:
• The median is the value of the middle observation in a series ordered by magnitude.
• It is intuitive and is best used for erratically spaced or heavily skewed data.
• The median can be computed even if the extreme values are unknown in open-ended distributions. It is less stable to sampling fluctuation than the arithmetic mean.
INTERRELATIONS AMONG AVERAGES:
• Mean = mode = median for symmetrical data.
• Mean > median for right-skewed data.
• Mean < median for left-skewed data.
• In general, mode-median = 2(median-mean).
• The mean with the standard deviation is best used to summarize symmetrical data.
• The median with inter-quartile ranges is best used to summarize skewed data.
• For some data sets it is best to show all the 3 types of averages.
MATHEMATICAL OPERATIONS ON AVERAGES:
• The following rules govern mathematical operations on averages involving constants.
• If a constant is added to each observation, the same constant is added to the average.
• If a constant is subtracted from each observation, the same constant is subtracted from the average.
• If a constant is multiplied into each observation, the average is multiplied by the same constant. If each observation is divided by a constant, the average is divided by the same constant.