search this site.

0508P - CONTINUOUS DATA SUMMARY 1: MEASURES OF CENTRAL TENDENCY

Print Friendly and PDFPrint Friendly

By Professor Omar Hasan Kasule Sr.


Learning Objectives:
·        Use of averages in data summary
·        Definition, properties, advantages, and disadvantages of various types of averages
·        Relations among the various averages
·        Choice of average to use

Key Words and Terms:
·        Arithmetic mean, indexed mean
·        Arithmetic mean, robust mean
·        Arithmetic mean, the midrange
·        Arithmetic mean, weighted mean
·        Mean, arithmetic mean
·        Mean, geometric mean
·        Mean, harmonic mean
·        Median
·        Mode

Unit Outline
CONCEPT OF AVERAGES
A. Biological Basis
B. Theoretical Basis
C. Purpose
D. Common Measures

MEANS
A. Arithmetic Mean
B. Geometric Mean
C. Harmonic Mean
D. Weighted Mean
E. Indexed Mean

MODE
A. Definition
B. Properties
C. Advantages:
D. Disadvantages

MEDIAN
A. Definition
B. Mathematical Properties
C. Advantages
D. Disadvantages


UNIT SYNOPSIS
CONCEPT OF AVERAGES
Biological phenomena vary around the average. The average represents what is normal by being the point of equilibrium. The average is a representative summary of the data using one value. Three averages are commonly used: the mean, the mode, and the median. There are 3 types of means: the arithmetic mean, the geometric mean, and the harmonic mean. The most popular is the arithmetic mean. The arithmetic mean is considered the most useful measure of central tendency in data analysis. The geometric and harmonic means are not usually used in public health. The median is gaining popularity. It is the basis of some non-parametric tests as will be discussed later. The mode has very little public health importance.

MEANS
The arithmetic mean is the sum of the observations' values divided by the total number of observations and reflects the impact of all observations. The robust arithmetic mean is the mean of the remaining observations when a fixed percentage of the smallest and largest observations are eliminated. The mid-range is the arithmetic mean of the values of the smallest and the largest observations. The weighted arithmetic mean is used when there is a need to place extra emphasis on some values by using different weights. The indexed arithmetic mean is stated with reference with an index mean. The consumer price index (CPI) is an example of an indexed mean. The arithmetic mean has 4 properties under the central limit theorem (CLT) assumptions: the sample mean is an unbiased estimator of the population mean, the mean of all sample means is the population mean, the variance of the sample means is narrower than the population variance, and the distribution of sample means tends to the normal as the sample size increases regardless of the shape of the underlying population distribution.

The arithmetic mean enjoys 4 desirable statistical advantages: best single summary statistic, rigorous mathematical definition, further mathematical manipulation, and stability with regard to sampling error. Its disadvantage is that it is affected by extreme values. It is more sensitive to extreme values than the median or the mode. The geometric mean (GM) is defined as the nth root of the product of n observations and is less that the arithmetic mean for the same data. It is used if the observations vary by a constant proportion, such as in serological and microbiological assays, to summarize divergent tendencies of much skewed data. It exaggerates the impact of small values while it diminishes the impact of big values. Its disadvantages are that it is cumbersome to compute and it is not intuitive. The harmonic mean (HM) is defined as the arithmetic mean of the sum of reciprocals for a series of values. It is used in economics and business and not in public health. Its computation is cumbersome and it is not intuitive.

MODE
The mode is the value of the most frequent observation. It is rarely used in science and its mathematical properties have not been explored. It is intuitive, easy to compute, and is the only average suitable for nominal data. It is useless for small samples because it is unstable due to sampling fluctuation. It cannot be manipulated mathematically. It is not a unique average; one data set can have more than 1 mode.
 
MEDIAN
The median is value of the middle observation in a series ordered by magnitude. It is intuitive and is best used for erratically spaced or heavily skewed data. The median can be computed even if the extreme values are unknown in open-ended distributions. It is less stable to sampling fluctuation than the arithmetic mean.