search this site.

200107P - DATA SUMMARY AND PRESENTATION 4: CONTINUOUS DATA SUMMARY A MEASURES OF DISPERSION/VARIATION

Print Friendly and PDFPrint Friendly

Presentation to the Module I: Clinical Epidemiology at a Clinical Research Coordinator Course held on 5-9 January 2020 at Faculty of Medicine, King Fahad Medical City, Riyadh. by Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard), Chairman of the KFMC IRB


LEARNING OBJECTIVES:

Definition, properties, advantages, and disadvantages of common measures of variation: variance, standard deviation, and z-score

Definition and use of quartiles and percentiles

Relation among percentile, standard deviation, and area under a normal curve


KEYWORDS AND TERMS:

Analysis of Variance

Range

Standard deviation 

Variance

Z-Score / Standard Score 


INTRODUCTION

Variations are biological, measurement, or temporal.

Time series analysis relates biological to temporal variation.

Analysis of variance (ANOVA) relates biological variation (inter- or between-subject) to measurement variation (intra- or within-subject) variation.

Biological variation is more common than measurement variation.

Temporal variation is measured in calendar time or in chronological time. 


MEASURES OF VARIANCE: Types 

Measures of variation can be classified as absolute (range, inter-quartile range, mean deviation, variance, standard deviation, quantiles) or relative (coefficient of variation and standardized z- score). 

Some measures are based on the mean (mean deviation, the variance, the standard deviation, z score, the t score, the stanine, and the coefficient of variation) whereas others are based on quantiles (quartiles, deciles, and percentiles).


MEASURES OF VARIANCE BASED ON THE MEAN: Overview

Mean deviation is the arithmetic mean of absolute differences of each observation from the mean. It is simple to compute but is rarely used because it is not intuitive and allows no further mathematical manipulation. 

The variance is the sum of the squared deviations of each observation from the mean divided by the sample size, n, (for large samples) or n-1 (for small samples). It can be manipulated mathematically but is not intuitive due to the use of square units.

The standard deviation, the commonest measure of variation, is the square root of the variance. It is intuitive and is linear and not in square units. The standard deviation, s, is from a population but the standard error of the mean, s, is from a sample with s being more precise and smaller than s.

The relation between the standard deviation, s, and the standard error, s, is given by the expression s = s /(n-1) where n = sample size. 


MEASURES OF VARIANCE BASED ON THE MEAN: The Standard Deviation

The percentage of observations covered by mean +/- 1 SD is 66.6%, mean +/- 2 SD is 95%, and mean +/- 4 SD is virtually 100%.

Advantages of the standard deviation: it is resistant to sampling variation, it can be manipulated mathematically, and together with the mean it fully describes a normal curve. 

Disadvantages of the standard deviation: it is affected by extreme values. The standardized z- score defines the distance of a value of an observation from the mean in SD units. 


MEASURES OF VARIANCE BASED ON THE MEAN: The Standard Deviation

The percentage of observations covered by mean +/- 1 SD is 66.6%, mean +/- 2 SD is 95%, and mean +/- 4 SD is virtually 100%.

Advantages of the standard deviation: it is resistant to sampling variation, it can be manipulated mathematically, and together with the mean it fully describes a normal curve. 

Disadvantages of the standard deviation: it is affected by extreme values. The standardized z- score defines the distance of a value of an observation from the mean in SD units.


MEASURES OF VARIANCE BASED ON THE MEAN: The Coefficient of Variation

The coefficient of variation (CV) is the ratio of the standard deviation to the arithmetic mean usually expressed as a percentage. 

CV is used to compare variations among samples with different units of measurement and from different populations. 


MEASURES OF VARIATION BASED ON QUANTILES: Overview

Quantiles (quartiles, deciles, and percentiles) are measures of variation based on a division of a set of observations (arranged in order by size) into equal intervals and stating the value of observation at the end of the given interval. Quantiles have an intuitive appeal.

Quartiles are based on dividing observations into 4 equal intervals. Deciles are based on 10, quartiles on 4, and percentiles on 100 intervals. The inter-quartile range, Q3 - Q1, and the semi interquartile range, 1⁄2 (Q3 - Q1) have the advantages of being simple, intuitive, related to the median, and less sensitive to extreme values. 

Quartiles have the disadvantages of being unstable for small samples and not allowing further mathematical manipulation.

Deciles are rarely used.


MEASURES OF VARIATION BASED ON QUANTILES: Percentiles

Percentiles, also called centile scores, are a form of cumulative frequency and can be read off a cumulative frequency curve. They are direct and very intelligible.

The 2.5th percentile corresponds to mean - 2SD.

The 16th percentile corresponds to mean - 1SD.

The 50th percentile corresponds to mean + 0 SD. 

The 84th percentile corresponds to mean + 1SD. 10 

The 50th percentile corresponds to mean + 0 SD.

The 84th percentile corresponds to mean + 1SD.

The 97.5th percentile corresponds to mean + 2SD. 


OTHER MEASURES OF VARIATION: The Range

The full range is based on extreme values. It is defined by giving the minimum and maximum values or by giving the difference between the maximum and the minimum values.

The modified range is determined after eliminating the top 10% and bottom 10% of observations. The range has several advantages: it is a simple measure, intuitive, easy to compute, and useful for preliminary or rough work. 

Disadvantages of the range: it is affected by extreme values, it is sensitive to sampling fluctuations, and it has no further mathematical manipulation.