search this site.

170717P - PRINCIPLES OF EPIDEMIOLOGY HEALTH RESEARCH COURSE: THE NORMAL CURVE AND PROBABILITY

Print Friendly and PDFPrint Friendly

Presentation at a Course on Principles of Epidemiology Health Research Faculty of Medicine, King Fahad Medical City October 11-12, 2017 by Professor Omar Hasan Kasule Sr. MB ChB (MUK). MPH (Harvard), DrPH (Harvard) Chairman of the Institutional Review Board / Research Ethics Committee at King Fahad Medical City, Riyadh.


LECTURE 5: THE NORMAL CURVE AND PROBABILITY


INTRODUCTION:

Abraham de Moivre first described the formula for the normal curve in 1744. In the 19th century, Pierre Simon Laplace and Carl Friedrich Gauss re-discovered the normal curve each working independently.

Around 1835 Adolph Quetelet first used the normal curve as an approximation to the histogram.

The normal curve may be one of the unifying principles of nature reflecting sunan al llaah. The normal curve fits so many natural data distributions making it very useful in statistics.

The normal curve can be used for data that is initially not normally distributed. Such data can be made normally distributed by suitable mathematical transformations.

The binomial, the Poisson, the t, and the chi square distributions become normal curves if the sample size is large enough.


PROPERTIES & CHARACTERISTICS OF THE NORMAL CURVE:

The normal curve is described fully by its mean and its standard deviation. A standardized normal curve has mean = 0 and standard deviation = 1. 

Two curves may have the same mean but different standard deviations. Two curves may have different means but the same standard deviation. 

The normal curve is perfectly symmetrical about the mean. 

Although continuous, it models discrete data well for large sample sizes. It is asymptotic i.e. approaches the x-axis but never touches it. data


NORMAL CURVE SHOWING MEAN AND STANDARD DEVIATION: 


USE OF THE NORMAL CURVE FOR NON-NORMAL DATA:

Before using the normal curve to model a data set, tests have to be carried out to test the normality of the data.

These tests include: a bell-shaped histogram, a straight line on probability paper, and use of special computer programs. 

If the data is not normal it can be normalized by logarithmic, power, reciprocal, or Z score transformation.


THE Z-SCORE and the AREA UNDER THE CURVE:

The Z score is deviation of a measurement from the mean measured in SD units. The standard normal variable, z, has mean 0 and variance 1 written as z ~ N (0,1).

Z scores are used to compare different data sets, to determine a cut-off or critical value, and to replace the original variable in analysis.

The area under the curve is relative frequency or probability. Mean +/- 1SD covers 68% of observations. Mean +/- 2 SD covers 95% of observations. Mean +/- 3 SD covers 99% of observations.

The area under the curve between mean – 2 SD and mean + 2 SD is the probability of 95% confidence interval (CI).


NORMAL CURVE SHOWING z SCORES:


ESTIMATION:

There are 3 types of estimates: the point estimate, the pooled estimate, and the interval estimate.

Point estimation being just one point may be in error.

Pooled estimation is a weighted combination of parameters from more than one population or sample.

In Interval estimation the confidence interval is stated as the lower confidence level and the upper confidence level using a usual or customary confidence of 95%.

In a common sense way the 95% confidence interval (CI) means that we are 95% sure that the true value of the parameter is within the interval.


VALIDITY vs. PRECISION:

Validity tells us how well an instrument measures what it is supposed to measure.

The mean is a measure of validity (parameter of location).

The standard deviation is a measure of precision (spread).

Validity and precision are both desirable but may not always be achieved simultaneously. A valid measurement may not be precise. A precise measurement may not be valid.