search this site.

0508P - INTRODUCTION TO BIO-STATISTICS

Print Friendly and PDFPrint Friendly

By Professor Omar Hasan Kasule Sr.

Learning Objectives

·        Definition, scope, and role of bio-statistics in medicine
·        Definition of descriptive and inferential biostatistics
·        Difference between substantive and statistical conclusions
·        Uses and limitations of bio-statistics

 

Key Words and Terms



·        Biomathematics
·        Biometry
·        Biostatistics
·        Mathematical Computing
·        Numerical Analysis
·        Numerical Data
·        Statistical Conclusion
·        Statistical Methods
·        Statistical questions
·        Statistics And Decisions
·        Statistics, analytic statistics
·        Statistics, applied statistics
·        Statistics, descriptive statistics
·        Statistics, health Statistics
·        Statistics, inferential statistics
·        Statistics, mathematical statistics
·        Statistics, medical Statistics
·        Statistics, theoretical statistics
·        Substantive Conclusion
·        Substantive Question



Unit Outline

BIOSTATISTICS AS A DISCIPLINE
A. Statistics
B. Biostatistics
C. Importance of Biostatistics
D. Scope of Biostatistics
E. Rationale of Learning Biostatistics

HISTORY OF BIOSTATISTICS
A. Ancient Times:
B. Era of Vital Records:
C. Population Studies
D. Era of Descriptive Statistics
E. Era of Analytic Statistics

LIMITATIONS OF BIOSTATISTICS
A. Statistical Vs Substantive
B. Analysis Vs Interpretation:
C. Misuse of Statistics:
D. Mis-Use of the Computer:


UNIT SYNOPSIS

BIOSTATISTICS AS A DISCIPLINE
The term statistics can be used to convey three meanings. Applied statistics is defined as techniques of articulating, summarizing, analyzing, and interpreting numerical information. Theoretical statistics deals with probability. Statistics are indices or summary statistics derived from data. Bio-statistics is a branch of applied statistics that is management and analysis of numerical data on people, health, disease, medical treatments and procedures. It includes vital statistics, public health statistics, and demography. Biostatistics is divided into 2 branches: descriptive and analytic. Descriptive statistics deals with collection, organization, presentation, and summarization of data. Analytic statistics deals with drawing logical and objective conclusions about a sample or a population. Biostatistics provides the tools for the summary and digestion of a lot of numerical laboratory and clinical data including critical reading and understanding of scientific literature.

HISTORY OF BIOSTATISTICS
Statistics has grown through successive eras: era of censuses, era of vital statistics, era of descriptive statistics, era of analytic statistics, and era of probability statistics. Ancient civilizations counted their populations for taxation and military purposes. Complete census were first carried out in Sweden in 1749, the US in 1790, Spain in 1798, England & Wales in 1801, and Canada in 1871. John Graunt is considered the founder of vital statistics. He analyzed London mortality data and also laid the foundations of the science of demography. William Farr started the modern procedures of vital statistics registration. Pierre Charles Alexandre Louis (1787-1872) introduced the numerical method in describing medical facts quantitatively.

The 19th century and early 20th centuries witnessed many theoretical developments. Karl Pearson (1857-1936) introduced the mode, mean deviation, coefficient of variation, moments, measures of symmetry and kurtosis, the chi-square, symbol of the null hypothesis (H0), type 1 and type 11 errors, homoscedacity and heteroscedacity, and the concept of partial correlation. Sir Arnold Fisher (1890-1962) introduced variance, methods for small samples, factorial designs, the null hypothesis, random allocation, ANOVA, ANCOVA, relation between regression and ANOVA, and testing significance of the regression coefficient. Karl Pearson and RA Fisher developed contingency table analysis using the chi-square test. Adolph Quetelet developed vital statistics in its modern form and introduced the concept of the mean. KF Gauss (1777-1855) introduced the median, re-discovered the normal distribution that has independently been discovered before Pierre Simon Marquis de Laplace (1749-1827) and in 1733 by Abraham de Moivre (1667-1754). Sir Francis Galton used the term ‘normal’ to refer to the curve, applied statistical techniques to natural phenomena, described correlation and regression. W.F. Sheppard introduced the standard normal curve in 1899. C Kremp published the first table of the area under the curve in 1799. J Neyman developed the concept of confidence intervals in 1934. Charles Spearman (1863-1945) and Maurice George Kendall (1907-1983) introduced non-parametric tests.

The bulk of statistical theory is probability theory since modern inferential statistics depends on probability theory. Christian Huygens (1629-1695) was the first one to publish on probability and games. Modern probability theory owes a lot to the pioneers: Blaise Pascal (1623-1662), Pierre de Fermat (1601-1665), Jacques Bernoulli (1654-1705), Nicolas Bernoulli (1687-1759), Abraham de Moivre (1667-1754), Pierre Raymond de Montmart (1678-1719), and Pierre Simon Marquis de Laplace (1749-1827).

LIMITATIONS OF BIOSTATISTICS
An investigator starts with a substantive question that is formulated as a statistical question. Data is then collected and is analyzed to reach a statistical conclusion. The statistical conclusion is used with other knowledge to reach a substantive conclusion.
Statistics has several limitations. It gives statistical and not substantive answers. The statistical conclusion refers to groups and not individuals. It only summarizes but does not interpret data.

Statistics can be misused by selective presentation of desired results. Computation is not an end in itself. It is a tool that can be used well or can be mis-used. A human must have a clear idea of what is required of the computer and must instruct it accordingly. The human must also be able to intelligently interpret the output from the computer.  All who tinker with computers must remember the adage ‘rubbish in/rubbish out’.


EXERCISES: INTERNET RESEARCH
Using the internet to get more information on the following topics
1.       History of population censuses
2.      Brief biographies and contributions of the following pioneers of bio-statistics: Fisher, John Graunt, William Farr, Quetelet, La Place, Bernoulli.
3.      Review the abstract and methodology sections of one journal article in a current issue of any medical journal and draw a table showing the frequency with which the following statistical terms are used: t-test, chi-square test, linear regression, logistic regression, analysis of variance, and p-value.
4.      List at 2-3 statistical packages or programs that are used for (a) data management (b) data analysis (c) both data management and data analysis
5.      Find out and describe the following information about a computer: Random access memory, hard disk memory, speed, type of chip used, byte, bit, local area network, server

EXERCISES: PREPARING A QUESTIONNAIRE FOR DATA COLLECTION
Prepare a mailed questionnaire and collect the following data on members of the class:
·        Identifying information: ID (not real), gender, year of study
·        Sociodemographic information: home address (urban/rural), region of origin (East Coast, West Coast & Central, North, South, Other), primary school (religious, private, public)
·        Family information: number of siblings, paternal grandfather living now? (yes/no) and if dead age at death
·        Wearing glasses for refractive errors (yes/no), age at which glasses were first prescribed, does the father wear glasses (yes/no), does the mother wear glasses (yes/no), does any sibling wear glasses (yes/no)
·        Color preference (choose one color only)
·        Desire to specialize or work as a general practitioner (yes/no)
·        Ideal age for marriage
·        Desired number of children