Background reading by Professor Omar Hasan Kasule Sr. for the July 19-23 sessions of the course 'Essentials of Epidemiology in Public Health' at the Department of Social and Preventive Medicine University of Malaya
1.1 INTRODUCTION TO GENERAL EPIDEMIOLOGY
1.1.1 DEFINITION, SCOPE, and CLASSIFICATION
Epidemiology is the study of the distribution and determinants of both disease and injury. Two triads are involved in epidemiology: (a) the agent, host, and environment triad and the time, place, and person triad. The primary goals of epidemiology are prevention, control, and, in rare instances, eradication disease and injury. Epidemiology started as a study of epidemics and extended to cover infectious disease and later non-infectious diseases. It has now become a methodological discipline that is used to study disease and non-disease phenomena.
Qualitative epidemiology deals with qualitative descriptions. Quantitative epidemiology deals with numerical descriptions. Observational epidemiology is based on observation of human phenomena. Experimental epidemiology involves assessment of the effects of intervention against a disease phenomenon. Theoretical epidemiology deals with mathematical and methodological issues. Descriptive epidemiology describes the patterns of disease occurrence in terms of place, time and person. Analytic epidemiology seeks to discover the underlying causes of diseases.
Public-health epidemiology deals with preventive medicine. Clinical epidemiology deals with diagnosis, management, and prognosis of disease. Hospital epidemiology deals with nosocomial infections and other aspects of hospital operations that can be studied using epidemiological methodology. Drug or pharmaco-epidemiology studies phenomena of adverse reactions and side-effects of drugs. Genetic epidemiology studies the patterns of inheritance of disease from parents and how genetic and environmental factors interact in the final pathway of disease causation. Molecular epidemiology deals with phenomena at the molecular level. Occupational epidemiology studies diseases due to exposure to hazardous material or working conditions in the work-place. Environmental epidemiology studies the impact of air, water, and soil pollution on health. The supporting disciplines of epidemiology are clinical sciences, demographical sciences, data and information sciences, behavioral sciences, and environmental sciences.
1.1.2 IMPORTANCE and PIONEERS OF EPIDEMIOLOGY
Epidemiology is used in clinical medicine, public health, and actuarial sciences. The major activities of an epidemiologist are: study design including selection of the study sample, data collection, data analysis, data interpretation, and initiation of action programs to prevent disease and promote health. Professional practice and careers in Epidemiology are in government (Ministry of Health), universities, hospitals, and the private sector (drug manufacturers), and research institutes. Famous epidemiologists contributed to the early growth of the discipline. Hippocrates made the first recorded epidemiological observations by describing the relation of disease to climate and geography. John Snow (1813-1858) recognized the importance of field epidemiology in his study of the London cholera and its relation to water pollution William Budd (1811-1880) described the spread of typhoid due to ingestion of infected material from patients. William Furr realized that cycles of epidemics could be described mathematically. Major Greenwood (1880-1949) was chief of epidemiology and vital statistics at the London School of Hygiene and Tropical Medicine worked on models of epidemics.
1.1.3 EPIDEMIOLOGIC METHODOLOGY:
1.1.4 HISTORICAL EVOLUTION OF EPIDEMIOLOGIC KNOWLEDGE
Five stages can be identified in the evolution of epidemiological knowledge. The ancient period up to 1500, the post renaissance period 1500-1750, the sanitary period 1750-1870, the infectious disease period 1870-1945, and modern epidemiology period starting in 1945 (also considered the chronic disease period).
In the ancient period, inter-personal disease transmission, connection between diseases and the environment, quarantine and isolation were known. In 400 BC Hippocrates suggested the relation between disease on one side and lifestyle and environmental factors on the other side.
The post renaissance period witnessed rapid growth of knowledge of pathology, and transmission as well as control of disease. In the 1660s Bacon and others developed inductive logic that provided a philosophical basis for epidemiology. Girolamo Fracastoro (1478-1553) suggested that disease spread by direct contact and by small living particles. In 1683 Van Leeuwenhoek saw microorganisms under the microscope. In 1662 Captain John Graunt analyzed births and deaths and described disease in population quantitatively with significant epidemiological observations and determinations. In 1747 James Lind discovered the prevention of scurvy by conducting one of the first experimental trials on humans. In 1798 Edward Jenner discovered vaccination. Ramazzini wrote on occupational health in 1770. Percival Pott (1713-1788) associated scrotal cancer to chimney soot.
In the sanitary period concern was about environmental correlates of disease; quarantine and isolation were used for disease control.
During the infectious disease period, the microbial basis of disease became firmly established when Louis Pasteur (1822-1895) and Robert Koch (1843-1900) developed the germ theory through experimentation. Dr Robert Koch the father of bacteriology identified causative organisms of anthrax (1876), tuberculosis (1882), and cholera (1883). He developed Koch’s postulates which were criteria for determining an infectious etiology of disease. In 1847 Ignaz Philip Semmelweis suggested hand-washing to avoid obstetric infection. John Snow described the association between cholera and contaminated water by forming and testing a series of hypotheses thus being a pioneer of analytic epidemiology. William Budd in 1857-73 concluded that typhoid was contagious. In 1839 William Farr started the discipline of vital statistics as a system of regular collection and interpretation of data and set up a system for routine summaries of causes of death. Joseph Lister introduced antiseptic surgery in 1865. Manson Barr, Bruce-Chwatt and others studied the transmission of mosquito-borne infections, malaria and yellow fever.
Towards the end of the infectious disease period, there were developments in knowledge of non-infectious disease and statistical methodology. Non-infectious diseases (nutritional, occupational, psychiatric, and environmental) were identified and were studied. In 1905 beriberi was found associated with eating milled rice. In 1920 Joseph Goldberger published a descriptive field study relating pellagra to diets high in cereal & canned foods and free of fresh animal products. Elmer McCollum a Professor at Johns Hopkins since 1918 discovered vitamin-deficiency diseases. Statistical theory and practice developed rapidly towards the close of the 19th century to keep up with developments in basic research and public health all of which required statistical analysis.
The period of modern epidemiology starting in 1945 is the chronic disease epoch. By 1945 there was convergence of the non-mechanistic concepts of disease (environment, social, and behavioral basis of disease) and the mechanistic concepts of disease (molecular, biological, gent-host interaction). Health was defined in a broad sense as: physical, mental, psychological, and spiritual well-being. Scientists recognized the multi-causal nature of disease (genetic, psycho-social, physiological, and metabolic). The period is witnessed a demographic transition (ageing populations) as an epidemiologic transition (change from communicable to non-communicabe diseases). It also witnessed major studies that helped redefine the direction of epidemiology and public health. In 1949 the Framingham Heart Study was began as the first cohort study of the causative factors of cardiovascular disease. In 1950 Doll and Hill, Levin et al, Schreck et al. and Wynder and Graham published the first case control studies of smoking and lung cancer. In 1954 the Field trials of the Salk polio vaccine were the largest formal human experiment. In 1971-1972 the North Karelia Project and the Stanford Three Community studies were launched as the first community-based cardiovascular disease prevention programs. Further methodological developments were witnessed in this period. In 1960 MacMahon published the first epidemiology textbook with systematic treatment of study design. In 1959 Mantel and Haenszel developed statistical procedures for case control studies. In the 1970s logistic regression and log-linear regression were developed as new multivariate analytic methods. In the 1970s – present new developments in computer hardware and software. In the 1990s molecular techniques are being applied to study of large populations.
1.1.5 ETHICO-LEGAL ISSUES IN EPIDEMIOLOGY
A study involving humans must get approval from a recognized body. For approval the study must fulfill certain criteria. It must be scientifically valid. It is unethical to waste resources (time and money) on a study that will give invalid conclusions. In 1992 the Council for International Organizations of the Medical Sciences published ‘Guidelines for Ethical Review of Epidemiological Studies. Among ethical considerations are: individual vs. community rights, benefits vs. risks, informed consent, privacy and confidentiality, and conflict of interest.
Study interpretation and communication of findings to the public pose problems. Risk reports that are not yet confirmed are picked up by the media and create unnecessary public concern. Study findings affect policy. Epidemiologists must know how to communicate risk to the public. It is an ethical obligation to report research findings to subjects so that they may take measures to lessen risk. Epidemiological evidence is different from legal evidence. Epidemiological evidence may not be accepted in a court of law because it has few certainties; it is concerned with populations whereas legal evidence pertains to individuals.
1.2 INTRODUCTION TO CLINICAL EPIDEMIOLOGY
1.2.1 DEFINITIONS, SCOPE, and ROLES
Clinical epidemiology is defined as study of outcome of disease and the factors that affect the variation in outcome. It applies epidemiological methodology to patient care. The ‘exposures’ are therapy and the etiological factor. The ‘outcomes’ are disease progression, disease complications, and mortality. The scope of clinical epidemiology covers definition of abnormality, diagnosis (symptoms, signs, and diagnostic procedures), frequency, risk, prognosis, treatment, and prevention. It also involves study of the natural history of disease, study of the sensitivity, specificity, and predictive value of diagnostic and screening tests; study of therapy using randomized clinical trials. It also studies treatment efficacy, and effectiveness. It is used in the investigation of disease etiology, identification of risks, identifying syndromes, classifying diseases, differential diagnosis, planning and follow-up of treatment. The conceptual basis and methodology of clinical epidemiology are the same as those of general population-based epidemiology.
1.2.2 HISTORICAL EVOLUTION OF CLINICAL MEDICINE
In Pre-historic times magic and superstition were mixed with medical care and were closely related to the prevailing belief systems. Mesopotamian medicine was magico-religious with priests serving also as physicians. Divination was used beside treatment with vegetables, animals, and minerals. Early Egyptian medicine was mystical and priestly. Pills, potions, suppositories, purgatives, enemas, emetics, inhalants, and ointments were used. Egyptians knew circumcision, plaster for closing wounds, cautery for hemostasis, and incision & drainage for abscesses. Chinese medicine can be traced back to Fu Hse in about 3322BC. Traditional Indian Medicine knew the diseases of tuberculosis, cancer, diabetes mellitus, leprosy, and smallpox. Diagnoses were made by listening to the breath sounds, observing the color of eyes, the tongue, and the skin; feeling the pulse; tasting the sweetness of urine. Stress was put on diet, hygiene, and mental preparation. Herbs were used. Rauwolfia serpentina and opium were used as drugs. Indians knew excision, suturing, drainage, cauterization, laparatomy, removal of bladder stones, repair of fistulae, cesarean section, and cataract removal. The ancient Indian medicine is what has grown into Ayurdevic medicine of India today.
Greek medicine benefited by learning from Asia Minor, Mesopotamia, and Egypt . It was closely related to religions and the temples. The main figures of Greco-Roman medicine were Hippocrates, Galen, and Aristotle. Normal physiology, disease, and treatment were based on the concepts of the 4 humors (blood, phlegm, yellow bile, and black bile), the 4 elements (earth, air, fire, and water), and the 4 qualities (hot vs. cold and wet vs. dry). Rest and diet were used in treatment. With the death of Galen in 199 BC Greco-Roman medicine entered its dark ages. Aulius Cornelius Celsius formulated the 4 cardinal signs of inflammation: pain, redness, heat, and swelling. Romans made practical but no theoretical contributions to the development of medicine.
Muslim medicine started with translations. Nestorians had translated Greek medical works into Syriac. Muslims translated these Greek works into Arabic. They in addition made their own observations and discoveries. Abubakar al Razi described measles and wrote al Hawi. Ibn Sina wrote al Qanun fi al Tibb. Zahrawi wrote on surgery.
European medicine in the Middle Ages was in general decay. The Christian Church contributed to lack of scientific growth. Superstition became wide spread. Muslim medicine was transferred to Europe through Andalusia (modern Spain ) and Sicily in Italy creating the medieval medical reawakening (800-1500 CE). As a result of contacts with Muslims, the Salermo medical school emerged in Southern Italy in the 9th century CE. Muslim medical writings were translated into Latin by Constantine Africanus (1010-1087 CE) at Salermo. Muslim works were expanded and annotated by Europeans and were taught at Universities in Bologne and elsewhere. The Renaissance (1500-1700 CE) was a period of the rise of anatomical knowledge. Besides translations of Muslim writings, Europeans undertook human dissections. Following the renaissance rapid developments were mad in physiology, microbiology, Pathology, and Pharmacology by use of empirical research. In clinical sciences, internal medicine lagged behind surgery. In 1636 CE clinical teaching of medicine started at Leyden . Schools of medicine developed in the 18th century in London , Vienna , Paris , Edinburgh , and Dublin . The industrial revolution witnessed building of new hospitals. Modern medicine (20th century CE onwards) has three distinguishing characteristics. It is evidence-based medicine. It uses advanced technological interventions. It uses human experimentation extensively.
1.2.3 NATURAL HISTORY OF ILLNESS
Natural history is the course of disease in patients who are not receiving any therapy or intervention. Knowledge of the natural history is necessary for planning rational treatment strategies. The stages of disease on the basis of clinical manifestation are: essentially normal (low risk), Establishment of disease-causing agent, appearance of signs, appearance of symptoms, disability, and death.
1.2.4 CLINICAL EPIDEMIOLOGY IN DIAGNOSIS
Disease is anatomical, biochemical, physiologic or psychological derangement. Clinical diagnosis is an effort to recognize the class or group to which a person's illness belongs. Epidemiology as the study of the distribution and determinants of disease provides background information needed in clinical diagnosis. Statistical abnormality, used to define disease, is defined as deviation beyond 2 standard deviations. The most often used strategy in clinical diagnosis is the hypothetico-deductive in which a hypothesis is formed from early clues and then history, clinical examination, and diagnostic tests are undertaken to confirm or reject the hypotheses. Epidemiological knowledge provides prior probabilities for clinical decision making. The clinician combines his empirical findings with the prior probabilities top reach a diagnosis. This may be informal or formal using Bayesian techniques. In formal clinical decision making, the problem is defined. Alternative actions and possible outcomes are determined. Probabilities are determined on the decision tree and the value of the outcome is computed.
Diagnostic tests are used for assessing severity, predicting prognosis, estimating likely response to treatment, and to determine the actual response to treatment. The precision of each test, its sensitivity and specificity must be taken into consideration in interpreting its findings. Diagnostic procedures can be evaluated by computing their predictive value. Epidemiological parameters are used to choose a diagnostic procedure. Diagnostic tests are also useful in predicting illness outcome. Random controlled trials, follow up and case control study designs can be used to assess the role of diagnostic tests in predicting outcome. A hospital stores a lot of clinical data about patients. This data may be shared with other hospitals using local area net-works (LAN). Data-bases have been developed with AI capabilities and they can provide much support to the physician who is trying to diagnose a disease. This is done by comparison of the patient's data with several profiles stored in the data-base.
1.2.5 CLINICAL EPIDEMIOLOGY IN TREATMENT and PROGNOSIS
The objective of treatment must be identified: cure vs. palliation. The specific treatment modalities to be used must then be selected. Treatment targets must then be decided: dose, frequency, start, and end. Treatment decisions are based on clinical experience or medical literature. Formal decision analysis techniques using prior probabilities from clinical epidemiological studies can be used. Decision trees are used with decision nodes employing probabilities are from empirical data. Epidemiological measures of treatment are efficacy, effectiveness, safety, incidence of side effects, incidence of treatment failure, compliance, and functional status. The clinical data base can be used to predict patient compliance. Functional status is measured using: restricted activity days, workless days, bed-disability days. Clinical practice guidelines have been developed for many conditions. They are based on results of clinical trials and epidemiologic studies. The guidelines can be evaluated using specific criteria. Both randomized and non-randomized clinical trials are used in studying treatment efficacy. Therapeutic safety can be measured using case control, follow up, and case reports. Two main issues arise: characterization of patients who receive a particular treatment and ascertainment of unintended effects. Prognosis can be based on clinical experience or expert opinion and review of literature. Prognosis can also be assessed by comparing the patient's profile to the information in the clinical data-base
1.2.5 CLINICAL TRIALS ON HUMANS
Therapeutic clinical trials are controlled experiments to compare the effectiveness of different treatments by random allocation of study participants to treatment and control groups, observing the outcome of interest, and at the same time studying time-varying potential confounding variables. Trials can also be designed to be intervention or preventive trials. They may be accrual or non-accrual. Unadjusted censoring causes bias. Random allocation prevents selection bias. Double blinding prevents observer bias. The primary objective of drug clinical trials is efficacy and the secondary objective is assessing ADR. Complete randomization is simple but requires a large sample size. Stratified randomization balances prognostic factors. The trial can use historical, concurrent, self, untreated, placebo, negative, or positive controls. Clinical trials are preceded by screening in vivo in animals and in vitro in human tissues. Phase 1 trials study maximum tolerated doses, drug administration schedules, drug toxicity, and evidence of anti-tumor activity. Phase 2 studies assess therapeutic activity of a drug in advanced disease. Phase 3 trials are compare a drug to a placebo or a new drug to an existing drug. Comparability is assured by randomization and equal handling of the 2 groups. Phase 4 studies involve post-marketing surveillance by collecting data on short term and long term effects. Clinical trials on humans have several ethico-legal considerations. Search for better treatment justifies clinical trials. The ethical issues of trials are withholding a potentially beneficial treatment from the controls, unknown risks of new agents, lack of informed consent or consent under stress, trials if an effective treatment exists, trial when one treatment is known to be better, testing with no evidence of usefulness, unscientific research, violation of the normal doctor-patient relation, randomization when there is prior knowledge that one treatment is the better one, and failure to stop the study when harmful/beneficial effects appear.
1.3 INTRODUCTION TO PUBLIC and COMMUNITY HEALTH
1.3.1 DEFINITION OF PUBLIC HEALTH
Public health is the sum of all official (government) efforts to promote, protect, and maintain health. It is investigation, promotion, and evaluation of optimal health services for communities. Public health has 2 main paradigms: disease prevention & health promotion. Public health had developed as a reaction to bad health and social conditions and can therefore be looked as a reform movement. The scope of public health covers health problems and disease determinants. It faces many challenges because of its wide scope: demographic, globalization, human will and behavior, scarcity of resources, distributive justice & equity, and ethico-legal issues. The essential public health functions are: prevention of disease and injuries; protection against environmental hazards; promotion of healthy behavior; assurance of quality and accessibility of health services; and provision of personal and community health services. Public health uses the scientific approach to solve problems. However its interventions are often tentative and do not wait for acquisition of perfect information.
1.3.2 DISCIPLINES THAT ASSIST PUBLIC HEALTH
Quantitative disciplines that contribute to public health are epidemiology, biostatistics, and operations research. Economic disciplines deal with resources. Other disciplines that make contributions are sociology, social policy, communication, and management sciences.
1.3.3 PUBLIC HEALTH PROGRAMS and STRATEGIES
The main programs of public health are health policy formulation, disease prevention and health promotion, medical and social services, and environmental protection. The main strategies are surveillance, intervention, and evaluation. Economic interventions have a public health impact.
1.3.4 HISTORY OF PUBLIC HEALTH
In the UK government interest in public health was complacent and adhoc. The cholera epidemics of 1831-2 and 1865-66 as well as the 1842 Chadwick report on sanitary conditions and disease led to an awakening. The Public Health Act was passed in 1848. Legislation 1872 and 1875 established a sanitary authority in every district. Housing laws were passed in the 1870s. In 1872 local authorities were required to appoint a medical officer of health (MOH) for sanitation and disease control. In 1939 local authorities were permitted to provide a wider range of services including MCH. The National Health Service (NHS), established in 1948, became the main provider of services and in 1968 started providing primary health care through community physicians. In the US public health started with port health and quarantines. Higher disease than combat mortality in the American civil war led to an awakening. State and national health boards were set up by 1879. In 1912 the role of the United States Public Health Service (USPHS) was expanded to include investigation of disease and sanitation. In 1912 USPHS started helping states develop public health departments. The 1935 Social Security Act provided funds to states through USPHS for public health. After World War II funding of public health programs was seen as part of the defense policy. The political atmosphere of the 1950s did not support public health but during the Great Society (1960-1980) funding for public health increased and medicaid and medicare bills were passed in 1965. In the health promotion period (1980-1990) health promotion and disease prevention were recognized as priorities and the role of life-style change was emphasized. The major community health problems of the 1990s were: rising health care costs and barriers to access, environmental concerns, life style diseases (cancer, stroke, and injuries), communicable diseases (HIV, Lyme’s disease), abuse of alcohol and drugs
Community Health involves both private and public efforts of individuals, groups, and organizations to promote, protect, and preserve the health of those in the community. It involves community development, community organization, community participation, and community diagnosis. Community health is affected by physical factors (geography, the environment, community size, industrial development, socio-cultural factors (beliefs, traditions, prejudices, economic status, politics, religion, and social norms, individual behavior, and community organization. Whereas public health is government-driven, community health is community-driven. Communities both in pre-history and the historical era undertook measures to protect health. Before the 1980s emphasis was on public health. After that the importance of community health and community participation were recognized.
1.4 INTRODUCTION TO BIO-STATISTICS
1.4.1 BIOSTATISTICS AS A DISCIPLINE
The term statistics can be used to convey three meanings. Applied statistics is defined as techniques of articulating, summarizing, analyzing, and interpreting numerical information. Theoretical statistics deals with probability. Statistics are indices or summary statistics derived from data. Bio-statistics is a branch of applied statistics that is management and analysis of numerical data on people, health, disease, medical treatments and procedures. It includes vital statistics, public health statistics, and demography. Biostatistics is divided into 2 branches: descriptive and analytic. Descriptive statistics deals with collection, organization, presentation, and summarization of data. Analytic statistics deals with drawing logical and objective conclusions about a sample or a population. Biostatistics provides the tools for the summary and digestion of a lot of numerical laboratory and clinical data including critical reading and understanding of scientific literature.
1.4.2 HISTORY OF BIOSTATISTICS
Statistics has grown through successive eras: era of censuses, era of vital statistics, era of descriptive statistics, era of analytic statistics, and era of probability statistics. Ancient civilizations counted their populations for taxation and military purposes. Complete census were first carried out in Sweden in 1749, the US in 1790, Spain in 1798, England & Wales in 1801, and Canada in 1871. John Graunt is considered the founder of vital statistics. He analyzed London mortality data and also laid the foundations of the science of demography. William Farr started the modern procedures of vital statistics registration. Pierre Charles Alexandre Louis (1787-1872) introduced the numerical method in describing medical facts quantitatively.
The 19th century and early 20th centuries witnessed many theoretical developments. Karl Pearson (1857-1936) introduced the mode, mean deviation, coefficient of variation, moments, measures of symmetry and kurtosis, the chi-square, symbol of the null hypothesis (H0), type 1 and type 11 errors, homoscedacity and heteroscedacity, and the concept of partial correlation. Sir Arnold Fisher (1890-1962) introduced variance, methods for small samples, factorial designs, the null hypothesis, random allocation, ANOVA, ANCOVA, relation between regression and ANOVA, and testing significance of the regression coefficient. Karl Pearson and RA Fisher developed contingency table analysis using the chi-square test. Adolph Quetelet developed vital statistics in its modern form and introduced the concept of the mean. KF Gauss (1777-1855) introduced the median, re-discovered the normal distribution that has independently been discovered before Pierre Simon Marquis de Laplace (1749-1827) and in 1733 by Abraham de Moivre (1667-1754). Sir Francis Galton used the term ‘normal’ to refer to the curve, applied statistical techniques to natural phenomena, described correlation and regression. W.F. Sheppard introduced the standard normal curve in 1899. C Kremp published the first table of the area under the curve in 1799. J Neyman developed the concept of confidence intervals in 1934. Charles Spearman (1863-1945) and Maurice George Kendall (1907-1983) introduced non-parametric tests. The bulk of statistical theory is probability theory since modern inferential statistics depends on probability theory. Christian Huygens (1629-1695) was the first one to publish on probability and games. Modern probability theory owes a lot to the pioneers: Blaise Pascal (1623-1662), Pierre de Fermat (1601-1665), Jacques Bernoulli (1654-1705), Nicolas Bernoulli (1687-1759), Abraham de Moivre (1667-1754), Pierre Raymond de Montmart (1678-1719), and Pierre Simon Marquis de Laplace (1749-1827).
1.4.3 LIMITATIONS OF BIOSTATISTICS
An investigator starts with a substantive question that is formulated as a statistical question. Data is then collected and is analyzed to reach a statistical conclusion. The statistical conclusion is used with other knowledge to reach a substantive conclusion.
Statistics has several limitations. It gives statistical and not substantive answers. The statistical conclusion refers to groups and not individuals. It only summarizes but does not interpret data.
Statistics can be misused by selective presentation of desired results. Computation is not an end in itself. It is a tool that can be used well or can be mis-used. A human must have a clear idea of what is required of the computer and must instruct it accordingly. The human must also be able to intelligently interpret the output from the computer. All who tinker with computers must remember the adage ‘rubbish in/rubbish out’.
1.4.5 CAREER OPPORTUNITIES IN BIOSTATISTICS
Biostatistics finds practical applications in quantitative research, administration, and decision-making. Statisticians work in universities, the public sector, and the private sector.
1.5 INTRODUCTION TO COMPUTING
1.5.1 INFORMATION REVOLUTION AND THE COMPUTER AGE:
Data is used for operational, managerial and planning functions. Use of computers is facilitated by routine computerization of operational data. The invention of the computer enabling humans to handle large amounts of data has created an information revolution. New computers can manage (collection & storage) and analyze large amounts of data, a feat that was unthinkable a few years ago. The growth of computational techniques has enabled deeper and more sophisticated analyses. Availability of high speed and efficient computing has encouraged growth in statistical methodology and more sophisticated statistical analysis. This has in turn called for developments in statistical theory that is later translated into newer and more powerful analysis programs.
1.5.2 INFORMATION SYSTEM
An information system has 5 components: people, procedures software, hardware, and data. Database management systems (DBS) create, modify, and access data. Data elements are arrayed to make a record or an observation. Files are made up of several observations. Several files make a database. A data dictionary describes the structure of the data. A relational database is in the form of a table with rows and columns in the form of one-to-one. A hierarchical database is several layers of information in the form of one-to-many. A network database is a many-to-many architecture. A program is a series of instructions that the computer executes. Computer languages are at various levels of sophistication. Machine language and assembly language are in binary code. High level procedural languages are BASIC, Pascal, C, COBOL, and FORTRAN. Problem oriented languages are query languages used in searching databases. Natural language is usual human language that the computer cannot use directly. Ethical issues of privacy, accuracy, data ownership, data access, and security arise due to the large amount of personal information now kept on computers.
1.5.3 HISTORY OF COMPUTING
The Chinese discovered the abacus for making arithmetic computations easier. In 1882 John Shaw Billings invented the Hollerith punched card for processing US census results. The Electronic Numerical Integrator and Calculator (ENIAC) was invented after World War II for ballistic calculations. The Digital computer company developed the first personal computer in 1965. The development of personal computers was the real computer revolution because it led to widespread availability of computers in homes and offices. This increased access of ordinary people to computing.
1.5.4 COMPUTER HARDWARE DEVELOPMENTS:
Computer can be microcomputers, minicomputers, mainframe computers, and supercomputers. Computer hardware consists of input devices, a central processor, output devices, and communication devices such as the modem. The central processing unit (CPU) is the calculating part of the computer. Data is stored as binary units (bits) either 0 or 1. Eight bits make a byte. A KB is 1024 bytes, MB is 106 bytes, GN is 1 billion bytes, and TB is 1 trillion bytes. Data in a computer is stored as files (sequential or random access) that are grouped in directories. RAM keeps data during the processing stage. Long-term memory is the hard disk, floppy disk, or CD-ROM. The modem is used to transfer data from one point to another. Communication channels can be telephone lines, co-axial cables, fiber optic cables, and microwaves. Microwaves operate over short distances and require satellites as relay stations. Computers may be interconnected in a local area network, LAN; a metro area network, MAN; or a wide area network, WAN. Ergonomic designs are used to avoid health problems due to working at computer workstations for long hours.
1.5.5 COMPUTER SOFTWARE:
Software consists of the operating system and the application programs. A statistical package is a collection of programs. There are three types of software: operating systems such as windows, general-purpose programs such as word processors and statistical packages, and futuristic programs. The most popular statistical packages are BMDP, SPSS, Minitab, Censtat, SAS, and GLIM. Epi-info and Egres are specific for epidemiology. Specialized programs may be graphics, communication, multimedia, and futuristic programs. Artificial intelligence is an attempt to simulate human thought and actions used in robotics, expert systems, and virtual reality. Knowledge-based or expert systems are programs that incorporate the human thought or problem-solving processes. Virtual reality, also called artificial reality or virtual environment, is used in entertainment and simulators that train aircraft pilots.