Presented at a Training Workshop on Research Methodology held at the Faculty of Medicine King Fahad Medical City 18th May 2010 by Dr Omar Hasan Kasule Sr MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Bioethics omarkasule@yahoo.com.
An epidemiological investigation proceeds in several stages. It starts by identifying the problem ie a decision has to be made that a public health or medical problem exists. This is followed by description of the extent and distribution of the problem. Hypotheses are then formulated about the causes of the problem using the procedures of the scientific method. Appropriate studies are designed to test the hypotheses. Epidemiological information is sourced from existing data or studies (observational or experimental). An epidemiologic study involves data collection, data analysis, and data interpretation. Biostatistics is the technology of the scientific method that enables sophisticated data analysis and interpretation.
3.0 RESEARCH QUESTIONS & CONCLUSIONS: STATISTICAL vs SUBSTANTIVE:
An investigator starts with a substantive question. This is formulated as a statistical question. Data is then collected and is analyzed to answer the statistical question. The answer to the statistical question is the statistical conclusion. The investigator uses the statistical conclusion and other knowledge available to him to reach a substantive conclusion. Statistics therefore gives statistical and not substantive answers.
A substantive question is the subject matter stated in ordinary language. Technical terminology may or may not be used. The less technical the formulation is, the better to enable statisticians who are not specialists in the subject matter can understand. Care must be taken to make sure that accuracy and exactness are not sacrificed for the sake of simplification.
A statistical question is when the substantive question is stated using statistical language. Since the language of statistics is mathematical, the statistical question is stated as numbers, parameters, relations of equality, and relations of inequality.
A statistical conclusion is the result of mathematical manipulation of parameters or data. Statistical conclusions are made about groups and not individuals. Any inference to the individual is to a hypothetical individual. In other words the statistical conclusion is depersonalized.
A substantive conclusion is the translation of the statistical conclusion back to normal language to answer the substantive question that was posed at the start.
A hypothesis is a statement of belief in something. Unlike other types of beliefs, scientific beliefs are subject to experimental verification. Two hypotheses are always stated for proper scientific investigation: the null and the alternative hypotheses.
The null hypothesis or research hypothesis, H0, states that there is no difference between the two comparison groups and that the apparent difference seen is due to sampling error.
The alternative hypothesis, HA, disagrees with the null hypothesis and states that there is a real difference not explained by sampling error. H0 and HA are complimentary and exhaustive in that between them they cover all the possibilities. HA could be vague. When H0 is rejected, we cannot accept HA we only fail to reject it.
The aim of hypothesis testing is to make a conclusion about H0. The conclusion is in the form of rejecting or not rejecting the hypothesis. If H0 is rejected, HA becomes the new working hypothesis. A hypothesis cannot be proved; you only give an objective measure of probability of its truth
Implications of statistically significant
· H0 is false
· H0 is rejected
· Observations are not compatible with H0
· Observations are not due to sampling variation
· Observations are real/true biological phenomenon
Implications of not statistically significant
· H0 is not false (we do not say true)
· H0 is not rejected
· Observations are compatible with H0
· Observations are due to sampling variation or random errors of measurement.
· Observations are artificial, apparent and not real biological phenomena
Statistical and practical significance
Statistically significant may have no clinical/practical significance/importance. This may be due to (a) other factors being involved and not studied here (b) measurements that are not valid. Clinically important difference may not reach statistical significance due to 2 main reasons: (a) small sample size (b) measurement that are not discriminating enough