Presentation prepared for the event on Thursday, 2-3 May 2025 by Prof Omar Hasan Kasule.
- Definition of biostatistics
- Substantive and statistical questions and answers
SESSION 2: POPULATIONS, SAMPLES, and DATA SOURCES
- Populations (target and study),
- Samples (random, convenience, stratified, systematic).
- Sample size: why and how? Sample calculation formulas require an input from the researcher
- Sources of data: primary (interview, measurement) and secondary (census, routine records)
SESSION 3: STUDY DESIGN
- Observational studies: 3 types (cross-sectional, case-control, follow-up/cohort), advantages and disadvantages
- Experimental studies: clinical and community randomized studies
SESSION 4: DESCRIPTIVE STATISTICS FOR CONTINUOUS DATA
- Measures of central location (averages): mean, median, mode
- Measures of dispersion: range, min-max, variance, standard deviation
- Charts: bar diagrams (bar chart and histogram, pie chart, stem and leaf
SESSION 5: DESCRIPTIVE STATISTICS FOR DISCRETE DATA
- Frequency count, percentages, relative frequency (frequency%, %)
- Proportion, p and variance of a proportion is p(1-p).
- Rates, hazards, and ratios
SESSION 6: PROBABILITY, HYPOTHESES, VARIABLES
- Probability: frequentist definition, events (mutually exclusive and independent)
- Hypotheses: null (no difference) and alternative. P value <0.05 significant association relationship not due to chance. P value >0.05 no significant association.
- Variables: continuous and discrete (categorical)
SESSION 7-8: INFERENCE ON CONTINUOUS VARIABLES
- T test (compare 2 groups), F test OR ANOVA (compare 3 or more groups)
- Correlation between 2 continuous variables: Pearson vs Spearman coefficients
- Correlation coefficient is -1 to +1.
- r = 0-0.25 little or no relationship, 0.25-0.50 fair degree of relationship, 0.50-0.75 moderate to good relationship, above 0.75 very good relationship.
- Pearson Chi-square works best for larger samples and compares 2 groups or more than 3 groups
- Fisher's exact test works for small samples and requires a lot of computing power
- In both cases, we used the p-value to test the hypothesis
- Chi-square and Fisher are tests of association. The odds ratio is a test of effect; it tells us how strong the association is
SESSION 11: PARAMETRIC and NON-PARAMETRIC CORRELATION
- Pearson is for larger normally distributed samples
- Spearman is for smaller samples not normally distributed
SESSION 12: MULTIVARIATE REGRESSION ANALYSIS
- Linear regression has a continuous dependent/response variable
- Logistic regression has a discrete dependent variable (usually 2 or dichotomous)
- Multiple regression predicts better because it uses more independent/predictor variables
- The regression coefficient of logistic regression can be interpreted as an odds ratio
SESSION 13: REGRESSION MODELS
- Fitting models: forward, backward, and stepwise.
- P-value indicates the best-fitting model
- Validation of models by trying on another sample or part of the sample
- Dealing with missing data: drop subjects, drop variables, impute