Integrated Medical Education Resources: 250401 REVIEW OF THE BIOSTATISTICS COURSE

Presentation prepared for the event on Thursday, 2-3 May 2025 by Prof Omar Hasan Kasule.

SESSION 1: INTRODUCTION

SESSION 2: POPULATIONS, SAMPLES, and DATA SOURCES

Populations (target and study),
Samples (random, convenience, stratified, systematic).
Sample size: why and how? Sample calculation formulas require an input from the researcher
Sources of data: primary (interview, measurement) and secondary (census, routine records)

SESSION 3: STUDY DESIGN

Observational studies: 3 types (cross-sectional, case-control, follow-up/cohort), advantages and disadvantages
Experimental studies: clinical and community randomized studies

SESSION 4: DESCRIPTIVE STATISTICS FOR CONTINUOUS DATA

SESSION 5: DESCRIPTIVE STATISTICS FOR DISCRETE DATA

SESSION 6: PROBABILITY, HYPOTHESES, VARIABLES

Probability: frequentist definition, events (mutually exclusive and independent)
Hypotheses: null (no difference) and alternative. P value <0.05 significant association relationship not due to chance. P value >0.05 no significant association.
Variables: continuous and discrete (categorical)

SESSION 7-8: INFERENCE ON CONTINUOUS VARIABLES

T test (compare 2 groups), F test OR ANOVA (compare 3 or more groups)
Correlation between 2 continuous variables: Pearson vs Spearman coefficients
Correlation coefficient is -1 to +1.
r = 0-0.25 little or no relationship, 0.25-0.50 fair degree of relationship, 0.50-0.75 moderate to good relationship, above 0.75 very good relationship.

SESSION 9-10: INFERENCE FOR DISCRETE DATA

Pearson Chi-square works best for larger samples and compares 2 groups or more than 3 groups
Fisher's exact test works for small samples and requires a lot of computing power
In both cases, we used the p-value to test the hypothesis
Chi-square and Fisher are tests of association. The odds ratio is a test of effect; it tells us how strong the association is

SESSION 11: PARAMETRIC and NON-PARAMETRIC CORRELATION

SESSION 12: MULTIVARIATE REGRESSION ANALYSIS

Linear regression has a continuous dependent/response variable
Logistic regression has a discrete dependent variable (usually 2 or dichotomous)
Multiple regression predicts better because it uses more independent/predictor variables
The regression coefficient of logistic regression can be interpreted as an odds ratio

SESSION 13: REGRESSION MODELS