search this site.

250401 REVIEW OF THE BIOSTATISTICS COURSE

Print Friendly and PDFPrint Friendly

Presentation prepared for the event on Thursday, 2-3 May 2025 by Prof Omar Hasan Kasule.

SESSION 1: INTRODUCTION

  • Definition of biostatistics
  • Substantive and statistical questions and answers
 
SESSION 2: POPULATIONS, SAMPLES, and DATA SOURCES
  • Populations (target and study), 
  • Samples (random, convenience, stratified, systematic). 
  • Sample size: why and how? Sample calculation formulas require an input from the researcher
  • Sources of data: primary (interview, measurement) and secondary (census, routine records)

SESSION 3: STUDY DESIGN
  • Observational studies: 3 types (cross-sectional, case-control, follow-up/cohort), advantages and disadvantages
  • Experimental studies: clinical and community randomized studies

SESSION 4: DESCRIPTIVE STATISTICS FOR CONTINUOUS DATA
  • Measures of central location (averages): mean, median, mode
  • Measures of dispersion: range, min-max, variance, standard deviation
  • Charts: bar diagrams (bar chart and histogram, pie chart, stem and leaf

SESSION 5: DESCRIPTIVE STATISTICS FOR DISCRETE DATA
  • Frequency count, percentages, relative frequency (frequency%, %)
  • Proportion, p and variance of a proportion is p(1-p).
  • Rates, hazards, and ratios

SESSION 6: PROBABILITY, HYPOTHESES, VARIABLES
  • Probability: frequentist definition, events (mutually exclusive and independent)
  • Hypotheses: null (no difference) and alternative. P value <0.05 significant association relationship not due to chance. P value >0.05 no significant association.
  • Variables: continuous and discrete (categorical)

SESSION 7-8: INFERENCE ON CONTINUOUS VARIABLES
  • T test (compare 2 groups), F test OR ANOVA (compare 3 or more groups)
  • Correlation between 2 continuous variables: Pearson vs Spearman coefficients
  • Correlation coefficient is -1 to +1.
  • r = 0-0.25 little or no relationship, 0.25-0.50 fair degree of relationship, 0.50-0.75 moderate to good relationship, above 0.75 very good relationship.

SESSION 9-10: INFERENCE FOR DISCRETE DATA

  • Pearson Chi-square works best for larger samples and compares 2 groups or more than 3 groups
  • Fisher's exact test works for small samples and requires a lot of computing power
  • In both cases, we used the p-value to test the hypothesis
  • Chi-square and Fisher are tests of association. The odds ratio is a test of effect; it tells us how strong the association is

SESSION 11: PARAMETRIC and NON-PARAMETRIC CORRELATION
  • Pearson is for larger normally distributed samples
  • Spearman is for smaller samples not normally distributed

SESSION 12: MULTIVARIATE REGRESSION ANALYSIS
  • Linear regression has a continuous dependent/response variable
  • Logistic regression has a discrete dependent variable (usually 2 or dichotomous)
  • Multiple regression predicts better because it uses more independent/predictor variables
  • The regression coefficient of logistic regression can be interpreted as an odds ratio

SESSION 13: REGRESSION MODELS
  • Fitting models: forward, backward, and stepwise. 
  • P-value indicates the best-fitting model
  • Validation of models by trying on another sample or part of the sample
  • Dealing with missing data: drop subjects, drop variables, impute