Presented in the Biostatistics module of the Clinical Research Coordinators Course on June 23, 2020 11.00-12.00 by Professor Omar Hasan Kasule MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Bioethics King Fahad Medical City
DATA GROUPING
• Data grouping summarizes data but leads to loss of information due to grouping errors.
• The suitable number of classes is 10-20.
• The bigger the class interval, the bigger the grouping error.
• Classes should be mutually exclusive, of equal width, and cover all the data.
• The upper and lower class limits can be true or approximate.
• The approximate limits are easier to tabulate.
• Data can be dichotomous (2 groups), trichotomous (3 groups) or polychotomous (>3 groups).
DATA TABULATION
• Tabulation summarizes data in logical groupings for easy visual inspection.
• A table shows cell frequency (cell number), cell number as a percentage of the overall total (cell %), cell number as a row percentage (row%), cell number as a column percentage (column %), cumulative frequency, cumulative frequency%, relative (proportional) frequency, and relative frequency %.
• Ideal tables are simple, easy to read, correctly scaled, titled, labeled, self-explanatory, with marginal and overall totals.
• The commonest table is the 2 x 2 contingency table. Other configurations are the 2 x k table and the r x c table
[2 x 2 table], [r x c table]
DATA DIAGRAMS: 1 or more quantitative variables
• Diagrams present data visually. An ideal diagram is self-explanatory, simple, not crowded, of appropriate size, and emphasizes data and not graphics.
• Bar diagrams (bar chart, histogram) the area represents the frequency
• Line graph
• Stem and leaf shows actual frequencies
• Pie chart
• Pictogram
SHAPES OF DISTRIBUTIONS
• Unimodal: one peak
• Bimodal: 2 peaks
• Bell-shaped or Normal
• Positive skew
• Negative skew
• Leptokutosis
• Platykurtosis
• S curve (ogive)
• Reverse J (exponential)
MISLEADING DIAGRAMS
• Poor labeling,
• Inappropriate scaling,
• Omitting the zero origins,
• Presence of outliers,
• Presence of high leverage points,
• Widening and narrowing the scales produce different impressions of the data.
• Double vertical scales can misleadingly be used to show spurious associations.
ASSIGNMENTS
CLASSROOM DATA
We are going to generate a data set about ourselves called the classroom data.
The class leader will prepare the template named CLASS DATA as follows. Click on the Microsoft Excel icon on your computer. You will see a blank Workbook in the top left-hand corner. Click on the workbook and you will see a table with columns and rows. Type the following variable names on row 1 starting from left to right: AGE (in years), GENDER (male, female) REGION of birth (East, North, West, Central, West), ORDER in the family (first, second, third, higher), WEIGHT (in kilograms), HEIGHT (in centimeters), wearing glasses (yes, no), COLOR preference (blue, red, green, yellow), number of BROTHERS, number of SISTERS, type of primary SCHOOL (private, public), type of UNIVERSITY (public, private),
The class leader will send the template to the class WhatsApp group. Each member will have a row in which to type information. For confidentiality, this information could be yours or someone you know who is not in the class. Each class member shall have a copy of the data set. We shall use it in classroom exercises.
CLASSROOM EXERCISES ON DATA PRESENTATION AS FIGURES
- Draw a bar chart of GENDER
- Draw a histogram of AGE
- Draw a pie chart of COLOR, ORDER
CLASSROOM EXERCISES ON DATA PRESENTATION AS TABLES
- Draw a table of GENDER by AGE
- Draw a table of GLASSES by ORDER