Integrated Medical Education Resources: 200623P - DATA TABULATION AND DATA DIAGRAMS USING EXCEL

Presented in the Biostatistics module of the Clinical Research Coordinators Course on June 23, 2020 11.00-12.00 by Professor Omar Hasan Kasule MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Bioethics King Fahad Medical City

DATA GROUPING

• Data grouping summarizes data but leads to loss of information due to grouping errors.

• The suitable number of classes is 10-20.

• The bigger the class interval, the bigger the grouping error.

• Classes should be mutually exclusive, of equal width, and cover all the data.

• The upper and lower class limits can be true or approximate.

• The approximate limits are easier to tabulate.

• Data can be dichotomous (2 groups), trichotomous (3 groups) or polychotomous (>3 groups).

DATA TABULATION

• Tabulation summarizes data in logical groupings for easy visual inspection.

• A table shows cell frequency (cell number), cell number as a percentage of the overall total (cell %), cell number as a row percentage (row%), cell number as a column percentage (column %), cumulative frequency, cumulative frequency%, relative (proportional) frequency, and relative frequency %.

• Ideal tables are simple, easy to read, correctly scaled, titled, labeled, self-explanatory, with marginal and overall totals.

• The commonest table is the 2 x 2 contingency table. Other configurations are the 2 x k table and the r x c table

[2 x 2 table], [r x c table]

DATA DIAGRAMS: 1 or more quantitative variables

• Diagrams present data visually. An ideal diagram is self-explanatory, simple, not crowded, of appropriate size, and emphasizes data and not graphics.

• Bar diagrams (bar chart, histogram) the area represents the frequency

• Line graph

• Stem and leaf shows actual frequencies

• Pie chart

• Pictogram

SHAPES OF DISTRIBUTIONS

• Unimodal: one peak

• Bimodal: 2 peaks

• Bell-shaped or Normal

• Positive skew

• Negative skew

• Leptokutosis

• Platykurtosis

• S curve (ogive)

• Reverse J (exponential)

MISLEADING DIAGRAMS

• Poor labeling,

• Inappropriate scaling,

• Omitting the zero origins,

• Presence of outliers,

• Presence of high leverage points,

• Widening and narrowing the scales produce different impressions of the data.

• Double vertical scales can misleadingly be used to show spurious associations.

ASSIGNMENTS

CLASSROOM DATA

We are going to generate a data set about ourselves called the classroom data.

The class leader will prepare the template named CLASS DATA as follows. Click on the Microsoft Excel icon on your computer. You will see a blank Workbook in the top left-hand corner. Click on the workbook and you will see a table with columns and rows. Type the following variable names on row 1 starting from left to right: AGE (in years), GENDER (male, female) REGION of birth (East, North, West, Central, West), ORDER in the family (first, second, third, higher), WEIGHT (in kilograms), HEIGHT (in centimeters), wearing glasses (yes, no), COLOR preference (blue, red, green, yellow), number of BROTHERS, number of SISTERS, type of primary SCHOOL (private, public), type of UNIVERSITY (public, private),

The class leader will send the template to the class WhatsApp group. Each member will have a row in which to type information. For confidentiality, this information could be yours or someone you know who is not in the class. Each class member shall have a copy of the data set. We shall use it in classroom exercises.

CLASSROOM EXERCISES ON DATA PRESENTATION AS FIGURES

Draw a bar chart of GENDER
Draw a histogram of AGE
Draw a pie chart of COLOR, ORDER

CLASSROOM EXERCISES ON DATA PRESENTATION AS TABLES

Draw a table of GENDER by AGE
Draw a table of GLASSES by ORDER