search this site.

200623P - DATA TABULATION AND DATA DIAGRAMS USING EXCEL

Print Friendly and PDFPrint Friendly

Presented in the Biostatistics module of the Clinical Research Coordinators Course on June 23, 2020 11.00-12.00 by Professor Omar Hasan Kasule MB ChB (MUK), MPH (Harvard), DrPH (Harvard)  Professor of Epidemiology and Bioethics King Fahad Medical City


DATA GROUPING

Data grouping summarizes data but leads to loss of information due to grouping errors. 

The suitable number of classes is 10-20. 

The bigger the class interval, the bigger the grouping error. 

Classes should be mutually exclusive, of equal width, and cover all the data. 

The upper and lower class limits can be true or approximate. 

The approximate limits are easier to tabulate. 

Data can be dichotomous (2 groups), trichotomous (3 groups) or polychotomous (>3 groups). 


DATA TABULATION

Tabulation summarizes data in logical groupings for easy visual inspection. 

A table shows cell frequency (cell number), cell number as a percentage of the overall total (cell %), cell number as a row percentage (row%), cell number as a column percentage (column %), cumulative frequency, cumulative frequency%, relative (proportional) frequency, and relative frequency %. 

Ideal tables are simple, easy to read, correctly scaled, titled, labeled, self-explanatory, with marginal and overall totals. 

The commonest table is the 2 x 2 contingency table. Other configurations are the 2 x k table and the r x c table

[2 x 2 table], [r x c table]


DATA DIAGRAMS: 1 or more quantitative variables

Diagrams present data visually. An ideal diagram is self-explanatory, simple, not crowded, of appropriate size, and emphasizes data and not graphics. 

Bar diagrams (bar chart, histogram) the area represents the frequency

Line graph

Stem and leaf shows actual frequencies

Pie chart

Pictogram


SHAPES OF DISTRIBUTIONS

Unimodal: one peak

Bimodal: 2 peaks

Bell-shaped or Normal

Positive skew

Negative skew

Leptokutosis

Platykurtosis

S curve (ogive)

Reverse J (exponential)


MISLEADING DIAGRAMS

Poor labeling, 

Inappropriate scaling, 

Omitting the zero origins, 

Presence of outliers, 

Presence of high leverage points,

Widening and narrowing the scales produce different impressions of the data. 

Double vertical scales can misleadingly be used to show spurious associations. 

 

ASSIGNMENTS

CLASSROOM DATA

We are going to generate a data set about ourselves called the classroom data. 


The class leader will prepare the template named CLASS DATA as follows. Click on the Microsoft Excel icon on your computer. You will see a blank Workbook in the top left-hand corner. Click on the workbook and you will see a table with columns and rows.  Type the following variable names on row 1 starting from left to right: AGE (in years), GENDER (male, female) REGION of birth (East, North, West, Central, West), ORDER in the family (first, second, third, higher), WEIGHT (in kilograms), HEIGHT (in centimeters), wearing glasses (yes, no), COLOR preference (blue, red, green, yellow), number of BROTHERS, number of SISTERS, type of primary SCHOOL (private, public), type of UNIVERSITY (public, private),


The class leader will send the template to the class WhatsApp group. Each member will have a row in which to type information. For confidentiality, this information could be yours or someone you know who is not in the class. Each class member shall have a copy of the data set. We shall use it in classroom exercises.


CLASSROOM EXERCISES ON DATA PRESENTATION AS FIGURES

  • Draw a bar chart of GENDER
  • Draw a histogram of AGE
  • Draw a pie chart of COLOR, ORDER


CLASSROOM EXERCISES ON DATA PRESENTATION AS TABLES

  • Draw a table of GENDER by AGE
  • Draw a table of GLASSES by ORDER