Presented at the Scientific Writing Workshop held at the Kulliyah of Medicine International Islamic University Kuantan Pahang Malaysia on 29th-30th March 2008 by Professor Omar Hasan Kasule Sr. MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Islamic Medicine Institute of Medicine University of Brunei and Visiting Professor of Epidemiology University of Malaya.
1.0 DATA GROUPING
1.1 Data classes
The initial step in preparing a table is data grouping. The objective of grouping is to summarize data for presentation (parsimony) while preserving a complete picture. Some information is inevitably lost by grouping. The suitable number of classes is 10-20. Using very few classes masks details of the distribution. Using too many classes nullifies the objective of parsimony. The desirable characteristics of classes are: mutual exclusiveness, equality of the width of class intervals, and coverage of all the data. Class limits (also called class boundaries) are of 2 types: true and tabulated. True class boundaries are more accurate and may be decimalized. They are however difficult to tabulate. True class limits should conform to data accuracy (decimals & rounding off). The tabulated class limits are usually whole numbers and are an approximation. We sometimes talk about true upper class limit (UCL) and true lower class limit (LCL). The class mid-points are used in drawing line graphs.
1.2 Dichotomy/trichotomy
Data grouping is sometimes achieved by dividing it into 2 groups (dichotomy), 3 groups (trichotomy), and many groups (multichotomy).
1.3 Grouping errors
Grouping error is defined as information loss due to grouping. Grouped data gives less detail than ungrouped data. The bigger the class interval, the bigger the grouping error. The parsimony advantage of data grouping must be considered against the extent of grouping error. Computations on grouped data are usually based on the mid-point. Grouping error becomes serious when the distribution of scores about the mid-point is not uniform.
2.0 DATA TABULATION
2.1 Objective of data tabulation
Tabulation has the objective of presenting and summarizing a lot of data in logical groupings and for 2 or more variables. It allows visual inspection of the data.
2.2 Type of information presented in tables
A table can show the following summaries about data: cell frequency or cell number, cell number as a percentage of the overall total, cell number as a row percentage, cell number as a column percentage, cumulative frequency, cumulative frequency%, relative (proportional) frequency, and relative frequency %.
2.3 Characteristics of an ideal table
An ideal tables is simple, easy to read, and is correctly scaled. The layout of the table should make it easy to read and understand the numerical information. The table must be able to stand on its own i.e. understandable without reference to the text. The table must have a title/heading that should indicate its contents. Labeling must be complete and accurate: title, rows & columns, marginal & grand totals as well as units of measurement. The field labels are in the margins of the table. Numerical data is in the cells that are in the body of the table. Footnotes may be used to explain the table.
2.4 Configurations of tables
A contingency table can be presented in several configurations. The commonest is the 2 x 2 contingency table. Other configurations are the 2 x k table and the r x c table.