Integrated Medical Education Resources: 130501P - OVERVIEW OF DATA MANAGEMENT AND DATA ANALYSIS

Presentation at a Training Program on Biostatistics for physician managers working in Public Health Administration, Qasim Province on May 1, 2013 by Professor Omar Hasan Kasule Sr MB ChB (MUK), MPH (Harvard), DrPH (Harvard). EM: omarkasule@yahoo.com

Data entry

Self-coding or pre-coded questionnaires are preferable. Data is input as text, multiple choice, numeric, date and time, and yes/no responses. In double entry techniques, 2 data entry clerks enter the same data and a check is made by computer on items on which they differ. Data in the computer can be checked manually against the original questionnaire. Interactive data entry enables detection and correction of logical and entry errors immediately.

Data editing

Data editing is the process of correcting data collection and data entry errors. The data is 'cleaned' using logical, statistical, range, and consistency checks. All values are at the same level of precision (number of decimal places) to make computations consistent and decrease rounding off errors. The kappa statistic is used to measure inter-rater agreement. Data editing identifies and corrects errors such as invalid or inconsistent values.

Data validation

Data is validated and its consistency is tested. The main data problems are missing data, coding and entry errors, inconsistencies, irregular patterns, digit preference, out-liers, rounding-off / significant figures, questions with multiple valid responses, and record duplication.

Data transformation

Data transformation is the process of creating new derived variables preliminary to analysis and includes mathematical operations such as division, multiplication, addition, or subtraction; mathematical transformations such as logarithmic, trigonometric, power, and z-transformations.

Preliminary data analysis

Data analysis consists of data summarization, estimation and interpretation. Simple manual inspection of the data is needed before statistical procedures. Preliminary examination consists of looking at tables and graphics. Descriptive statistics are used to detect errors, ascertain the normality of the data, and know the size of cells. Missing values may be imputed or incomplete observations may be eliminated.

Tests for association and effect

Tests for association, effect, or trend involve construction and testing of hypotheses. The tests for association are the t, chi-square, linear correlation, and logistic regression tests or coefficients.

The common effect measures Odds Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are not picked up by association and effect measures. The probability, likelihood, and regression models are used in analysis.

Analytic procedures and computer programs vary for continuous and discrete data, for person-time and count data, for simple and stratified analysis, for univariate, bivariate and multivariate analysis, and for polychotomous outcome variables. Procedures are different for large samples and small samples.