Presentation at a Training
Program on Biostatistics for physician managers working in Public Health
Administration, Qasim Province on May 1, 2013 by Professor Omar Hasan Kasule Sr
MB ChB (MUK), MPH (Harvard), DrPH (Harvard). EM: omarkasule@yahoo.com
Data entry
Self-coding
or pre-coded questionnaires are preferable. Data is input as text, multiple
choice, numeric, date and time, and yes/no responses. In double entry
techniques, 2 data entry clerks enter the same data and a check is made by
computer on items on which they differ. Data in the computer can be checked
manually against the original questionnaire. Interactive data entry enables
detection and correction of logical and entry errors immediately.
Data editing
Data
editing is the process of correcting data collection and data entry errors. The
data is 'cleaned' using logical, statistical, range, and consistency checks.
All values are at the same level of precision (number of decimal places) to
make computations consistent and decrease rounding off errors. The kappa
statistic is used to measure inter-rater agreement. Data editing identifies and corrects errors
such as invalid or inconsistent values.
Data validation
Data is validated and its
consistency is tested. The
main data problems are missing data, coding
and entry errors, inconsistencies, irregular patterns, digit preference,
out-liers, rounding-off / significant figures, questions with multiple valid
responses, and record duplication.
Data
transformation
Data transformation is the process of creating new derived variables preliminary
to analysis and includes mathematical operations such as division,
multiplication, addition, or subtraction; mathematical transformations such as
logarithmic, trigonometric, power, and z-transformations.
Preliminary data analysis
Data analysis consists of
data summarization, estimation and interpretation. Simple manual inspection of the data is
needed before statistical procedures. Preliminary examination consists of looking at
tables and graphics. Descriptive statistics are used to detect errors,
ascertain the normality of the data, and know the size of cells. Missing values
may be imputed or incomplete observations may be eliminated.
Tests for association and effect
Tests
for association, effect, or trend involve construction and testing of
hypotheses. The tests for association are the t, chi-square, linear
correlation, and logistic regression tests or coefficients.
The
common effect measures Odds Ratio, Risk Ratio, Rate difference. Measures of
trend can discover relationships that are not picked up by association and
effect measures. The probability, likelihood, and regression models are used in
analysis.
Analytic
procedures and computer programs vary for continuous and discrete data, for
person-time and count data, for simple and stratified analysis, for univariate,
bivariate and multivariate analysis, and for polychotomous outcome variables.
Procedures are different for large samples and small samples.