Workshop
presented at the Research Data Base Management Workshop organized by the
Directorate of Research Ministry of Health on 26th March by
Professor Omar Hasan Kasule Sr MB ChB (MUK), MPH (Harvad), DrPH (Harvard) EM; omarkasule@yahoo.com
Key
words: Data coding, Data compression, Data encryption, Data mining, Data
modeling, Data processing, Data protection, Data recovery, Data reduction, Data
replication, Data retrieval, Data
storage, Data structures, Data value
Data
·
Data gives rise to information that in turn
gives rise to knowledge.
·
Knowledge leads to understanding. Understanding
leads to wisdom.
·
Data may be univariate if it has only one
variable. It may be bivariate if it has two variables allowing correlation. It
may be multivariate with several variables allowing more sophisticated
analyses.
·
Files may be sequential files, indexed files,
tree structured files, and clustered files.
Data
model
·
A data model is a graphic representation of the
data either as diagrams or charts.
·
The data model reflects the essential features
of an organization.
·
The purpose of a data model is to facilitate
communication between the analyst and the user.
·
The data model also helps create a logical
discipline in database design.
Data
storage
·
A document is stored data in any form: paper,
book, letter, message, image, e-mail, voice, and sound.
·
Some documents are ephemeral but can still be
retrieved for the brief time that they exist and are recoverable.
·
Data is physically stored as bytes. A byte has
8 bits and can therefore represent 28 = 256 characters.
·
ASCII is a machine language that uses only 127
codes (95 character codes and 25 control codes). ANSI is an extension of ASCII
used by Microsoft.
·
Different languages use different numbers of
codes for example Greek uses 219 characters, Cyrillic uses 259 characters,
Arabic uses 196 characters, and Chinese uses 65, 536 characters.
Operations
on data
·
Data compression makes document retrieval
easier because the search is carried out in a smaller space. Character, image,
and sound data can all be compressed; however compression may involve loss of
some data. Data compression facilitates data storage and data retrieval.
·
Data may be formatted in tables of several
types of databases (relational, hierarchical, and network). It may be
unformatted such as images, sound, or electronic monitoring in the hospital.
·
Formatted documents are easier to retrieve.
Data
retrieval 1: by use of queries
·
Document surrogates used in data retrieval are:
identifiers, abstracts, extracts, reviews, indexes, and queries.
·
Queries are short documents used to retrieve
larger documents by matching, mapping, or use of Boolean logic (and, or, but,
not, except etc).
·
Queries may be in natural or probabilistic
language. Fuzzy queries are deliberately not rigid to increase the probability
of retrieval.
Data
retrieval 2: other forms of data retrieval
·
Term extraction: based on low frequency of
important terms
·
Term association: based on terms that normally
occur together
·
Lexical measures: using specialized formulas
·
Trigger phrases: like figure, table, conclusion
·
Synonyms: same meaning
·
Antonyms: opposite meaning
·
Homographs: same spelling but different meaning
·
Homonyms: same sound but different spelling
·
Stemming algorithms help in retrieval by
removing ends of words leaving only the roots
·
Specialized mathematical techniques are used to
assess the effectiveness of data retrieval.
Data
warehousing
·
Data warehousing is a method of extraction of
data from various sources, storing it as historical and integrated data for use
in decision-support systems.
·
Meta data is a term used for definition of data
stored in the data warehouse (i.e. data about data).
Data mining
·
Data mining is the
discovery part of knowledge discovery in data (KDD)
involving knowledge engineering, classification, and problem solving.
·
KDD starts with
selection, cleaning, enrichment, and coding.
·
The products of data
mining are pattern recognition. These patterns are then applied to new
situations in predicting and profiling.
·
Artificial intelligence
(AI), based on machine learning, imbues computers with some creativity and
decision making capabilities using specific algorithms.
Data
replication
· Data
replication is a copy management service that involves copying the data and
also managing the copies. It ensures that all parts of the organization have
access to updated data. It is also an insurance against data loss in case of
computer crashes because there will be an alternative data source.
· Databases
must be designed and configured to facilitate replication. The replication
infrastructure must be in place from the start. Care must be taken to make sure
that replicated data is consistent and in synchrony with the master copy.
· The
process of replication may inadvertently create redundancy in the system. In
synchronous data replication there is no latency in data consistency. All
replicas of the data are the same because of immediate updating.
· In
asynchronous data replication the updating is not immediate and consistency is
loose. Asynchronous replication is easier and cheaper.
How to
use PubMed 1
·
Search by author, journal name, key
words/terms/concepts, specific citation. Employ Boolean relations
How to
use PubMed 2
·
Example of search by key terms: 1. Question: What role does pain
have in sleep disorders? 2. Key concepts: pain, sleep disorders
· Example of search by author: 1. Enter last name plus initials without ounctuations 2. Watson JD
·
Example of search by author and subject:1. citations to articles
written by Bonnie W. Ramsey about gene therapy for cystic fibrosis 2. Enter cystic
fibrosis gene therapy ramsey bw
How to
search the Cochrane database
·
To go
the Cochrane Library: Click http://www.thecochranelibrary.com
·
Retrieve
reviews from the Cochrane Database of systematic reviews: (a) endocrine and
metabolic – diabetic foot (b) cancer – lung cancer chemotherapy
·
Go to the Cochrane Central
Register of Controlled Trials (CENTRAL) and retrieve any trial on (a)
nasopharyngeal carcinoma (b) colon cancer (c) cervical cancer EXERCISE #1: USE OF PUBMED
Health
questionnaires databases (payment)
·
Look for health questionnaire databases
·
Login and look at a demo