Workshop
presented at the Research Data Base Management Workshop organized by the
Directorate of Research Ministry of Health on 26th March by
Professor Omar Hasan Kasule Sr MB ChB (MUK), MPH (Harvad), DrPH (Harvard) EM; omarkasule@yahoo.com
1.0INFORMATION SYSTEMS
1.1 Information revolution and the computer
age:
1.2 Components of the Information system
1.3 Computer hardware developments:
1.4 Computer software:
1.1 Information revolution
and the computer age:
·
Data is used for operational, managerial and
planning functions.
·
Use of computers is facilitated by routine
computerization of operational data. The invention of the computer enabling
humans to handle large amounts of data has created an information revolution.
·
New computers can manage (collection &
storage) and analyze large amounts of data, a feat that was unthinkable a few
years ago. The growth of computational techniques has enabled deeper and more
sophisticated analyses.
·
Availability of high speed and efficient
computing has encouraged growth in statistical methodology and more
sophisticated statistical analysis.
·
This has in turn called for developments in
statistical theory that is later translated into newer and more powerful
analysis programs.
1.2 Components
of the Information system
·
An information system has 5 components: people,
procedures software, hardware, and data. Database management systems (DBS)
create, modify, and access data.
·
Data elements are arrayed to make a record or
an observation. Files are made up of several observations. Several files make a
database. A data dictionary describes the structure of the data. A relational
database is in the form of a table with rows and columns in the form of
one-to-one. A hierarchical database is several layers of information in the
form of one-to-many. A network database is a many-to-many architecture.
·
A program is a series of instructions that the
computer executes. Computer languages are at various levels of sophistication.
Machine language and assembly language are in binary code. High level
procedural languages are BASIC, Pascal, C, COBOL, and FORTRAN. Problem oriented
languages are query languages used in searching databases. Natural language is
usual human language that the computer cannot use directly.
·
Ethical issues of privacy, accuracy, data
ownership, data access, and security arise due to the large amount of personal
information now kept on computers.
1.3 Computer
hardware developments:
·
Computer can be microcomputers, minicomputers,
mainframe computers, and supercomputers. Computer hardware consists of input
devices, a central processor, output devices, and communication devices such as
the modem. The central processing unit (CPU) is the calculating part of the
computer.
·
Data is stored as binary units (bits) either 0
or 1. Eight bits make a byte. A KB is 1024 bytes, MB is 106 bytes,
GN is 1 billion bytes, and TB is 1 trillion bytes. Data in a computer is stored
as files (sequential or random access) that are grouped in directories.
·
RAM keeps data during the processing stage.
Long-term memory is the hard disk, floppy disk, or CD-ROM. The modem is used to
transfer data from one point to another.
·
Communication channels can be telephone lines,
co-axial cables, fiber optic cables, and microwaves. Microwaves operate over
short distances and require satellites
as relay stations.
·
Computers may be interconnected in a local area
network, LAN; a metro area network, MAN; or a wide area network, WAN. Ergonomic
designs are used to avoid health problems due to working at computer
workstations for long hours.
1.4 Computer
software:
·
Software consists of the operating system and
the application programs. A statistical package is a collection of programs.
There are three types of software: operating systems such as windows,
general-purpose programs such as word processors and statistical packages, and
futuristic programs.
·
The most popular statistical packages are BMDP,
SPSS, Minitab, Censtat, SAS, and GLIM. Epi-info and Egres are specific for
epidemiology.
·
Specialized programs may be graphics,
communication, multimedia, and futuristic programs.
·
Artificial intelligence is an attempt to
simulate human thought and actions used in robotics, expert systems, and
virtual reality.
·
Knowledge-based or expert systems are programs
that incorporate the human thought or problem-solving processes.
·
Virtual reality, also called artificial reality
or virtual environment, is used in entertainment and simulators that train
aircraft pilots.
2.0 TERMINOLOGY ABOUT DATA
BASES
2.1
Database system
2.2
Types of database models
2.3 Descriptors of databases 1
2.4 Descriptors of databases 2
2.1
Database system
·
Database system refers collectively to the
database model, database management system, and database
·
A database
model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be
stored, organized, and manipulated. The most popular example of a database
model is the relational model, which uses a table-based format.
·
Data base is an
organized collection of data
·
Database management system (DBMS) is a software system designed to allow the
definition, creation, querying, update, and administration of databases e.g
ORACLE
2.2
Types of database models
·
Hierarchical database model: tree-like each child has only one parent but a child can have
several children 1-to-many. Model not very flexible. It can be in the form of
tables with pointers to another table (fig #1)
·
Relational model most recent and most popular. Problem if a man has 4 wives and
each wife has 5 children we want to show in one table the husband-wife,
father-son, and mother-son relation. Repeated measures of the same person. An
ID acts as a key
·
Others: Graph
model, Multivalue model allows depth in one variable, Object oriented
integrated the data base and the object-based programming language
·
The
hierarchical and network models are called navigational data bases
Figure 1: Hierarchical database model: (one to
many)
(to provide image)
Figure 2: Network database model (many to many)
(to provide image)
Figure 1: Relational database model
ID
|
Var 1
|
Var 2
|
Var 3
|
Var 4
|
Var 5
|
Var 6
|
Var 7
|
Var 8
|
Car 9
|
Observation – file – data bases – data base
system
2.3 Descriptors of databases 1
·
Data warehouses archive data from operational databases used by managers for data
mining and knowledge discovery.
·
.End-user databases is developed for a specific end-user
2.4 Descriptors of databases 2
·
In a hypertext or hypermedia database has data linked to an object e.g. the World Wide Web 10. An in-memory database uses
main memory with a backp in storage memory suitable whererapid communication is
needed
·
Real-time databases process
transactions fast enough for the result to come back and be acted on right
away.
3.0 DATABASE MANAGEMENT SYSTEM
3.1 Functions of DBMS:
3.2 Data views provided by DBMS
3.3 Stages of database design
3.4 Data
base - others
3.1 Functions of DBMS:
·
Data
definition. Defining new data structures for a database, removing data
structures from the database, modifying the structure of existing data.
·
Data
Update. Inserting, modifying, and deleting data.
·
Data
Retrieval. Obtaining information either for end-user queries and reports or for
processing by applications.
·
Data
Administration. Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information if the system fails
3.2 Data
views provided by DBMS
·
The external level defines how each group of end-users sees
the organization of data in the database. A single database can have any number
of views at the external level.
·
The conceptual level unifies the various external views into
a coherent global view. It provides the synthesis of all the external views. It
is out of the scope of the various database end-users, and is rather of
interest to database application developers and database administrators.
·
The internal level (or physical level) is the
internal organization of data inside a DBMS (see Implementation section below).
It is concerned with cost, performance, scalability and other operational
matters. It deals with storage layout of the data, using storage structures
such as indexes to
enhance performance. Occasionally it stores data of individual views (materialized views), computed from generic data, if performance justification exists
for such redundancy. It balances all the external views' performance
requirements, possibly conflicting, in an attempt to optimize overall
performance across all activities.
3.3 Stages
of database design
·
a conceptual data model that reflects the structure of the information
to be held in the database using flow charts.
·
Translate conceptual model into a schema expressed in terms of a
specific database, and database model for the modelling notation used to
express that design.)
·
physical
database design. Which is performance requirements and access
control
3.4 Data base - others
·
Data base storage: binary
encoding encoding
·
Database security:
access control, privileges eg administrator priviledge, encryption, forensic
audit: keep record of those who access
·
Data base migration:
(a) portability from one DBMS to another because of economic reasons or ease or
special need. (b) Must maintain all features of the data
·
Database
building, maintaining, and tuning: Design, Use a DBMS to build the database and
create the user interface, Data base initialization ie put data, Data base
maintenance: changes and tuning for better performance
4.0 MEDICAL DATA BASES
4.1 Medical database
4.2 Cochrane Library
4.3 Other medical databases: Clinical Key by Elesvier 3.. EBSCO 4. EMBASE 5.
4.4 Others: questionnaires,
images, pictures, sounds, videos etc
4.1 MEDLINE
·
Contains medical journal articles: medicine, nursing, pharmacy, dentistry, veterinary medicine, health care, biology, and biochemistry.
4.2 Cochrane library
·
Is a collection of databases in medicine and other healthcare specialties provided by the Cochrane Collaboration and other organizations.
·
The Cochrane Collaboration is an
international network of more than 28,000 dedicated people from over
100 countries: preparing, updating, and promoting the accessibility of Cochrane Reviews (a) over 5,000
so far, published online in the Cochrane
Database of Systematic Reviews, part (b) the largest collection of
records of randomized controlled trials in the world, called CENTRAL, published
as part of The Cochrane Library.
·
At its core is the collection of Cochrane
Reviews, a database of systematic reviews and meta-analyses which summarize and
interpret the results of medical research.