Workshop
presented at the Research Data Base Management Workshop organized by the
Directorate of Research Ministry of Health on 26th March by
Professor Omar Hasan KasuleSr MB ChB (MUK), MPH (Harvad), DrPH (Harvard) EM; omarkasule@yahoo.com
1.0 INFOIRMATION SYSTEM
1.1 Information revolution
and the computer age:
Data is
used for operational, managerial and planning functions. Use of computers is
facilitated by routine computerization of operational data. The invention of
the computer enabling humans to handle large amounts of data has created an
information revolution. New computers can manage (collection & storage) and
analyze large amounts of data, a feat that was unthinkable a few years ago. The
growth of computational techniques has enabled deeper and more sophisticated
analyses. Availability of high speed and efficient computing has encouraged
growth in statistical methodology and more sophisticated statistical analysis.
This has in turn called for developments in statistical theory that is later
translated into newer and more powerful analysis programs.
1.2 Components
of the Information system
An
information system has 5 components: people, procedures software, hardware, and
data. Database management systems (DBS) create, modify, and access data. Data
elements are arrayed to make a record or an observation. Files are made up of
several observations. Several files make a database. A data dictionary
describes the structure of the data. A relational database is in the form of a
table with rows and columns in the form of one-to-one. A hierarchical database
is several layers of information in the form of one-to-many. A network database
is a many-to-many architecture. A program is a series of instructions that the
computer executes. Computer languages are at various levels of sophistication.
Machine language and assembly language are in binary code. High level
procedural languages are BASIC, Pascal, C, COBOL, and FORTRAN. Problem oriented
languages are query languages used in searching databases. Natural language is
usual human language that the computer cannot use directly. Ethical issues of
privacy, accuracy, data ownership, data access, and security arise due to the
large amount of personal information now kept on computers.
1.3 Computer
hardware developments:
Computer
can be microcomputers, minicomputers, mainframe computers, and supercomputers.
Computer hardware consists of input devices, a central processor, output
devices, and communication devices such as the modem. The central processing
unit (CPU) is the calculating part of the computer. Data is stored as binary
units (bits) either 0 or 1. Eight bits make a byte. A KB is 1024 bytes, MB is
106 bytes, GN is 1 billion bytes, and TB is 1 trillion bytes. Data
in a computer is stored as files (sequential or random access) that are grouped
in directories. RAM keeps data during the processing stage. Long-term memory is
the hard disk, floppy disk, or CD-ROM. The modem is used to transfer data from
one point to another. Communication channels can be telephone lines, co-axial
cables, fiber optic cables, and microwaves. Microwaves operate over short
distances and require satellites as relay stations. Computers may be
interconnected in a local area network, LAN; a metro area network, MAN; or a
wide area network, WAN. Ergonomic designs are used to avoid health problems due
to working at computer workstations for long hours.
1.4 Computer
software:
Software
consists of the operating system and the application programs. A statistical
package is a collection of programs. There are three types of software:
operating systems such as windows, general-purpose programs such as word
processors and statistical packages, and futuristic programs. The most popular
statistical packages are BMDP, SPSS, Minitab, Censtat, SAS, and GLIM. Epi-info
and Egres are specific for epidemiology. Specialized programs may be graphics,
communication, multimedia, and futuristic programs. Artificial intelligence is
an attempt to simulate human thought and actions used in robotics, expert
systems, and virtual reality. Knowledge-based or expert systems are programs
that incorporate the human thought or problem-solving processes. Virtual
reality, also called artificial reality or virtual environment, is used in
entertainment and simulators that train aircraft pilots.
2.0 TERMINOLOGY ABOUT DATA BASES
2.1 Database system
·
Database
system refers collectively to the database model, database management system,
and database
·
A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated.
The most popular example of a database model is the relational model, which uses a table-based format.
·
Data base is an
organized collection of data
·
Database
management system (DBMS) is a software system
designed to allow the definition, creation, querying, update, and
administration of databasese.g ORACLE
2.2 Types of database models
·
Hierarchical database model: tree-like each child has only one parent but
a child can have several children 1-to-many. Model not very flexible. It can be
in the form of tables with pointers to another table (fig #1)
·
Relational model most recent and most popular. Problem if a man has 4
wives and each wife has 5 children we want to show in one table the
husband-wife, father-son, and mother-son relation. Repeated measures of the
same person. An ID acts as a key
·
Others: Graph model, Multivalue model allows
depth in one variable, Object oriented integrated the data base and the
object-based programming language
·
The hierarchical and network models are called
navigational data bases
2.3
Descriptors of databases
·
Data warehouses archive data from operational databases used by managers for data
mining and knowledge discovery.
·
.End-user databasesis developed for a specific end-user
·
In a hypertext or hypermedia database has data linked to an object e.g.
theWorld Wide Web10. An in-memory databaseuses main memory with a backp in storage
memory suitable whererapid communication is needed
·
Real-time databases process transactions fast enough for the
result to come back and be acted on right away.
Functions
of DBMS:
·
Data definition. Defining new data structures
for a database, removing data structures from the database, modifying the
structure of existing data.
·
Data Update. Inserting, modifying, and
deleting data.
·
Data Retrieval. Obtaining information either
for end-user queries and reports or for processing by applications.
·
Data Administration. Registering and
monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information if the system fails
Data
views provided by DBMS
·
The external level
defines how each group of end-users sees the organization of data in the
database. A single database can have any number of views at the external level.
·
The conceptual level
unifies the various external views into a coherent global view. It provides the
synthesis of all the external views. It is out of the scope of the various
database end-users, and is rather of interest to database application
developers and database administrators.
·
The internal level
(or physical level) is the internal organization of data inside a DBMS
(see Implementation section below). It is concerned with cost, performance,
scalability and other operational matters. It deals with storage layout of the
data, using storage structures such as indexes to enhance performance.
Occasionally it stores data of individual views (materialized
views), computed from generic data, if performance
justification exists for such redundancy. It balances all the external views'
performance requirements, possibly conflicting, in an attempt to optimize
overall performance across all activities.
Stages
of database design
·
A conceptual
data model that reflects the structure of the information to be
held in the database using flow charts.
·
Translate conceptual model
into a schema expressed in terms of a specific database, and database model
for the modelling notation used to express that design.)
·
physical database design. Which is performance requirements and access
control
Data base storage: binary
encoding encoding
Database security:
·
access control
·
privileges eg administrator privileges
·
encryption
·
forensic audit: keep record of those who access
Data base migration
·
portability from one
DBMS to another because of economic reasons or ease or special need.
·
Must maintain all
features of the data
Database
building, maintaining, and tuning:
·
Design
·
Use a DBMS to build the database and create the user
interface
·
Data base initialization ie put data
·
Data base maintenance: changes and tuning for better
performance
Medical
database
·
MEDLINE (Medical Literature Analysis and Retrieval System Online)
contains medical journal articles: medicine, nursing, pharmacy, dentistry, veterinary
medicine, health care. biology, and biochemistry. Compiled by the United
States National Library of Medicine (NLM), MEDLINE is freely available on the Internet and searchable via PubMed
·
The Cochrane
Library (named after Archie Cochrane) is a
collection of databases in medicine and
other healthcare
specialties provided by the Cochrane
Collaboration and other organizations.
At its core is the collection of Cochrane Reviews, a database of systematic reviews and meta-analyses which
summarize and interpret the results of medical research.
·
Other databases: Clinical Key by
Elesvier 3. EBSCO 4. EMBASE 5.