search this site.

130326P - INTRODUCTION TO RESEARCH DATA BASE MANAGEMENT (Word Slides)

Print Friendly and PDFPrint Friendly



Workshop presented at the Research Data Base Management Workshop organized by the Directorate of Research Ministry of Health on 26th March by Professor Omar Hasan Kasule Sr MB ChB (MUK), MPH (Harvad), DrPH (Harvard) EM; omarkasule@yahoo.com


1.0INFORMATION SYSTEMS
1.1 Information revolution and the computer age:
1.2 Components of the Information system
1.3 Computer hardware developments:
1.4 Computer software:

1.1 Information revolution and the computer age:
·         Data is used for operational, managerial and planning functions.
·         Use of computers is facilitated by routine computerization of operational data. The invention of the computer enabling humans to handle large amounts of data has created an information revolution.
·         New computers can manage (collection & storage) and analyze large amounts of data, a feat that was unthinkable a few years ago. The growth of computational techniques has enabled deeper and more sophisticated analyses.
·         Availability of high speed and efficient computing has encouraged growth in statistical methodology and more sophisticated statistical analysis.
·         This has in turn called for developments in statistical theory that is later translated into newer and more powerful analysis programs.

1.2 Components of the Information system
·         An information system has 5 components: people, procedures software, hardware, and data. Database management systems (DBS) create, modify, and access data.
·         Data elements are arrayed to make a record or an observation. Files are made up of several observations. Several files make a database. A data dictionary describes the structure of the data. A relational database is in the form of a table with rows and columns in the form of one-to-one. A hierarchical database is several layers of information in the form of one-to-many. A network database is a many-to-many architecture.
·         A program is a series of instructions that the computer executes. Computer languages are at various levels of sophistication. Machine language and assembly language are in binary code. High level procedural languages are BASIC, Pascal, C, COBOL, and FORTRAN. Problem oriented languages are query languages used in searching databases. Natural language is usual human language that the computer cannot use directly.
·         Ethical issues of privacy, accuracy, data ownership, data access, and security arise due to the large amount of personal information now kept on computers.

1.3 Computer hardware developments:
·         Computer can be microcomputers, minicomputers, mainframe computers, and supercomputers. Computer hardware consists of input devices, a central processor, output devices, and communication devices such as the modem. The central processing unit (CPU) is the calculating part of the computer.
·         Data is stored as binary units (bits) either 0 or 1. Eight bits make a byte. A KB is 1024 bytes, MB is 106 bytes, GN is 1 billion bytes, and TB is 1 trillion bytes. Data in a computer is stored as files (sequential or random access) that are grouped in directories.
·         RAM keeps data during the processing stage. Long-term memory is the hard disk, floppy disk, or CD-ROM. The modem is used to transfer data from one point to another.
·         Communication channels can be telephone lines, co-axial cables, fiber optic cables, and microwaves. Microwaves operate over short distances and require  satellites as relay stations.
·         Computers may be interconnected in a local area network, LAN; a metro area network, MAN; or a wide area network, WAN. Ergonomic designs are used to avoid health problems due to working at computer workstations for long hours.

1.4 Computer software:
·         Software consists of the operating system and the application programs. A statistical package is a collection of programs. There are three types of software: operating systems such as windows, general-purpose programs such as word processors and statistical packages, and futuristic programs.
·         The most popular statistical packages are BMDP, SPSS, Minitab, Censtat, SAS, and GLIM. Epi-info and Egres are specific for epidemiology.
·         Specialized programs may be graphics, communication, multimedia, and futuristic programs.
·         Artificial intelligence is an attempt to simulate human thought and actions used in robotics, expert systems, and virtual reality.
·         Knowledge-based or expert systems are programs that incorporate the human thought or problem-solving processes.
·         Virtual reality, also called artificial reality or virtual environment, is used in entertainment and simulators that train aircraft pilots.

2.0 TERMINOLOGY ABOUT DATA BASES
2.1 Database system
2.2 Types of database models
2.3 Descriptors of databases 1
2.4 Descriptors of databases 2

2.1 Database system
·         Database system refers collectively to the database model, database management system, and database
·         A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.
·         Data base is an organized collection of data
·         Database management system (DBMS) is a software system designed to allow the definition, creation, querying, update, and administration of databases e.g ORACLE

2.2 Types of database models
·         Hierarchical database model: tree-like each child has only one parent but a child can have several children 1-to-many. Model not very flexible. It can be in the form of tables with pointers to another table (fig #1) 
·         Network model allows many to many relations one to many
·         Relational model most recent and most popular. Problem if a man has 4 wives and each wife has 5 children we want to show in one table the husband-wife, father-son, and mother-son relation. Repeated measures of the same person. An ID acts as a key
·         Others: Graph model, Multivalue model allows depth in one variable, Object oriented integrated the data base and the object-based programming language
·         The hierarchical and network models are called navigational data bases


Figure 1: Hierarchical database model: (one to many)


(to provide image)


Figure 2: Network database model (many to many)

(to provide image)


Figure 1: Relational database model

ID
Var 1
Var 2
Var 3
Var 4
Var 5
Var 6
Var 7
Var 8
Car 9































Observation – file – data bases – data base system


2.3 Descriptors of databases  1
·         An active database in actively collecting data such as security monitor, statistics etc
·         Data warehouses archive data from operational databases used by managers for data mining and knowledge discovery.
·         A distributed database is one found on more than one computer
·         A document-oriented database is a collection of documents
·         An embedded database system is data embedded within a software
·         .End-user databases is developed for a specific end-user
·         A federated database system is several distinct databases handled by one federated DBMS

2.4 Descriptors of databases  2
·          In a hypertext or hypermedia database has data linked to an object e.g. the World Wide Web 10. An in-memory database uses main memory with a backp in storage memory suitable whererapid communication is needed
·         A knowledge base is used for
·         A mobile database can be carried on or synchronized from a mobile computing device
·         Operational databases consists of routine data generated by normal organizational activities
·         Probabilistic databases employ fuzzy logic to draw inferences from imprecise data.
·         Real-time databases process transactions fast enough for the result to come back and be acted on right away.

3.0 DATABASE MANAGEMENT SYSTEM
3.1 Functions of DBMS:
3.2 Data views provided by DBMS
3.3 Stages of database design
3.4 Data base - others

3.1 Functions of DBMS:
·         Data definition. Defining new data structures for a database, removing data structures from the database, modifying the structure of existing data.
·         Data Update. Inserting, modifying, and deleting data.
·         Data Retrieval. Obtaining information either for end-user queries and reports or for processing by applications.
·         Data Administration. Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information if the system fails

3.2 Data views provided by DBMS
·         The external level defines how each group of end-users sees the organization of data in the database. A single database can have any number of views at the external level.
·         The conceptual level unifies the various external views into a coherent global view. It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators.
·         The internal level (or physical level) is the internal organization of data inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the data, using storage structures such as indexes to enhance performance. Occasionally it stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in an attempt to optimize overall performance across all activities.

3.3 Stages of database design
·         a conceptual data model that reflects the structure of the information to be held in the database using flow charts.
·         Translate conceptual model into a schema expressed in terms of a specific database, and database model for the modelling notation used to express that design.)
·         physical database design. Which is performance requirements and access control

3.4 Data base - others
·         Data base storage: binary encoding encoding
·         Database security: access control, privileges eg administrator priviledge, encryption, forensic audit: keep record of those who access
·         Data base migration: (a) portability from one DBMS to another because of economic reasons or ease or special need. (b) Must maintain all features of the data
·         Database building, maintaining, and tuning: Design, Use a DBMS to build the database and create the user interface, Data base initialization ie put data, Data base maintenance: changes and tuning for better performance

4.0 MEDICAL DATA BASES
4.1 Medical database
4.2 Cochrane Library
4.3 Other medical databases: Clinical Key by Elesvier 3.. EBSCO  4. EMBASE 5.
4.4 Others: questionnaires, images, pictures, sounds, videos etc


4.1 MEDLINE
·         Contains medical journal articles: medicine, nursing, pharmacy, dentistry, veterinary medicine,  health care, biology, and biochemistry.
·         Compiled by the United States National Library of Medicine (NLM),
·         MEDLINE is freely available on the Internet and searchable via PubMed

4.2 Cochrane library
·         Named after Archie Cochrane
·         Is a collection of databases in medicine and other healthcare specialties provided by the Cochrane Collaboration and other organizations.
·         The Cochrane Collaboration is an international network of more than 28,000 dedicated people from over 100 countries: preparing, updating, and promoting the accessibility of Cochrane Reviews (a) over 5,000 so far, published online in the Cochrane Database of Systematic Reviews, part (b) the largest collection of records of randomized controlled trials in the world, called CENTRAL, published as part of The Cochrane Library.
·         At its core is the collection of Cochrane Reviews, a database of systematic reviews and meta-analyses which summarize and interpret the results of medical research.