Presented at a workshop on evidence-based decision making organized by the Ministry of Health Kingdom of Saudi Arabia Riyadh 24-26 April 2010 by Professor Omar Hasan Kasule MB ChB (MUK), MPH (Harvard), DrPH (Harvard) Professor of Epidemiology and Bioethics Faculty of Medicine King Fahd Medical College
1.0 INTRODUCTION
Data gives rise to information that in turn gives rise to knowledge. Knowledge leads to understanding. Understanding leads to wisdom. A document is stored data in any form: paper, book, letter, message, image, e-mail, voice, and sound. Some documents are ephemeral but can still be retrieved for the brief time that they exist and are recoverable. Data for public health decisions is
2.0 DATA SOURCES
A document is stored data in any form: paper, book, letter, message, image, e-mail, voice, and sound. Documents of medical importance are usually journal articles, books, technical reports, or theses. The sources of on-line documents are Medline / pubmed, on-line journals, on-line books, on-line technical reports, on-line theses and dissertations.
3.0 DATA RETRIEVAL
Retrieval technology for formatted character documents is now quite sophisticated. It uses matching, mapping, or use of Boolean logic (AND, OR, NOT). In matching, the most common form of retrieval, the query is matched to the document being sought after determining what terms or expressions are important or significant. The search can be limited by subject matter, language, type of publication, and year of publication.
Document surrogates used in data retrieval are: identifiers, abstracts, extracts, reviews, indexes, and queries. Queries are short documents used to retrieve larger documents by matching, mapping, or use of Boolean logic (and, or, but). Queries may in natural or probabilistic language. Fuzzy queries are deliberately not rigid to increase the probability of retrieval.
Other forms of data retrieval are term extraction (based on low frequency of important terms), term association (based on terms that normally occur together), lexical measures (using specialized formulas), trigger phrases (like figure, table, conclusion), synonyms (same meaning), antonyms (opposite meaning), homographs (same spelling but different meaning), and homonyms (same sound but different spelling). Stemming algorithms help in retrieval by removing ends of words leaving only the roots. Specialized mathematical techniques are used to assess the effectiveness of data retrieval.
4.0 DATA WAREHOUSING
Data warehousing is a method of extraction of data from various sources, storing it as historical and integrated data for use in decision-support systems. Meta data is a term used for definition of data stored in the data warehouse (i.e. data about data). A data model is a graphic representation of the data either as diagrams or charts. The data model reflects the essential features of an organization. The purpose of a data model is to facilitate communication between the analyst and the user. It also helps create a logical discipline in database design.
5.0 DATA MINING
Data mining is the discovery part of knowledge discovery in data (KDD) involving knowledge engineering, classification, and problem solving. KDD starts with selection, cleaning, enrichment, and coding. The products of data mining are pattern recognition. These patterns are then applied to new situations in predicting and profiling. Artificial intelligence (AI), based on machine learning, imbues computers with some creativity and decision making capabilities using specific algorithms.