Slide1DATABASES AND DATA WAREHOUSES A Gold Mine of Information
Today, Organizations Need...Today, Organizations Need... lInformation to compete effectively lInformation just to stay alive in the information age lInformation organized in such a way that you can easily and quickly get to it lInformation-processing tools that help you work with information Introduction
YOUR FOCUS IN THIS CHAPTERYOUR FOCUS IN THIS CHAPTER lThe Difference Between Logical and Physical Views of Information lDatabases and Database Management Systems lHow You Can Develop Database Applications lData Warehouses and Data Mining Tools Introduction
THREE THINGS ORGANIZATIONS DO WITH INFORMATIONTHREE THINGS ORGANIZATIONS DO WITH INFORMATION 1.Process information in the form of transactions 2.Use information to make a decision 3.Manage information while it抯 used Information Revisited
PROCESSING INFORMATION IN THE FORM OF TRANSACTIONSPROCESSING INFORMATION IN THE FORM OF TRANSACTIONS lSuch as payroll processing, order processing, and handling your registration requests for classes. lThis is called ONLINE TRANSACTION PROCESSING (OLTP) - the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information. lOperational databases support OLTP. Information Revisited
USING INFORMATION TO MAKE A DECISIONUSING INFORMATION TO MAKE A DECISION lFor answering such questions as, 揌ow many senior-level marketing majors have not taken statistics? lThis is called ONLINE ANALYTICAL PROCESSING (OLAP) - the manipulation of information to support decision making. lData warehouses support OLAP. Information Revisited
MANAGING INFORMATION WHILE IT扴 USEDMANAGING INFORMATION WHILE IT扴 USED lDetermining who can view or use information lSpecifying how to back up information lIdentifying what storage technologies to use Information Revisited Most importantly, managing information includes organizing it so that people can logically use it without having to know anything about its physical structure. The difference between logical and physical is key.
Slide9lIn managing information, physical deals with the structure of information as it resides on various storage media. lLogical deals with how knowledge workers view their information needs, and includes such terms as: –CHARACTER - our smallest unit of information. –FIELD - group of related characters. –RECORD - group of related fields. –FILE - group of related records. –DATABASE - group of logically associated files. –DATA WAREHOUSE - information from many databases. Information Revisited
DATABASEDATABASE A database is actually composed of two parts: 1. the information itself –the files that are logically associated 2. the logical structure of the information –called the data dictionary Databases a collection of information that you organize and access according to the logical structure of that information.
A Database Is a Collection of InformationA Database Is a Collection of Information lMost databases contain two or more files with related information. lThe Inventory database (Figure 4.4, page 125) contains two files - Part and Facility. lThese two files are logically related because parts are stored in facilities and because you would use both of these files to manage your inventory. Databases
A Database Contains a Logical StructureA Database Contains a Logical Structure lYou organize and access a database by its logical structure, not its physical position. lDATA DICTIONARY - contains the logical structure of information in a database. lThe data dictionary contains the logical properties that describe information in a database. lSee Figure 4.5 (page 126) for the data dictionary of the Percentage Markup field in the Inventory database. Databases
A Database Has Logical Ties Among the InformationA Database Has Logical Ties Among the Information lA PRIMARY KEY is a field in a database file that uniquely describes each record. lA FOREIGN KEY is a primary key of one file that also appears in another file. So, foreign keys specify how files are logically related. lFor example, the Part and Facility files are logically related. So, in Figure 4.4 you can see that Facility Number (the primary key for the Facility file) exists in the Part file (where it抯 a foreign key). Databases
A Database Contains Built-in Integrity ConstraintsA Database Contains Built-in Integrity Constraints lAn INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database. lA registration database at your school includes integrity constraints concerning prerequisites for certain classes. lOur Inventory database includes an integrity constraint that says a part in the Part file cannot be assigned to a facility that does not exist in the Facility file. Databases
DATABASE MANAGEMENT SYSTEM (DBMS)DATABASE MANAGEMENT SYSTEM (DBMS) A DBMS contains 5 software components: 1. DBMS engine 2. Data definition subsystem 3. Data manipulation subsystem 4. Application generation subsystem 5. Data administration subsystem Database Management Systems the software you use to specify the logical organization for a database and access it.
DBMS ENGINEDBMS ENGINE Recall that: lPHYSICAL VIEW deals with how information is physically arranged, stored, and accessed on some type of secondary storage device. lLOGICAL VIEW focuses on how you need to arrange and access information to meet your particular business needs. DBMSs accepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.
DATA DEFINITION SUBSYSTEMDATA DEFINITION SUBSYSTEM lYou use this subsystem to define the information logical structure when you first create a database. lOnce you抳e created a database, you use this subsystem to define new fields, delete fields, or change field properties. lFigure 4.5 (page 126) contains this subsystem screen for the Part file. DBMSs helps you create and maintain the data dictionary and define the structure of the files in a database.
DATA MANIPULATION SUBSYSTEMDATA MANIPULATION SUBSYSTEM lThis subsystem is most often the primary interface between you as a user and the information contained in a database. lTools in this subsystem include views, report generators, query-by-example tools, and structured query language. DBMSs helps you add, change, and delete information in a database and mine it for valuable information.
DATA MANIPULATION TOOLSDATA MANIPULATION TOOLS lVIEW - allows you to see the content of a database file, make whatever changes you want, perform simple sorting, and query to find the location of specific information. See Figure 4.7 page 129. lREPORT GENERATOR - helps you quickly define formats of reports and what information you want to see in a report. See Figures 4.8 and 4.9 page 130. DBMSs
DATA MANIPULATION TOOLSDATA MANIPULATION TOOLS lQUERY-BY-EXAMPLE (QBE) TOOL - helps you graphically design the answer to a question. Figure 4.10 (page 130) shows the QBE for displaying the names and phone numbers of facility managers in charge of parts that cost more than $10. lSTRUCTURED QUERY LANGUAGE (SQL) - a standardized fourth-generation language found in most database environments. SQL is the same as QBE, except that you perform a query by creating a statement instead of pointing, clicking, dragging. DBMSs
APPLICATION GENERATION SUBSYSTEMAPPLICATION GENERATION SUBSYSTEM lTools for creating data entry screens (See Figure 4.12 page 131 for an example) lProgramming languages specific to a particular DBMS lInterfaces to commonly used programming languages that are independent of any DBMS. DBMSs contains facilities to help you develop transaction-intensive applications. This subsystem includes:
DATA ADMINISTRATION SUBSYSTEMDATA ADMINISTRATION SUBSYSTEM lBackup and recovery lSecurity management lQuery optimization lReorganization lConcurrency control lChange management DBMSs helps you manage the overall database environment by providing facilities for:
THE RELATIONAL DATABASE MODELTHE RELATIONAL DATABASE MODEL lThis is the most popular model. lEach table is called a RELATION. lA relation contains information about a particular ENTITY CLASS (a concept - people, places, or things - about which you wish to store information and that you can identify with a unique key). Database Models a database model that uses a series of two- dimensional tables or files to store information.
Slide24lFigure 4.14 (page 136) shows a relational database for a video rental store. lThe entity classes are Customer, Video, Video Rental, and Distributor. lNotice how these tables are related to each other through the use of foreign keys. lIn the Video Rental relation, you抣l find a primary key that uses more than one one field to create a unique description. This is called a COMPOSITE PRIMARY KEY. lA primary key that uses only one field is called an ATOMIC PRIMARY KEY. Database Models
THE OBJECT-ORIENTED (O-O) DATABASE MODELTHE OBJECT-ORIENTED (O-O) DATABASE MODEL lAn OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEM (O-O DBMS) is the DBMS software that allows you to develop and work with an O-O database. Database Models a database model that brings together, stores, and allows you to work with both information and procedures that act on the information.
Slide26lThis model takes advantage of the concept of an OBJECT - a software module containing information that describes an entity class along with a list of procedures that can act on the information describing the entity class. lFigure 4.15 (page 138) shows the same video rental store using the O-O database model. lNotice that the objects (entity classes) - which include Customer, Video Rental, Video, and Distributor - contain both information and procedures for working with that information. lSee Appendix C for more on objects. Database Models
DEVELOPING YOUR OWN DATABASEDEVELOPING YOUR OWN DATABASE lBeing able to develop your own database is a part of knowledge worker computing. lBuilding a database for your personal needs includes the following 4 steps: 1. Defining entity classes and primary keys 2. Defining relationships among entity classes 3. Defining information (fields) for each relation 4. Using a data definition language to create the database lFollow along as we build the database to support the report in Figure 4.16 on page 140. Developing Databases
Slide28lYou own a small business and are interested in tracking employees by the department in which they work, job assignment, and the number of hours assigned. lEach of your employees can be assigned to only one department, but a department may have many employees (a department, however, may not have any employees assigned to it). Each employee can be assigned to any number of jobs and a job can have many employees assigned to it, but it抯 not necessary that any employees be assigned to a certain job.
#1 - DEFINING ENTITY CLASSES AND PRIMARY KEYS#1 - DEFINING ENTITY CLASSES AND PRIMARY KEYS lFrom the report in Figure 4.16, you can identify the entity classes as Employee, Department, and Job. lNow, for each entity class, you must define a primary key that provides a unique description. These include: •Employee entity class - Emp ID (e.g., 2345 for Smith) •Department entity class - Dept (e.g., 15) •Job entity class - Job (e.g., 14 for Acct) Developing Databases
#2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSES#2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSES lFor this step, use an ENTITY-RELATIONSHIP (E-R) DIAGRAM, a graphical method of representing entity classes and their relationships. lSee Figure 4.17 (page 140) for the initial E-R diagram of our database and a listing of E-R diagram symbols. Developing Databases
Slide31lAn Employee must be assigned to a Department. lAn Employee cannot be assigned to more than one Department. lA Department may have many Employees assigned to it. lA Department is not required to have any Employees assigned to it. Developing Databases EMPLOYEE DEPARTMENT M:1
Slide32lAfter building the initial E-R diagram, you must follow the process of normalization. lNORMALIZATION is a process of assuring that a relational database structure can be implemented as a series of two-dimensional tables. lNormalization includes the following 3 steps: 1.Eliminate repeating groups or M:M relationships 2.Assure that each field in a relation depends only on the primary key of that relation 3.Remove all derived fields from the relations. Developing Databases
Slide33lThe first rule of normalization states that no M:M relationships can exist. lThere is an M:M between Employee and Job. lYou eliminate this by creating an INTERSECTION RELATION - a relation you create to eliminate a repeating group. lAn intersection relation will have a composite primary key that consists of the primary key fields from the two intersecting relations. lIn Figure 4.18 (page 142), we created an intersection relation called Employee-Job to eliminate the M:M relationship. Developing Databases
#3 - DEFINING INFORMATION (FIELDS) FOR EACH RELATION#3 - DEFINING INFORMATION (FIELDS) FOR EACH RELATION lIn this step, you follow rules #2 and #3 of normalization. lYour goal here is two-fold: 1.Make sure that the information in each relation is indeed in the correct relation 2.Make sure that the information cannot be derived from other information. Developing Databases
Slide35lTo determine if information is in the correct relation, ask: 揇oes this piece of information depend only on the primary key for this relation? lIf the answer is yes, the information is in the correct relation. lIn the Employee relation (Figure 4.20 page 144), we currently store Dept Sup. Does Dept Sup depend on Emp ID? lThe answer is no - Dept Sup depends on Dept, so it should be in the Department relation. Developing Databases
Slide36lDerived information - information that can be mathematically determined from other information - should not be stored in your database. lFor example, # Emp is a field in the Department relation. lHowever, we can simply count the number of occurrences of each Dept in the Employee relation and determine the number of employees. lSo, we remove # Emp from the database. Developing Databases
#4 - USING A DATA DEFINITION LANGUAGE TO CREATE THE DATABASE#4 - USING A DATA DEFINITION LANGUAGE TO CREATE THE DATABASE lThe final step is to actually create the relations you identified in steps 1-3. lYou do this with a data definition language. lThis step includes: –Developing a data dictionary –Defining the various relations –Defining primary keys and relationships Developing Databases
DATA WAREHOUSEDATA WAREHOUSE lare a logical extension of databases lsupport OLAP lare among the newest and hottest buzz words and concepts in the IT field. Data Warehouses a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. Data warehouses...
DATA WAREHOUSE FEATURESDATA WAREHOUSE FEATURES lData warehouses combine information from different databases –Making them a true repository of all an organization抯 information lData warehouses are multi-dimensional –As opposed to 2 dimensions in the relational model –Often called hypercubes (See Figure 4.23 page 148) lData warehouses support decision making –While databases support OLTP, data warehouses support OLAP Data Warehouses
DATA MINING TOOLSDATA MINING TOOLS lQUERY-AND-REPORTING TOOLS - QBE tools, SQL, and report generators. lINTELLIGENT AGENTS - various artificial intelligence tools that form the basis for 搃nformation discovery?in OLAP. lMULTIDIMENSIONAL ANALYSIS (MDA) TOOLS - slice-and-dice techniques that allow you to view multidimensional information from different perspectives. Data Warehouses the software tools you use to query information in a data warehouse.
IMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSEIMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSE lDo you need a data warehouse? lDo you already have a data warehouse? lWho will the users be? lHow up-to-date must the information be? lWhat data mining tools do you need? Data Warehouses
Slide44lHow will changes in technology affect organizing and managing information lWhat types of database model and databases are most appropriate
MANAGING THE INFORMATION RESOURCEMANAGING THE INFORMATION RESOURCE lHow will changes in technology affect organizing and managing information? lWhat types of database models and databases are most appropriate? lWho should oversee the organization抯 information? Managing Information
OVERSEEING YOUR ORGANIZATION扴 INFORMATIONOVERSEEING YOUR ORGANIZATION扴 INFORMATION lCHIEF INFORMATION OFFICER (CIO) is the IT manager who directs all IT systems and personnel while communicating directly with the highest levels of the organization. lDATA ADMINISTRATION plans for, oversees the development of, and monitors the information resource. lDATABASE ADMINISTRATION is responsible for the more technical and operational aspects of managing information in databases. Managing Information
MANAGING THE INFORMATION RESOURCEMANAGING THE INFORMATION RESOURCE lIs information ownership a consideration? lWhat are the ethics involved in organizing and managing information? lHow should databases and database applications be developed and maintained? Managing Information
TO SUMMARIZETO SUMMARIZE lHow we view information: –The physical view of information deals with how information is physically arranged, stored, and accessed on some type of secondary storage device. –The logical view of information focuses on how you need to arrange and access information to meet your particular business needs. lA database is a collection of information that you organize and access according to the logical structure of that information. lThe data dictionary contains the logical structure of information in a database.
TO SUMMARIZETO SUMMARIZE lA database management system is the software you use to specify the logical organization for a database and access it. lPopular database models include the relational model and the object-oriented model. lThe four steps of developing a personal database application include: 1. Define entity classes and primary keys 2. Define relationships among entity classes 3. Define information (fields) for each relation 4. Use a data definition language to create the database
TO SUMMARIZETO SUMMARIZE lData warehouses are a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks. lData mining tools - the software tools you use to query information in a data warehouse - include query-and-reporting tools, intelligent agents, and multidimensional analysis (MDA) tools.