CSUSM MIS 304 - Business Uses of Data Mining and Data Warehousing - D2966301

Home> Schools> California State University San Marcos> (MIS) > MIS 304> Business Uses of Data Mining and Data Warehousing

DOC PREVIEW

CSUSM MIS 304 - Business Uses of Data Mining and Data Warehousing

School name California State University San Marcos

Course Mis 304- Principles of Management Information Systems

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595 Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento Submitted: 12/9/2008Introduction- Data Mining is the process of collecting and sorting through massive amounts of data in order to find useful bits of information that can be used by a business. In the following pages data mining and data warehousing will be explained and the many uses of these two subjects will be expanded upon. Additionally the history of data mining will be presented. Several of the most popular software products that are used for the collection and sorting of data will also be explained. Current issues and trends in data mining will also be addressed. Lastly a conclusion and analysis of data mining will be given. Objectives- The primary objective of this paper will be to explore the uses of data mining and data warehousing for business application. Within the exploration of these two related topics we will define data mining and data warehousing. Included in the analysis will be the advantages and disadvantages of data mining and data warehousing. We will also show how data is collected. Additionally we will explain how data mining is useful and necessary for business and the processes by which raw data is converted into useful information by a business. In all data mining and data warehousing will be defined, advantages and disadvantages explained and a broad overview of how data mining is used for business applications.Scope of Project- Data mining is the practice of searching through large amounts of computerized data to find useful patterns or trends (Merriam-Webster). A consortium of individuals, which utilize data mining techniques, developed the Cross-Industry Standard Process for Data Mining (CRISP-DM). The group comprised of representatives from DaimlerChrysler, SPSS Inc., and NCR Corporation. Their derived model consisted of six major phases of data mining. The first of these phases is the process of understanding the business. The researcher undergoing this step needs to only focus on salient components of the project. Understanding the data is another crucial step in the process of data mining. The researcher needs to ask one’s self the following questions regarding the data gathered: What is the nature of the data? What is the quality of the data? Is there missing or miscoded data? Does the data relate directly to the goals set in the prior step? In order to effectively mine data the data needs to be sufficient. If there is missing data, or low quality, or inaccurate data, the process will not be accurate in viewing, or predicting any patterns. In the step of understanding data the researcher needs to either accept or deny the data as a whole before moving forward to preparing the data. In the data preparation step the researcher has the opportunity to disregard data that isn’t going to be used. The analyst also needs to deduce which method should be utilized for dealing with missing data problems. In this step the data also needs to be organized in such a way that the computer can analyze the data in the most efficient manner.Step Four is the process of modeling the data gathered and organized. This step has a plethora of techniques. The best method varies depending on the application, the data, and the amount of work the analyst wants to put in. A good model clearly summarizes patterns and relationships. Evaluation is the next process in CRISP-DM. In the evaluation process the analyst determines if their model is sufficient. The sufficiency is measured based on a technical standard as well as suitability to the business problem. The technical aspects of the project can then be determined to need enhancement, which is done in this step of the project. To determine if the collected data and model meets organizational requirements an analyst may utilize graphical diagnostics. The last stage is deployment. If data mining is to be useful to management research findings and models should accurately guide business decisions. Models can be implemented, via computer programs, for product customization and personalization, customer acquisition and retention, credit scoring, and pricing, among an enormous amount of other applications. In order for data mining to be successful information needs to be collected. This is a crucial aspect of data mining, which can prove to be a huge undertaking. Fortunately, many companies have been gathering this data since the birth of data mining techniques. A user can gather this data through internal resources, as well as secondary information, accrued by others. When the proper amount of data is collected, CRISP-DM then goes into effect. (Data and Text Mining, Miller) An enormous database of information gives rise to a system highly susceptible to noisy, missing, and inconsistent data. Data mining techniques are not just encapsulatedwithin the boundaries of analytics; they are also used in the preprocessing of the data in order to create a more fluent, efficient system. These processes are called data cleaning, data integration, data transformations, and data reduction. Data cleaning helps to fill missing values, smooth noisy data, identify and remove outliers (data so skewed they fall outside the expected boundaries), and resolving inconsistency. Missing values can be filled in manually but this approach is very time consuming. A global constant can be used to fill the missing data (i.e. ‘unknown’), usually this makes the data misleading because the system will determine that there is a pattern concerning all the missing data. A global constant is not recommended even though it is a simple fix. Another fix is taking the average data and replacing all the missing values with that average. Data integration is the process of taking data from multiple sources and creating a single database with a standard protocol. Entity identification problems can occur in this process since different databases can use difference references and categorizations for the same entity of data. For example, customer_ID in one database and customer number can refer to the same data entity but may not be detected in the data integration process. Transformation puts data into a format that is more advantageous to the user. Data transformation can include aggregation. For example: taking monthly sales and

View Full Document