Data mining is a process of extracting useful data from large set of raw data or process of discovering patterns in large data set. Data mining is also known as knowledge discovery in data. (KDD)
• Automatic summarization of data
• Extracting useful information
• Discovering patterns in raw data
• Relational marketing
• Fraud detection
• Risk evaluation
• Text mining
• Web mining
Data gathering and integration: once the objectives and definition is identified, gathering of data begins as data comes from different sources therefore may requires integration. Data integration is a process of combining all gathered data into a single view.
Exploratory analysis: This is a third phase of data mining process. In this process, integrated data is investigated and summarized in main characteristics. It helps to identify errors and understand pattern in data before any assumptions.
Attribute Selection: This is a process of selecting attributes for integrated and summarized data. Here attributes that are n little use are removed to cleanse dataset. Moreover, new required attributes are added which are obtained from original attributes.
Model development and validation: once high quality dataset with newly added attribute is obtained, models are developed. In this phase data is split into two subsets training and testing.
Training set which is relatively small is use to identify learning model and testing set is use to access the accuracy model generated using training set.
Prediction and interpretation: this is final process of data mining where developed models and implemented and used to achieve goals.
Data mining process includes feedback cycles, represented by dotted arrow in figure. which indicates return in previous phase depending on outcome of subsequent phase.
• Efficiency of data mining algorithm
• Relational and complex types of data
• Poor data quality
• Presentation and visualization of mined data
• Interactive mining of knowledge
Hope you found it informative :)