Data Refining is a process that refines disparate data within a common context to increase the awareness and understanding of the data, remove data variability and redundancy, and develop an integrated data resource. Disparate data are the raw material and an integrated data resource is the final product.
Data refining process may be composed of many different subsets depending on the database or data warehousing implementation. The process of data refining is one of the most important aspects of data warehousing because unrefined data can cause a heavy disaster on the final statistical output from the data warehouse will then be used by a company’s business intelligence.
In a data warehouse, there is a collective process called Extract, Transform, and Load (ETL). Data extracting is the process of gathering data from various other data sources. The data will then be transformed in order to fit business needs. Finally, then the data has been made to abide the business rules and the data architecture framework, it will then be loaded into the data warehouse.
Data refining does not apply to one particular aspect of the data warehouse implementation. In fact it applies to the many stages – from the planning to data modeling to the final integration of systems in the data warehouse to the functioning of the entire business intelligence system.
Beginning with the data modeling, data refining occurs when at the conceptual schema development, the semantics of the organization are being described. All abstract entity classes and relationships are being identified and carefully made sure that the entities will be base on real life events and activities of the company. In this case, data refining goes into action but eliminating unnecessary things to interest. The same goes true during the logical schema development where the tables and columns, XML tags and object oriented classes are being described and data refining makes sure that the structures to hold data are well defined.
An Entity-Relationship Model (ERM) is a data modeling technique where a representation of structured data is defined and data refining is very important. This is a stage in information system design where models are used in describing information needs or the type of information that is to be stored in a database during the requirements analysis. It makes sure that data are not redundant and relationship integrity is maintained so that any insert, delete or update processes can be easy managed without sacrificing the final data quality by broken integrity. In this aspect, data is refined by making sure that all relationships between entities and corresponding attributes are secure and accurate.
Data refining also takes place during the database normalization, technique used in designing relational database tables so that duplication of information is minimized. As a result, the database is safeguarded from certain types of logical inconsistency.
In data mining, there is a process called data massaging. This process is used in extracting values the numbers, statistics, and information found within a database and to predict what a customer will do next. Data Mining works in several stages starting with collection of data, then data refining, and taking the final action. Data collection may be gathering of information fro website databases and logs. Data refining involves comparing of user profiles with recorded behavior and dividing the users into groups so that behaviors can be predicted. The final is action is the appropriate action taken by the data mining process or the data source and that action is answering a question on the fly or sending targeting online advertisements to a browser or any other software application being used.