Disparate Data are heterogeneous data. They are neither similar nor can be easily integrated with an organisations database management system. It differs in one or more aspects of an information system.
Disparate data may be characterized by these basic problems:
1. In an organization implementing a database system, there is no one, complete, integrated inventory of all its data.
2. The real substance, meaning and content of all the data within the organizational data resource is not readily known or well defined.
3. There is very high data redundancy in all over the organization
4. There is a very high variability of data formats and contents
A data warehouse could be a prime example of a place where disparate data come together. The goal of a data warehouse to facilitating the bringing together of data from a wide variety of existing databases such as from data marts and other data warehouses as well so that the data warehouse can support management and reporting needs.
Now, the reality is that databases and other data sources are not implemented in the same way. Some database may be managed by say Microsoft SQL Sever while other may by managed by Oracle or MySQL.
While the underlying technology of these commercial relational database management systems may be basically the same, they can differ in the final data outputs because they have their own specific and often proprietary formatting. So when each of these relational database management systems send their data to the data warehouse, they may sending disparate data converging at the warehouse area.
Different data sources may also be implemented using different platforms. Relational databases are not the only sources of data feeding a data warehouse. There may be other data such output of running programs from computer servers. Different data sources may be powered by different operating systems. Some may be running on Unix and the many different distributions of Linux. Some may be on MacOS, others on Windows and many other different platforms.
Still another cause of having disparate data is the different requirements and different data available through the states of the lifecycle – it could be less at the start and then more at the end. Different users within the company may have different needs for data like suppliers versus customers, operator versus planner, commercial versus government.
In a data warehouse, there is a process known as ETL which stands for extract, transform and load. The transform part is the part which takes care of managing disparate data.
Data Transformation is very important and needs to be executed with precision. During the earlier part when the data needs to be extracted from the various data sources with different platforms, data identification for the transformation process beings. During this stage, the system identifies the data needed at the target location, such as an operational data store or a data warehouse, and the source data needed to produce the target data.
Once everything is in place and the data has been identified, data extraction take place by taking the desired data from data sources and placing them in data depot for refining. The data depot refers to some sort of working place or staging area so that disparate data can be refined before getting loaded into the database.
The process of data extraction technically includes any conversion between database management systems. Data refining is the actual work of transforming disparate data before they are finally integrated to the data warehouse under common data architecture. When disparate data are transformed into the data defined by the architecture, real integration begins