An enterprise data management system that consists of data stores and data warehouse may have several data sources. Primary Data Source is the first data site at which the original data is stored after their origination.
Imagine the data warehouse whose database is the repository of all of the company’s historical data. The data warehouse is the corporate memory. And then there are the Online Analytical Processing (OLAP) that handles all sorts of data so that the analysis can be the basis for wise and sound corporate decisions. And then there the Online Transactional Processing (OLTP) which handles online and real time transactions like that of an automated teller machine or a retails point of sales. In short, the enterprise data management handles very high volume of data every single minute and all throughout the year as long as the business is operating.
An enterprise data management information system has a data store that is a dynamic place for data coming from different data source and delivering disparate data from different platforms. This is where the disparate data are being processed in a series of activities called the ETL which stands for extract transform load so that the disparate data can be formatted in a unified form before being processed.
Speaking of a data store, the data that periodically gets to the data store are coming from the data sources.
For instance, let us take the case of the United States Environmental Protection Agency which is implementing an Envirofacts Data Warehouse are an example of a data source and where the primary data source applies. This United States agency is so large and it deals naturally with large volumes of data so its data handling is broken down into many individual EPA databases and databases are administered by program system offices. Sometimes, the industry is required to report information to state where it operates and sometimes also, the information is being collected at federal level.
So the data sources of the Envirofacts Data Warehouse provide information that makes it easy to trace the origin of the information. Some of these data sources are:
Superfund Data Source – This data source are from Superfund sites which hav those uncontrolled hazardous wastes sites designated by the federal to be cleaned up. In this data source are stored information about these sites in the Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS), which has been integrated into Envirofacts.
Safe Drinking Water Information Data Source – This database stores information related to drinking water programs.
Master Chemical Integrator Data source – This database integrates various chemical identifications used in four program system components.
Other data sources Envirofacts Data Warehouse are Hazardous Waste Data, Toxics Release Inventory, Facility Registry System, Water Discharge Permits, NDrinking Water Microbial and Disinfection Byproduct Information and the National Drinking Water Contaminant Occurrence Database.
Now, all these data sources contribute seemingly unrelated data which may come in disparate files formats. This may also come from different geographical locations from different federal governments within the United States. The data that they share finally converged in a central data warehouse which manages them so they become more meaningful and relevant to be redistributed or shared to anybody who needs them.
Each of these departments may or may not act as the primary data source. For example, if the data originating from the Safe Drinking Water Information Data Source comes from yet another source, then the Safe Drinking Water Information Data Source is not a primary data source. If data really comes from the actual raw activity of the department where the real paper took place, then the department may be a primary data source.