Data Warehouse Infrastructure basically supports a data warehousing enviroment with the help of a combination of technologies. In its most general definition, a data warehouse is large repository of all sorts of data the implementing organization would in need in the present and in the future. But the real of data warehouse and its functions and features may very depending upon the need of the organization and what it can afford.
So, the overall design and methodology of data warehouse will be depending on the data life cycle management policy of the organization. The general life data life cycle starts with pre-data warehouse, data cleansing, data repository, and front-end analytics.
The pre-data warehouse is like stage or area where the designers need to determine which data contains business value for insertion. Some of the infrastructure found in this area includes online transaction processing (OLTP) database which stored operational data.
These OLTP databases may be residing in some transactional business software solutions such as Supply Chain Management (SCM), Point of Sale, Customer Serving Software and Enterprise Resource Planning (ERP) and management software. OLTP databases need to have very fast transactional speeds and up to the point accuracy.
Metadata computer application servers also can be found within this area. Metadata, which means data about data in computer speak, make sure that data which into the data lifecycle process are accurate and clean. It also makes sure that they are well defined because metadata can help speed up searches in the future.
During the data cleansing, data undergoes a collective process referred to as ETL which stands for extract, transform, and load. Data are extracted from outside sources like those mentioned in the pre-warehouse. Since these data may come in different formats from disparate data, they will be transformed to fit the business needs and requirements before they are loaded into the data warehouse.
Tools at this phase include software applications created with almost any programming language. These tools could be very complex and many companies prefer to buy then instead of having in house programmers. One of the requirements of a good ETL tool is that it could efficiently communicate with the many different relational databases. It should also be able to read various file formats from different computer platforms.
At the data repository phase, data are stored in corresponding databases. This is also the phase where active data of high business value to an organization are given priority and special treatment. Data repositories may be implemented as data mart of operational data store (ODS).
A data mart is smaller that a data warehouse and is more specific as in it is built on a departmental level instead of company wide level. An ODS are sort of resting place for data and they hold recent data before they are migrated to the data warehouses. Whether a data warehouse implements both or not, the tools in this stage are all related to databases and database computer servers.
The front-end analysis may be considered the last and most critical stage of the data warehouse cycle. This is the stage where data consumers will interact with the data warehouse to get the information they need. Some of the tools used in this area are data mining applications which are used to discover meaningful patterns from a chaotic system or repository.
Another tools is the Online Analytical Processing (OLAP) will used in analyzing historical data of the organizations and slice the required business information. Some of the other tools are generic reporting or data visualization tools so that end users can see the information in visually appealing layouts.