In a real world scenario, a data warehouse is implemented with different kinds of data stores supporting the whole system as it handles high level volume of data. These three kinds of data stores are Historical Data Store, Operational Data Store, Analytical Data Store.
All three kinds of data stores, although they server different purposes, are all basically databases. As such they all obey a general standard of database structuring, rules and constraints.
In general, it is recommended that databases should be normalized. This means that it should follow the at least the three normalization forms which are designed to eliminate redundant data. By eliminating redundant data, the database will generally execute a lot faster because the data to be inserted will into nice laid columns in one place only that if several descriptions or attributes of an entity are also inserted, they are neatly laid without causing redundancy because a relationships is well defined.
But there are certain cases however that having redundant data as effect of a database denormalization will give the database a better performance.
For instance, when implementing a database which has more data retrieval, as in the case of a website which only gets information less frequently but there are more views of data from internet users, it is wise to have denormalized table.
As a general rule, when updates are optimized at the expense of retrieval, the by all means, the database should not be denormalized. But when retrieval should be optimized at the expense of updates, the by all means, denormalization should be employed.
Going to be the data stores, the historical data store is a place for holding cleansed and integrated historical data. Since the data acts like archives and therefore retrieval only happens less frequently, the data store should be fully normalized. This means it has to be normalized up the third normal form.
The operational data store is designed for holding current data which is used for the day to day operation. This means that accesses and insertions happen every minute due to the progressive activities of business transactions.
Master and transactional data are mish mashed together and the operational data store need to efficiently facilitate the generation of operational reports. Hence, the operational data store should be implemented as a denormalized data store.
The analytical data store is designed to store both current and historical information about the business organization and is implemented with a dimensional structure for the facilitation of the generation of analytical reports. In this case, denormalization should be employed as the current information is also fast getting updated.
The decision whether or not to denormalize a data store should not be taken lightly as this involves administrative dedication. This dedication should be manifested in the form of documenting business processes and rules in order to ensure that data are valid, data migration is scheduled and data consumers kept update about the state of the data stores. If denormalized data exists for a certain application, the whole system should be reviewed periodically and progressively.
As a general practice, the periodic test whether the extra cost which is related to processing with a normalized database justifies the positive effect of denormalization. This cost should be measured in terms of Input / Output and CPU processing time saved and complexity of the updating programming minimized.
Data stores are such an integral part of the data warehouse and when not optimized, it could significantly minimize the entire system’s efficiency.