Data Denormalization is a process in which internal schema is developed from conceptual schema.
The data denormalization, although done by adding redundant data, is actually a process of optimizing a relational database’s performance. This is often done with relational model database management system which is poorly implemented. At the logical level, a true relational database management system would allow for a fully normalized database while providing physical storage of data which is designed to function at very high performance.
Database normalization is very important for designing relational database tables so that duplication of data will be prevented and the database can be guarded against logical inconsistencies. But data denormalization, although the term sounds the opposite, actually complements normalization in the database optimization process.
It is common that a normalized database uses stores different but related data in separate logical tables. These tables are called relation. In big data warehouses, some these relations are physical contained on separate disk files. Thus, logically, issuing a query that gets information from different relations stored on separate disk files can be slow. This can be even slower if many relations are being joined.
To overcome this problem, it is good to keep the logical design normalized while allowing the database management system to store separate redundant data on disk so that the query response may be optimized. While doing this, the DBMS should be responsible for ensuring that the redundant replicas are kept consistent all the time. In some SQL software, this is called indexed views while in others, this is called materialized views. For this matter, the term view is the information laid out in a format which is convenient for query with the index ensuring that the queries against the view are being optimized.
Data Denormalization is an important aspect of data modeling which is the process of creating and exploring data oriented structures taken from real life activities of an organization.
There general three categories namely the Conceptual data model which is used to explore domain concepts with project stakeholders; Logical data model which used to explore the relationships among domain concepts; and the Physical data model which used to design the internal schema of the database with focus on data columns of tables and relationships between tables.
Data denormalization is a substantial part of the physical data modeling process. The rules of data normalizations are bent on minimizing redundant data and not on improving the performance of data access. Denormalizing certain parts of the data schema can improve database access speed.
Denormalizing the logical data with extreme care also can result to an improvement in query response. But this can come with a cost. It will be the responsibility of the data designer to ensure that the denormalized database will not become inconsistent. This can be achieved by creating database rules called constraints. These constraints specify the synchronization measures of redundant copies of data. The real cost in this process is the increase in logical complexity of the design of the database as well as the complexity of additional constraints. They key to denormalizing logical data is exerting extreme care as constraints can create overhead of updates, inserts and deletes which may cause bad performance compared to its functionally normalized counterpart.
It should be noted that a denormalized data model is not the same as unnormalized data model which refers to the model which has not been normalized at all. Denormalization must be done only after a satisfactory level of normalization and after any required rules and constraints have been created to prevent anomalies n the overall design.