Database reverse engineering is an important process in improving the understanding of data semantics. Many aspects of the database evolution especially those that pertain to old and legacy databases where the semantics of their data have been lost through the years need a database reverse engineering process to understood in detail.
Today, there are many processes currently undertaken to re-engineer legacy systems or the federation of distributed databases. There have been many works done wherein a conceptual schema that is often based on an extension of the Entity-Relationship (ER.) model is being derived from a hierarchical database, a network database, or a relational database.
Reverse data denormalization is just one of the broad aspects of a database reverse engineering which has two major steps. The first step involves eliciting the data semantics from the existing system. In this step, the various sources of information can be relevant for tackling this task, e.g., the physical schema, the database extension, the application programs, but especially expert users, are being elicited.
The second involves expressing the extracted semantics with a high level data model. This task consists in a schema translation activity and gives rise to several difficulties since the concepts of the original model do not overlap those of the target model.
Most of the methods used in database reverse engineering within the context of relational database mainly focus on the work done with schema translation since they assume that the constraints such as functional dependencies or foreign keys are
available at the beginning of the process. But then in order to be more realistic, those strong assumptions may not in apply in all cases as there are also old versions of database management systems which do not support such declaration.
There have been many recent reverse data denormalization works that are independently proposing for the alleviation of the assumptions aforementioned. In a Third Normal Form schema, they key idea would be to fetch the needed information from the data manipulation statements which are embedded in certain application programs but there may be no need for constraining the relational schema with a consistent naming key attribute.
The Third Normal Form requirement has remained to be one of the major limits for the current methods in database reverse engineering. As it has been shown, during a database design process, the relational schemas are sometimes either directly produced in the First Normal Form or in the Second Normal Form or denormalized at the end of the database design process.
In some cases, the denormalization occurs during the implementation of the physical database or during the maintenance phase when the attributes are added to the database.
Data denormalization is a process wherein an attempt is made to optimize a database performance by adding redundant data despite the fact that it is generally recognized that all relational database design should be based on a normalized logical data model since normalization proposes a way to develop an optimum design from a logical perspective.
There are cases when it is really necessary to add redundant data because current database management systems that implement the relational model are doing the implementation poorly.
A common approach to data denormalization is to denormalize the logical data design and when done with care, this process can achieve significant improvement in query response.
On the other hand, reverse data denormalization is mostly used for understand the structure of database especially when there is a need to fix or troubleshoot a very complex problem.