Data Restructuring is the process to restructure the source data to the target data during data transformation. Data Restructuring is an integral part in data warehousing. A very common set of processes is used in running large data warehouses. This set of process is called Extract, Transform, and Load (ETL).
The general flow of ETL involves extracting data from outside sources, then transforming based on business rules and requirements so that the data fit the business needs and finally, data is loaded in to the data warehouse.
If one looks closely at the process, the data restructuring part comes before the loading. This is extremely necessary. For one, in a data warehouse environment, high volume levels of data come into the data warehouse usually at very short intervals. In most cases, the data could come from disparate sources – this means that the server where data comes from maybe ran by different software platforms so the data may be of different format; or that sources may be based on different data architectures which may not be compatible with the data architecture of the receiving data warehouse.
When all the data coming from the different sources, there is need for the data to be restructured so they comply with all the business rules as well as the overall data architecture of the data warehouse. Data restructuring makes the data structures more sensible to the database behind the data warehouse.
Data structure analysis includes making sure that all the components of the data structures are closely related, that closely related data are not in separate structures, and that the best type of data structure is being used. The data may be a lot easier to manage and understand when it is a representation which tries to abstract its relevant similarities.
Often, in data warehouses, data restructuring involves changing some aspects of the way wherein the database is logically or physically arranged. There are many reasons why data restructuring should be performed. For instance, data restructuring is done to make a database more desirable by improving performance and storage utilization or to make an application more useful in order to support decision making or data processing.
There are generally four types of data restructuring operations namely:
- Trimming
- Flattening
- Stretching
- Grafting
In trimming, the extracted data from the input is placed in the output without having to change any of the change in the hierarchical relationships but some unwanted components of the data removed.
In flattening, the operation produced a form from a structure branch of an input by extracting all information at the level of the values of the basic attributes of the branch.
The stretching operating can produce a data structure output which has hierarchical levels than the input.
Finally, a grafting operating involves combining two hierarchies horizontally to form a wider hierarchy by matching common values.
One of the most important roles that data restructuring plays is in the field information processing applications. At the moment data is extracted from the data sources and then new fields are being created and placed in the output, the data structure of the resulting output sometimes does not resemble that of the input.
Sometimes, some query facilities which are designed for simple retrievals are not adequate enough to handle many of the real world scenarios so some programming may be required. But programming may not be for everyone, even for database administrators. Making the most of data restructuring may actually help eliminate some of the needs for programming. With a properly restructured data within a relational database, simple queries may actually be enough even in retrieving relatively complex and aggregated data structures.