Redundant Data as the name suggests is data duplication. It means, same data of a single organization is stored at multiple data sites.
Dealing with redundantly data means that a company has to spend a lot of time, money and energy. Since, as mentioned, these redundant data are unknown to the organization, they can crawl into the system and give the system unwanted and unexpected results such as slowing down the entire system process, giving inaccurate data output and affecting data integrity very negatively. Redundant data can also create a risk to information quality if the different databases are not updated concurrently.
Data redundancy is costly to address as it requires additional storage, synchronization between databases, and design work to align the information represented by different presentation of the same data.
The problems associated with redundant data can be addressed by data normalization. Normalized tables generally can contain no redundant data because each attribute only appears in one table. Also, normalized tables do not contain derived data and instead, the data contained can be computed from existing attributes which has been selected as an expression based on the said attributes.
Having normalized tables can also greatly minimize the amount of disk space used in the implementation while making the updating very easy to do. But with normalized tables, one can be forced to use joins and aggregate functions which can sometimes be time consuming to process. An alternative to database table normalization would be to have new columns to contain redundant data as long as the trade offs involved are fully understood.
A correctly designed data model can avoid data redundancy by keeping any attribute only in the table for the entity which it describes. In case the attribute data is needed in a different perspective, then a join can be used although using a join may take time. If the join really greatly affects the performance in a negative way, then it can be eliminated by duplicating the joined data in another table.
But despite all the negative effect and impressions associate with redundant data, there is also some positive impact that redundant data may bring. Redundant data can also be useful and may even be required in order to satisfy service-level goals for performance, availability, and accessibility.
It has been shown in the different representations of the same data by data warehouses, operational data stores, and business intelligence systems that redundant data is essential in providing new information. The important thing is to know is that in some cases, when redundant data is managed well, it can give some benefits to the entire information system.
In fact, data redundancy is actually a standard computer term referring to the computer storage property wherein several disk arrays typically in RAID arrays provide fault tolerance such that when some of the disks in the system fail, all or part of the data stored on the array can be recovered. The cost which is often associated with data redundancy is a reduction of disk capacity; implementations require either a duplication of the entire data set or an error-correcting code to be stored on the array.
There are many special hardware available in the market today especially designed for handling redundant data. A redundant data storage hardware can help decrease a system downtime by removing some of points of failure. Some storage arrays can provide a system with redundant power supplies, cooling units, drives, data paths, and controllers. While servers attached to the redundant data storage can include multiple connections, providing path failover capabilities.