From the perspective of computer science in general and telecommunications in particular, data integrity refers to the wholeness or completeness of data during operations involving transfer, storage and retrieval. It also refers to the preservation of data so that whatever process in undergoes through, it will still remain to be what it has been intended for. In other words, data integrity is the assurance that data will always be correct, consistent and accessible.
Data which is stored in the database could be interpreted as meaningless until used. When an application needs to use data, it needs to access it so data needs to travel from the storage to some place for processing. Data is said to have integrity if as it travels, it remains faithful to the source it comes from.
Think of a furniture, say, a chair. You can a move a chair from place to another, and you will soon discover that at some point in its movement something will get broken.
Today data traveling and getting broken is frequent and inevitable circumstance. In fact, with today’s internet protocols, data will be broken into packets. The point with data integrity is that when data breaks, it can be recovered.
There are many ways wherein data integrity can be compromised. Human error in data entry is one of the top causes. Another could be the instability of communications medium when transmitting data. Software applications having bug and viruses could also compromise data integrity. Also, hardware malfunctions such disk crashes is a cause of compromised data integrity.
In relational databases which are key technologies behind data warehouses, data integrity is focused on the correctness, validity and accuracy of data within the database. One of the most common types of database integrity is the referential integrity. This type of data integrity involves prevention of errors in the relationship between a foreign key and primary key. An example problem would be an orphan child record that is missing its parent record which was deleted and the keys in the relation were not properly defined.
Data integrity in relational database can be achieved by having careful database planning and design. A database designer or developer should use integrity constraints in order to enforce all business rules that are closely associated with the database. This can ensure that end users cannot enter invalid information or data consumers can alter data without the right privileges. And when someone with the appropriate privileges deletes or alters data, relationships through keys can be maintained so no record can be left orphaned.
To ensure that data integrity, most of today’s high power relational database management system applications now offer server enforced data integrity. Many RDBMS professionals prefer that the RDBMS will be the ultimate authority when it comes to accepting and rejecting data into or from the database. These professionals acknowledge the fact that RDBMS could be more precise in decision making related to databases.
In general sense, threats to data integrity can be minimized by having regular data backups, controlling access of data by defining roles and privileges and installing security tools, designing user interfaces that can warn or prevent invalid input, and using error detection and correction tools necessary in data transmission.