Change Data Capture refers to the process of capturing changes which are made to a production data source. Change Data Capture is typically performed by reading the source of database management software logs. Some of the features of Change Data Capture are
- It consolidates units of works
- Ensures that data is synchronized with the original source
- Reduces data volume in a data warehousing environment.
In a data warehousing environment, some events oftentimes required that relational data be extracted and transported from one or more sources of databases and then loaded into the data warehouses for processing and analysis. Change Data Capture immediately identifies the event and processes only the data which has not changed. It will not change the entire table but makes the changed data available for whatever use.
If Change Data Capture was not implemented, extracting business data from any database would be an extremely difficult and cumbersome process. This will involve moving the entire contests of the tables into flat files and load the files into the data warehouse. This is not just cumbersome but also expensive.
Change Data Capture is not dependent on any intermediate flat files to temporarily contain data outside the relational database. Changed data resulting from INSERT, UPDATE, and DELETE operations are captured and then stored in a database object which is called change table. The changed data will then be made available to any applications which will need them in a controlled manner.
Some of the terminologies describing Change Data components include the following:
Source System – This refers to the production of database containing the source table where Change Data Capture will capture the changes.
Source Table – Is the table in the database which contains data the user will want to capture. Any changes made to the source table will be instantly reflected in the change table.
Change Set – This term refers to the collection of change tables.
Change Table – This is the database table which contains the changed data which results from DML statements made to a single source table. This table can consist of the change data itself and the system metadata. The Change Data is stored in a database table while the system metadata is needed for maintaining the change table.
Change table need to be managed so that its size will not grow without limit. This is done by managing the data in change tables and automatically purging change data which are no longer needed. A procedure can be automatically set to be called periodically to remove data from the change table which are no longer required.
Security is imposed in the Change Data process by having data subscribers, any user application which will want to get data, register with the database management system. They will then specify their interest from one or more source to tables and the database manager or administrator will give the subscribers their desired permissions, privileges or access.
The Change Data Capture environment is very dynamic. The data publisher can add and remove change tables at whim at any time. Depending on the database application, subscribers may not get explicit notification when the publisher makes changed to a table but views can be used to check by the subscriber. There are many more mechanism employed so that subscribers can always adjust to changes in the database where subscription is active.
Change Data is an indispensable feature of any relational database management system (RDBMS) especially those being used in large data warehouses. They make sure servicing of data is fast and efficient, changes are monitored for easy troubleshooting and analysis and referential integrity is always maintained between tables.