Jan 27, 2024

CDC in SQL Server

A brief description of SQL Server's CDC-

Change Data Capture, or CDC for short, was initially introduced in the SQL Server 2008 edition as a useful feature to track and record changes made to SQL Server database tables without requiring extra programming work. Prior to SQL Server 2016, a SQL Server database could only have Change Data Capture enabled with the SQL Server Enterprise edition; however, as of SQL Server 2016, this is no longer necessary.

 

A mirrored table with the same column structure as the source tables and extra columns to capture the description of these changes is created by Change Data Capture, which keeps track of the INSERT, UPDATE, and DELETE actions on the database table. For each INSERT statement, SQL Server creates one record that displays the values that are inserted; for each DELETE statement, it creates one record that displays the data that is deleted; and for each UPDATE statement, it creates two records—one that displays the data that was there prior to the change and the other that displays the data that was there after it was made.

 

Design and configuration-

 

A SQL Server instance must have a SQL Server Agent active in order to use Change Data Capture. A SQL Server database table that has this feature enabled will have two SQL Server Agent jobs created for it. The first job's task is to update database change tables with the updated data, and the second job's task is to tidy up the change tables by removing records that are older than the three-day configurable retention period. As the data source changes, Change Data Capture is dependent on the SQL Server Transaction Log. A modification will be recorded in the Transaction Log file as soon as it is made.

 

Change Data Capture as a Solution for Audits-

Without the ability to track SELECT statements, Change Data Capture can be used as an asynchronous SQL Server Audit solution to monitor and audit DML changes made to the table, such as INSERT, UPDATE, and DELETE operations.

Change Data Capture is an effective SQL Server audit solution because it is simple to set up with a few T-SQL statements, offers historical data about values prior to the change process, and gives comprehensive information on the data modification procedure.

Let's see how we can audit the SQL Server DML modifications using Change Data Capture.

 

Enabling DAC-

Step 1-

Open the Management Studio for Microsoft SQL Server and make a new database and table.

 

Create Database CDC1

use CDC1

 

CREATE TABLE CDC_TAB

(

 ID INT,

 Name VARCHAR (50),

 Place VARCHAR (50),

 Salary INT,

)







Step 2-

After creating the previously mentioned database and table, use the following Insert SQL Statements to add some data to the CDC_TAB table.




Step 3-

CDC use in SQL Server

In SQL Server, CDC must be enabled in two steps. They are listed in the following order:

CDC must first be enabled at the database level.

Next, we must specify which tables require CDC activation.

Enable CDC on Database Level: 

To enable CDC on the database level, run the following stored procedure in the system.

 





Step 4-

Enable CDC at Table Level

Using the CDC1 database, we generated the CDC_TAB table. Let's now activate CDC for this Employee table. As demonstrated below, we must run the system stored procedure sp_cdc_enable_table in order to enable CDC at the table level. Here, we are giving the stored procedure access to the schema, table name, and roll.

 

EXEC sys.sp_cdc_enable_table

@source_schema = 'dbo',

@source_name = 'CDC_TAB',

@role_name = null




Step 5-

The SQL Server Agent service must be running for the CDC to operate properly.




Step 6-

The tables that truly record insert, update, and delete action for a specific table are called _CT tables. For instance, every Insert, Update, and Delete operations made on the CDC_TAB table will be recorded in the dbo_CDC_TAB_CT table. A table named dbo_EMP_CT will be created to record all DML operations made on the EMP table if you have another table named EMP and you enable CDC for that table.




Step 7-

For instance, updating data in the CDC_TAB table

UPDATE CDC_TAB SET Name = 'Ayush' WHERE ID = 7




 

UPDATE CDC_TAB SET Name = 'Abhishek' WHERE ID = 6




Step 7-

It will be recorded in the system table.



Step 8-

Deleting from the CDC_TAB table, for instance

To observe what occurs in the dbo_CDC_TAB_CT table, let's remove one record from the CDC_TAB. In order to remove the CDC_TAB with ID number three, please run the SQL statement below.

DELETE FROM CDC_TAB WHERE ID = 7




Please review the dbo_CDC_TAB_CT table as indicated below after deleting the CDC_TAB. It creates a single record for the removed CDC_TAB, as you can see.




Step 9-

Example: Inserting into the CDC_TAB table.

Let's add one record to the CDC_TAB

 

INSERT INTO CDC_TAB VALUES (8, 'Yash', 'meerut', 900000)





Step 10 –

Identify which row is utilized for what type of activity.

The operation column can take one of four values (1, 2, 3, or 4), and we'll look at what each means.

Delete Statement = 1

Insert Statement = 2

Value before update statement = 3

Value after update statement = 4




Step 11-

To Disabled CDC Database Level and table level

 

EXECUTE sys.sp_cdc_disable_db;



EXEC sys.sp_cdc_disable_table

    @source_schema = 'dbo',

    @source_name   = 'CDC_TAB',

    @capture_instance = 'all';




Advantages and Disadvantages-

 

Advantages of CDC in SQL Server-

 

Efficient Change Tracking-

CDC provides an efficient mechanism for tracking changes to data without the need for manual triggers or logging mechanisms.

It captures changes at the database level, making it suitable for scenarios where detailed change history is required.

 

Minimized Performance Impact-

CDC is designed to have minimal impact on the performance of the database system.

It uses log-based change tracking, ensuring that changes are captured without affecting the operational performance of the database.

 

Support for ETL Processes-

CDC is often used in conjunction with Extract, Transform, Load (ETL) processes to identify and replicate changes to data between systems.

ETL processes can use CDC to efficiently identify and process only the changed data.

 

Built-in Change Tables-

CDC automatically creates change tables to store the historical changes, making it easy to query and analyze the historical data.

 

Integration with Replication-

CDC can be integrated with transactional replication to efficiently replicate changes to other databases or servers.

 

Disadvantages of CDC in SQL Server-

 

Configuration Complexity-

Setting up and configuring CDC can be complex, especially for users who are not familiar with the feature.

It requires specific permissions and steps to enable CDC on databases and tables.

 

Maintenance Overhead-

Change tables can grow in size over time, leading to increased storage requirements.

Regular maintenance is required to manage and clean up change tables, which may add administrative overhead.

 

Limited to Certain Editions-

CDC is not available in all editions of SQL Server. For example, it's not available in the Express edition, which may limit its use in certain environments.

 

Lack of Granular Control-

CDC operates at the database and table level, which means that it may capture more data than necessary in certain scenarios.

There is limited control over which specific columns or types of changes to capture.

 

Increased Storage Requirements-

The change tables can consume additional storage space, especially in scenarios where there are frequent and significant changes to the data.

 

In summary, CDC is a powerful feature for tracking changes in SQL Server, especially in scenarios where a comprehensive change history is required. However, its implementation should be carefully considered based on the specific requirements and constraints of the environment.

 

That covers everything regarding the CDC.

No comments:

Post a Comment

If you have any doubt or question, please contact us.