A brief description of SQL Server's CDC-
Change Data Capture,
or CDC for short, was initially introduced in the SQL Server 2008 edition as a
useful feature to track and record changes made to SQL Server database tables
without requiring extra programming work. Prior to SQL Server 2016, a SQL
Server database could only have Change Data Capture enabled with the SQL Server
Enterprise edition; however, as of SQL Server 2016, this is no longer
necessary.
A mirrored table
with the same column structure as the source tables and extra columns to
capture the description of these changes is created by Change Data Capture,
which keeps track of the INSERT, UPDATE, and DELETE actions on the database
table. For each INSERT statement, SQL Server creates one record that displays
the values that are inserted; for each DELETE statement, it creates one record
that displays the data that is deleted; and for each UPDATE statement, it
creates two records—one that displays the data that was there prior to the
change and the other that displays the data that was there after it was made.
Design and configuration-
A
SQL Server instance must have a SQL Server Agent active in order to use Change
Data Capture. A SQL Server database table that has this feature enabled will
have two SQL Server Agent jobs created for it. The first job's task is to
update database change tables with the updated data, and the second job's task
is to tidy up the change tables by removing records that are older than the
three-day configurable retention period. As the data source changes, Change
Data Capture is dependent on the SQL Server Transaction Log. A modification
will be recorded in the Transaction Log file as soon as it is made.
Change Data Capture as a Solution for Audits-
Without
the ability to track SELECT statements, Change Data Capture can be used as an
asynchronous SQL Server Audit solution to monitor and audit DML changes made to
the table, such as INSERT, UPDATE, and DELETE operations.
Change
Data Capture is an effective SQL Server audit solution because it is simple to
set up with a few T-SQL statements, offers historical data about values prior
to the change process, and gives comprehensive information on the data
modification procedure.
Let's see how we can audit the SQL Server DML modifications
using Change Data Capture.
Enabling DAC-
Step 1-
Open the Management Studio for Microsoft SQL Server and make a
new database and
Create Database CDC1
use CDC1
CREATE TABLE CDC_TAB
(
ID INT,
Name VARCHAR (50),
Place VARCHAR (50),
Salary INT,
)
Step 2-
After creating the
previously mentioned database and table, use the following Insert SQL
Statements to add some data to the CDC_TAB table.
Step 3-
CDC use in SQL
Server
In SQL Server, CDC
must be enabled in two steps. They are listed in the following order:
CDC must first be
enabled at the database level.
Next, we must
specify which tables require CDC activation.
Enable CDC on
Database Level:
To enable CDC on the
database level, run the following stored procedure in the system.
Step 4-
Enable CDC at Table
Level
Using the CDC1
database, we generated the CDC_TAB table. Let's now activate CDC for this
Employee table. As demonstrated below, we must run the system stored procedure
sp_cdc_enable_table in order to enable CDC at the table level. Here, we are
giving the stored procedure access to the schema, table name, and roll.
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'CDC_TAB',
@role_name = null
Step 5-
The SQL Server Agent
service must be running for the CDC to operate
Step 6-
The tables that
truly record insert, update, and delete action for a specific table are called
_CT tables. For instance, every Insert, Update, and Delete operations made on
the CDC_TAB table will be recorded in the dbo_CDC_TAB_CT table. A table named
dbo_EMP_CT will be created to record all DML operations made on the EMP table
if you have another table named EMP and you enable CDC for that table.
Step 7-
For instance,
updating data in the CDC_TAB table
UPDATE CDC_TAB SET Name = 'Ayush' WHERE ID = 7
UPDATE CDC_TAB SET Name = 'Abhishek' WHERE ID = 6
Step 7-
It will be recorded
in the system table.
Step 8-
Deleting from the CDC_TAB
table, for instance
To observe what
occurs in the dbo_CDC_TAB_CT table, let's remove one record from the CDC_TAB.
In order to remove the CDC_TAB with ID number three, please run the SQL
statement below.
DELETE FROM CDC_TAB WHERE ID = 7
Please review the
dbo_CDC_TAB_CT table as indicated below after deleting the CDC_TAB. It creates
a single record for the removed CDC_TAB, as you can see.
Step 9-
Example: Inserting
into the CDC_TAB table.
Let's add one record
to the CDC_TAB
INSERT INTO CDC_TAB VALUES (8, 'Yash', 'meerut', 900000)
Step 10 –
Identify which row
is utilized for what type of activity.
The operation column
can take one of four values (1, 2, 3, or 4), and we'll look at what each means.
Delete Statement = 1
Insert Statement = 2
Value before update statement = 3
Value after update statement = 4
Step 11-
To Disabled CDC Database Level and
table level
EXECUTE sys.sp_cdc_disable_db;
EXEC sys.sp_cdc_disable_table
@source_schema = 'dbo',
@source_name = 'CDC_TAB',
@capture_instance = 'all';
Advantages and Disadvantages-
Advantages of CDC in SQL Server-
Efficient Change Tracking-
CDC provides an efficient mechanism for tracking changes to
data without the need for manual triggers or logging mechanisms.
It captures changes at the database level, making it
suitable for scenarios where detailed change history is required.
Minimized Performance Impact-
CDC is designed to have minimal impact on the performance of the
database system.
It uses log-based change tracking, ensuring that changes are
captured without affecting the operational performance of the database.
Support for ETL Processes-
CDC is often used in conjunction with Extract, Transform,
Load (ETL) processes to identify and replicate changes to data between systems.
ETL processes can use CDC to efficiently identify and
process only the changed data.
Built-in Change Tables-
CDC automatically creates change tables to store the
historical changes, making it easy to query and analyze the historical data.
Integration with Replication-
CDC can be integrated with transactional replication to
efficiently replicate changes to other databases or servers.
Disadvantages of CDC in SQL Server-
Configuration Complexity-
Setting up and configuring CDC can be complex, especially
for users who are not familiar with the feature.
It requires specific permissions and steps to enable CDC on
databases and tables.
Maintenance Overhead-
Change tables can grow in size over time, leading to
increased storage requirements.
Regular maintenance is required to manage and clean up
change tables, which may add administrative overhead.
Limited to Certain Editions-
CDC is not available in all editions of SQL Server. For
example, it's not available in the Express edition, which may limit its use in
certain environments.
Lack of Granular Control-
CDC operates at the database and table level, which means
that it may capture more data than necessary in certain scenarios.
There is limited control over which specific columns or
types of changes to capture.
Increased Storage Requirements-
The change tables can consume additional storage space,
especially in scenarios where there are frequent and significant changes to the
data.
In summary, CDC is a powerful feature for tracking changes
in SQL Server, especially in scenarios where a comprehensive change history is
required. However, its implementation should be carefully considered based on
the specific requirements and constraints of the environment.
That covers everything regarding the CDC.
No comments:
Post a Comment
If you have any doubt or question, please contact us.