Its all about Database: MySQL Architecture

Showing posts with label MySQL Architecture. Show all posts

Oct 26, 2021

InnoDB Multi-Versioning

InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.

Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.

Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.

Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.

The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.

In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.

If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 15.13, “InnoDB Startup Options and System Variables” for more information.

Multi-Versioning and Secondary Indexes

InnoDB multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.

When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.

If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB looks up the record in the clustered index.

However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB looks up the record in the clustered index.

Best Practices for InnoDB Tables

Specifying a primary key for every table using the most frequently queried column or columns, or an auto-increment value if there is no obvious primary key.
Using joins wherever data is pulled from multiple tables based on identical ID values from those tables. For fast join performance, define foreign keys on the join columns, and declare those columns with the same data type in each table. Adding foreign keys ensures that referenced columns are indexed, which can improve performance. Foreign keys also propagate deletes or updates to all affected tables, and prevent insertion of data in a child table if the corresponding IDs are not present in the parent table.
Turning off autocommit. Committing hundreds of times a second puts a cap on performance (limited by the write speed of your storage device).
Grouping sets of related DML operations into transactions, by bracketing them with START TRANSACTION and COMMIT statements. While you don't want to commit too often, you also don't want to issue huge batches of INSERT, UPDATE, or DELETE statements that run for hours without committing.
Not using LOCK TABLES statements. InnoDB can handle multiple sessions all reading and writing to the same table at once, without sacrificing reliability or high performance. To get exclusive write access to a set of rows, use the SELECT ... FOR UPDATE syntax to lock just the rows you intend to update.
Enabling the innodb_file_per_table option or using general tablespaces to put the data and indexes for tables into separate files, instead of the system tablespace.
The innodb_file_per_table option is enabled by default.
Evaluating whether your data and access patterns benefit from the InnoDB table or page compression features. You can compress InnoDB tables without sacrificing read/write capability.
Running your server with the option --sql_mode=NO_ENGINE_SUBSTITUTION to prevent tables being created with a different storage engine if there is an issue with the engine specified in the ENGINE= clause of CREATE TABLE.

InnoDB Engine

InnoDB is a general-purpose storage engine that balances high reliability and high performance. In MySQL 8.0, InnoDB is the default MySQL storage engine. Unless you have configured a different default storage engine, issuing a CREATE TABLE statement without an ENGINE= clause creates an InnoDB table.

Key Advantages of InnoDB

Its DML operations follow the ACID model, with transactions featuring commit, rollback, and crash-recovery capabilities to protect user data. See Section 15.2, “InnoDB and the ACID Model” for more information.
Row-level locking and Oracle-style consistent reads increase multi-user concurrency and performance. See Section 15.7, “InnoDB Locking and Transaction Model” for more information.
InnoDB tables arrange your data on disk to optimize queries based on primary keys. Each InnoDB table has a primary key index called the clustered index that organizes the data to minimize I/O for primary key lookups. See Section 15.6.2.1, “Clustered and Secondary Indexes” for more information.
To maintain data integrity, InnoDB supports FOREIGN KEY constraints. With foreign keys, inserts, updates, and deletes are checked to ensure they do not result in inconsistencies across different tables. See Section 15.6.1.5, “InnoDB and FOREIGN KEY Constraints” for more information.

Transactional Storage of Dictionary Data

The data dictionary schema stores dictionary data in transactional (InnoDB) tables. Data dictionary tables are located in the mysql database together with non-data dictionary system tables.

Data dictionary tables are created in a single InnoDB tablespace named mysql.ibd, which resides in the MySQL data directory. The mysql.ibd tablespace file must reside in the MySQL data directory and its name cannot be modified or used by another tablespace.

Dictionary data is protected by the same commit, rollback, and crash-recovery capabilities that protect user data that is stored in InnoDB tables.

Removal of File-based Metadata Storage

In previous MySQL releases, dictionary data was partially stored in metadata files. Issues with file-based metadata storage included expensive file scans, susceptibility to file system-related bugs, complex code for handling of replication and crash recovery failure states, and a lack of extensibility that made it difficult to add metadata for new features and relational objects.

The metadata files listed below are removed from MySQL. Unless otherwise noted, data previously stored in metadata files is now stored in data dictionary tables.

.frm files: Table metadata files. With the removal of .frm files:
- The 64KB table definition size limit imposed by the .frm file structure is removed.
- The INFORMATION_SCHEMA.TABLES VERSION column reports a hardcoded value of 10, which is the last .frm file version used in MySQL 5.7.
.par files: Partition definition files. InnoDB stopped using partition definition files in MySQL 5.7 with the introduction of native partitioning support for InnoDB tables.
.TRN files: Trigger namespace files.
.TRG files: Trigger parameter files.
.isl files: InnoDB Symbolic Link files containing the location of file-per-table tablespace files created outside of the data directory.
db.opt files: Database configuration files. These files, one per database directory, contained database default character set attributes.

Serialized Dictionary Information (SDI)

In addition to storing metadata about database objects in the data dictionary, MySQL stores it in serialized form. This data is referred to as serialized dictionary information (SDI). InnoDB stores SDI data within its tablespace files. NDBCLUSTER stores SDI data in the NDB dictionary. Other storage engines store SDI data in .sdi files that are created for a given table in the table's database directory. SDI data is generated in a compact JSON format.

Serialized dictionary information (SDI) is present in all InnoDB tablespace files except for temporary tablespace and undo tablespace files. SDI records in an InnoDB tablespace file only describe table and tablespace objects contained within the tablespace.

SDI data is updated by DDL operations on a table or CHECK TABLE FOR UPGRADE. SDI data is not updated when the MySQL server is upgraded to a new release or version.

The presence of SDI data provides metadata redundancy. For example, if the data dictionary becomes unavailable, object metadata can be extracted directly from InnoDB tablespace files using the ibd2sdi tool.

For InnoDB, an SDI record requires a single index page, which is 16KB in size by default. However, SDI data is compressed to reduce the storage footprint.

For partitioned InnoDB tables comprised of multiple tablespaces, SDI data is stored in the tablespace file of the first partition.

The MySQL server uses an internal API that is accessed during DDL operations to create and maintain SDI records.

The IMPORT TABLE statement imports MyISAM tables based on information contained in .sdi files.

Sep 22, 2021

Table Level Replication in MySQL on Windows OS

Hello guys,

Hope you are going great.

In this article, we are going to talk about configuring MySQL replication between two servers. Here there will be one master and one slave server. It’s simple setup there are basically 6 major steps which are below…

Step 1 -->

============================================================================

/*Take dump of the tables which we are planning to replicate*/

mysqldump -u mysqlbk -p --master-data=2 -f -vv DB_Name Table_1 Table_2 >D:\DB_Name_tablesdump.sql

Step 2 -->

===============================================================================

/*Once backup starts, please note-down master position from the dump file...*/

-- Position to start replication or point-in-time recovery from

-- CHANGE MASTER TO MASTER_LOG_FILE='CLTSQLPW01-bin.000040', MASTER_LOG_POS=146261;

Step 3 -->

===============================================================================

Restore ON SLAVE SERVER

Copy the dump file on slave and restore the dump. */

mysql -u mysqlbk -p DB_Name -vv < D:\DB_Name_tablesdump.sql

Step 4

===============================================================================

/* Slave user creation. Create the user on master server */

mysql> create user 'clt_exslave'@'10.10.19.13' IDENTIFIED by "**********";

mysql> Grant replication slave on DB_Name.* to 'clt_exslave'@'10.10.19.13';

mysql> flush PRIVILEGES;

Step 5 -->

===============================================================================

/*Change my.cnf in SLAVE server with the listed tables.*/

#Start tables for replication

replicate-do-table = DB_Name.Table_1

replicate-do-table = DB_Name.Table_2

--Note for sample we have copied only two table here.

default-authentication-plugin=mysql_native_password

server-id=100 /*make sure server id must be different on both the servers.*/

server-id=101 /*make sure server id must be different on both the servers.*/

Restart MySQL on SLAVE Server.

Step 6 -->

===============================================================================

/*Run below command on SLAVE server*/

mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST='10.10.19.5',SOURCE_USER='clt_exslave',SOURCE_PASSWORD='********',SOURCE_LOG_FILE='CLTSQLPW01-bin.000040',SOURCE_LOG_POS=146261,SOURCE_PORT=3306;

mysql> START REPLICA;

mysql> SHOW REPLICA STATUS\G

Its done!!!!!

===============================================================================

Sep 7, 2021

InnoDB Architecture

Buffer Pool

In MySQL, buffer pool is an area in main memory where InnoDB caches table and index data as it is accessed. The buffer pool permits frequently used data to be accessed directly from memory, which speeds up processing. On dedicated servers, up to 80% of physical memory is often assigned to the buffer pool.

For efficiency MySQL use Least Recently Used (LRU) algorithm in order to manage new pages.

Buffer Pool LRU Algorithm

The buffer pool is managed as a list using a variation of the LRU algorithm. When room is needed to add a new page to the buffer pool, the least recently used page is evicted and a new page is added to the middle of the list. This midpoint insertion strategy treats the list as two sub-lists:

At the head, a sub-list of new (“young”) pages that were accessed recently
At the tail, a sub-list of old pages that were accessed less recently

The algorithm keeps frequently used pages in the new sub-list. The old sub-list contains less frequently used pages; these pages are candidates for eviction.

By default, the algorithm operates as follows:

3/8 (37%) of the buffer pool is devoted to the old sub-list.
The midpoint of the list is the boundary where the tail of the new sub-list meets the head of the old sub-list.
When InnoDB reads a page into the buffer pool, it initially inserts it at the midpoint (the head of the old sub-list). A page can be read because it is required for a user-initiated operation such as an SQL query, or as part of a read-ahead operation performed automatically by InnoDB.
Accessing a page in the old sub-list makes it “young”, moving it to the head of the new sub-list.
As the database operates, pages in the buffer pool that are not accessed “age” by moving toward the tail of the list.

Monitoring the Buffer Pool Using the InnoDB Standard Monitor

InnoDB Standard Monitor output, which can be accessed using SHOW ENGINE INNODB STATUS, provides metrics regarding operation of the buffer pool. Buffer pool metrics are located in the BUFFER POOL AND MEMORY section of InnoDB Standard Monitor output:

---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 2198863872 Dictionary memory allocated 776332 Buffer pool size 131072 Free buffers 124908 Database pages 5720 Old database pages 2071 Modified db pages 910 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 4, not young 0 0.10 youngs/s, 0.00 non-youngs/s Pages read 197, created 5523, written 5060 0.00 reads/s, 190.89 creates/s, 244.94 writes/s Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 5720, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0]

InnoDB Buffer Pool Metrics