Essential Knowledge of MySQL Database
How is a query processed?
Connector: Long connection, Short connection, Reset Connection, etc.
Cache: Optional for the query. However, the cache is not useful because the data might be updated frequently. It is now not supported in MySQL8.0
Analyzer: lexical analysis. Syntax analysis. Including checking whether a querying column exists or not.
Optimizer: Decide using which index. Start by reading which table while executing join
.
Executor: Authentication. Maintaining binglog. Interacting with Storage Engine.
Storage Engine:
- InnoDB
- MyISAM: doesn’t support Txn, row lock, redolog-based crash-safe
- Memory
How is an update operation executed?
Executor: binlog
Storage Engine: redolog, crash-safe
The two-phase committed of update: prepare redolog → update binlog → commit redolog
Process of restoring a temporary DB after misoperations such as accidental deletion: getting started from the recent backup DB → relay the binlog from the backup time to the time of accidental deletion → If the operations after the accidental deletion involves business, it needs to be reconciled with the business to recover.
Backup Period: daily, weekly, and monthly, etc
Determining which kind of Backup frequency to use is a trade-off between The Frequency of Backup and Recovery Time Objective(RTO).
The Isolation of Transaction
The property of Transaction: ACID (Atomicity, Consistency, Isolation, Durability)
Possible Pitfalls/Hazards: Dirty Read, Non-repeatable Read, Phantom Read
Isolation Level: RU, Read Uncommitted < RC, Read Committed < RR, Repeatable Read < SR, Serizable Read
Avoid the Long Txn: set the autocommit
as 1
to enable implicitly starting the Txn while executing a single command. For example, the start and end of a select
operation also means the start and end of the corresponding Txn.
Detect the Long Txn: select * from information_schema.innodb_trx where TIME_TO_SEC(timediff(now(),trx_started))>60
Under the hood:
undo log
, read-view(读视图), MVCC(Multiple Version Concurrent Control)- Scenarios:
- Repeatable Read: In a Txn, operations can read the data committed before the creation of
read-view
of this Txn. - Read Committed: In a Txn, operations can read the committed update before it.
- Special Case: Current Read
- All of the
DML
will read the current value first and then update/insert the value. - The same thing goes for
select
operation if it isselect .. lock in share mode
orselect ... for update
. They are acquiringread lock
orwrite lock
respectively.
- All of the
- Repeatable Read: In a Txn, operations can read the data committed before the creation of
- Principle:
- In Repeatable Read,
- The creation time of the
read-view
depends on the start time of Txn- Txns start with
start transaction with consistent snapshot
is regarded asstarted
immediately. - Or An operation is executed after the beginning of a Txn.
- Txns start with
snapshot
/read-view
is implemented based on theundo log
- Every Txn will maintain
- an array of txn_id, including all of the uncommitted txn_id.
- The lower bound of txn_id and the upper bound of txn_id determine the
read-view
range.
- Every Txn will maintain
- The txn_id is lesser than the lower bound of the current txn_id: already committed before the current Txn, therefore they are visible
- The txn_id higher than the upper bound of the current txn_id: is not committed, so they are not visible
- The txn_id in the range of [lower_bound, upper_bound]:
- If the txn_id is not in the array: it has been committed and is visible.
- If the txn_id is in the array: it has not been committed yet, non-visible.
- Every Txn will maintain
- The creation time of the
- In Repeatable Read,
- Scenarios:
- The time to delete the
undo log
: depends on the earliestread-view
, so try to avoid the long Txn
A previous real deadlock case in my work.
- Two Txn are trying to update the same order concurrently
- They almost get the
read lock
byselect * from order_fee_tab where order_id = 1
at the beginning at the same time. - There are lots of calculation/upstream requests which take time.
- The first Txn tries to execute
update order_fee_tab set fee_amount = amount_2 where order_id = 1
the value will be blocked because it is trying to wait until another Txn is still holding theread lock
to get thewrite lock
. - The second Txn tries to execute
update order_fee_tab set fee_amount = amount_3 where order_id = 1
the value will cause a deadlock because it is also trying to wait until another Txn is still holding theread lock
to get thewrite lock
.
More: The deadlock happens more frequently when the
DML
isinsert into order_fee_tab (order_id, fee_amount) values (2, 2)
because thewrite lock
of insert is more strict, including the gap lock, and next-key lock.
Index
Index: used to expedite the query process
Index struct: Hash Table, Array, Binary Balance Tree, B+ Tree
Hash Table:
- Pros: Easy to query and add Key-Value Data
- Use Linked List to solve hash coalition problem
- Cons: Hard to query a range of data, need to scan all data
- Applicable Scenario: 只有等值查询
Array:
- Pros: Easy to query data(binary query O(log(N))), easy to query a range of data
- Cons: Hard to insert/update data, the time cost is high to relocate/rearrange the data
- Applicable Scenario: Static Data Storage
Binary Balance Tree:
- Pros: Easy to query data(binary query O(log(N))), easy to query a range of data. The time complexity of an update/insert is O(log(N)) to keep the binary tree balanced.
- Cons: The time of query depends on the layers of the tree, normally 1M nodes of data will need a tree with 20 layers.
B+ Tree:
- Terms:
- N-Node Tree; If the N is 1200, there will be only 4 layers of tree to store 1.7B data.
1200^3
- Primary Key Index(Clustered Index)
- Key: Primary Key
- Value: the full columns
- Non-Primary Key Index(aka Normal Index, Non-Clustered Index)
- Key: Non-Primary Key
- Value: Primary Key → Lookup/Ref(回表)
- N-Node Tree; If the N is 1200, there will be only 4 layers of tree to store 1.7B data.
- Main Mechanisms:
- Multiple-Column Indexes
- leftmost prefix principle(最左匹配)
- MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on.
- [MySQL8.0 Manul](https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html#:~:text=For example%2C if you have a three-column index on (col1%2C col2%2C col3)%2C you have indexed search capabilities on (col1)%2C (col1%2C col2)%2C and (col1%2C col2%2C col3).) For example, if you have a three-column index on
(col1, col2, col3)
, you have indexed search capabilities on(col1)
,(col1, col2)
, and(col1, col2, col3)
.
- Index condition pushdown(索引下推) is introduced in MySQL 5.6, in order to reduce the lookup times by checking the value of the rest existing indexes in the tree.
- Best Practises:
- Use auto-increment for the primary index to avoid rearranging nodes(merge, split)
- Don’t use an unordered/big integer as the Primary Key to avoid the abuse of capacity for building a Non-Primary Index
- Better to use
- Reset the Primary Key Index Tree
- It will reset the Non-Primary Key Index Tree as well
- Reset the Normal Index Tree
- May impact the query request traffic
Others: Skip Table; LSM; etc
Lock
Global Lock
FTWRL(Flush Table with Read Lock).
- It will block all of the operations such as DML, DDL, and COMMIT of TCL. The table is only readable when the FTWRL is on.
- Scenario: Logical Backup of the Whole DB.
mysqldump -single-transaction
in Innodb engine- With
-single-transaction
, the backup can be done in a repeatable txn, in which a view is supported by MVCC of InnoDB. - Best choice to backup DB, because it will not affect all of the other operations.
- With
Tips: There is another command that can be used to backup DB:
set global readonly=true
- It will set the DB to readable only globally.
- The
FTWRL
is even better than this because- This
readonly
env variable is used to judge whether a DB is master/slave DB. - Disaster/Exception Recovery:
FTWRL
can be released if the client crashes but this command cannot.
- This
Table Lock
- Lock and Unlock Commands such as
lock tables t1 read, t2 write
- It is not recommended since InnoDB supports the Row Lock.
- MDL(Meta Data Lock)
- MDL-Read
- This lock will be used in reading.
- The lock will not be released before the Txn commit
- MDL-Write
- It will be captured when there is a DDL request
- If there is an MDL-Write in the MDL lock waiting queue, all of the rest MDL lock requests will be blocked.
- Best Practise:
- If there is a long txn with MDL-Read → a DDL is executed, which means the MDL-Write is in the waiting queue, the subsequent operations will be all blocked until the long Txn is completed.
- Therefore, to avoid the long blocking of operations, we can
- Avoid conducting long txn
- Use
wait n
params in DDL, in order to give up the waiting DML-Write. Can retry until it succeeds.
- MDL-Read
Row Lock
- Two-Phase Lock Protocol
- Try to acquire the lock when need it.
- Release lock when the Txn is completed or the sentence is completed.
- Deadlock and Deadlock Detection
- Strategy to process Deadlock
- Wait until timeout. This can be set by
innodb_lock_wait_timeout=Xs
- Detect the deadlock proactively and roll back one of the Txn in the deadlock queue.
innodb_deadlock_detect=on
- Wait until timeout. This can be set by
- Avoid Deadlock
- Ensure the deadlock doesn’t happen.
- Reduce the concurrency, that is, to control the QPS of update/insert in which acquire the same lock.
- Analyze the biz operations
- Introduce rate-limiter in proxy of different layer
- Sharing the original data into different rows/data
- Strategy to process Deadlock
Others
Common Conceptions
These SQL commands are mainly categorized into five categories link
- DDL – Data Definition Language
CREATE, ALTER, TRUNCATE, RENAME, DROP
- DML – Data Manipulation Language
INSERT, UPDATE, DELETE, CALL, EXPLAIN CALL, LOCK
- DQL – Data Query Language
SELECT
- DCL – Data Control Language
GRANT, REVOKE
- TCL – Transaction Control Language
COMMIT, SAVEPOINT, ROLLBACK, SET TRANSACTION, SET CONSTRAINT
Lock
- Read Lock, also known as Shared Lock or S Lock, is not conflicted with Read Lock
- Write Lock, also known as Exclusive Lock or X Lock, is an exclusive lock. It will block all of the other requests that use locks.
- DDL – Data Definition Language