For business development, the database is probably the storage component that you can’t get away from. our usual development process just stay on the surface to understand it, such as writing a query sql to get specific data to the front-end on the line, but an in-depth understanding of the principles behind it is also critical.
So now start with me to see a query SQL behind the specific process! I’ll start with a simple query SQL, assuming that a table named example has been created in advance with only one ID field.
select * from example where ID=10;
When we enter this SQL in the MYSQL client you can see the results of the query it returns, MySQL actually does a lot of internal work to deal with it. So let’s break it down and see what “parts” are involved in the process.
Broadly speaking, MySQL can be divided into two parts: the Server tier and the Storage Engine tier.
The Server layer includes connectors, query caches, analyzers, optimizers, executors, etc. It covers most of MySQL’s core service functionality, as well as all of the built-in functions (such as date, time, math, and cryptographic functions, etc.), and all of the functionality that spans the storage engine is implemented at this layer, such as stored procedures, triggers, views, and so on.
And the storage engine layer is responsible for data storage and extraction. Its architectural pattern is plug-in and supports multiple storage engines such as InnoDB, MyISAM, Memory and so on. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL version 5.5.5.
That is to say, when you execute create table to build a table, if you do not specify the engine type, the default is to use InnoDB. however, you can also choose another engine by specifying the type of storage engine, for example, use engine=memory in the create table statement to specify that you use the memory engine to create the table. Different storage engines have different ways of accessing table data and support different features.
It is easy to see from the figure that different storage engines share a Server layer, that is, the part from the connector to the executor. You can first have an impression of the name of each component, and then I will take you through the entire execution process in conjunction with the SQL statement mentioned at the beginning, and look at the role of each component in turn.
connectors
As a first step, you will first connect to this database, and this time you will be received by the connector. The connector is responsible for establishing a connection with the client, obtaining permissions, maintaining and managing the connection. The connection command is usually written like this:
mysql -h$ip -P$port -u$user -p
After you’ve entered the command, you’ll need to enter your password inside the interactive dialog. While it is possible to write your password on the command line directly after the -p, this may cause your password to be compromised. If you are connected to a production server, it is highly recommended that you do not do this.
The mysql connection command is a client-side utility used to establish a connection with the server. After the classic TCP handshake, the connector will start to authenticate you, using the username and password you entered.
- If the username or password is incorrect, you will receive an “Access denied for user” error and the client program will end.
- If the username and password are authenticated, the connector will look up the permissions you have in the permissions table. After that, the permission judgment logic inside this connection will depend on the permissions read at this point.
This means that after a user successfully establishes a connection, even if you make changes to that user’s permissions with the administrator account, it will not affect the permissions of the already existing connection. After the modification is complete, only newly created connections will use the new permission settings.
After the connection is complete, if you don’t follow up, the connection is in an idle state and you can show processlist
You can see it in the command. The image in the text is the result of show processlist, where the Command column shows the line “Sleep”, which means that there is now an idle connection in the system.
If the client doesn’t move for too long, the connector will automatically disconnect it. This time is determined by the parameter wait_timeout
The default value is 8 hours. If the client sends a request after the connection has been disconnected, it will receive an error alert: Lost connection to MySQL server during query. if you want to continue, you need to reconnect and execute the request again. In the database, a long connection means that after a successful connection, if the client continues to make requests, it will always use the same connection. A short connection is a connection that is disconnected after a few queries and re-established for the next query. The process of establishing a connection is usually complicated, so I suggest you minimize the action of establishing a connection, that is, try to use a long connection.
However, after using all long connections, you may find that there are times when MySQL’s memory usage goes up particularly fast. This is because the memory temporarily used by MySQL during execution is managed inside the connection object. This is because the memory that MySQL uses temporarily during execution is managed inside the connection object. These resources are only released when the connection is disconnected. So if the long connection accumulates, it may lead to too much memory consumption and be forced to kill by the system (OOM), from the phenomenon that MySQL restarts abnormally.
How to solve this problem? You can consider the following two options.
- Periodically disconnect long connections. After a period of use, or after a large memory-consuming query is judged to have been executed by the program, disconnect, and reconnect if you want to query afterward.
- If you’re using MySQL 5.7 or newer, you can do this after each relatively large operation by running the
mysql_reset_connection
to reinitialize the connection resources. This process does not require reconnecting and redoing privilege authentication, but it does restore the connection to the state it was in when it was first created.
query cache
Once the connection is established, you can execute the select statement. The execution logic comes to the second step: the query cache. when MySQL gets a query request, it first goes to the query cache to see if the statement has been executed before. the key is the query statement and its result may be cached directly in memory as a key-value pair. Previously executed statements and their results may be cached in memory as key-value pairs, with the key being the query statement and the value being the result of the query.
If your query can find the key directly in this cache, then the value is returned directly to the client. If the statement is not in the query cache, it continues to a later stage of execution. When execution is complete, the result is stored in the query cache. As you can see, if the query hits the cache, MySQL can return the result directly without performing the complex operations that follow, which can be very efficient.
But most of the time I would recommend that you don’t use query caching, why? Because query caching tends to do more harm than good.
The query cache expires so frequently that whenever there is an update to a table, all the query caches on that table are emptied. So it’s very likely that you’ve gone to the trouble of storing the results, and before you can use them, they’ll all be emptied by an update. For databases with high update pressure, the hit rate of the query cache will be very low. Unless your business just has a static table that only gets updated once in a very long time. For example, a system configuration table, then the queries on this table are only suitable for query caching.
The good news is that MySQL also provides this “on-demand” approach. You can set the parameter query_cache_type
set to DEMAND
THIS DOES NOT USE THE QUERY CACHE FOR ANY OF THE DEFAULT SQL STATEMENTS. FOR STATEMENTS THAT YOU ARE SURE YOU WANT TO USE THE QUERY CACHE, YOU CAN USE THE SQL_CACHE
Specify it explicitly, like the following statement:
select SQL_CACHE * from T where ID=10;
It should be noted that MySQL version 8.0 directly removed the whole piece of query caching functionality, which means that it is completely absent from 8.0 onwards.
analyzer
If you don’t hit the query cache, it’s time to start actually executing the statement. First, MySQL needs to know what you’re trying to do, so it needs to parse the SQL statement. The parser does a “lexical analysis” first. You’re typing a SQL statement that consists of strings and spaces, and MySQL needs to recognize what the strings are and what they represent. MySQL recognizes that this is a query statement from the keyword you typed, “select”. It also needs to recognize the string “T” as “table name T” and the string “ID” as “column ID”. “. After these identifications, it is necessary to do “syntactic analysis”. According to the result of lexical analysis, the syntax analyzer will be based on the syntax rules, to determine whether the SQL statement you entered to meet the MySQL syntax. If your statement is not correct, you will receive “You have an error in your SQL syntax” error alert, such as the following statement select less than the beginning of the letter “s”.
Usually grammatical errors indicate the first place where the error occurs, so you want to focus on what comes immediately after “use near”.
optimizer
After going through the parser, MySQL knows what you’re going to do. Before it starts executing, it has to go through the optimizer. The optimizer decides which index to use when there are multiple indexes inside a table, or the order in which tables are joined when there are multiple joins in a statement. For example, if you execute a statement such as the following, this statement performs a join of two tables:
select * from t1 join t2 using(ID) where t1.c=10 and t2.d=20;
- You can first take the ID value of the record c=10 from table t1, then relate it to table t2 based on the ID value, and then determine whether the value of d in t2 is equal to 20.
- You can also take out the ID value of the record with d=20 from table t2, and then associate it to t1 according to the ID value, and then determine whether the value of c in t1 is equal to 10.
The logical result of these two execution methods is the same, but the efficiency of execution will be different, and the role of the optimizer is to decide which option to choose to use. Once the optimizer phase is complete, the execution scheme for this statement is determined, and then the executor phase is entered.
actuators
MySQL knows what you want to do through the parser, knows what to do through the optimizer, and then enters the executor phase and starts executing the statement. When you start executing, you have to determine whether you have permission to execute queries on this table T. If not, it will return an error of no permission, as follows (in the engineering implementation, if you hit the query cache, you will do the permission verification when the query cache returns the results. The query also calls precheck to verify permissions before the optimizer).
mysql> select * from T where ID=10;
ERROR 1142 (42000): SELECT command denied to user 'b'@'localhost' for table 'T'
If it has permission, it opens the table to continue execution. When the table is opened, the executor goes on to use the interfaces provided by this engine, based on the table’s engine definition.
For example, in our example table t, the id field is not indexed, so the execution flow of the executor looks like this:
- The InnoDB engine interface is called to take the first row of the table and determine if the ID value is 10, if not then it is skipped, if it is then the row exists in the result set;
- The engine interface is called to fetch the “next row”, and the same judgment logic is repeated until the last row of the table is fetched.
- The executor returns to the client as a result set the set of records consisting of all the rows that satisfy the conditions of the above traversal process. At this point, the execution of this statement is complete.
For indexed tables, the execution logic is similar. The first call is to “fetch the first row that satisfies the condition” interface, and then loop to fetch “the next row that satisfies the condition” interface, which are already defined in the engine.
You’ll see in the database’s slow query log arows_examined
field that indicates how many rows were scanned during the execution of this statement. This value is what is accumulated each time the executor calls the engine to fetch rows of data.
In some scenarios, the executor is called once, while inside the engine multiple lines are scanned, so the number of lines scanned by the engine is similar to the number of lines scanned by the rows_examined
It’s not exactly the same.