Other things being equal, the level of comfort a developer has with the underlying database defines their seniority level. Little database and little coding experience = junior developer; little database and good coding experience = mid-level developer; good database and good coding experience = senior developer. It’s a harsh reality that even devs with 6-8 years under their belt struggle to explain the intricacies of the query optimizer and prefer to look heavenwards when asked about database tuning. Why? Surprisingly, the reason isn’t laziness (although in some part it is). The point is that databases are a force of their own to contend with. Even traditionally, when there were only the relational types of databases to deal with, mastering them was a miracle and career path in its own; these days, we have so many types of databases that it’s just impossible to expect a single, mortal soul to master everything. That said, there’s a good chance that you’re still happy with relational databases or are part of a team that has a product running on a relational database satisfactorily for a long, long time. And in nine cases out of ten, you’re on MySQL (or MariaDB). For these cases, diving just a little deeper under the hood yields massive benefits in boosting application performance and is every bit worth learning. Curious? Let’s dive in! Not curious? Well, dive in anyway, because your career depends on it! 😛
Optimize the MySQL query cache
Almost all optimization in the field of computers comes down to caching. On one end, the CPU maintains several levels of cache to speed up its computations, and on the other, web apps make aggressive use of caching solutions like Redis to server precomputed results to users rather than hitting the database every time. But hey, even the poor MySQL database has its own query cache! That is, every time you query something, and the data is still stale, MySQL will serve up these cached results rather than running the query again, making the app ridiculously faster. You can check whether you have query cache available (note, available, not enabled) in your database by running this query in the database console: So, you can see that I’m running MariaDB and that I have query caching available to be turned on. It’s extremely unlikely that you’d have it turned off if you’re using a standard MySQL install. Now let’s see if I have the query cache actually turned on: Yes, I do. But in case you don’t, you can turn it on by saying: Interestingly, this variable also accepts a third value that denotes “on-demand,” meaning MySQL will cache only those queries we tell it to, but we won’t go into that here. With this, you have query caching on and have taken the first step towards a more robust MySQL setup! I say the first step because while turning it on is a major improvement, we do need to tune query caching to suit our setup. So let’s learn to do that. The other variable of interest here is query_cache_size, whose function is self-explanatory: So, I have a query cache of around 16 MB size. Note that even if query caching is turned on, but this size is zero, caching is effectively off. That’s why checking only one variable isn’t enough. Now, you should set a query cache size, but how much should it be? First off, please note that the query caching feature will itself need 4 KB to store its metadata, so whatever you select should be above that. Let’s say you set the query cache size to be 500 KB: MariaDB [(none)]> SET GLOBAL query_cache_size = 500000; Is doing this much enough? Well, no, because how the query engine will actually end up performing depends on a couple more things:
First off, the query_cache_size variable must be large enough to hold your the result of your queries. If it’s too small, nothing will be cached.Secondly, if query_cache_size is set to too high a number, there will be two types of problems: 1) The engine will have to do extra work storing and locating query results in this massive memory area. 2) If most of the queries result in much smaller sizes, the cache will get fragmented, and the benefits of using a cache will get lost.
How do you know the cache is getting fragmented? Check the total number of blocks in the cache like this: If the number is very high, the cache is fragmented and needs to be flushed. So, to avoid these problems, make sure that the size of query_cache_size is chosen wisely. If you feel frustrated that I didn’t leave you with a concrete number here, I’m afraid that’s the way things are once you move past development and step into engineering. You must look into the app you’re running and see what the query sizes for the important query results are and then set this number. And even then you might end up making a mistake. 🙂
Threading, thread pools, waiting, and timeouts
This is probably the most interesting part of how MySQL works and getting it right means making your app several times faster! Threading MySQL is a multi-threaded server. That means, every time there’s a new connection to the MySQL server, it opens a new thread with the connection data and passes off a handle to it to the client (Read this article if you’re wondering what is a thread). The client then sends all queries over this thread and receives results. This leads us to ask a natural question: how many threads can MySQL spin up? The answer lies in the next section. Thread pool No program in a computer system can open as many threads as it wants. The reason is twofold: 1) Threads cost memory (RAM), and the operating system just won’t allow you to go berserk and eat up all of it. 2) Managing, say, a million threads is a massive task on its own, and if the MySQL server could create that many threads, it would die trying to deal with the overhead. To avoid these problems, MySQL comes with a thread pool — a fixed number of threads that are part of a pool at the start. New connection requests cause MySQL to pick up one of these threads and return the connection data, and if all the threads are used up, new connections are naturally refused. Let’s see how large the thread pool is: So, my machine allows a maximum of four connections at the same time. It’s interesting to note that the number 4 comes from the fact that I have a four-core processor, which means my computer can run only 4 parallel tasks at a time (I’m talking of truly parallel tasks here, not concurrent ones). Ideally, this is the limit that should drive the value of thread_pool_size, but on beefier machines increasing it does benefit to a point. If you want not to make all new connections wait and are okay to take some performance hit (again, this is an area that you can judge best based on your app’s performance under load), bumping it up to 8 might be a good idea. However, setting it beyond 16 is a terrible idea unless you have a 32-core machine, as performance degrades significantly. The rabbit hole of thread pools in MySQL goes deep, but if you’re interested, here’s a more detailed discussion.
Waiting and timeouts
Once a thread has been created and attached to a client, it’d be a waste of resources if the client sent no queries for the next few seconds (or minutes). As a result, MySQL terminates a connection after a period of inactivity. This is controlled by the wait_timeout variable: The resulting value is in seconds. So yes, by default MySQL is set to wait 8+ hours before it severs the cord! This may be good if you have long-running queries and actually want to wait for them (but even then, eight hours is absurd!) but terrible in most cases. When a query is run, this value is set to 0 (meaning forever), but generally, this should be set to a very low value (5 seconds, for example, or maybe even less) to free up the connection for other processes.
Tuning temporary tables
Let’s begin with what are temporary tables in MySQL. Suppose we have a MySQL that structurally looks like this: TABLE A UNION (TABLE B INNER JOIN C). That is, we’re interested in joining tables B and C, and then performing a union of the result with table A. Now, MySQL would first proceed to join the tables B and C, but before it can perform a union, it needs to store this data somewhere. This is where temporary tables come in — MySQL uses them to store data at intermediate stages in complex queries temporarily, and once the query is over, this temporary table gets discarded. Now the question is: why should we bother with all this? Simply because the temporary table, just a query result, is data that is being used by MySQL in computation, and the speed of its access (among other limitations) will determine how fast the query gets executed. For instance, storing the temporary table in RAM will be several times faster than storing it on the disk. There are two variables that control this behavior: The first one, max_heap_table_size, tells us how much RAM can be used up by a MySQL table (“heap” here refers here to the data structure used in RAM allocation and management — read more here), while the second one, tmp_table_size, shows what the maximum size of the temporary table is. In my case, both are set to 16 MB, though the point I’m trying to make it that increasing only tmp_table_size will not work as overall, MySQL would still be limited by max_table_heap_size. Now comes the point: if the temporary tables being created are larger than the limit allowed by these variables, MySQL would be forced to write them to the hard disk, resulting in extremely poor performance. Our job now is simple: do our best to guess the most accurate data size for temporary tables and tweak these variables to that limit. However, I’d like to caution against absurdity: setting this limit to 16 GB (assuming you have this much RAM) when most of your temporary tables are less than 24 MB in size is foolishness — you’re simply wasting RAM that could’ve been used by other queries or parts of the system (cache, for example). Conclusion It’s not possible to cover all the system variables in one article, or even all the important ones in one article when the MySQL documentation itself spans several thousand words. While we covered some universal variables here, I’d encourage you to look into the system variables for the engine you’re using (InnoDB or MyISAM). My most desirable outcome for writing this article is for you to take away three things: What’s your favorite system variable for tuning? 🙂