One hundred and fifty to one
I’m in the rather unfortunate position of having to migrate a client’s accumulated data from an old version of MySQL to the latest, and from old hardware to new, in a hosted environment where my only access is via a SQL client, with minimal down time.
Let’s ignore stored procedures, views, triggers etc. Those can be reproduced easy enough. It’s the tables containing millions of rows that will eat up the most time[1]. If you do a cursory check of the manuals and guides you’ll find that the normal approach is to use mysqldump to create a (massive, compressed) SQL file filled with table creation and row insertion operations. You then pump that dump file through a MySQL connection to rebuild the entire database in another server.
To kick off the exercise, I used mysqldump to generate the compressed dump file. It took about as much time as it took me to make and drink a cup of coffee. Checking the head of the resulting dump I could see the usual preamble, a few embedded comments intended to be used by MySQL as hints to adjust some settings during processing, a table CREATE and then line-after-line of INSERT statements. It was near the end of my working day, and I had fired up a test box to receive the dumped data, so I kicked off a restore via a single DB connection and locked my PC for the evening.
It was still running when I logged in the following morning. In fact, it took until lunchtime to complete.
If this were the actual production migration, we would be in serious trouble. Timings like this would likely see the client opt for a migration during anticipated non-usage, such as New Year’s Eve! Not only would that put off the process for several months, it would ruin what should be a day off. This migration has to get down to just a few hours so that it can happen in the very early hours of the morning, in the next few weeks, when an outage of up to 4 hours could be accommodated.
There were several factors taken into consideration to determined the ultimate solution:
- There would be no other users connected to the old and new database servers during migration.
- The target server’s hardware can handle up to four concurrent threads of execution without breaking a sweat.
- Table indices are not needed during table restoration. (They can be added later.)
- There are several big tables, some with only a few columns, some with many.
Here is what I did:
- The dumped SQL is piped through a custom process that splits the dump into four.
- Each split replicates the header and footer from the original dump, but only includes a subset of the tables.
- The subsets are non-overlapping and chosen so that the total number of rows are spread evenly across the splits. (There’s some adjustment here to account for the fact that some table inserts are more intensive than others because of the width of the tables and other factors. It took a bit of benchmarking to work out the optimum.)
- The table creation is adapted to defer indexing and other changes appropriate for the migration[2].
- I’m using the “extended insert” option of mysqldump so that each INSERT statement contains data for multiple rows. These optimized inserts can be up to 67 million characters (max_allowed_packet) for my target server, in theory[3].
- The four partial dumps are restored in parallel to the target server. Were it not for the numerous restrictions, some of this could be achieved via MySQL shell’s parallel import. As that’s not available, the bespoke approach is needed.
The result:
The restore took just a few minutes. Yes, minutes. In fact, the performance improvement was 150:1
It looks like when the migration happens for real, I might just have time for two coffees :)
[1] Actually, no, we also have to consider the complete audit of the existing application software to ensure that all the old SQL usage was compatible with the new v8, and setting up a replica environment to run the entire suite of unit tests with the new database, while watching for slow queries and other gotchas. That took two weeks, but as a separate activity that has no impact on the expected migration down time.
[2] Interestingly, MySQL 8 has deprecated utf8 in favour of utf8mb3 and will automatically set this charset during table creation when you have specified utf8. So, while you would need to adjust the CREATE statements to change PKs and indices, you can leave the charsets alone.
[3] Using LOAD DATA would be even faster, were it not for the fact that I can’t access the underlying file system of the target server.
Categorised as: Technology