I must admit MySQL replication is something I've never felt too comfortable with - in most of my positions, I've had the luxury of working with a full time DBA who would handle all database related work. In my current workplace we have three major pairs of database machines, and have been going through upgrading them all to Percona MySQL 5.5. As you'd expect data integrity is of the highest importance, so discovering this Percona Table Checksum tool is a real life saver, providing an amazing tool for verifying and fixing any drift or problems with MySQL slaves.
I can't take any credit for these instructions or the trial and error in assembling them, as they were penned by my workmate, the awesome Trystan Leftwich - these are his notes for use at our place, with some additional clarifications from myself from working through them.
First things first, grab the Percona Toolkit and install.
Now on the master DB do the following:
create database BLAH;
This will be the database you store your checksums, so something like pt_checksums will do.
Now on the master as the mysql user, run
pt-table-checksum --create-replicate --replicate [db_name].[table_name] --databases [comma_separated_list_of_databases_you_want_to_check] --empty-replicate-table --chunk-size=5000 localhost
Where [db_name].[table_name] is the database you created before, and a table name you will be able to remember.
EG pt_checksums.myimportanttables_checksums
(If you get a “can not connect to host: blah, this is ok, ignore)
Now, when this is complete, go to the slave DB. (ensure replication is up to date - if you have errors, just skip them to get it up to date)
Then run the following
connect [db_you_created_above];
select * from [table_name_you_created_above] where this_crc != master_crc;
If this returns an empty set, Then your DB is in sync - go straight to Go, collect $200.
If not you will have to try and sync it -
Create a user with the following permissions (pretty much everything) (Also it may not need all of these, but couldn't find what exactly it needed)
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT
You can create with:
create user 'pt_checksum_maint'@'%' identified by 'blah';
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT on *.* to 'pt_checksum_maint'@'%';
Then, still on the master, run the following command
pt-table-sync --execute --replicate [db_name].[table_name] master_db_ip/hostname --user user_you_created_above --ask-pass --no-foreign-key-checks
(At first I assumed this would be run on the slave to fix it up, however the man page for pt-table-checksum explains:
it always makes the changes on the replication master, never the replication slave directly. This is in general the only safe way to bring a replica back in sync with its master; changes to the replica are usually the source of the problems in the first place. However, the changes it makes on the master should be no-op changes that set the data to their current values, and actually affect only the replica.
)
Once this table sync has been run, re-run the pt-table-checksum command, then verify your results on the slave - should be good .