Let's say we have a connection called A to the database. Connection A tries to update a row. But, it receives a lock wait timeout error. That does not mean that connection A did anything wrong. It means that another connection, call it B, is also updating a row that connection A wants to update. But, connection B has an open transaction that has not been committed yet. So, MySQL won't let you update that row from connection A. Make sense?
The first mistake people may make is looking at the code that throws the error to find a solution. It is hardly ever the code that throws the error that is the problem. In our case, it was code that was doing a simple insert into a table. I had a look at our processing logs around the time that the errors were thrown and I found a job that was running during that time. I then looked for code in that job that updates the table that was locked. This was where the problem lied.
So, why does this happen? Well, there can be very legitimate reasons. There can also be very careless reasons. The genesis of this blog post was some code that appeared to be legitimate at first, but upon further inspection was careless. This is basically what the code did.
- Start Transaction on database1
- Clear out some old data from the table
- Select a bunch of data from database2.table
- Loop in PHP, updating each row in its own query to update one column
- Select a bunch of data from database2.other_table
- Loop in PHP, updating each row in its own query to update another column
- Commit database1
This code ran in about 20 minutes on the data set we had. It kept a transaction open the whole time. It appeared legit at first because you can't join the data as there are sums and counts going on that have a one to many relationship which would cause some duplication of the sums and counts. It also looks legit because you are having to pull data from one database into another. However, there is a solution for this. We need to stop pulling all this data into PHP land and let it stay on the server where it lives. So, I changed it to this.
- Create temp table on database2 to hold mydata
- Select data from database2.table into my temp table
- Select data from database2.other_table into my temp table
- Move my temp table using extended inserts via PHP from database2 to database1
- Start Transaction on database1
- Clear out some old data from the table
- Do a multi-table bulk update of my real table using the temp table
- Commit database1
This runs in 3 minutes and only requires a 90 second transaction lock. Our lock wait timeout on this server is 50 seconds though. However, we have a 3 time retry rule for any lock wait timeout in our DB code. So, this should allow for our current workload to be processed without any data loss.
So, why did this help so much? We are not moving data from MySQL to PHP over and over. This applies to any language, not just PHP. The extended inserts for moving the temp table from one db to another really help. That is the fastest part of the whole thing. It moves about 2 million records from one to the other in about 1.5 seconds.
So, if you see a lock wait timeout, don't think you should sleep longer between retries. And don't dissect the code that is throwing the error. You have to dig in and find what else is running when it happens. Good luck.
Bonus: If you have memory issues in your application code, these techniques can help with those too.