Why the change
With normal MySQL, configuration can make big performance differences. With cluster, it can make the cluster shut down. We woke up one morning to errors about REDO logs being overloaded. It had been overloaded for about 8 hours. We had made some changes the day before, but they all worked fine on our test cluster. So, we shut down the processes that were new and even shut off all other processes that were loading data into the servers. 4 hours later, the simplest insert would still complain about the REDO logs. The only thing that cleared it up was a full rolling restart of the storage nodes. That took 5 hours. Luckily, we were still operating with a single server as a backup that had the same data stored in InnoDB tables. We switched to it before we started the rolling restart. So, after all that was done, we really became worried about the ups and downs of cluster. We started thinking that maybe cluster was not ready for us and the use we had in mind.
We had tried replication years ago with not a lot of success. We found we had to baby sit it weekly as queries would fail on the slave that did not fail on the master or indexes would get corrupted on the slave. Just annoying things that would take up time. But, that was MySQL 4.0 and we were trying to replicate our main database. Faced with massive down time with cluster, an annoyance every few weeks seems like an acceptable risk.
The new setup
So, with the new structure, we have two masters and two slaves. We had already written code that would synchronously write data to two (or more) MySQL servers using transactions to ensure the data was written to all the hosts. Its what was used to write the data to the cluster and InnoDB servers already. So, now, it just writes data to the two masters using InnoDB and then the two slaves read from them. So far, there have been no outages whatsoever.
Scaling: Cluster vs. Replication
One of the big things we liked about MySQL Cluster was that it scaled very well. As connecitons increased, query speed and response time did not degrade at the same rate as connections. InnoDB on the other hand linearly degrades. Basically, an InnoDB server with 128 concurent connections returns data twice as slow as it does with 64 connections. Cluster on the other hand would only see about a 10% reduction in performance. But, with InnoDB and replication, we can more quickly add new nodes and scale out in an emergency. Once the hardware is in place, we could have new nodes online in about 30 minutes. If we hit a wall with cluster, it would take us a day or two to bring the server online after the hardware was in place. Adding storage nodes would mean a full dump and reload of the database. So, the solution we have now will not scale as well as cluster with the same equipment, but it will be easier to scale out once we need to.