Managing two data centers
TweetHere is the problem. No one in our company has experience with this. And, there does not seem to be any resources on the internet talking about this. Our problems are not so much with managing the data between the two. The problem is failover and how to deal with one data center being out. Here are some of the ideas that have been thrown on to the wall.
Round Robin DNS
This was the first idea. It seems simple enough. We have two data centers. We publish different DNS for each data center and traffic goes to each one. The problem here is that it is, well, random.
Global Traffic Management
There are devices that "balance" traffic across multiple different locations. But, I am unsure how those deal with outages at one of the locations. It seems like there is still one point of failure.
BGP Routing
This is the biggest mystery to me. I know what it is. I know what it means. I have no idea how to deploy this type of solution. I understand that you can "move" your IP addresses with routing changes. But, that means running routers. Where are these routers? Does this happen at some provider? Is there a provider that handles this? Does that mean that all of our data centers are with one provider? I think one more peace of mind feature of this is that we would not be tied to just one vendor. So, if one vendor had major issues or there was some legal troubles (we lived through the dot come boom and bust) we would have security in knowing we had other equipment that was not affected.
Is there something else? Are we being way paranoid? Maybe it is not cost effective in the end. I/we have no idea really. Anyone out there that has knowledge on this subject?
Scott Larson Says:
First off, anycast is really not a good idea if you're going to be relying on sessions or some other transaction which needs to bounce off the same servers until completed since it is entirely possible that anycast could shift someone to a new set of machines in the process, thus breaking stuff. Of course if you don't do that then you could look at putting it to use this way.
If you're concerned about things like proximity to servers or data center load balancing, then GeoDNS or hardware solutions like F5's BigIP or the offerings of Coyote Point are something to look at. BigIP's 3DNS is the one I looked into the most myself. They're designed to be deployed in hot/cold pairs at each data center and then they talk to each other to keep up to date on service metrics. If a data center went offline they're supposed to handle taking it out of the loop until things are back to normal.
Another thing to keep in mind is database synchronization. Keeping db's in different parts of the country up to date with the same information is sketchy, because if the link between the locations breaks then your db's are out of whack. It may be that you can get away with some lagging of data replication and if that's the case I envy you.
I've been dealing with these sorts of issues myself lately and the challenge is simultaneously fun and exhausting.