Late last year we were told that our current collocation facility was unable to provide us with additional resources (power/cooling). So, we decided at that point to begin looking for a new location to host the Plaxo application. This was a large undertaking for the Operations team and required concise timing, and a thorough understanding of how each of the components that make up the Plaxo back-end would respond.
After nearly 4 months of searching for a new location, negotiating details of the contract, procuring new hardware (racks/power management/etc…), securing a solid data migration specialist, and painstaking selection of what systems would be moved when, we were ready to start phase 1 of 2.
Then an interesting issue came up, we were moving some database servers, web servers, various others, but how would they communicate on the back-end with the other existing infrastructure? Well, Layer42 (one of our NSPs) came to the rescue, and was able to provide us with an amazing service, proving their excellent flexibility, and commitment to the customer. They set us up with a literal cross connect between the facilities, even setup some VLANs on this link, allowing those now moved systems to communicate across this link as if they were still on the same LAN. Sweet.
Plaxo designed our back-end infrastructure to not only scale horizontally, but to always have online replicas of critical data, ensuring that those replicas were physically separate systems. This early decision enabled us to move approximately 50% of the site (100+ systems) while the customer never noticed we were in a degraded state. After the first move was successfully completed, we migrated all the database services and caching service from the old location to the new location by simply promoting the secondary copy of each database to be the primary, and vice versa. This was done over 2 weeks, and laid the groundwork for phase 2.
Well, phase 2 was recently completed, and I’m pleased to say it went quite well. Not a single support request asking why the site was offline, or why their data was unavailable.
-- Ethan Erchinger
Operations Manager
When we started Plaxo we were running on a handful of Linux boxes. In fact, I think we had as many development machines as we did production servers. The initial roll out was 2 database servers, running MySQL, each with a couple IDE drives in them of around 80gigs a piece. That was about 3 and a half years ago.
Once we started growing we moved into Dell servers with their external storage, that was ummm...interesting. Let's just say that Dell should stick to what they are good at, making desktops. So we moved on.
Now we've grown to over 200 servers, which includes over 25 DB pairs (primary/secondary for redundancy), each capable of storing between 600gig to 1TB. We're still running MySQL, on InnoDB, but not much else is the same. We've installed a SAN environment built around storage from Pillar Data Systems, and a whole slew of servers from Penguin Computing , which run very nice Opteron chips from AMD. I won't go into more specifics as to what we have in place, but will say that it takes 6 full data center racks to hold just the storage, and no we aren't being paid by Pillar or Penguin to say this. :-)
Here's a glimpse at a bit of the storage:

Yeah, we know it's not a huge environment, and there are certainly larger environments in most any data center, but we thought it'd be interesting for our customers to understand what it really takes to run this service. Maybe it'll entice a few of you to show your appreciation and become premium users, hint hint.
Some of you have been watching the OPS Blog, which we keep updated with any issues that the service is experiencing. It's not a perfect service, but we try darn hard to provide a fast, reliable and secure experience, and hope you can bear with us when we do have little issues.
Finally, we're hiring in the OPS department if any of you talented Operations Engineer/Unix Admin types are looking for a challenge! See ya next time.
We had a minor glitch in our servers which is affecting some users when they try to login to Plaxo Online. Our customers said it wasn't so much a problem that the service was down, the problem was more that there wasn't a channel where we could communicate service-level issues with them.
So rather than pretend it's not happening (as some services are prone to do ;) ), we listened to our customers and created the Plaxo OPS blog. If you're a Plaxo user and want to keep up-to-date on what's going on with our service, just subscribe to that blog or check it every once in a while. We think this is a win-win situation, let us know what you think.
