Joe Stump - Scaling Digg and Other Web Applications[Live from Web 2.0 Expo 9/16 - 9/19 Follow along the other Expo Talks in RSS.]
Joe Stump is currently the Lead Architect for Digg where he spends his time partitioning data, creating internal services, and ensuring the code frameworks are in working order.
Digg by the numbers: 30,000,000 Ron Paul fans. 13,000 requests a second, bunches of servers.
“Web 2.0 sucks (for scaling).” Web 1.0 was easy where we had this landrush of just getting content on-line.
Web 2.0 somebody had a bright idea that we would turn content over to the users. The problem is people like creating a lot of shit. Web 1.0 was easy to scale because I only needed to worry about a could hundred thousand some records. Now we’ve got a lot more to worry about. Another thing I hate is AJAX which makes interacting with websites really easy. It gives users the ability to create shit even faster.
Making your PHP code 300% faster doesn’t matter, it’s not where your bottlenecks are. “PHP Doesn’t Scale” – Cal Henderson. PHP doesn’t scale, Java doesn’t scale, Ruby doesn’t scale – languages don’t scale. When you’re worrying about scale and storing 4 billion kitten photos: how you program it probably doesn’t matter.
What’s scaling? Scaling is specialization. As you get bigger and as you grow the solutions being sold to you by vendors won’t cut it. You have to cut your database into different pieces and make it very specialized and specific to your needs. We’re going to talk about some of the techniques we use at Digg. Scaling is also about severe hair loss. I’m not joking. I’m going bald. It’s tough. It’s not easy. You can’t do it alone.
Often people get confused with scaling out and scaling up. You get to a point where you can’t scale up anymore. You can’t just buy more expensive machines at some point. Everyone is scaling out right now with lots of crappy boxes. We expect to fail.
Your mom lied; don’t share. Decentralize, expect failures and just add boxes. Amazon is one of the best at this.
CAP Theorem says you can only pick two of the following three: strong Consistency, high Availability, Partition tolerance.
What are my options? Denormalize, eventually consistent, parallel, asynchronous, specialize.
Denormalization is necessary in partitioned solutions and it’s becoming a huge problem for Digg. If you’re not using queues and messaging systems you’re going to want to look into gearman and djabberd. You wonder why things are going slow and you realize you’re doing 5 synchronous trips to the database. You’ve got to make these calls async with either http calls or gearman. One thing Digg is big on is running the numbers before you try and fix a problem. Run the numbers to make sure things actually will work. We’ll discuss a case of this.
Memcached, OMG Files! (MogileFS) Digg uses for icons and photos, Gearman is a massively distributed fork, and the new favorite toy: MemcacheDB “Will be the biggest new kid on the block in scaling.” Initial tests on a laptop yielded 15,000 writes a second. The developer behind this took Berkley DB and Memcache and brought them together.
Caching techniques: cache forever and explicitly expire, have a chain of responsibility. We had a generic expiration time on all objects at Digg. The problem is we have a lot of users and a lot of users that are inactive. Chain-of-Responsibility pattern creates a chain: mysql, memcache, apc, PHP globals. You’re first going to hit globals, if it has it you’ll get it straight back, if not go to the next link in the chain, etc. Used at Facebook and Digg. If you’re caching fairly static content you can get away with a file based cache, if it’s something requested a bunch go with memcache, if it’s something like a topic in Digg we use apc.
Partition your data horizontally (rows a-f on one machine) and vertically (some columns on one table, some on another table). Horizontal when you have so much data you need to spread it across a lot of servers. Vertical scaling: Instead of altering tables, add a new table and add new columns to it, this avoids downtime. Abstract your data access so that the partitioned details are hidden from the user.
Green badges at Digg are the bane of Joe’s existence. Similar problem to what Twitter and Digg have. If you take a message from one place and drop it in a bunch of other buckets. Kevin rose has 40,000 followers. You can’t drop something into 40,000 buckets synchronously. 300,000 to 320,000 diggs a day. If the average person has 100 followers that’s 300,000,000 Diggs day. The most active Diggers are the most followed Diggers. The idea of averages skews way out. “Not going to be 300 queries per second, 3,000 queries per second. 7gb of storage per day. 5tb of data across 50 to 60 servers so MySQL wasn’t going to work for us. That’s where memcachedb comes in.” The recommendation engine is a custom graph database from the R&D department and is eventually consistent. An example of problems you run into at real big scale on a social website.