Sandy Jen - Scaling Synchronous Web Apps: Lessons Learned from Meebo

[Live from Web 2.0 Expo 9/16 - 9/19 Follow along the other Expo Talks in RSS.]

Sandy is a co-founder of meebo in Mountain View. She majored in Computer Science at Stanford. Sandy is the 'Server Chick'.

Things to keep in mind about scalability: what works for someone else won’t necessarily work for you. You know the most about your stuff. Don’t hire consultants. You built your app. Don’t get married to a technology but don’t be a total flirt. It is a very high cost to rip out your guts and start over again. Remember that this is supposed to be fun. You’re building a product. You always have a customer who will be happy that you’re building this thing for them.

Synchronous web applications are very different from asynchronous web applications. Traditionally “async” implies “more complex”. On the web it’s opposite because the browser is meant to be async. If you’re going to build a synchronous app you’re probably taking something that used to be on the desktop to the web. That’s where all the problems of scaling come into. Meebo is a good example of synchronous on the web, so is Gmail, many games out there.

Doing synchronous on the web is like trying to fill a square hole with a round peg. The hole is that there are a lot of platforms (OS, browsers, etc). When we test a release we test on all the OSes and the browsers and safari and now chrome. Spotty network connections are not going to be 100% stable, still people using dial-up. The limitation of only being able to have 2 open http requests allowed imposes a serious constraint. It’s hard to measure how successful a synchronous app is based on traditional page view metrics. Alexa doesn’t pick up how much traffic Meebo actually gets in relation to other page view based sites.

The peg, the thing you’re shoving into the hole, is the need for instantaneous data transfer. Long polling is challenging because you’re using resources on both the client and the server. This is making the browser do more work. The user experience needs to be seamless and feel fast, light, and feel better than the desktop equivalent.

What is synchronous? What has to be synchronous? What doesn’t have to be synchronous? The more you try to dump into synchronous the more trouble you’re going to have trying to scale it. Sometimes you cheat in order to create the seamless user experience.

Find the right holes for your pegs: don’t underestimate server side architecture! Type of app determines the type of synchronous scaling. Bottlenecks can be anywhere: memory, CPU, bandwidth, storage, disk i/o. Based on the type of app you’re building it will be in different places. You won’t know where all the bottlenecks are until you let it loose. With Meebo it went from one to another to another. We solved the memory problem a while ago and it’s come back since then.

Things that people say are great, synchronous helpers: long polling (COMET) connections without having to poll every 5 seconds. Meebo started with Apache and it wasn’t good for us, so lighttpd was a much better fit. When it comes to compiled vs. interpreted languages it goes either way. We use C but it’s kind of a bitch to hire for because there aren’t many people doing it any more. Databases can be really expensive or really cheap. Start simple and if you need to get more complicated do it when you need to. Memcache is great, we’ll talk about it more later. Load balancers, finally, are just really expensive and you have to buy in pairs.

Simple is better unless you’re rich. First question: what am I using it for? Am I using memcache because everyone else is? We tried it at Meebo but it turns out most of our data wasn’t cacheable. What I gaining? Scalability at the cost of maintainability? Can I use DNS round-robin instead of load balancers?  FastCGI vs. web modules vs. PHP? When we first started we didn’t want to reinvent the wheel. We started out really simple with CGI written in PHP. We wound up just writing a module directly into the web server and that’s what we’re still doing today. Start out with something simple, see if it works, evolve. Do I need to save state? Is it persistent? Can I store it in a cookie? Meebo didn’t have user accounts for a year. Launching feedback light is not a bad thing.

There’s a constant tug of war between the front-end and the back-end. Whose bug is it anyway? You have to figure out where the workload makes sense. The browser can be really slow. Most of Meebos users use IE. Say you’re using a web request and you pass a lot of data down to the client to process it can really bog down the user experience. Pick one, release it, and ask if it’s slower or faster than the last release. Your users know more about your product than you do. Listen. Efficiency with data transfer: when we first started out I picked variable names with single letters to save bandwidth. Once we started hiring it was confusing.

Must find a balance between good enough vs. perfect. Perfection is enough simplicity in the system to allow for adaptation. Users don’t care how clever you are, they just want their product to work. Long polling isn’t perfect, browsers have quirks. Sometimes perfect is not good enough (look at Ruby!). Release enough and things will asymptotically approach perfection. Don’t be afraid to try things.

Think ahead but don’t think ahead too much. A great example of this is security. You can spend a long time trying to fix security holes but if your product never ships, who cares? Over designed code is hard to roll back from. Hacky code can work and not be so bad. When you first build you won’t know where you’re going to need to scale so over-thinking the problem is a waste. It’s all about balance.

Nothing simulates real life. Have contingency plans on both front-end and back-end. Don’t build flood gates, build dams: One time we rolled out a feature that took a huge amount of bandwidth and we were able to switch it off. When you roll out features be very transparent with your users and say “hey try this out, let us know what you think”, they’ll get a lot less upset when you have to roll it back.

Be a user of your own product. Don’t be afraid to break your own product. Stay in the loop of your community and stay in touch with the pulse. What is your firefox/ie breakdown? 70% of Meebo’s users use IE. When we use when we use Meebo? IE.

It’s ok to be “Big Brother” in the sense of being aware of what’s going on. Monitor key areas but don’t go overboard on monitoring, you’ll learn to ignore your alerts. Ignoring what systems are telling you in feedback mechanisms are dangerous. Monitoring is being aware of how healthy your system is at any given point. Can I log in? What is our downtime percentage?

Final thoughts are that there are no magic solutions to scalability. It’s important for you to know your system like the back of your hand. Correlate effects to the changes you’ve made in your systems. Do not lose sight of your goal: why are you scaling? Finally, remember, everyone scales differently!

[ Follow the Feed for notes on talks from other web leaders & innovators at the Web 2.0 Expo in New York going on this week. ]