Joshua Schachter - Lessons Learned in Scaling and Building Social Systems

[Live notes and transcription from Web 2.0 Expo in NY (Follow along the other Expo Talks in RSS.) - 1:20PM-2:10PM EST 9/17/08]

Joshua Schachter is the creator of del.icio.us, creator of geoURL and co-creator of Memepool.

Built delicious in 2003, sold it to Yahoo! in 2005, and left Yahoo! just a few years ago. 4+ million users. 100s of millions of urls indexed. He was there through all of it and is going to be talking about the things he learned and "screwed up at everything there is to screw up at."

There are 3 Kinds of scale: technological, social, and personal. We're going to briefly go through the technical stuff. Partitioning / sharding, as discussed in Cal's talk in way more detail, is important. Delicious was originally a single database and was until its relaunch. Big pains and expensive hardware. Caching and memcached is looking good. Replicas give you significant performance if the data is in the order you need it. So replication of data in the ordering. Having your primary keys ordered correctly lead them to a 60x performance. Autoincrement will hurt you later, guarantees your data is on disk in wrong order, all sorts of pains. Put a proxy in front. Sloppiness. Common access pattern is to have a screen, make a modification, requery the data, and then resend the results. We're building a lot of these systems to be synchronous which is a huge performance hit. What you need is a queue not a database for processing messages asynchronously. Decouple interactive performance from the rest of the system. Huge win for Delicious. Now that we've got technical stuff out of the way...

Social scaling. Different features at different scales. In early systems there's almost no one there so they need to find each other, see what others are doing, and you want to push everyone together into the same room so that there's enough ability for them to spark a conversation. In very sparse systems someone will say "Hi" and no one will say "Hello" to them back. As a system grows a lot of features are in place to mitigate traffic. People can't find each other. Once you get a million users it's too big to be a community. In Delicious the features we added were tags and networks.

If you think about it like a market you want to minimize barriers to entry and minimize transaction costs. When Facebook launched apps it was very easy to get it and sketchily promote it. Now it's much harder to promote and to grow with the same virality.

There's 3 reasons a social application has value: utility, network effect, and revenue. When you build any given feature you have to speak to all three of these. In delicious the initial sell was there is utility in it for me. You need utility to bring in your first users. Eventually the network effect comes into play and the value becomes the other people in the system. Finally you want to make some money and the system has to be paid for. Do it in this order. You have to be careful when you structure things for revenue. If you start extracting cash before its become a reasonable size you're short circuiting the process and going to piss off users who aren't deriving enough value to stick around.

Any more utility added to applications make it less viral. Do you want it to be big? Do you want it to be good? You need a product to be self-marketing. When you have to log in to see anything at all you move on to the next thing. Transaction cost is incredibly high to sign-up and pass an e-mail. Have the app exhibit as much functionality as you probably can. Once they get to a place that actually requires a log-in then kick them to the registration. Keep it minimal.

Initial marketing vs. actual functionality. It was initially marketed as a way of keeping track of bookmarks. It was actually a people powered search engine. You don't need to make the initial sell on the actual functionality. You have to scratch a specific itch but it doesn't have to be the ultimate problem you are solving.

Provide ways for people to be continually brought into the system. Allow people to market themselves. For example in a blog you can show a widget with the number of saves someone performed on a post. You can make users feel really good about themselves and about their content and data by exposing them some usage information.

One huge driver of traffic for delicious was RSS. Half the traffic came from RSS. Important way to get people connected to your system and be reminded that they're the kind of person that can evangelize your product. Always be there.

Figure out the drivers for infection. For delicious it was the Firefox plugin. Fixed the gap in Firefox's lack-luster bookmarking. Are you an outlook plugin? An iPhone app? Facebook app? Figure out your vectors of infection. Use another system as a carrier but not completely underneath their umbrella.

People get hung up on the tiniest things. There was a checkbox for private. They worked very hard to come up with the way of phrasing private - chose "do not share". Vaguely negative option. Want users to think "no, no, no I'm a nice person I'm going to share."

Don't go too far and expose too much information. A long time ago you could see how many friends you had and how many followers you had and compare who it was that was following but not a friend. The system allowed me to get angry at two people. People got freaked out by people 'follow'ing them on delicious.

Wanted delicious to be a harmonious system. That's why there were no conversations. Didn't want people to come in and have religious wars. You have to be willing to deal with abuse, spam, porn, etc. His last year this got really bad.

Lengthen or destroy feedback loops. If someone gets ejected or banned they will be back and taught them what was wrong. If you can't cut this don't ban users. Maybe they get a lot of errors. In delicious we let them use the system but other people couldn't see their links. Don't let them bootstrap you because it will hurt you.

Pretty urls are important. It's prime advertising space. People will copy paste and link.

Delicious was one of the first sites to have a public API. This was always a huge driver of traffic and interest. Delicious ran on one machine in the first year and two machines in the second year and it was built by one guy (Josh) in his spare time. A lot of developers see an itch and they scratch it. The goal with the API was to build functionality on this site without having to reinvent it. People had built Firefox APIs on their own before he was able to release a public one himself. People filled their own gaps.

Scaling Yourself

When you build stuff, the first thing you do is the wrong thing. Don't spend a huge amount of time polishing what you're building. It's wrong. (Strong echoes with what Fried was saying.)  Be very careful with your ideas, write them down, it's important for patent work and in order to go back. Keep notes you can go back and look at.

Listen to your users (this is the opposite of scaling). Up until a year before he left he read every single incoming customer request and customer question. Listen to your users. Delicious was a conceptually difficult system for some folks. It's not just the loud users and blog about how you suck, its the ones making solid points. You can't take what they are saying at face value and respond directly. Users love it when you respond to them directly. Every week tally up the requests you get into bins and figure out what you can do to fix the biggest problems now.

Understand your users motivations for using the systems whether they're there for utility or to meet people. Also figure out your own intentions of building something good or something that makes money. Think aobut where you're going at any point in time.

User testing is very difficult to do in a structured way. Your understanding is very biased having built the thing. If you sit users down and ask them to read the task and then act it out then they do it. In the real world they skip the text and fumble. Figure out how to allow them to act as they naturally would. Go to a starbucks and offer to buy people a coffee in trade for using your site. Have all staff be behind the glass wall and watch what's going on.

Measure and record stats about users and their interactions with your site. You need to know how people are using your site, how many people are using the site, how long they spend on task X, etc.

Shorten your feedback loops and make it so that you are always able to learn. Most essential thing you can do. Do something, learn from it, repeat and that's about it.

Remarks from questions:

- A quarter of all user passwords is either: 12345, 123456, or username+sitedomain

- A big problem became figuring out how people could reclaim their bookmarks when their emails were no longer valid (i.e. a student graduating from school).

- Firefox and RSS were the two primary drivers.

- Took a year to get the first 1000 customers. Shrink the feedback loop. When youre the customer it's real easy to know what sucks and change it.

- Interesting note: Josh was a quant on Wall Street for a while.

- What are you doing now? "Umm XBox 360, playing video games." Haha, awesome.