Refactoring Cache Tables using Memoize

Have you ever written code that feels like this?

var byKey = function (key) {
	... do slow, expensive stuff ...
	return { value: “something expensive” };
};

And later optimized it to feel like this?

var lookup = {};
var byKey = function (key) {
    if ( lookup[key] === undefined ) {
        ... do slow, expensive stuff ...
        lookup[key] = { name: “something expensive” };
    }
    return lookup[key];
};

I certainly have. The lookup table is a trusty pattern for caching the results of expensive calls. With four additional lines of code you can dramatically improve the performance of an often called function. Sprinkle these lines around your most expensive, referentially transparent functions and you have a nicely optimized program, right?

With a functional language this pattern is considered harmful; an anti-pattern to refactor. It may be four lines of code, but it is redundant when used over and over. Your function’s intentions become conflated with performance concerns. It would be best to keep our intentions and optimizations separate.

Can we simply write “cache this function’s return values”? It would feel like this,

var byKey = cache( function(key) {
    ... do expensive stuff ...
    return { name: “something expensive” };
} );

That feels much better. Although, not knowing what the cache function actually does is a little off-putting. Let’s implement it.

If you are unfamiliar with anonymous functions and closures in JavaScript this is about to get slightly hairy. Understanding those concepts will make writing JavaScript much more interesting, and little blog posts like this more palatable. I suggest Crockford, but there are plenty of good on-line resources, too.

We need our cache function to take any function we throw at it and return a new function. The returned function must accomplish what our imperative lookup table code did: remember the arguments of previous invocations and cache their return values.

var cache = function( expensiveFn ) {
    var lookup = { };
    return function(key) {
        if(lookup[key] === undefined) {
            lookup[key] = expensiveFn(key);
        }
        return lookup[key];
    };
};

Let's walk through this. We have defined cache as a function that takes expensiveFn as a parameter. Within this function we setup the variable lookup to be our hash/cache table. Then we return a function that encapsulates the lookup table pattern used earlier. Do you see how the structure has been extracted and the state hidden within the closure?

In computer science this function is conventionally named memoize. It is often demonstrated using a recursive function like factorial. Factorial is a more beautiful application of memoization than basic key-value caching, but this is an anti-patterm you are more likely to find in real code.

An Exercise for the Reader

The most obvious shortcoming to cache, our implementation of memoize, is that it does not generalize to functions of more than one argument. If you have a few minutes, pop open your browser’s JavaScript console in Firebug or Webkit Inspector and try writing it. Hints: 1) The arguments keyword provides an array of all arguments a function is called with. 2) A function can be invoked with an array of arguments by using Function's apply method.

Refactor one-off caching via lookup tables with memoize

Plenty of libraries provide a generic memoize function, including underscore in JavaScript. The next time you catch yourself writing a naive in-memory cache lookup table around a function stop and consider memoizing it instead. Using memoize is faster to write and less bug-prone than a one-off cache, it separates intentional concerns from optimization concerns, and it keeps your code a little cleaner.