Goodbye, server configuration woes. Hello, Puppet.

Another dev is joining your web app project's team. How will you configure their system/VM to be ready for contributing? How will you keep your team's dev server configurations in sync? Or, the web app you are cranking on is ready to be setup in production. You have a new, clean server awaiting your command. How will you deploy?

The Marco Polo Method

SSH in to your new server. You know you need a web server and Ruby or PHP so you sudo apt-get install apache php, or whatever. You copy over an Apache configuration file and enable it as a site. Restart Apache. Marco: directory does not exist. Polo: mkdir /www/awesome-app. Restart Apache. Copy files in. Open up browser. Marco: function pdo mysql driver does not exist. Polo: sudo apt-get install php-pdo-mysql (or whatever). Refresh. Marco: cannot connect to localhost, connection refused. Whoops, forgot to install MySQL. Polo: sudo apt-get install mysql. Eventually, hopefully, it all works.

Deploy by error-Google-fix loop. Everyone has been down this road before.  The Marco Polo setup is painful, error prone, and time consuming. It's a necessary evil when you are new to configuring systems or rarely have to do it. The skills you pick up here, though, are fundamental to being successful in evolving to shell scripts and beyond.

The Scripted Method

At some point it becomes painfully obvious setting up a system is just running a series of shell commands. What better tool to run a series of commands than a shell script? None, right?

Scripting a complete system setup turns out to be painful, too. Scripts are fragile. All it takes is one remote server being down in the middle of a script and dependent steps will fail. So you try running the script again and the steps that worked are rerun. Source downloads again, recompiles happen, directories already exist, and so on. So you open up the script and either comment out the steps that worked or start wrapping steps in conditionals checking for criteria to be met.

Shell scripts are an imperative approach to mutating a system's state. Your commands describe how to reach a desired state, not what the desired state actually is. By describing steps and not final state the scripted method's sophistication and robustness are a product of time spent crafting the script and an author's ability. In practice, this often leads to scripted setups which are not idempotent; running setup scripts multiple times doesn't generally work without thought and effort. This becomes more troublesome as system setups evolve over the lifetime of a product. Commit logs become riddled with comments like "must run setup script to run and pass tests". We must be able to abstract away these pains, right?

The Declarative Method

Our ideal solution, thus, is automated, declarative, and idempotent. We need a language for describing a system's desired state and a repeatable means for the system to reach that state. Enter Puppet (and other sysadmin automation technologies like Chef).

Puppet is just that: a language for describing a system's desired state and software that can, in essence, diff your system's current state with your desired state and determine the series commands that need to be applied in order to reach that state. It was initially designed for sysadmins managing groups of servers. It's valuable for small custom development projects and single-server deploys, too. For the purposes of this article, we'll only demonstrate its single machine facilities.

Puppet exists in most package manager repositories. On Ubuntu, for example, sudo apt-get install puppet will get you there.

Once Puppet is installed all we need to do is setup a manifest file and run puppet apply. A .pp manifest file uses Puppet's domain-specific language for describing our system's desired state. Let's walk through a simple static HTML nginx server system manifest in Puppet.

# demo.pp
package { "nginx":
    ensure => installed

service { "nginx":
    require => Package["nginx"],
    ensure => running,
    enable => true

Our demo.pp manifest file opens by declaring the 'nginx' package should be installed and setup as a service. The service requires the package as a dependency and is ensured to be in a running state. Also, it is 'enabled' so that it will start automatically on system boot up.

We can save and apply this puppet manifest file by running sudo puppet apply demo.pp and testing with curl localhost. You should see a dump of nginx's default HTML message. Let's kill the default nginx site with our puppet manifest.

# ...
# demo.pp - part 2 
file { "/etc/nginx/sites-enabled/default":
    require => Package["nginx"],
    ensure  => absent,
    notify  => Service["nginx"]

The symbolic link enabling the nginx default site will be made absent by Puppet. If reaching this state requires action, as in the symlink being deleted, Puppet will notify the nginx service to reload its configuration automatically. Run sudo puppet apply demo.pp && curl localhost and it will report nginx is no longer listening; no site is configured. Try running sudo puppet apply demo.pp once more. No actions are taken because the system is already in the desired state. Cool, huh?

Let's setup a new site with a static HTML index file. We'll throw the site in the /www directory.

# ...
# demo.pp - part 3
file { "/www":
    ensure => "directory"
file { "/www/index.html":
    require => File["/www"],
    ensure => "file",
    content => "<!DOCTYPE html>
        Hello, world.

Run sudo puppet apply demo.pp && ls /www once more to see that our latest additions to the manifest file are setting up the /www directory and then, once it is ready, ensuring the file index.html is placed in the directory with specific contents. Our last step is setting up the nginx configuration for the site in our demo.pp manifest file.

# ...
# demo.pp - part 4
file { "/etc/nginx/sites-available/puppet-demo":
    require => [
    ensure => "file",
    content => 
        "server {
            listen 80 default_server;
            server_name _;
            location / { root /www; }
    notify => Service["nginx"]
file { "/etc/nginx/sites-enabled/puppet-demo":
    require => File["/etc/nginx/sites-available/puppet-demo"],
    ensure => "link",
    target => "/etc/nginx/sites-available/puppet-demo",
    notify => Service["nginx"]

We've declared an nginx configuration file for our site should be in sites-available and that a symbolic link pointing to that configuration file be placed in sites-enabled to turn the site on. If changes are made to either of these resources nginx will automatically reload its configuration. Run sudo puppet apply demo.pp && curl localhost and our wonderful hello world HTML file is displayed.

Let's have a little fun showing off the value of Puppet's declarative method. Try turning nginx off with sudo service nginx stop. Rerun sudo puppet apply demo.pp. "Service[nginx] changed from 'stopped' to 'running'". It compared the desired system state, where nginx should be running, with the current system state, and made only the change necessary. We could delete the /www directory, uninstall nginx, and so on, and a simple run of puppet apply will bring us back to our desired state. Same is true on a brand new server.

To be suitable for a gentle introduction to Puppet I have avoided discussing any detail of its language. It is robust. It has variables, units of abstraction and organization, the ability to specify file content with ERB templates, and a great deal more capability than what you have seen here. The practice of putting an entire system definition in a single manifest file is a practice you will quickly outgrow; it is great for learning, though.

Stop doing the work. Start describing the target.

Systems can be setup using a variety of methods. From manually pounding away in a shell, to automating repeated installs with shell scripts, to declaring a desired system state in Puppet and letting it figure out what, if anything, needs to be done to get you there. Automating system setups with a declarative configuration tool like Puppet or Chef will change the way you approach both sysops/syadmin work as well as development. With your Puppet manifests stored alongside your projects in source control it is easier to evolve your system setup and your code without getting lost in configuration hell.

Puppet may have been designed for the sysops folks managing farms of servers, but it is also an incredibly valuable technology for smaller teams, single-server projects, and solo devs, who want to abstract away the pains of maintaining system configurations.