Nodejitsu

Save time managing and deploying your node.js app

Infrastructure at Nodejitsu (Part 1) -- Deploying Applications

About the author

Name
Location
Worldwide
nodejitsu nodejitsu

Other popular posts

- Scaling Isomorphic Javascript Code - Keep a node.js server up with Forever - Package.json dependencies done right
- npm cheat sheet - 6 Must Have Node.js Modules
Sign up to our platform for free - get $20 usage

As mentioned in our most recent blog post, we recently rolled out some major changes to our infrastructure. We realized we needed to make these changes because of certain problems with our existing architecture.

We decided to take this opportunity to make this the first post in a series about how the infrastructure underlying the Nodejitsu platform works. Building a Platform-as-a-Service requires a lot of moving parts. From provisioning to configuration management, from logs to monitoring, from building snapshots to deploying them. It's a lot to keep track of! In this post we will discuss how we deploy applications at Nodejitsu. We will answer the question "how does my app get started" and how our new agentless architecture works and what problems it compared to our previous implementation.

Still interested? Keep reading!

Understanding the problem

On the surface the question of download, install and execute application code seems relatively simple:

  1. Fetch the code.
  2. Install it.
  3. Start the application.
  4. Send outbound metrics and logs for collection.

But underneath the covers there are a number of subtle issues that make this problem deceptively simple.

What is haibu?

Our previous architecture was based on haibu - an application server. The deployment process was pretty straightforward: our build server put the snapshot in CloudFiles, haibu fetched it and started the application.

There were three problems this approach. Let's examine them in more detail:

1. Long-running Agent

Having a long-running daemon is often a good thing (for example sshd), but for a PaaS they can cause significant headaches:

  • Excess memory: Daemons are fine when you have plenty of available memory, but when you're running inside a 256MB Zone or VM every megabyte counts. Haibu ended up using ~20MB of RAM on average which was about 10% of total RAM.
  • Upgrades: Because the daemon is running the application any upgrade to the daemon itself requires the application to be restarted. On the surface this seems OK, but overtime it caused a lot of inconsistencies in the underlying drones.

2. Calculus of node versions

Since haibu is built using node we often had to have multiple versions of node running on the same machine: the version of node haibu used and the version of node used by the application. Thus, to make it run new node versions, we had to upgrade haibu-carapace - our process jail. This meant that as features were added and removed as node evolved we had to implement a number of shims (around the .fork() API for example) to ensure they all work.

3. No metrics

We want our customers to have as much data about their application as possible. haibu didn't provide any way of easily plugging in a monitoring agent. If we wanted to continue using it, we'd have to add one more process to the existing stack (which meant yet more memory usage).

New architecture

Through our use of haibu we saw the deficiencies outlined above and sat down to carefully analyze what needed to change. We distilled our list of requirements to:

  • Backward compatible with running applications.
  • Low memory usage.
  • No daemons. All our processes should be as short-lived as possible.
  • Easy to upgrade.
  • Built-in monitoring, metrics, and logs.

The low memory requirement suggested using C for parts of the stack running on drone servers. It was based on:

This is what the new implementation looks like at a high-level:

Digging in you can see that we have removed haibu and replaced it with forza and solenoid:

Monitoring applications with forza

forza is a C application monitor written using libuv - the library behind node.js core. It's essentially an interface for plugins to communicate with both metrics servers and the running process (in our case, user's application).

One important thing to note about forza is that when its child process dies, it dies too. That makes it both easier to upgrade it and to avoid significant memory leaks. forza comes with few interesting plugins which send additional metrics about the application it monitors:

  • cpu - CPU load
  • mem - Used memory
  • start - plugin for starting applications
  • logs - log distribution
  • heartbeat - heartbeat
  • process - process-level events
  • uptime - process uptime

To get started with forza, first install it:

git clone https://github.com/opsmezzo/forza.git  
cd forza  
./configure --with-plugin cpu --with-plugin logs
make  

Now launch a new terminal window and run nc to see all of the data that forza will send from the application.

nc -k -l 1337  

Then start the application using forza:

./forza -h 127.0.0.1:1337 -- node <path-to-your-node-app>

When your app starts up, you should start seeing some logs and CPU statistics in a unified JSON format in the second terminal window.

Starting applications with solenoid

solenoid is an application starter. It performs several necessary tasks before starting an application:

  1. Fetches and unpacks the application snapshot using pkgcloud
  2. Reads the package.json and determines what node version to run.
  3. Sets any environment variables passed in.
  4. Creates the solenoid user, and restricts the file system.
  5. Starts forza with aeternum as the solenoid user.
  6. Exits

Right now, solenoid is specific for our needs, but we decided to open it up for the community to get better insight into how our platform works and help us improve it.

solenoid is started by the Nodejitsu API through SSH connection, using the ssh2 library. We then wait for forza to determine whether it correctly started responding to our users through jitsu appropriately.

Verifying our assumptions

The new infrastructure has been out for over three weeks. We have gotten lots of positive feedback, but we realize that it may have been rough around the edges for some of our users. We're still working on some edge cases, but we can honestly say that deployments are far more reliable than they were before.

Upgrading software on our servers became a breeze - we can release new software to all of our servers in about one hour. We're extremely happy to finally ship a thing we've been working on for over 6 months.

Looking to learn more about the inner workings of Nodejitsu? Stay tuned, in future posts we will be digging into:

  1. Building application snapshots.
  2. Provisioning and configuring servers.
  3. Real-time logging and metrics infrastructure.
  4. Load balancing and network routing.
  5. Databases and backups.