Nodejitsu

Save time managing and deploying your node.js app. Code faster with jitsu and npm

Fault tolerant applications in nodejs

About the author

Name
Location
Worldwide
nodejitsu nodejitsu

As part of my ongoing quest to develop Skynet, I've been deep diving into distributed computing.

At Nodejitsu, our node.js hosting platform deals with 1000s of live servers spanned across multiple data-centers. As scale increases, minute statistical probabilities become very real problems. The network is unreliable, disks become unwritable, streams break, unexpected input is unexpected, and entire data-centers can go down.

Think of it like this:

If you have 1,000 servers each individually rated at 99.9% uptime, on average, one of those machines is always failing.

So we research...

Fault tolerant system

Fallacies of Distributed Computing

Properties of distributed systems

High availability Percentage calculation

and we research more...

Shared nothing architecture

Byzantine fault tolerance

Paxos

Eventually, we hit the The Halting Problem:

Given a description of a computer program, decide whether the program finishes running or continues to run forever. In 1936, Alan Turing proved that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.

to summarize all of these links in the context of this blog post...

It's mathematically impossible your distributed application is not going to fuck up. Deal with it.


Basic strategies in building fault tolerant applications in node

  1. The operating system's RAM, processes, and file descriptors are relatively cheap. Use them as needed.
  2. Use JSON to pass data between nodes. ( see: nssocket or dnode ).
  3. Embrace crash-only design. Nodes should be able to disconnect, restart and reconnect as fast as possible with minimal impact.
  4. Never assume any one node process will never go down. Assume that any process can go down for any reason at any time.

The last point is particularly important. If your application is unable to recover from a "central" node going down, it should not be considered fault tolerant.

Hook.io

hook.io is a distributed input/output framework for node.js. It allows you to seamlessly link together several processes and start sending messages between them. hook.io also provides a rich network of stand-alone hook libraries for adding additional i/o sources. In a sense, hook.io can be described as a next generation enterprise service bus.

Basic Paxos

Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.

As of v0.8.6, hook.io now has the ability to do a very basic consensus among all hooks to ensure recovery if any hooks die.

In previous versions of hook.io, if the "master" or "central" server hook went down, all other hooks would stop communicating.

Now, hook.io is able able to recover from the "central" server hook going down.

Previously, when the hub went missing, none of the spokes knew where to connect anymore.

Building fault-tolerant node.js application with hook.io

hook.io can be used programmatically or as stand-alone binary. For simplicity, we will assume in this example that you are using hookio as a binary application, and with all it's default settings.

For more examples of using hook.io programmatically check out the examples or test folders.

To install hook.io with npm

 npm install hook.io -g

Now, we are going to start up three instances of hookio in our terminal.

Terminal 1

 hookio --autoheal

Terminal 2

 hookio --autoheal

Terminal 3

 hookio --autoheal

Using hook.io's default auto-discovery functionality the first terminal will take the role of server, and the second and third terminals will connect as clients to the first terminal.

Now, kill the first hook terminal with CTRL-C. The second hook will take over the role of the server. The third hook will now know to connect to the second hook instead of the first.

It might not seem very impressive, but it demonstrates a basic example of how to build fault-tolerant multi-process distributed applications in node.js

Applications built with hook.io now have no central point of failure.

But wait, there's more!!!

Mdns

Multicast DNS (mdns) is a way of using DNS programming interfaces, packet formats and operating semantics on a small network where no DNS server is running. The mDNS protocol is used by Apple's Bonjour and Linux Avahi service discovery systems. mdns is an easy way to help networked devices find each other without any prior configuration.

As of v0.8.6, hook.io now has alpha support for zero configuration networking via mdns. This means that if you hookio -m on two machines on the same Local Area Network...they will find each other and instantly start communicating without any configuration!

Computer A

 hookio -m

Computer B

 hookio -m

Now these two computers ( connected over a LAN, with no central DNS server ) will automatically discovery each other and begin to transmit messages.

Think of the possibilities!

Conclusion

With the new mdns and paxos features, hook.io is getting better everyday. The current goal is to stabilize this new functionality in the 0.8.x branch across all platforms.

The next steps will be p2p hook trackers and hosted hook services...