Nodejitsu

Save time managing and deploying your node.js app. Code faster with jitsu and npm

Building an anonymous cloud database

About the author

Name
Location
Worldwide
nodejitsu nodejitsu

As part of my ongoing quest to develop Skynet, I've been thinking a lot about the decentralized storage and distribution of small amounts of state in a peer-to-peer environment. With new privacy laws, the internet is fundamentally changing in a way we have not seen before. It's up to us, the developers to make sure the internet remains a fair and open place.

In software application development we usually store application state in volatile and non-volatile random access memory. We also use network devices to communicate state between our application and a central provider or server. We store state in things like databases, the file-system and cloud hosting providers. This approach is recommended for most applications, but when we start to think about the implications of building large, distributed, and decentralized applications, we begin to rethink about how these applications are going to store and maintain state.

In some way or another, almost anything can store state. Any device that can be written to, then read from later can be used to store state.

Think of it like this:

There are hundreds of thousands of endpoints on the web right now that you can post some sort of data to, and read it back later. Why not make a database out of that?

Basic strategies in building a decentralized and anonymous database

  1. No central servers. A central server is a central point of failure.
  2. No central authorities. A central authority over control of the database indicates another ( and much worse ) central point of failure.
  3. Ability to encrypt and obscure data.
  4. Ability to run in any combination of local / cloud / hosted. The database should be able to run locally, on hosted servers, and in the cloud.
  5. Ability to easily share and replicate data. The data should be able to easily transfer and replicate between multiple systems and users.

The goals of building a decentralized and anonymous database are fairly straight forward. The data needs to have no central point of failure ( technological or political ). The data needs to be able to be secured. The data needs to be accessible and sharable.

Introducing hnet

hnet is a decentralized and anonymous database built in node.js. It works by spreading small amounts of information across several nodes using a variety of non-traditional storage engines.

http://github.com/hookio/hnet/

What is an hnet node?

An hnet node is anything that we can store a small amount of data which we can eventually read back later. The hnet client communicates with many hnet nodes in order to establish a dataset.

What is an hnet storage engine?

A storage engine can be considering a wrapper for any service on the web that can store data, and read it back later. Storage engines are small and pluggable, so it should be very easy to author new engines. Some examples of engines are: gist, pastebin, imgur, google groups, reddit, twitter, irc, etc... The only real requirement for a storage engine is that it can store state, the rest is up to your imagination.

Why hnet?

hnet was originally built as an easy way to provide distributed and semi-anonymous tables of server ips and ports for hook.io hooks. hnet helps provide hook.io a way of auto-discovery other hooks over a Wide Area Network. Note: hook.io already supports auto-discovery over the Local Area Network via mdns.

How does hnet work?

Many dumb hnet nodes, one smart client.

In order to query hnet, first we connect to a few "top-level" nodes

If you are not maintaining your own hnet, the hnet client will fall-back to a few semi-moderated top level nodes. If you don't feel comfortable having someone else moderate your data, it's trivial to setup your own piece of hnet.

You will notice that hnet uses Iriscouch for many of it's top-level nodes and that most nodes eventually link back to http://hnet.iriscouch.com/public. Iriscouch provides free hosted CouchDB as a service. It allows you to signup for a free CouchDB instance instantly and with minimal registration. IrisCouch was chosen for both it's high quality of service, and the fact the lead developer of the service Jason Smith is an active open-source and open-data advocate.

Note: Our node.js deployment tool jitsu also ships with a nifty command jitsu databases create couch, which will instantly give you a CouchDB instance through IrisCouch.

The hnet protocol is JSON

the hnet protocol supports arbitrary JSON data, and optional JSON-RPC commands

Here is an example of what a JSON fragment returned from an hnet node might look like:

example JSON returned from hnet node:

[
  { "foo": "bar", "tar": "val" },
  { "foo": "boo", "something": ["a","b","c"] },
  { "foo": "bar", "tar": "val" },
]

optional JSON-RPC commands can be embedded

hnet optionally parses these JSON-RPC commands.

[
  { "foo": "boo", "something": ["a","b","c"] },
  { "method": "link", params: [ 
      { "type": "couch", "uri": "http://hnet.iriscouch.com/public/0" }
    ] 
  },
  { "foo": "bar", "tar": "val" }
]

If we look at the array item from the previous JSON fragment, we can see:

{ "method": "link", params: [ { "type": "couch", "uri": "http://hnet.iriscouch.com/public/0"} ] },

The link method is particularly important.

It indicates that we should lazily link this document from a remote dataset into the current position in the item array.

This allows us to create large datasets from many small hnet nodes.

hnet client receives data from many nodes

After hnet is able to query a few top-level nodes, it begins to crawl several other linked hnet nodes and merges the data locally.

Saving data to hnet

hnet could be considered an append-only database. hnet is designed to create new nodes instead of attempting to edit existing nodes. Every time a new hnet node is created, it links back to at least two existing nodes.

Since every new node created links back to at least two other known hnet nodes, all newly created hnet nodes will eventually link back to the first nodes we started querying.

Establishing a TTL ( Time To Live ) for circular link resolution

By default, the hnet client will not attempt to resolve circular JSON-RPC links. You can enable circular linking by specifying the ttl parameter in the hnet constructor.

Ex:

var Hnet = require('../lib/hnet').Hnet;

var hnet = new Hnet({  
  ttl: 5000
});

hnet.load();

The ttl parameter indicates that the hnet client will resolve any circular links it encounters, after a delay of 5,000 milliseconds. If a ttl is specified, the hnet client will continue to crawl hnet forever, or until the circular link becomes broken due to specific hnet nodes becoming unavailable.

A quick note on Steganography and Cryptography

hnet exposes simple Hnet.get and Hnet.save methods. It's fairly trivial to perform whichever cryptography you want on the data before you save it, and when you retrieve it. hnet also provides an optional interface for specifying un-encrypted metadata with encrypted data. This approach allows you to use hnet with most existing cryptography standards.

Steganography is the art and science of writing hidden messages in such a way that only the sender and intended recipient are aware of the existence of the message. It can be considered a form of security through obscurity. Many hnet storage engines use some form of steganography. The advantage of using steganography in conjunctions with cryptography, is that messages do not attract attention to themselves. Cryptography can protect the contents of a message, but steganography can obscure the fact that a message even exists.