[ /js/graph ]

Of all the websites produced by the Hyperborian community, I think my favourite would have to be fc00.org. A big part of the reason why I like it is that it exposes all of its data as a JSON object. That object can be downloaded directly, and lots of additional data can be extracted from it.

FC00.org compiles data from many nodes to provide an accurate picture of how the network is structured at any given time. This is essential, since no single node can be expected to have knowledge of the entire network.

I <3 Graph Theory

If you've been on Hyperboria for a while, you might remember that for a while I was pulling a bunch of data from a bunch of sources, and trying to interpret what it all meant. I got bored/busy after a while, but every now and then I come back to my old hobby and start playing around with the data some more.

JSON stands for "Javascript Object Notation", and so it's reasonable to expect that Javascript would be suited to dealing with information packaged as JSON. Since the last time I tried dealing with this data, I've published an npm package which contains some of my favourite functions for data science. As a result, it's become even easier.

In this article, I'm going to document the process of interpreting FC00.org's data, using my eponymous npm package. To use my library, just run sudo npm ansuz -g. This will install the package ansuz globally on your system.

Once you have that, you can get follow along with the everything in this post.

Getting going

First you need to grab the data from FC00.org. I used wget for this task, but if you wanted to automate the process from within a NodeJS script, there are lots of packages for doing so, like request.

wget http://www.fc00.org/static/graph.json

The easiest way to load a JSON object in NodeJS is to use require. I like to take care of all require statements at the top of a script, so I'll load ansuzjs here too. ansuzjs is a wrapper around three libraries I commonly use:

  1. van.js (which contains a bunch of basic helper functions)
  2. gen.js (with code for producing lazy generators)
  3. morf.js (which I use for processing strings)
var G=require("./graph.json");
var ansuz=require("ansuz");
var van=ansuz.van;

Extracting information from the graph object.

Once you've loaded your libs, you can take a look at what graph.json contains.

console.log(Object.keys(G));

This line will print an array of the three attributes in the graph object:

  1. nodes
  2. edges
  3. created

These should be pretty straight-forward if you know a little bit about graph theory.

G.created simply says when this information was gathered. It contains a single integer value, the UTC code.

G.nodes is an array of node objects, which each contain the following keys:

  1. color
  2. label
  3. version
  4. y
  5. x
  6. id
  7. size

I will only be using two of these values: id and version.

G.edges is an array of edge objects, which each contain the following two keys:

  1. sourceID
  2. targetID

Simplifying this data

This is a really common way of describing a graph. There are individual points, and separate data indicating how those points relate. For my purposes, however, I would like to reformat the information so as to make certain data easier to access.

I love arrays, but objects make it much easier to access nodes by their ip, and so I'd like to convert these two arrays into a single object. This will be called U, for 'universe' since it will comprise our complete domain.

To make things easy, I'll still keep an array as well, called I, for 'index'.

var U={}; // the universe of nodes

G.nodes.map(function(n,i){ // map over all the nodes
  U[n.id]={ // populate the universe
    id:n.id
    ,p:[]
    ,v:G.nodes[i].version
  };
});

This code instantiates our universe object, and fills it with nodes, which each contain:

  1. a copy of their id (a string)
  2. an empty array (which will contain a list of peers)
  3. their version (a string)

Now we need to iterate over the array of edges, and populate each node's array of peers. This duplicates a lot of information (which is why it was in the other format to begin with), but makes it such that we can access each node's list of peers without having to look outside that particular node object.

G.edges.map(function(e){ // populate each node with edge data
  U[e.sourceID].p.push(e.targetID);
  U[e.targetID].p.push(e.sourceID);
});

Each edge object represents a link between two nodes, which you'll understand to be symmetrical, thus we push each node's id to the other node's peer list.

Having reduced the nodes and graph arrays to a single object, we can now precompute some basic information and clean up some redundancies. FC00.org collects data about the nodes it has seen, and which connections it knows about. Unfortunately, that information isn't always complete, and so it sometimes reports nodes which lack peers. I call these nodes islands.

These islands can cause problems when generating a distance matrix (which I'm going to do), so I'm going to remove them now.

var I=[];

Object.keys(U).map(function(node){ // precompute each node's number of peers
  U[node].n=U[node].p.length;
  if(U[node].n===0){ // nodes with no peers shouldn't be on the graph
    delete U[node];
  }else{
    I.push(U[node].id);
  }
});

var L=I.length;

In the code above, I instantiate my index array, I. Then for every node in the Universe object U, I create a new attribute n, which represents that node's number of peers. Nodes with 0 peers are islands, which I promply delete from the Universe object U. Nodes which have peers are pushed to the index array I. Once I has been fully populated, I take the length of it and assign it to a global variable L.

Generating an Adjacency Matrix

There are many different metrics we can use to evaluate the health of a network. Of the many ways we can represent this data, some of the more informative formats are the Adjacency Matrix and the Distance Matrix.

The generation of an adjacency matrix is made really easy by my function van.carteSquare, which applies a binary function to every possible combination of elements in an array.

var A=van.carteSquare(function(a,b){
  return (a===b||U[a].p.indexOf(b) !== -1)?1:0;
},I);

The above definition of adjacency is such that a node is considered adjacent to itself, or any node within its array of peers. This function returns an array of arrays, in which each element is either a zero or a one. This data gets assigned to A.

This data isn't actually that informative, since all the positive information, that is, information about which nodes are connected to which, was already contained in the nodes themselves. It just contains a bunch of negative information, regarding which nodes lack direct connections.

We will have to dig a bit deeper to expose some more valuable information.

Generating a Distance Matrix

This task is significantly more difficult than the previous one. This time, instead of generating an array, I'm going to generate an object. This object will contain nested objects, each of which will represent a node. Each such node will contain attributes indicating that node's distance from the specified node.

This is a fairly complicated function, but I've done my best to comment it well.

var buildDistanceMatrix=function(maxDepth){
  // maxDepth allows you to limit how far it will explore the network
  var D={}; // D is the distance matrix object
  var done=false; // are we done yet? NO. definitely not.

  // we start with an empty store of nodes, but we have an index
  I.map(function(i){// for each entry in the index
    D[i]={}; // initialize an object
    D[i][i]=0; // that entry has a distance of zero from itself
  });

  var depth=0; // let's start by looking for nodes at a distance of one hop

  while(done !== true){ // as long as we still have work to do...

    var completion=I.map(function(i){
      // for each iteration of the while loop, we want to loop over each node
      // and deal only with its peers which are `depth` hops away.
      // the easiest way to do that is to grab all the keys in the object
      // and filter out all the nodes which do not match that depth

      Object.keys(D[i]).filter(function(nodes){ 
          return (D[i][nodes]===depth); // get just the nodes at this depth
        })
        .map(function(node){ // for every such node, apply this function

          U[node].p.map(function(pp){ // find all its peers' peers
            if(!D[i][pp] && D[i][pp] !== 0){ // avoid overwriting existing records
              // this is how we retrieve the shortest path
              D[i][pp]=depth+1; // they are $depth away from this node
              D[pp][i]=depth+1; // and vice versa, bcuz distance is reflexive
            }// skip this pair if distance has already been computed
          }); // this mapping was purely for side effects...

        }); // finish mapping over each node
        return Object.keys(D[i]).length===L;
        // this returns whether each node's list of distances is equal in length
        // to the global value L (this is a boolean value)
      }); // assign the value of this huge expression to the 'completion'

    var incomplete=completion.indexOf(false); // check for incomplete nodes

    if((++depth > maxDepth)||(incomplete===-1)){
      // if you've exceeded the maximum specified depth
      // of if every pair of nodes has its distance computed
      done=true; // then you're done
    } // default action is to continue

  } // end of while loop, which means you're done
  return D; // return the Distance Matrix object
};

With that function defined, you can call it and assign the return value to an object for reference.

var D=buildDistanceMatrix(50)

You can now look up the distance between any two nodes quite easily.

console.log(D[nodeA][nodeB]);

Computing other statistics

van.js is composed mostly of functions for working with arrays. While it wasn't expressly designed for statistics, it was designed to process arrays really easily. Since it already included most of the requisite operations for computing a dataset's standard deviation, I decided to include the definition itself.

It will only work on a one dimensional array of numbers, though. Keep this in mind when using the van.stdDev function.

There are a number of functions which will transform other datatypes into one-dimensional arrays, or otherwise operate on the arrays themselves:

  1. van.flatten flattens an array of arrays into a single array.
  2. van.vals does the opposite of Object.keys, returning an array of all the component values of an object.
  3. van.sum sums all the values in a numerical array.

With those in mind, let's figure out some quick attributes of our graph:

var distances=van.flatten(
  van.vals(D) // get all the values in D
    .map(function(d){ // each value is another object
      return van.vals(d); // return the values in that object
    }) // now you have nested arrays instead of nested objects
  );// and you've flattened them into a single array

var deviation=van.stdDev(distances); 
// standard deviation of distances

var mean=van.sum(distances)/distances.length;
// one kind of average

var maximum=distances.reduce(function(a,b){return (a>b)?a:b;});
// the furthest any packet should have to travel in the network (ideally)