wiki · home


Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Chord is a distributed lookup protocol for peer-to-peer systems. The protocol focus on providing a fast lookup for keys.

Some of Chord’s characteristics help simplifying peer-to-peers systems:

Applications can be built using a Chord library that provides two ways of interaction: the first way is by using Chord’s lookup(key) function that yields the IP address of the node responsible for the supplied key. The second way is by having the nodes notifying the application about changes in the set of keys the node is responsible for. Other features such as authentication, caching, replication, etc, are not managed by Chord and should be the application’s responsibility.

The Base Chord Protocol

The Chord protocol specifies how to find the locations of keys, how new nodes join the system, and how to recover from the failure (or planned departure) of existing nodes.

At its core, Chord provides a fast mapping from keys to the nodes responsible for them via usage of a hash function to do the mapping. It uses consistent hashing so when there is a failure with a node or new nodes join the system, only a portion of keys need to be remapped. To improve the scalability of consistent hashing it does not require that every node need to know about every other node.

Scalable Key Location

The nodes in the consistent hashing ring store pointers to their respective successors and predecessors. When the user queries for a given identifier these pointers can be simply followed until the correct node is found. In a system with a large number of nodes this traversal might not be attractive, and that is why Chord maintains additional routing information to speed up the process of finding a node.

In Chord, each node n maintains a routing table with at most m entries, where m is the number of bits in the key/node identifiers. This table is called finger table and the ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier ring. So, s = successor(n + 2i-1), where 1 <= i <= m, and all arithmetic is modulo 2m.

With this scheme, each node will end up storing more information about those nodes that are more closer to it on the ring instead of nodes that are farther away. The lookup can then proceed by following these entries in the finger table where following each entry roughly corresponds of skipping half the nodes between n and the desired node s. The following figure illustrates these concepts:

To lookup the node for a given key, the application can contact any node in the ring which will then find the successor node of the range that includes the given key, which is the node responsible for that key. This process makes use of the finger table to avoid linearly following the successor pointers.

Node Joins

When a node n joins the ring, Chord must perform the following tasks to preserve the routing invariants and finger tables:

  1. Initialize the predecessor and fingers of n.
  2. Update the fingers and predecessors of existing nodes to reflect the addition of n.
  3. Notify the higher layer software so that it can transfer state (e.g. values) associated with keys that node n is now responsible for.

The new node can join the ring by knowing the identity of any existing Chord node n'. With the help of node n' node n will have its state initialized and itself added to the Chord ring.

In the process of joining the ring, the existing node n' helps the new node n by initializing the finger table for n. Then the finger tables of existing nodes need to be updated to reflect the new node n in the ring. And finally, the keys that should be now of responsibility of n need to be transferred to it. The actual meaning of “transferring the keys” is left for the application using Chord.

Concurrent Operations and Failures

The idea of the previous section only considered a single node joining the ring but what actually happens is multiple nodes joining concurrently and nodes that fail or depart voluntarily. Because of that, the ideas in the previous section need to be modified.

In the previous section, the algorithm tried to maintain correct finger tables for all nodes. With several nodes joining the ring it is difficult to maintain the same level of correctness. The idea is then to separate the correctness goal by having a different procedure that is periodically executed to fix the finger tables and attempt to give the nodes a more correct view of the ring1.

Another procedure that is also executed periodically is used to stabilize the ring, and it works by notifying possible successors of the existence of a new previous node. The following pseudocode illustrates such procedure:

// periodically verify n's immediate successor, and tell the successor about n.
n.stabilize()
    x = successor.predecessor
    if (x ∈ (n, successor))
        successor = x
    successor.notify(n)

// n' thinks it might be our predecessor
n.notify(n')
    if (predecessor is nil or n' ∈ (predecessor, n))
        predecessor = n'

To maintain the correct successor pointers when nodes fail or leave the ring, the nodes maintain a list of successors. A modified version of the stabilize function takes care of maintaining this list. So when the successor of node n fails, n replaces the successor with the next alive node from its successors list.

One caveat is that after a node failure, but before stabilization has completed, nodes might send requests through the failed node via a find_successor function that uses the finger table entries. In this case, the implementation could also use the successors list to try routing the request with a different but alive node.

References

Notes


  1. Not a full view of the ring because, as it was mentioned before, the amount of information required to maintain a full view might hurt the performance.↩︎