The paper analyzes requests made to Facebook’s TAO system to quantify how often anomalies regarding data consistency happen in practice. The system consists of a two-level cache and a sharded database with a single-master-per-shard. The following figure illustrate a typical request:
Within each cluster the system guarantees per-object sequential and read-after-write consistency. Across the entire system the system provides eventual consistency.
The results show that only 0.0004% of the requests returned a different value than what a linearizable system would return1.
Two forms of analysis are performed: a principled version and a practical version. The principled analysis identifies when the results would differ from a stronger consistency model and is not performed in real-time. The practical analysis is performed at real-time but not for all requests and is used as a monitored metric to help detecting bugs.
It’s interesting how they constructed their consistency checker for the principled analysis. They collect a bunch of information about the requests and then create a graph with the requests. Each
read operations are vertices in the graph and based on the time they completed the edges are created connecting them. Edges are also created to connect subsequent
read operations with the
write operation they read, and the edges are created in both directions.
The trick is that the
read operations are merged with the
write operation they read and the edges are inherited by the
write operation. So if the final graph has a cycle it is a sign that there is an anomaly. An acyclic graph means there is a total order over the operations.
The following figure illustrates a trace that shows an anomaly:
┌────┐ ┌────┐ ┌────┐ │ │ │ │ │ │ before merge │ w1 │─────────▶│ w2 │──────────▶│ r1 │ │ │ │ │ │ │ └────┘ └────┘ └────┘ │ ▲ │ │ └────────────────────────────────┘ ┌────┐ ┌────┐ │ │─────────▶│ │ after merge │ w1 │ │ w2 │ │ │◀─────────│ │ └────┘ └────┘
The other weaker consistency models provided by TAO like per-object sequential consistency and read-after-write are detected using the same graph.
Causal consistency can’t be checked with this model. The following example, where
C0, C1, C2 represent different clusters, is valid under causal consistency:
C0: Wx1 C1: Rx1 C2: Rx0
The following example is not:
C0: Wx1 C1: Rx1 Wy2 C2: Ry2 Rx0
In the example above if
C2 saw the operation that caused
y to change it should also see all operations that might have influenced the write to
y. The trace for the
y object is
Wy2 Ry2, which is valid and alone does not give evidence about an illegal operation. The trace for the
x object is the same in both examples:
Wx1 Rx1 Rx0. Since Facebook only traces a tiny random sample of keys they might not see both
x traces and won’t be able to perform a causal check since each trace alone is not enough for detecting such violations.
Since the principled analysis is not a good fit for real-time monitoring they came up with a way for monitoring in real-time how often anomalies happened, called Φ(P)-consistency. The definition is as follows:
The Φ(P)-consistency of a set of replicas
Pis the frequency that injected reads for the same data to all
Preceive the same response from each
The injected reads are issued from the same server and the responses are compared when all have returned. Because there is no clock synchronization, versioning, or logging involved, the responses might be affected by the network and give false negatives (the read is marked inconsistent when it should be consistent). For example, a server sends injected
read operations to other replicas but if in a given replica a
write operation arrives and is executed just before the injected read is received, when the read returns it will have a different value than the others that received the injected
read before the
write. This injected read will be marked as inconsistent even though that at the time the injected read was dispatched the replicas had a consistent view for the object.
At first glance you might expect that the TAO system would have higher rates of inconsistency but the results show the opposite2. Another interesting aspect is the idea of a metric to monitor these anomalies in real-time and give a hint about a possible bug that was put in production recently.