Distributed Transactions
Two-phase commit protocol
In the two-phase commit protocol (2PC) there is a Coordinator which as the name suggests takes care of coordinating the protocol with the steps below:
- Contact every participant telling them to get ready for the transaction.
- Gather the response from all participants and based on the response decide whether or not to commit. If all participants voted
YES
then commit, if not, abort.
┌────┐ ┌────┐
┌────────▶│ A │ ┌─────────│ A │
│ └────┘ │ └────┘
propose yes/no
│ │
┌───────────┐────────┘ ┌────┐ ┌───────────┐◀───────┘ ┌────┐
│Coordinator│─────propose─────▶│ B │ │Coordinator│◀────yes/no───────│ B │
└───────────┘────────┐ └────┘ └───────────┘◀───────┐ └────┘
│ │
propose yes/no
│ ┌────┐ │ ┌────┐
└────────▶│ C │ └─────────│ C │
└────┘ └────┘
┌────┐ ┌────┐
┌────────▶│ A │ ┌─────────│ A │
│ └────┘ │ └────┘
commit/abort ack
│ │
┌───────────┐────────┘ ┌────┐ ┌───────────┐◀───────┘ ┌────┐
│Coordinator│──commit/abort───▶│ B │ │Coordinator│◀──────ack────────│ B │
└───────────┘────────┐ └────┘ └───────────┘◀───────┐ └────┘
│ │
commit/abort ack
│ ┌────┐ │ ┌────┐
└────────▶│ C │ └─────────│ C │
└────┘ └────┘
Problems with 2PC
2PC is a blocking protocol which means that if some entity crashes the others will wait for some time until the crashed entity comes back.
Consider the case when the propose
messages have been sent but not all of them and the coordinator died during the process. Some participants received the proposal and are waiting for the next step, if the coordinator doesn’t come back or if it takes a long time to come up these participants will be blocked while other servers will be there thinking that everything is fine. If all participants received the proposal and are blocked after the coordinator crashed they can’t simply discard their locks on resources that were acquired after the propose
message because the coordinator could have come back saw their yes
and died again or is just taking too much time to reply with commit
.
One way to mitigate this problem is by having a secondary process that can take over the job of coordinating the protocol if the coordinator stops working. For that the participants must save what was their response to some round of agreement to remember what was their reply to the original coordinator. The coordinator could save the results of completed rounds on durable storage so when it comes back up it can inform the participants the result and they can take action cleaning up their logs.
Even with a standby coordinator there might be problems if one of the participants stops working. The new coordinator doesn’t know what the failed participant voted and cannot decide if the round should be commit or abort based on the responses from the surviving participants.
Three-phase commit protocol
The three-phase commit protocol (3PC) adds a new step to 2PC in order to not block during the process and in case an entity does not receive some message for a certain period of time it can take action without waiting forever. The steps for the coordinator are the following:
- Contact every participant telling them to get ready for the transaction.
- Gather the response from all participants and based on the response decide whether or not to prepare for commit.
- Send
commit/abort
based on the response from their participants.
┌────┐ ┌────┐
┌────────▶│ A │ ┌─────────│ A │
│ └────┘ │ └────┘
propose yes/no
│ │
┌───────────┐────────┘ ┌────┐ ┌───────────┐◀───────┘ ┌────┐
│Coordinator│─────propose─────▶│ B │ │Coordinator│◀────yes/no───────│ B │
└───────────┘────────┐ └────┘ └───────────┘◀───────┐ └────┘
│ │
propose yes/no
│ ┌────┐ │ ┌────┐
└────────▶│ C │ └─────────│ C │
└────┘ └────┘
┌────┐ ┌────┐
┌────────▶│ A │ ┌─────────│ A │
│ └────┘ │ └────┘
prepare ack
│ │
┌───────────┐────────┘ ┌────┐ ┌───────────┐◀───────┘ ┌────┐
│Coordinator│─────prepare─────▶│ B │ │Coordinator│◀──────ack────────│ B │
└───────────┘────────┐ └────┘ └───────────┘◀───────┐ └────┘
│ │
prepare ack
│ ┌────┐ │ ┌────┐
└────────▶│ C │ └─────────│ C │
└────┘ └────┘
┌────┐ ┌────┐
┌────────▶│ A │ ┌─────────│ A │
│ └────┘ │ └────┘
commit/abort ack
│ │
┌───────────┐────────┘ ┌────┐ ┌───────────┐◀───────┘ ┌────┐
│Coordinator│──commit/abort───▶│ B │ │Coordinator│◀──────ack────────│ B │
└───────────┘────────┐ └────┘ └───────────┘◀───────┐ └────┘
│ │
commit/abort ack
│ ┌────┐ │ ┌────┐
└────────▶│ C │ └─────────│ C │
└────┘ └────┘
In 3PC if there is a timeout when the coordinator is waiting for the yes/no
from the participants the coordinator decides to abort the transaction. If the timeout from a participant happens after the coordinator sent the prepare
messages it should commit
. In this case the participants know they received the prepare
and voted yes
so when a participant recovers it will know that the round should have committed.
If the coordinator dies after sending the prepare
message the standby coordinator can contact the participants and ask if they received the prepare
message and if it happened then it is safe to send commit
because the participants can only receive prepare
messages if all participants agreed to proceed. If some participant didn’t receive prepare
but others did it is also safe to commit
. Maybe the message got lost or didn’t arrive yet. In case all the participants didn’t receive the prepare
messages the new coordinator can safely abort
.
The new step helps the protocol by giving the participants the ability to decide to abort in case they don’t receive the prepare
message after a certain time instead of blocking indefinitely waiting for the coordinator instruction to commit
or abort
. This step also helps in case the participants fail after the prepare
messages were sent, assuming these nodes will recover, the protocol can commit
because the participants said they were prepared.
Problems with 3PC
In face of a network partition where all nodes at one side (side A) received the prepare
message and the nodes on the other side (side B) didn’t receive it the protocol will break since each side will take different actions: A will commit and B will abort.
References
- http://the-paper-trail.org/blog/consensus-protocols-two-phase-commit/
- http://the-paper-trail.org/blog/consensus-protocols-three-phase-commit/
- https://en.wikipedia.org/wiki/Three-phase_commit_protocol
- https://en.wikipedia.org/wiki/Two-phase_commit_protocol