Introduction
Time from conventional clocks are not always comparable due to clock drift, network delays. These timestamps are therefore not always comparable for the sake of ordering.
A logical clock allows comparable timestamps. However we don't want this logical clock to be centralized, as this become a single point of failure for the system and is not fault-tolerant.
We are interested in distributed logical clocks where each participant has its own clock that collaborates with others.
Concurrent events
Events occurring without the knowledge of others.
Total order
If all nodes have the same collection of events, they can come to the same conclusion. It also means that we can infer causality of events (i.e Compare events to know which one happend before.)
Lamport Clocks
- Timestamps are sequential associated with events
- Each node maintains their sequence starting at 0
- When an event is generated, the sequence is incremented and that number is associated with the event.
- When a node is made aware of an event. It compares it sequence with the number of the event and increases its own sequence if the number is greater.
Lamport clocks allows comparing events in an historical manner (happened before) after the fact
- Local events remain order, even when mixed with events from other node, since the sequence is monotonically increasing.
Drawbacks
- Total order of event is not deterministic. We cannot distinguish events with the same timestamp.
- Events cannot be ordered in real-time. A node may have generated event [5] and a later time receive a [4] timestamp. If preserving order for non-concurrent events is important, Lamport clocks is not suitable.
- Cannot tell if an event occured after another, if the events are concurrent.
Lamport Origin Clocks
The Lamport original clocks also embeds the node id in the timestamp to be used as a second sort key. This allows having a deterministic sort order, because we can now distinguish events with identical timestamps.
Vector Clock & Dotted Vector Clock
The space complexity of vector clock is O(n), where n is the number of nodes. The timestamp keeps track of the last known event ID for each node.
For example consider the timestamp [3,4,0] originating from node B:
- There are 3 nodes in the system
- The fourth events generated by node B
- it precedes events like [4,5,2], because [4,5,2] is aware of events implied in [3,4,0]
- It is concurrent with [0, 2, 2], because [2] on Node C happened without knowledge of [4] on node B
Comparison
- If all values are equal, timestamps are identical
- If some values are lower or equal and some are greater, than the events are concurrent.
- If one or more entries are lower or equal, but none are greater, The timestamp with the lower entries precedes the other one.
Applications
- Collaborative editing such as Google docs
- CRDTs (Conflict-free replicated data types)
- Multi-leader storage systems