Understand the difference between correlation and causation and how it is connected with the happened-before relation. Learn how vector clocks can be used in distributed sytems to detect both happened-before relationships and concurrent events reliably. Get to know and practice the algorithm to exchange vector clocks with messages and implement a versioned key-value store.
Examples
Excercises
Context
The happened-before relation allows a partial ordering of events. In the following diagram, we can see multiple events that happen at different nodes ($A, B, C$). The physical time travels from left to right, and we can see that each node has a slightly different time (frequency) with the timestamps denoted in gray:
Let’s define the expression to say that event $a$ happened-before event $b$ as:
$$ a \to b $$
Further, we note that:
It is also possible, that two events, like $b$ and $d$ are completely unrelated. Then, we say the events are concurrent and write $b || d$. In the illustrated example, we know $c \to g$, and $e \to g$, but we can not know if $c \to e$ or $e \to c$, because $c || e$.
What do you see in the following picture?
Correlation refers to a statistical relationship between two random variables. In the picture with the cat, the roof has collapsed (variable one: $e_1$) and the cat is sitting exactly at the spot where the roof is the lowest (variable two: $e_2$). These two events are certainly correlated.
Causality refers to the relationship between two events: the cause and the effect, where the effect is a result of the first. It is important to distinguish between correlation and causality, as a correlation between two events does not necessarily imply a causal relationship. In order to establish causality, it is necessary to demonstrate that the first event is in fact responsible for the second event.
Let’s have a look at the happened-before relation in the picture with the cat: