Introduction

A key-value store is a type of database that implements a hash table abstraction: it stores mappings from keys to values. Popular key-value stores today such as Redis, memcached, and etcd are staples of production-level systems in industry today. These key-value stores are built in a distributed fashion, often with one or more leader and some number of followers. This design allows the database to scale past the capacity of just one node.

A key concept in distributed systems is fault tolerance: the property that system should be robust and reliable, able to continue operating despite the failure of some individual components. With distributed key-value stores, we must account for the possibility that some nodes will fail and crash over a long period of time due to common issues such as hardware failure or network partition. In the case of failures, we must ensure that we do not lose any data. To achieve this, distributed databases employ a strategy called replication, where the same key-value pair is stored on multiple followers instead of just one.

However, replication creates a problem for obtaining consensus - for example, a naive implementation supporting replication may end up in a state where one set of followers stores a value for some key which differs from that of another set of followers. In short, a database with consensus issues will lead to inconsistencies in the data. To solve this problem, we use a consensus protocol to make sure that all of our followers (or at least a majority), agree on the correct data. While there are many complex consensus protocols such as Paxos and Raft, our key-value store will use an older (and simpler!) protocol called Two-Phase Commit (2PC).

Last updated