Distributed databases

Distributed databases in a nutshell

Hey I just met you, the network’s laggy

A distributed database is a distributed system designed to provide read/write access to data, using the relational or other format.

Splitting the data among nodes

We will examine 3 topics in distributed databases:

  • Replication: Keep a copy of the same data on several different nodes.

  • Partitioning: Split the database into smaller subsets and distributed the partitions to different nodes.

  • Transactions: Mechanisms to ensure that data is kept consistent in the database

Q: Usually, distributed databases employ both replication and partitioning at the same time. Why?

Replication

MySQL async replication

Why replicate?

With replication, we keep identical copies of the data on different nodes.

D: Why do we need to replicate data?

  • To allow the system to work, even if parts of it are down
  • To have the data (geographically) close to the data clients
  • To increase read throughput, by allowing more machines to serve read-only requests

Replication Architectures

In a replicated system, we have two node roles:

  • Leaders or Masters: Nodes that accept writes from clients
  • Followers, Slaves or replicas: Nodes that provide read-only access to data

Depending on how replication is configured, we can see the following architectures

  • Single leader or master-slave: A single master accepts writes, which are distributed to slaves
  • Multi leader or master-master: Multiple masters accept writes, keep themselves in sync, then update slaves
  • Leaderless replication All nodes are peers in the replication network

How does replication work?

The general idea in replicated systems is that when a write occurs in node, this write is distributed to other nodes in either of the two following modes:

  • Synchronously: Writes need to be confirmed by a configurable number of slaves before the master reports success.

  • Asynchronously: The master reports success immediately after a write was committed to its own disk; slaves apply the changes in their own pace.

D: What are the advantages/disadvantages of each replication type?

Statement based replication in databases

The master ships all write statements to the slaves, e.g. all INSERT or UPDATE in tact. However, this is problematic:

UPDATE foo
SET updated_at=NOW()
WHERE id = 10

D: What is the issue with the code above in a replicated context?

Statement based replication is non-deterministic and mostly abandoned.

Write-ahead log based replication

Most databases write their data to data structures, such as \(B^+\)-trees. However, before actually modifying the data structure, they write the intended change to an append-only write-ahead log (WAL).

WAL-based replication writes all changes to the master WAL also to the slaves. The slaves apply the WAL entries to get a consistent data.

The main problem with WAL-based replication is that it is bound to the implementation of the underlying data structure; if this changes in the master, the slaves stop working.

Postgres and Oracle use WAL-based replication.

Logical-based replication

The database generates a stream of logical updates for each update to the WAL. Logical updates can be:

  • For new records, the values that where inserted
  • For deleted records, their unique id
  • For updated records, their id and the updated values

MongoDB and MySQL use this replication mechanism.

Process for creating a replica

  1. Take a consistent snapshot from the master
  2. Ship it to the replica
  3. Get an id to the state of the master’s replication log at the time the snapshot was created
  4. Initialize the replication function to the latest master id
  5. The replica must retrieve and apply the replication log until it catches up with the master

A real example: MySQL

Master

> SHOW MASTER STATUS;
+--------------------+----------+
| File               | Position |
+--------------------+----------+
| mariadb-bin.004252 | 30591477 |
+--------------------+----------+
1 row in set (0.00 sec)

Complications with async replication

  • Read-after-write: A client must always be able to read their own writes from any part of the system. Practically, the client can:

    • Read recent writes from the master
    • Read session data from the master
  • (non-) Monotonic reads: When reading from multiple replicas concurrently, a stale replica might not return records that the client read from an up to date one.

  • Causality violations: Violations of the happened-before property.

Async replication is fundamentally a consensus problem.

Multi-leader replication

The biggest problem with multi-leader replication are write conflicts. To demonstrate this, let’s think in terms of git:

                                # Clock
# User 1
git clone git://....            # t+1
git add foo.c                   # t+2
##hack hack hack
git commit -a -m 'Hacked v1'    # t+3
git push                        # t+5

# User 2
git clone git://....            # t+2
git add foo.c                   # t+5
##hack hack hack
git commit -m 'Hacked new file' # t+6
git push # fails                # t+7
git pull # CONFLICT             # t+7

If we replace user with master node, we have exactly the same problem

How to avoid write conflicts?

  • One master per session If session writes do not interfere (e.g., data are only stored per user), this will avoid issues altogether.

  • Converge to consistent state Apply a last-write-wins policy, ordering writes by timestamp (may loose data) or report conflict to the application and let it resolve it (same as git or Google docs)

  • Use version vectors Modelled after version clocks, they encode happens before relationships at an object level.

Partitioning