Distributed filesystems

What is a filesystem?

File systems determine how data is stored and retrieved. A file system keeps track of the following data items:

  • Files, where the data we want to store are.
  • Directories, which group files together
  • Metadata, such as file length, permissions and file types

The primary job of the filesystem is to make sure that the data is always accessible and in tackt. To maintain consistency, most modern filesystems use a log (remember databases!).

Common filesystems are EXT4, NTFS, APFS and ZFS

reading from a filesystem

Read system call routing in FreeBSD

reading from a file, block layer

Linux block-io layer

Distributed file systems

A distributed filesystem is a file system which is shared by being simultaneously mounted on multiple servers. Data in a distributed file system is partioned and replicated across the network. Reads and writes can occur on any node.

Q: Windows and MacOSX already have network filesystems. Why do we need extra ones?

Network filesystems, such as CIFS and NFS, are not distributed filesystems, because they rely on a centralized server to maintain consistency. The server becomes a SPOF and a performance bottleneck.

The Google filesystem

The Google Filesystem paper[1] kicked-off the Big Data revolution. Why did they need it though?

  • Hardware failures are common (commodity hardware)
  • Files are large (GB/TB) and their number is limited (millions, not billions)
  • Two main types of reads: large streaming reads and small random reads
  • Workloads with sequential writes that append data to files
  • Once written, files are seldom modified again
  • High sustained bandwidth trumps low latency

GFS architecture

GFS Architecture

GFS storage model

  • A single file can contain many objects (e.g. Web documents)

  • Files are divided into fixed size chunks (64MB) with unique identifiers, generated at insertion time. Files are replicated (\(\ge 3\)) across chunk servers.

  • Chunkservers store chunks on local disk as Linux files

    • Reading & writing of data specified by the tuple (chunk_handle, byte_range)
  • The Master maintains a mapping between file names and chunk locations.

    • GFS was a single-master system
  • No caching of (meta-)data allowed! (Why?)

GFS writes

Writes on a GFS filesystem

GFS operation

  • The master does not keep a persistent record of chunk locations, but instead queries the chunk servers at startup and then is updated by periodic polling.

  • GFS is a journaled filesystem: all operations are added to a log first, then applied. Periodically, log compaction creates checkpoints. The log is replicated across nodes.

  • If a node fails…:

    • If it is a master, external instrumentation is required to start it somewhere else, by rerunning the operation log
    • If it is a chunkserver, it just restarts
  • Chunkservers use checksums to detect data corruption

GFS stale chunkservers

GFS uses vector clocks to keep track of stale replicas.

  • Master maintains a chunk version number to distinguish up-to-date and stale replicas
  • Before an operation on a chunk, master ensures that version number is advanced
  • Stale replicas are removed in the regular garbage collection cycle

HDFS – The Hadoop FileSystem

HDFS started at Yahoo as an open source replica of the GFS paper, but since v2.0 it is different system.

GFS HDFS
Master NameNode
Chunkserver DataNode
operation log journal
chunk block
random file writes append-only
multiple writer/reader single writer, multiple readers
chunk: 64MB data, 32bit checksums 128MB data, separate metadata file

HDFS architecture

HDFS architecture diagram

The main difference with GFS is that HDFS is a user-space filesystem written in Java.

HDFS access session

# List directory
$ hadoop fs ls /
Found 7 items
drwxr-xr-x   - gousiosg sg          0 2017-10-04 08:23 /audioscrobbler
-rw-r--r--   3 gousiosg sg 1215803135 2017-10-04 08:25 /ghtorrent-logs.txt
-rw-r--r--   3 gousiosg sg   66773425 2017-10-04 08:23 /imdb.csv
-rw-r--r--   3 gousiosg sg     198376 2017-10-23 12:39 /important-repos.csv
-rw-r--r--   3 gousiosg sg     611300 2017-10-04 08:24 /odyssey.mb.txt
-rw-r--r--   3 gousiosg sg  388422973 2017-10-03 15:40 /pullreqs.csv

# Create a new file
$ dd if=/dev/zero of=foobar bs=1M count=100

# Upload file
$ hadoop fs -put foobar /
-rw-r--r--   3 gousiosg sg  104857600 2017-11-27 15:42 /foobar

HDFS looks like a UNIX filesystem, but does not offer the full set of operations.

For the course, we only need to know how to upload data to HDFS.

Content credits

Bibliography

[1]
S. Ghemawat, H. Gobioff, and S.-T. Leung, The google file system,” in Proceedings of the nineteenth ACM symposium on operating systems principles, 2003, pp. 29–43.
[2]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The hadoop distributed file system,” in IEEE 26th symposium on mass storage systems and technologies (MSST), 2010, pp. 1–10.