Distributed filesystems

What is a filesystem?

File systems determine how data is stored and retrieved. A file system keeps track of the following data items:

Files, where the data we want to store are.
Directories, which group files together
Metadata, such as file length, permissions and file types

The primary job of the filesystem is to make sure that the data is always accessible and in tackt. To maintain consistency, most modern filesystems use a log (remember databases!).

Common filesystems are EXT4, NTFS, APFS and ZFS

`read`ing from a filesystem

Read system call routing in FreeBSD

`read`ing from a file, block layer

Linux block-io layer

Distributed file systems

A distributed filesystem is a file system which is shared by being simultaneously mounted on multiple servers. Data in a distributed file system is partioned and replicated accross the network. Reads and writes can occur on any node.

Q: Windows and MacOSX already have network filesystems. Why do we need extra ones?

Network filesystems, such as CIFS and NFS, are not distributed filesystems, because they rely on a centralized server to maintain consistency. The server becomes a SPOF and a performance bottleneck.

The Google filesystem

The Google Filesystem paper[1] kicked-off the Big Data revolution. Why did they need it though?

Hardware failures are common (commodity hardware)
Files are large (GB/TB) and their number is limited (millions, not billions)
Two main types of reads: large streaming reads and small random reads
Workloads with sequential writes that append data to files
Once written, files are seldom modified again
High sustained bandwidth trumps low latency

GFS architecture

GFS Architecture

GFS storage model

A single file can contain many objects (e.g. Web documents)
Files are divided into fixed size chunks (64MB) with unique identifiers, generated at insertion time. Files are replicated (\(\ge 3\)) across chunk servers.
Chunkservers store chunks on local disk as Linux files
- Reading & writing of data specified by the tuple (chunk_handle, byte_range)
The Master maintains a mapping between file names and chunk locations.
- GFS was a single-master system
No caching of (meta-)data allowed! (Why?)

GFS writes

Writes on a GFS filesystem

GFS operation

The master does not keep a persistent record of chunk locations, but instead queries the chunk servers at startup and then is updated by periodic polling.
GFS is a journaled filesystem: all operations are added to a log first, then applied. Periodically, log compaction creates checkpoints. The log is replicated across nodes.
If a node fails…:
- If it is a master, external instrumentation is required to start it somewhere else, by rerunning the operation log
- If it is a chunkserver, it just restarts
Chunkservers use checksums to detect data corruption

GFS stale chunkservers

GFS uses vector clocks to keep track of stale replicas.

Master maintains a chunk version number to distinguish up-to-date and stale replicas
Before an operation on a chunk, master ensures that version number is advanced
Stale replicas are removed in the regular garbage collection cycle

HDFS – The Hadoop FileSystem

HDFS started at Yahoo as an open source replica of the GFS paper, but since v2.0 it is different system.

GFS	HDFS
Master	NameNode
Chunkserver	DataNode
operation log	journal
chunk	block
random file writes	append-only
multiple writer/reader	single writer, multiple readers
chunk: 64MB data, 32bit checksums	128MB data, separate metadata file

HDFS architecture

HDFS architecture diagram

The main difference with GFS is that HDFS is a user-space filesystem written in Java.

HDFS access session

# List directory
$ hadoop fs ls /
Found 7 items
drwxr-xr-x   - gousiosg sg          0 2017-10-04 08:23 /audioscrobbler
-rw-r--r--   3 gousiosg sg 1215803135 2017-10-04 08:25 /ghtorrent-logs.txt
-rw-r--r--   3 gousiosg sg   66773425 2017-10-04 08:23 /imdb.csv
-rw-r--r--   3 gousiosg sg     198376 2017-10-23 12:39 /important-repos.csv
-rw-r--r--   3 gousiosg sg     611300 2017-10-04 08:24 /odyssey.mb.txt
-rw-r--r--   3 gousiosg sg  388422973 2017-10-03 15:40 /pullreqs.csv

# Create a new file
$ dd if=/dev/zero of=foobar bs=1M count=100

# Upload file
$ hadoop fs -put foobar /
-rw-r--r--   3 gousiosg sg  104857600 2017-11-27 15:42 /foobar

HDFS looks like a UNIX filesystem, but does not offer the full set of operations.

For the course, we only need to know how to upload data to HDFS.

Content credits

FreeBSD read syscall, by Diomidis Spinellis
Linux block layer diagram, by Thomas Krenn
GFS and HDFS content, by Claudia Hauff
GFS block diagrams, by Ghemawat et al [1]
HDFS block diagram, by

Bibliography

[1]

S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” in Proceedings of the nineteenth ACM symposium on operating systems principles, 2003, pp. 29–43.

[2]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in IEEE 26th symposium on mass storage systems and technologies (MSST), 2010, pp. 1–10.

Distributed filesystems

Georgios Gousios

09 September 2021

Distributed filesystems

What is a filesystem?

`read`ing from a filesystem

`read`ing from a file, block layer

Distributed file systems

The Google filesystem

GFS architecture

GFS storage model

GFS writes

GFS operation

GFS stale chunkservers

HDFS – The Hadoop FileSystem

HDFS architecture

HDFS access session

Content credits

Bibliography

Copyright

Distributed filesystems

Georgios Gousios

09 September 2021

Distributed filesystems

What is a filesystem?

reading from a filesystem

reading from a file, block layer

Distributed file systems

The Google filesystem

GFS architecture

GFS storage model

GFS writes

GFS operation

GFS stale chunkservers

HDFS – The Hadoop FileSystem

HDFS architecture

HDFS access session

Content credits

Bibliography

Copyright

`read`ing from a filesystem

`read`ing from a file, block layer