MIT 6.824 Note - 01 Introduction

I start to believe that distributed systems are cool recently. Also, in my own research experience at Microsoft research, I find ideas from other areas in computer science to be particularly helpful for my projects. Therefore, trying to learn distributed systems from scratch might not be a bad idea. Now I am following the famous MIT course on distributed systems 6.824 and hope to learn as much as I can and contribute to my research. Here are my notes for lecture 1.

Distributed systems are important in modern software. Many big data systems now rely on distributed infrastructures. However, the first rule of thumb is to use a single computer as long as it’s possible to solve the problem. The reason to build distributed systems, which are obviously more complicated, is that distributed systems may achieve better performance as they embody some sort of parallelism. Also, distributed systems tolerate faults - if one computer fails, other machines might output the right result as well (data centers etc.). Other reasons to build distributed systems include physical restrictions (imagine international banks operate on different continents) or security/isolation reasons.

Challenges

Building distributed systems is hard:

  1. concurrency - many parts or computers operate concurrently
  2. partial failure - some computers fail, or the networks become unreliable
  3. performance - scalability is hard. Oftentimes, if we increase the number of computers by a factor of, say, 2, the throughput can not have exactly 2 times performance.

Fault tolerance

To measure fault tolerance, two criteria are availability and recoverability. Availability measures how the system performs under certain failures. Higher availability means the system can operate normally even when failures are occurring in the system. Recoverability measures that, when the system does crash, how much effort is needed to put the system back to service.

Consistency

Strong vs weak consistency: strong consistency is often more desired by customers, but it requires more specs and effort thus less efficient. Weak consistency models are being implemented in the real world, and both academia and industry are working on making weak consistency more useful.

MapReduce

(WIP, reading paper)

(updated on 05/11/2022)