Course: COS418
Instructor: Freedman
F 2016

Description of Course Goals and Curriculum

This course covers the basic design and implementation of distributed systems. In particular, we study the principles, techniques, and (often) trade-offs in building the kind of high-performance systems that supports some of the largest sites on the internet (e.g. Google, Facebook, Amazon). Topics include server design, network programming, naming, concurrency and locking, consistency models and techniques, security, and fault tolerance. Through the assignments, students will gain practical experience designing, implementing, and debugging real distributed systems.

The topics of the course include:

  • Fault Tolerant Systems
  • Primary Backup
  • 2-Phase Commit
  • Paxos
  • Raft
  • Byzantine Fault Tolerance

  • Scalability, Consistency, and Transactions
  • Distributed Hash Tables
  • Eventual Consistency
  • Strong vs. Causal Consistency
  • Concurrency Control, Locking, and Recovery
  • Spanner

  • “Boutique Topics”
  • Conflict Resolution (OT)
  • Blockchains
  • Distributed Mesh Wireless Networks
  • Content Delivery Networks

  • Data Processing Applications
  • Graph Processing
  • Chandy-Lamport Snapshotting
  • Stream Processing
  • Cluster Scheduling<
  • /ul>

    The class has one in-class midterm, five programming assignments, and one final exam.

    Learning From Classroom Instruction

    There are two 50 minute lectures (on Mondays and Wednesdays) and one 50 minute precept (on Friday) every week. There are many different ways to learn the material in the course. Professor Kyle Jamieson and Professor Mike Friedman both teach using PowerPoint presentations, and they generally do a good job of posting the slides the morning before lecture. Nearly all of the topics covered in this course have well-known and published papers that go more in-depth than the lecture slides. In precept, TAs usually review concepts from lecture and go over past assignments.

    Learning For and From Assignments

    The first homework assignment is 5% of each student’s grade, and subsequent assignments are each worth 10%. The first two assignments involve building a parallel implementation of the MapReduce algorithm. The third and fourth assignments give students the opportunity to build a working implementation of a distributed key-value storage algorithm known as Raft. The final assignment involves writing a wrapper on top of the Raft algorithm to build a fault-tolerant key-value storage service. All assignments are written using the Go programming language. No prior knowledge with Go is assumed, and the precepts do a good job of introducing the language. Since the assignments only cover a small subset of the topics of the class, most of the midterm and final exam focus on the other algorithms introduced in lecture.

    Some of the assignments from this class can be quite challenging (assignment 3 and especially assignment 4). For these assignments, students should expect to spend a significant amount of time debugging concurrent programs. The two exams in this course can also be more demanding than those from other COS courses because of the sheer number of topics covered. Each lecture goes over one (sometimes two) published papers, so students may find it difficult to keep up with the reading every week on top of the assignments. Personally, I found that studying the lecture slides (very) carefully will often suffice for the exams. Reading the papers is an excellent way to get more exposure to the field, but is not strictly necessary to do well in the course. Finally, office hours and Piazza are available to help students on assignments and exam prep.

    External Resources

    Most of the topics covered in this course are well-known systems that are deployed in today’s world. Therefore, finding additional explanations on a system or algorithm is often just a Google search away. Furthermore, the professors do a good job of posting links to relevant papers on the course syllabus. For students with no prior knowledge of Go, I would highly recommend skimming Brian Kernighan’s book (http://www.gopl.io/).

    What Students Should Know About This Course For Purposes Of Course Selection

    This course can be used to fulfill one Systems requirement for concentrators within COS. COS 217 is listed as a prerequisite for this course, and students should have also taken either COS 318 (Operating Systems) or COS 333 (Advanced Programming Techniques) before course. I found that COS 318 was useful in understanding the concepts in this course on a deeper level, but knowledge from COS 318 is certainly not necessary to do well in Distributed Systems.

    Overall, this is the only course in Princeton that really studies distributed systems, even though these systems power a large chunk of the modern internet. In addition to being a good introduction to distributed systems, this course also gives students the opportunity to learn and use Go under the context of concurrent programs, which was a main motivation behind the design of the language.

Distributed Systems

Add a Strategy or Tip