CMSC 33520: Data Intensive Computing Systems
Autumn 2021 MW 1330-1500, Location: TBD

Cloud computing has become the primary means for large-scale computation and services to be delivered. This class covers a wide range of cloud software elements that both underpin and shape the design scalable internet data services. These software structures support "big data" applications, including analytical systems such as map-reduce, and Spark. They also include data serving systems enable big data and support these systems including parallel filesystems, databases, and noSQL key-value stores such as Cassandra, Memcached, MongoDB, and more.

The success of cloud computing depends on efficient resource sharing and application isolation -- typically supported by virtual machines (VMs) and containers, controlled by orchestration infrastructures such as VMWare and Kubernetes. We will cover these models, and the oversubscription resource management approaches that sustain them, as well as the sustainability and carbon-emission challenges that limit them. Newer application architectures such as micro-services and function-as-a-service (FaaS or Serverless) enable easy application design evolution. We will cover these models and challenges they present for implementation.

Students will develop a broad familiarity with current challenges, the state of the art, including leading edge research in the area, and hands-on experience with a range of systems which together provide a solid preparation for research in the area.


Syllabus

  • Cloud Resource Abstractions
    • Virtual Machines and Containers
    • Orchestration (kubernetes, ...)
  • Resource Management
    • Real workloads
    • Oversubscription
    • Challenge: Variable capacity resources
  • Scalable Computing Systems
    • Mapreduce: mapreduce, hadoop, spark
    • Stream processing: flink, storm
    • Function-as-a-Service (FaaS and Serverless)
  • Scalable Data Services
    • Datacenter: GFS, BigTable, MemCached, Dynamo,...
    • Distributed and Consistent Data: PNUTS, Cassandra, Spanner...
  • Sustainability Challenges for the Cloud
    • Growth of Carbon (scope 1, scope 3) and Jevon's Paradox
    • Power grid 101 and Zero-carbon Cloud
    • RiPiT: Carbon-emission Information Services
    • Industry efforts: Google Carbon-aware, CA Kubernetes)
  • Acceleration: End of General-purpose?
    • Tensor Processing Unit, GPU's, and other ML Accelerators
    • Media Processing, ...
    • Attempts for broader generality: GPU's, FPGA's
Andrew A. Chien
Andrew A. Chien Teaching
Large-Scale Systems Group