CMSC 33001-1: Computer Architecture for Machine Learning
Spring 2019, TuTh 930-1050am, Ry 277

Traditional computer architectures are evolved for procedural and object-oriented programming structure, emphasizing control flow, loop structures, as well as data structure access. Machine learning computations can have these structural properties, but deep neural networks (DNN's) have distinct structure both for training and inference. Course coverage includes basic requirements and typical workloads for machine learning, but the main focus is on understanding the diverse, emerging computer architectures designed for efficient execution of machine learning training and inference. We will study the advantages as well as technological and programming limitations of each. Coverage will include accelerators designed for use in cloud data centers, mobile smartphone clients, and internet of things (IoT) devices. While the majority of these systems are digital, we will also include novel technology approaches, such as analog multiply-accumulate devices that have the potential for orders of magnitude inferences/watt improvement.

Students will read papers, analyze, and discuss extant, emerging, and research architectures designed for high-efficiency and performance on machine learning applications.

Prereading for Lecture 1: Dean, et.al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution, IEEE Micro, January 2019
  • Syllabus (5/23/19 revision)
    • Deep Learning and Tensorflow Basics
    • Density, Parallelism, and Values
    • Linear-algebra Processor and Neuflow
    • Memory Hierarchies: Diannao
    • Memory Hierarchies: Eyeriss 1 and 2
    • Compression: Quantization and Pruning
    • Compression of Recurrent DNN
    • More Complexity: Attention and Transformers
    • Industry Inference Products (TPU, Cambricon)
    • Industry Training: Graphcore, other PR
    • Architecture: Predictive Activation, Temporal Similarity
    • Smart Training: Lottery Tickets and Hash-based
    • FPGA-based acceleration: BrainWave and FPGA Limitations
    • Architecture: Spatial Similarity: PRA and Diffy
    • Industry - GPU's (Volta) and Architecture: Neural Cache
    • Beyond Digital: Flash and Memristor based inference
    • DNN Accelerators and Conventional Computer Architecture
  • Coursework
    • Assignment 1 - Understanding Machine Learning Workloads
    • Assignment 2 - Performance analysis of an ML application on an ML Architecture
    • Present 2 research papers (1 class), Writeup Lecture Notes
    • Assignments - exploration of a single ML architecture for a broad variety of ML workloads
    • Students -- papers or a project
Andrew A. Chien
Andrew A. Chien Teaching
Large-Scale Systems Group