Computer Architecture for Machine Learning CS33001-1

CMSC 33001-1: Computer Architecture for Machine Learning
Spring 2019, TuTh 930-1050am, Ry 277

Traditional computer architectures are evolved for procedural and object-oriented programming structure, emphasizing control flow, loop structures, as well as data structure access. Machine learning computations can have these structural properties, but deep neural networks (DNN's) have distinct structure both for training and inference. Course coverage includes basic requirements and typical workloads for machine learning, but the main focus is on understanding the diverse, emerging computer architectures designed for efficient execution of machine learning training and inference. We will study the advantages as well as technological and programming limitations of each. Coverage will include accelerators designed for use in cloud data centers, mobile smartphone clients, and internet of things (IoT) devices. While the majority of these systems are digital, we will also include novel technology approaches, such as analog multiply-accumulate devices that have the potential for orders of magnitude inferences/watt improvement.

Students will read papers, analyze, and discuss extant, emerging, and research architectures designed for high-efficiency and performance on machine learning applications.

Prereading for Lecture 1: Dean, et.al. A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution, IEEE Micro, January 2019

Syllabus (5/23/19 revision)

Deep Learning and Tensorflow Basics
Density, Parallelism, and Values
Linear-algebra Processor and Neuflow
Memory Hierarchies: Diannao
Memory Hierarchies: Eyeriss 1 and 2
Compression: Quantization and Pruning
Compression of Recurrent DNN
More Complexity: Attention and Transformers
Industry Inference Products (TPU, Cambricon)
Industry Training: Graphcore, other PR
Architecture: Predictive Activation, Temporal Similarity
Smart Training: Lottery Tickets and Hash-based
FPGA-based acceleration: BrainWave and FPGA Limitations
Architecture: Spatial Similarity: PRA and Diffy
Industry - GPU's (Volta) and Architecture: Neural Cache
Beyond Digital: Flash and Memristor based inference
DNN Accelerators and Conventional Computer Architecture

Coursework

Assignment 1 - Understanding Machine Learning Workloads
Assignment 2 - Performance analysis of an ML application on an ML Architecture
Present 2 research papers (1 class), Writeup Lecture Notes
Assignments - exploration of a single ML architecture for a broad variety of ML workloads
Students -- papers or a project

Resources

Tensorflow
Model repositories, Onnx, etc.
GPU resources, NVIDIA Deep Learning Accelerator http://nvdla.org/
List of commercial AI Chips and Projects
2017 UChicago System seminar on Computer Architectures for Machine Learning