Instructor:
Andrew A. Chien
Meetings: TuTh, Ry 277
"Big Data" and Data Analytics have become hot topics as well as drivers of multi-billion dollar industries. We live in an era of unprecedented data collection from sources as diverse as e-commerce, the WWW, scientific instruments, wireless sensors, and a rich electronic, networked infrastructure. While cheap computing, sensors, storage, and pervasive networking make the collection of these exabytes of data possible, significant challenges exist in the analysis of "big data" to deliver internet-scale services, scientific insights, and of course commercial insights.
The course objective is to e
xpose students to the technical challenges of data-intensive computing systems,
including canonical driving problems, research systems, and emerging technologies. While other classes focus on analysis algorithms (or even underlying statistical or machine learning methods), in this class we focus on the computer systems and technology needed to achieve scalable and efficient data-intensive computing systems. Through intensive research paper reading, interactive discussions, presentations, and in-depth course projects, students will develop
a broad familiarity with current challenges, the state of the art, including leading edge research in the area, and hands-on experience with a range of systems which together provide a solid preparation for research in the area. Course topics include: parallel filesystems, SQL databases, NoSQL/Mapreduce systems, graph-processing systems, , and popular open source infrastructures such as Graphlab, Giraph, Hadoop, VoltDB/HadoopDB, Cassandra, Memcached, MongoDB, and others.
Course Activites will include:
- paper reading, presentation and discussion
- hands-on labs/projects with leading edge data-intensive computing systems
- invited speakers from leading companies and projects
Writeup Submission Link is
HERE
Lecture Slides
Assignments