NAVIGATION
Large-Scale Systems Group
Private
Large-Scale Systems Group (LSSG) @ University of Chicago
Large-Scale Systems Group -> People -> Andrew A. Chien -> Andrew A. Chien Teaching -> Data Intensive Computing Systems (2012) -> Syllabus
Week Week of - Topics Readings Core Topics Course Notes
1 26-Mar Big Data is HERE!; Big Data Application Archetypes; Major Infrastructures EMCs Digital universe 2011,2010 (www.emc.com/leadership/programs/digital-universe.htm; HP Data Dwarfs(www.hpl.hp.com/techreports/2010/HPL-2010-115.html);   Discuss Expectation, Projects, Coursework
2 2-Apr Task-Parallel (www.ci.uchicago.edu/swift/main),; pagerank,mapreduce http://research.google.com/archive/mapreduce.html; Traffic Estimation Discuss papers, Project Types
3 6-Apr Storage, Traditional Filesystems, Databases Wilkes (Autoraid), dl.acm.org/citation.cfm?id=225535.225539,   lustre/pvfs dl.acm.org/citation.cfm?id=1268379.1268407,dl.acm.org/cication.cfm?id=2063384.2063474,   Model, Services, Guarantees, Consequences Discuss Data Intensive Computing Projects - "what makes a project data-intensive"
9-Apr Databases and CAP Theorem Architecture of a Database System http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf, CAP theorem http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf Discuss "Exemplar" Data-Intensive Computing Infrastructures
13-Apr CAP Theorem and More CAP Theorem: Growing Impact http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2012/0312/W_CO_TheCAPTheoremsGrowingImpact.pdf, Pushing the CAP http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2012.37, Beyond CAP http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2012.33, CAP and Cloud in PNUTS http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2011.388 Propose a project, and an infrastructure; crawl assessment
4 April 16, Apri l 20 Andrew away Break Work on Projects
5 23-Apr Key Value Stores   BigTable, SOSP 2006
Dynamo, SOSP 2007
SILT, SOSP 2011
Scale, Reliability Project plan, walk demonstration Project Walk Assignment + Plan  
27-Apr NoSQL and SQL Comparison of approaches to large-scale analysis dl.acm.org/citation.cfm?id=1559845.1559865, SQL vs. NoSQL http://dl.acm.org/citation.cfm?doid=1721654.1721659, Stonebraker on NoSQL in Enterprise http://dl.acm.org/citation.cfm?doid=1978542.1978546, Stonebraker on Data Warehousing Data Management Paradigm
6 30-Apr NoSQL Databases Cassandra: Principles and Application...
http://dfeatherston.com/cassandra-adf-uiuc-su10.pdf
Hive – A Petabyte Scale Data Warehouse Using Hadoop,
http://i.stanford.edu/~ragho/hive-icde2010.pdf
Efficient Processing of Data Warehousing Queries in a Split Execution
Environment, Abadi, et. al.
http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf
Dynamic Schema, Storage - Sharding
4-May Big Data Computing Middleware mapreduce/hadoop , gfs/hdfs http://research.google.com/archive/gfs.html,Incoop, http://dl.acm.org/citation.cfm?id=2038923, cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop Block data parallel, Object parallel, higher level interfaces
7 7-May Novel Storage Technologies I IBM Storage Class Memories Tutorial (FAST); "Phase Change Memories (MKF)
Philip Wong, Phase Change Memory Survey
Flash, PCM, MRAM, … enterprise-class storage; stacked flash, ssd, HP nanostores, mram, pcm  
11-May Storage Technologies Integration Freitas, Storage-Class Memory: the next storage system technology, IBM Journal of Research and Development.
Micron's Hybrid memory cube, (http://hotchips.org/uploads/hc23/HC23.18.3-memory-FPGA/HC23.18.320-HybridCube-Pawlowski-Micron.pdf)From Microprocessors to Ranganathan, Nanostores: Rethinking Data-intensive computing systems,
http://www.hpl.hp.com/news/2011_IEEEComputer_nanostores.pdf
Flash storage: http://www.fusionio.com/load/-media-/1qaz4e/docsLibrary/WP_-_Navy_-_SSC_Atlantic_-_Fusion-io_Testing.pdf
http://www.fusionio.com/load/-media-/1qaz4e/docsLibrary/FIO_SSD_Differentiator_Overview.pdf
8 14-May Systems and Software integration approaches David Roberts, Taeho Kgil, Trevor Mudge, "Integrating NAND flash devices onto servers", J Commun. ACM
http://dl.acm.org/citation.cfm?doid=1498765.1498791
Badam and Pai, "SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy",
http://www.usenix.org/event/nsdi11/tech/full_papers/Badam.pdf
Mohit Saxena, Michael M. Swift, Yiying Zhang, "FlashTier: a lightweight, consistent and durable storage cache", Proceedings of the 7th ACM european conference on Computer Systems, http://dl.acm.org/citation.cfm?doid=2168836.2168863
Replacement, Integration, Combination
18-May Cleversafe Guest Lecture - Andrew Baptist "Architecture of the Cleversafe System and Current Research Challenges" Full project demonstration
9 21-May Projects "Check-in" Presentation Project Run Assignment
10 28-May Holiday Monday, Projects Friday Work on Projects
4-Jun Exam Week Final Project Presentations Final Project Presentation, Demo, and Report
Candidate infrastructures: Zarkov/MongoDB, Hadoop/HDFS,Graphlab Course Work: Paper discussion lead (2 classes)
Project Proposal, crawl demonstration
Candidate projects: Implement a large scale analysis (that someone else has done) Project Detailed Plan, Walk demonstration
Implement a variation, based on some new technology (i.e. ssd, replicated storage, remote disk) Full project demonstration
Implement a new application (take sequential, make parallel or take small, make large) Final Presentation and Writeup
3   mapreduce/hadoop , gfs/hdfs http://research.google.com/archive/gfs.html, Block data parallel, Object parallel, higher level interfaces Propose a project, and an infrastructure; crawl demonstration
Incoop, http://dl.acm.org/citation.cfm?id=2038923, cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop
Cassandra http://dfeatherston.com/cassandra-adf-uiuc-su10.pdf, HadoopDB, http://dl.acm.org/citation.cfm?id=1687731, Hive, http://infolab.stanford.edu/~ragho/hive-icde2010.pdf
mapreduce/hadoop , gfs/hdfs http://research.google.com/archive/gfs.html,Incoop, http://dl.acm.org/citation.cfm?id=2038923, cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop
Bigtable,http://dl.acm.org/citation.cfm?id=1365815.1365816, dynamo http://dl.acm.org/citation.cfm?id=1294261.1294281, memcached, http://memcached.org/about,silt, http://dl.acm.org/citation.cfm?id=2043556.2043558,
ssd's in enterprise systems, flash-in-array checkpointing (llnl), ssdalloc, fawn/HP Moonshot, pcm in processor (find best), replacement?; RAMCloud

Subpages


Comments


author: Andrew A. Chien, achien7242@gmail.com
updated: June 16, 2015, 05:39 AM
revision: 5

Attachments