|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Week
|
Week of -
|
|
Topics
|
Readings
|
Core Topics
|
Course Notes
|
|
|
|
|
|
1
|
26-Mar
|
|
Big Data is HERE!; Big Data
Application Archetypes; Major Infrastructures
|
EMCs Digital universe 2011,2010
(www.emc.com/leadership/programs/digital-universe.htm; HP Data
Dwarfs(www.hpl.hp.com/techreports/2010/HPL-2010-115.html);
|
|
Discuss Expectation, Projects,
Coursework
|
|
|
2
|
2-Apr
|
|
|
Task-Parallel
(www.ci.uchicago.edu/swift/main),; pagerank,mapreduce
http://research.google.com/archive/mapreduce.html; Traffic Estimation
|
|
Discuss papers, Project Types
|
|
|
3
|
6-Apr
|
|
Storage, Traditional
Filesystems, Databases
|
Wilkes (Autoraid),
dl.acm.org/citation.cfm?id=225535.225539,
lustre/pvfs
dl.acm.org/citation.cfm?id=1268379.1268407,dl.acm.org/cication.cfm?id=2063384.2063474,
|
Model, Services, Guarantees,
Consequences
|
Discuss Data Intensive Computing
Projects - "what makes a project data-intensive"
|
|
|
|
9-Apr
|
|
Databases and CAP Theorem
|
Architecture of a Database
System http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf, CAP theorem
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
|
|
Discuss "Exemplar"
Data-Intensive Computing Infrastructures
|
|
|
|
13-Apr
|
|
CAP Theorem and More
|
CAP Theorem: Growing Impact
http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2012/0312/W_CO_TheCAPTheoremsGrowingImpact.pdf,
Pushing the CAP
http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2012.37, Beyond CAP
http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2012.33, CAP and Cloud
in PNUTS http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2011.388
|
|
|
Propose a project, and an
infrastructure; crawl assessment
|
|
|
4
|
April 16, Apri
l 20
|
Andrew away
|
Break
|
|
|
Work on Projects
|
|
|
5
|
23-Apr
|
|
Key Value Stores
|
BigTable, SOSP 2006
Dynamo, SOSP 2007
SILT, SOSP 2011
|
Scale, Reliability
|
Project plan, walk demonstration
|
Project Walk Assignment + Plan
|
|
|
|
|
|
27-Apr
|
|
NoSQL and SQL
|
Comparison of approaches to
large-scale analysis dl.acm.org/citation.cfm?id=1559845.1559865, SQL vs.
NoSQL http://dl.acm.org/citation.cfm?doid=1721654.1721659, Stonebraker on
NoSQL in Enterprise http://dl.acm.org/citation.cfm?doid=1978542.1978546, Stonebraker
on Data Warehousing
|
Data Management Paradigm
|
|
|
|
|
6
|
30-Apr
|
|
NoSQL Databases
|
Cassandra: Principles and
Application...
http://dfeatherston.com/cassandra-adf-uiuc-su10.pdf
Hive – A Petabyte Scale Data Warehouse Using Hadoop,
http://i.stanford.edu/~ragho/hive-icde2010.pdf
Efficient Processing of Data Warehousing
Queries in a Split Execution
Environment, Abadi, et. al.
http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf
|
Dynamic Schema, Storage - Sharding
|
|
|
|
|
4-May
|
|
Big Data Computing Middleware
|
mapreduce/hadoop , gfs/hdfs
http://research.google.com/archive/gfs.html,Incoop,
http://dl.acm.org/citation.cfm?id=2038923,
cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop
|
|
Block data parallel, Object
parallel, higher level interfaces
|
|
|
7
|
7-May
|
|
Novel Storage Technologies I
|
IBM Storage Class Memories
Tutorial (FAST); "Phase Change Memories (MKF)
Philip Wong, Phase Change Memory Survey
|
Flash, PCM, MRAM, …
|
|
|
enterprise-class storage; stacked
flash, ssd, HP nanostores, mram, pcm
|
|
|
|
11-May
|
|
Storage Technologies Integration
|
Freitas, Storage-Class Memory:
the next storage system technology, IBM Journal of Research and
Development.
Micron's Hybrid memory cube,
(http://hotchips.org/uploads/hc23/HC23.18.3-memory-FPGA/HC23.18.320-HybridCube-Pawlowski-Micron.pdf)From
Microprocessors to Ranganathan, Nanostores: Rethinking Data-intensive
computing systems,
http://www.hpl.hp.com/news/2011_IEEEComputer_nanostores.pdf
Flash storage:
http://www.fusionio.com/load/-media-/1qaz4e/docsLibrary/WP_-_Navy_-_SSC_Atlantic_-_Fusion-io_Testing.pdf
http://www.fusionio.com/load/-media-/1qaz4e/docsLibrary/FIO_SSD_Differentiator_Overview.pdf
|
|
|
|
|
|
|
8
|
14-May
|
|
Systems and Software integration
approaches
|
David Roberts, Taeho Kgil,
Trevor Mudge, "Integrating NAND flash devices onto servers", J
Commun. ACM
http://dl.acm.org/citation.cfm?doid=1498765.1498791
Badam and Pai, "SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy",
http://www.usenix.org/event/nsdi11/tech/full_papers/Badam.pdf
Mohit Saxena, Michael M. Swift, Yiying Zhang, "FlashTier: a
lightweight, consistent and durable storage cache", Proceedings of the
7th ACM european conference on Computer Systems,
http://dl.acm.org/citation.cfm?doid=2168836.2168863
|
Replacement, Integration,
Combination
|
|
|
|
|
18-May
|
|
Cleversafe Guest Lecture -
Andrew Baptist "Architecture of the Cleversafe System and Current
Research Challenges"
|
|
|
Full project demonstration
|
|
|
9
|
21-May
|
|
Projects "Check-in"
Presentation
|
|
|
|
Project Run Assignment
|
|
|
10
|
28-May
|
Holiday Monday, Projects Friday
|
|
|
|
Work on Projects
|
|
|
|
4-Jun
|
Exam Week
|
Final Project Presentations
|
|
|
|
Final Project Presentation, Demo,
and Report
|
|
|
|
|
|
|
|
|
|
|
Candidate infrastructures:
|
Zarkov/MongoDB,
Hadoop/HDFS,Graphlab
|
|
Course Work:
|
Paper discussion lead (2 classes)
|
|
|
|
|
|
|
|
Project Proposal, crawl
demonstration
|
|
|
|
Candidate projects:
|
Implement a large scale analysis
(that someone else has done)
|
|
|
Project Detailed Plan, Walk
demonstration
|
|
|
|
|
Implement a variation, based on
some new technology (i.e. ssd, replicated storage, remote disk)
|
|
|
Full project demonstration
|
|
|
|
|
Implement a new application
(take sequential, make parallel or take small, make large)
|
|
|
Final Presentation and Writeup
|
|
|
|
|
|
|
|
|
|
|
3
|
|
|
|
mapreduce/hadoop , gfs/hdfs
http://research.google.com/archive/gfs.html,
|
Block data parallel, Object
parallel, higher level interfaces
|
Propose a project, and an
infrastructure; crawl demonstration
|
|
|
|
|
|
|
Incoop,
http://dl.acm.org/citation.cfm?id=2038923,
cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop
|
|
|
|
|
|
|
|
|
Cassandra
http://dfeatherston.com/cassandra-adf-uiuc-su10.pdf, HadoopDB,
http://dl.acm.org/citation.cfm?id=1687731, Hive,
http://infolab.stanford.edu/~ragho/hive-icde2010.pdf
|
|
|
|
|
|
|
|
mapreduce/hadoop , gfs/hdfs
http://research.google.com/archive/gfs.html,Incoop,
http://dl.acm.org/citation.cfm?id=2038923,
cielo.http://dl.acm.org/citation.cfm?id=1972470, presto?, scihadoop
|
|
|
|
|
|
|
|
Bigtable,http://dl.acm.org/citation.cfm?id=1365815.1365816,
dynamo http://dl.acm.org/citation.cfm?id=1294261.1294281, memcached,
http://memcached.org/about,silt,
http://dl.acm.org/citation.cfm?id=2043556.2043558,
|
|
|
|
|
|
|
|
ssd's in enterprise systems,
flash-in-array checkpointing (llnl), ssdalloc, fawn/HP Moonshot, pcm in
processor (find best), replacement?; RAMCloud
|
|
|
|