1

FuzzyData: A Scalable Workload Generator for Testing Dataframe Workflow Systems

Dataframes have become a popular means to represent, transform and analyze data. This approach has gained traction and a large user base for data science practitioners - resulting in a new wave of systems that implement a dataframe API but allow for …

A Demonstration of RELIC: A System for REtrospective Lineage InferenCe of Data Workflows

The ad-hoc, heterogeneous process of modern data science typically involves loading, cleaning, and mutating dataset(s) into multiple versions recorded as artifacts operated on by various tools within a single data science workflow. Lineage …

Towards Understanding Data Analysis Workflows using a Large Notebook Corpus

The advent of big data analysis as a profession as well as a hobby has brought an increase in novel forms of data exploration and analysis, particularly ad-hoc analysis. Analysis of raw datasets using frameworks such as pandas and R have become very …

Safe Double Blind Studies as a Service

The emergence of IoT devices is revolutionizing various aspects of human life, including healthcare, where the use of such devices can potentially improve health outcomes for millions. However, the efficacy of treatments and protocols based on IoT …

A Cloud Computing Course: From Systems to Services

We have designed, developed and administered a course on cloud computing that was taught to over 700 students at our institution over two years. The goal of this project-based course is to provide students with foundational systems concepts as well …

Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic

MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple …

Votus: A flexible and scalable monitoring framework for virtualized clusters

Initial Findings for Provisioning Variation in Cloud Computing

Cloud computing offers a paradigm shift in management of computing resources for large-scale applications. Using the Infrastructure-as-a-service (IaaS) cloud computing model, users today can request dynamically provisioned, virtualized resources such …

A performance prediction model for the CUDA GPGPU platform

The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA's CUDA, has seen GPUs emerging as a very popular parallel computing platform. …

Fast and scalable list ranking on the GPU

General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. The GPUs have been used extensively on regular …