Dataframes have become a popular means to represent, transform and analyze data. This approach has gained traction and a large user base for data science practitioners - resulting in a new wave of systems that implement a dataframe API but allow for …
The ad-hoc, heterogeneous process of modern data science typically involves loading, cleaning, and mutating dataset(s) into multiple versions recorded as artifacts operated on by various tools within a single data science workflow. Lineage …
The advent of big data analysis as a profession as well as a hobby has brought an increase in novel forms of data exploration and analysis, particularly ad-hoc analysis. Analysis of raw datasets using frameworks such as pandas and R have become very …
The emergence of IoT devices is revolutionizing various aspects of human life, including healthcare, where the use of such devices can potentially improve health outcomes for millions. However, the efficacy of treatments and protocols based on IoT …
We have designed, developed and administered a course on cloud computing that was taught to over 700 students at our institution over two years. The goal of this project-based course is to provide students with foundational systems concepts as well …
MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple …
Cloud computing offers a paradigm shift in management of computing resources for large-scale applications. Using the Infrastructure-as-a-service (IaaS) cloud computing model, users today can request dynamically provisioned, virtualized resources such …
The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA's CUDA, has seen GPUs emerging as a very popular parallel computing platform. …
General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. The GPUs have been used extensively on regular …