A Demonstration of RELIC: A System for REtrospective Lineage InferenCe of Data Workflows

Abstract

The ad-hoc, heterogeneous process of modern data science typically involves loading, cleaning, and mutating dataset(s) into multiple versions recorded as artifacts operated on by various tools within a single data science workflow. Lineage information, including the source datasets, data transformation programs or scripts, or manual annotations, is rarely captured, making it difficult to infer the relationships between artifacts in a given workflow retrospectively. We demonstrate RELIC, a tool to retrospectively infer the lineage of data artifacts generated as a result of typical data science workflows, with an interactive demonstration that allows users to input artifact files and visualize the inferred lineage in a web-based setting.

Publication
Proc. VLDB Endow.