JZFS
JZFS
A Git like version control file system for data linage & data collaboration
Description
JZFS is an advanced Data-Centric Version Control File System designed to enhance Responsible AI Engineering by improving key aspects like data versioning, provenance, and reproducibility.
In dynamic production environments where machine learning components are frequently updated, JZFS ensures that every new model release—whether...
KPIs
Additional Details
Problem Statement
The scientific and AI/ML communities are grappling with significant challenges related to data integrity, reproducibility, and collaboration.
The reproducibility crisis in scientific research has undermined confidence in results, with a considerable number of studies failing replication attempts.
In AI/ML, frequent updates, experiments, and lack of disciplined versioning exacerbate these issues, leading to unreliable models and results. Current data management systems often fail to maintain immutable traceability links between raw data, processing steps, and final outcomes, making it difficult to reproduce and verify results.
Solution
JZFS is a Data-Centric Version Control File System designed to address these challenges by providing a comprehensive framework for data versioning, provenance, and reproducibility.
It leverages insights from software engineering and cryptographic technologies to ensure the end-to-end integrity of scientific data and AI/ML pipelines.
For this model, Filecoin provides a needed solution by making storage affordable, preventing vendor lock-in and fostering a culture of openness. By storing research objects in a decentralized repository, it becomes possible to share the entire research object with data and code, provide versioning, and ensure reproducibility.
A dataset management component creates a good separation between training data collection and consumption by defining strongly typed schemas for both, which allows data development and model algorithm development to iterate in their own loop, thus expediting the project development.