Branch Management for Data Lakes
lakeFS (by Treeverse) is an open-source platform that brings Git-like version control to data lakes, enabling data engineers to manage data with the same rigor as code. The Branches Management experience was redesigned end-to-end to better support data workflows. The goal was to make branching familiar, while adapting it to the realities of large-scale data pipelines.
Unlike code, data workflows involve large-scale, constantly changing datasets, where mistakes are costly and hard to trace across environments.
- No clear way to manage parallel data work (experiments, fixes, pipelines)
- Limited visibility into branches and their state
- GitHub mental models don't fully translate to data workflows
- High risk when modifying production data
The goal was to create a "home-like" experience for developers while addressing data-specific nuances:
- Translating GitHub concepts into data-native workflows
- Designing for high-scale environments (many branches, large datasets)
- Reducing cognitive load in critical operations (merge, commit, delete)
- Supporting both exploration and action in a single surface
1. Unified Branches View
A centralized table surfaces all branches alongside their relevant context, including last update, commit state, and relationships between branches. The structure builds on familiar GitHub patterns, while adapting to scale and making branch state immediately understandable.
2. Status Visibility & Merge Readiness
Branch status is surfaced directly within the table, with additional detail revealed on hover. This interaction provides immediate insight into test outcomes and merge readiness, highlighting which branches can be merged and which are blocked. Required checks are consistently surfaced at the top, making it clear when failures prevent progression.
3. Branch Drill-Down to Data
Clicking a branch enables a deeper exploration of its contents, similar to navigating into a repository in GitHub. In GitHub, the question is "what's in the code?" but in lakeFS it becomes "what's in the data?" Instead of code, the user is taken to an Objects view - exposing the underlying data, its structure, and hierarchy within the repository.
The new Branches experience improved clarity and confidence in managing data workflows, with users reporting higher satisfaction and smoother day-to-day operations.



