Towards a next generation object storage for Software Heritage
The mission of Software Heritage is to collect, preserve and share all the publicly available source code. With 10 billion source files from more than 150 million projects, the Software Heritage archive is the largest collection of source code ever created.
Building the Software Heritage infrastructure is no simple feat. We already described the challenge of connecting to a broad spectrum of code hosting and distribution platforms, listing their contents and loading the history of development from a variety of version control system (and you can help us takling this challenge through a dedicated grant program).
Today, we are delighted to announce a collaboration with Easter-eggs to address another key issue we face: building an object storage that can scale to tens of billions of source code files, which can be quite small compared to the usual workload for which object storage systems are designed. In 2010 Facebook published a seminal article describing how they solved a similar problem. A decade later it is still a challenge that projects such as SeaweedFS or Ambry try to address.
None of these approaches is a perfect fit for storing tens of billions of immutable objects, and Easter-eggs is bringing to bear its expertise to fill the gap.
Founded in 1997, Easter-eggs is a company well known for its commitment to work exclusively with Free Software as in its democratic governance where the company belongs equally to all its workers.
“This project is perfectly in line with Easter-eggs’ values and with its professional mission of contributing to the development of Free Software” says Pierre-Yves Dillard, Eater-eggs’ co-founder, “Easter-eggs is highly motivated to be part of and actively contribute to this effort and to advancing knowledge in this area.”
The details are in this issue and the design is being drafted here. As for everything we do at Software Heritage, this is an open collaboration and you are welcome to join the effort.