GitHub is partnering with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Arctic World Archive, Microsoft Research, the Bodleian Library, and Stanford Libraries to ensure the long-term preservation of the world's open source software. We will protect this priceless knowledge by storing multiple copies, on an ongoing basis, across various data formats and locations, including a very-long-term archive designed to last at least 1,000 years.
The Internet Archive is a well-known, widely beloved non-profit digital library which provides free public access to collections of digitized materials. In partnership with the GitHub Archive Program, the Internet Archive (IA) commenced its ongoing archive of GitHub public repositories on April 13, 2020. At present, IA is using a two-pronged approach. First, their well-known Wayback Machine is accessing and archiving raw GitHub data as WARCs, or Web ARChive files. Second, they have the goal of making entire archived GitHub repositories available via “git clone,” while also keeping repo comments, issues, and other metadata easily accessible on the web.
Software Heritage Foundation
Software Heritage is a non profit, multi-stakeholder initiative launched by Inria in collaboration with UNESCO with the goal to collect, preserve and share the source code of our software commons. They’ve already archived more than 130 million projects, with their full development history, and we are delighted that 100+ million of these are from GitHub. Thanks to the collaboration announced at GitHub Universe 2019, the archival engine is being improved with the goal to keep it up to speed with GitHub‘s growth, but if the project you are interested in, or its latest version, is not archived yet, you do not need to wait, it’s easy to trigger its archival right now at save.softwareheritage.org.
Project Silica: Microsoft Research
Project Silica is developing the first storage technology designed and built from the media up for cloud-scale storage of long-lived data. By leveraging recent discoveries in ultrafast laser optics, data is stored in quartz glass, through a process that permanently changes the physical structure of the glass material. Quartz glass is a durable storage media that offers unparalleled data lifetimes of upwards of tens of thousands of years. It is resilient to electromagnetic interference, water, and heat, making it the ideal storage medium for ensuring the world’s open source software is forever preserved for future generations. As a partner in the GitHub Archive Program, Project Silica is committed to driving storage innovation, and developing a storage technology that addresses the need for a sustainable and reliable storage technology for the world’s long-lived data. We’ve archived 6,000 of the world’s most popular repositories as a proof of concept for future archives.
Shannon Lee Dawdy
Archaeologist / Anthropologist / Historian
Executive Director, Long Now Foundation
Historian / Science Fiction Author
Archaeologist / Egyptologist / Director of the Antiquities Museum at the Library of Alexandria
Computational Astrophysicist / Security Engineer