Simple three-step process
With Software Heritage, you can easily archive your research software artifacts. It also allows you to include precise references to specific versions of the source code in your research articles, even down to individual fragments of specific source files. This in turn improves the reviewer experience (for example, artifact evaluation committees) and benefits all future readers, including yourself.
1. Prepare your public repository
Make sure your source code is hosted on a publicly accessible repository (GitHub, Bitbucket, a GitLab instance, an institutional software forge, etc.) using a version control system supported by Software Heritage. (These currently include Subversion, Mercurial, and Git.) Please follow best practices, by including the following files at the top level of your source-code tree:
- README
contains a description of the software (name, purpose, pointers to the web site, documentation, development platform, contact and support information, …) - AUTHORS
list of all the persons that need to be credited for the software; if you want to specify the roles of each person, we suggest using the taxonomy of contributors from Inria. - LICENSE
Project license terms. For open source licenses, please use standard SPDX license names. For large software projects and developers, consider REUSE process and tools. - A codemeta.json
A linked data metadata file will help index your source code in the Software Heritage archive and provide an easy way to link to related research outputs. See the CodeMeta initiative for more information and our CodeMeta generator tool.
2. Save your code
Once your code repository has been properly prepared and updated, follow the next steps:
- Go to the Software HeritageSave Code Now page
- Pick your version control system in the drop-down list
- Enter the code repository URL (the clone/checkout url as given by your development platform)
- Click ‘submit’
That’s it, you’re done!
There’s no need to create an account or provide personal information of any kind. If the URL you provided is correct, Software Heritage will archive your repository with its full development history shortly after. If your repository is hosted on one of the major forges we already work with, this process can take just a few hours; if you point to a location that’s new to us, it can take longer because it requires manual approval. You can also request archival programmatically, using the dedicated Software Heritage API entry point.
3. Reference your work
Once your source code has been archived, there are a number of ways to reference it.
Three common methods are:
- adding a link to the full repository archived in Software Heritage
- adding a link to a precise version of the software project
- adding a link to a precise version of a source code file, down to the level of the line of code
The full repository
The link to the full repository archived in Software Heritage (with all its development history) is obtained by prepending to the URL you used to request its archival the prefix https://archive.softwareheritage.org/browse/origin.
For example, if the repository you have saved is https://github.com/rdicosmo/parmap, then the link to the saved version in Software Heritage will be
https://archive.softwareheritage.org/browse/origin/https://github.com/rdicosmo/parmap/
By following this link, your readers can browse content of your repository, delving into development history, and/or directory structure, down to each single file.
Using Software Heritage intrinsic identifiers (SWHID)
Software Heritage provides a fully documented standard identifier schema, called SWHID, to equip any software artifact with intrinsic identifiers. To learn more about the properties that make SWHIDs the identifiers of choice for reproducibility, see this research article and learn more about who is adopting it in this blog post.
SWHID can be equipped with a rich set of qualifiers that can make precise the context in which a given artifact is meant to be seen.
Here are a few examples of how to use them.
Specific version of the project
This SWHID identifies a precise version of the source code of Parmap:
swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://github.com/rdicosmo/parmap
SWHIDs can be turned into a clickable URLs by prepending to them the prefix https://archive.softwareheritage.org/
Click the link below to directly access the specific version of the code in Software Heritage.
A very simple way of getting the right SWHID is to browse your archived code in Software Heritage then navigate to the revision you’re interested in. Then click the permalinks vertical red tab present on all pages of the archive, and in the tab that opens up you select the revision identifier.
Version one of the SWHIDs uses git-compatible hashes, so if you’re using Git as a version control system, you can create the right SWHID just by adding swh:1:rev: to your commit hash.
Code fragment
SWHIDs as supported by Software Heritage allow you to go further and pinpoint a given fragment of code inside a specific version of a file by using the lines= qualifier available for identifiers that point to files.
For example, the following SWHID that showcases all the available qualifiers for content SWHIDs points to the core mapping algorithm inside the Parmap source code as presented in a research article describing Parmap back in 2012:
swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379; origin=https://gitorious.org/parmap/parmap.git; visit=swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82; anchor=swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2; path=/parmap.ml; lines=101-143
Test it by clicking on this link: it takes you to the Software Heritage archive on a page showing the corresponding source code, with the relevant lines highlighted.
Here too, you can get the exact link by navigating to the code fragment you’re interested in the archive, click on the line number of the first line of the fragment, shift-click on the last one then open the permalinks tab.
Bibliography entries for software
Last but not least, take a look at biblatex-software, a bibliographic style that makes full use of SWHIDs is available for BibLaTeX users from CTAN. See the documentation there to learn more.