When it comes to supporting Open Science and adhering to the ‘Know Your Software’ (KYSW) principle, getting your source code properly archived and referenced makes a real difference.
This page offers a straightforward checklist to help you do just that, smoothly, using Software Heritage.
Step 1: Prepare your public repository
- Add a README file
- Add an AUTHORS file
- Add license information. There are two recommended ways:
- a LICENSE file at the root of your project, or
- a LICENSES directory containing all the licenses used in your project, and an SPDX compliant copyright header in all your source code files (see the REUSE instructions for details and tools)
Optional: Consider including a
codemeta.json
file for machine-readable metadata. The CodeMeta Generator can help create one easily.It’s now common practice to include markdown versions of the README, but please keep the AUTHORS and LICENSE files as plain text.
Step 2: Save your code in Software Heritage
There are several options for archiving your code, depending on your workflow and where it’s hosted. We’ve summarized them below so you can pick what works best for you.
Manually through the UpdateSWH browser extension
For projects hosted on Bitbucket, GitHub, GitLab.com or any GitLab instance, archival is best triggered manually with a click by installing the dedicated Updateswh browser extension, available for Chrome and Firefox.
Manually through the Save Code Now feature
- Go to the Software Heritage Save Code Now page
- Choose the appropriate version control system in the drop-down list
- Currently, the supported types are:
git
, for origins using Githg
, for origins using Mercurialsvn
, for origins using Subversioncvs
, for origins using CVSbzr
, for origins using Bazaartarball
, for tarball origins (supported formats:.jar
,.tar
,.tar.bz2
,.tar.gz
,.tar.lz
,.tar.xz
,.tar.zst
,.zip
)
- Currently, the supported types are:
- Enter the code repository reference URL
N.B. This must be the URL that enables cloning or checking out your project from the repository. If you’re unsure, double-check by attempting to clone/checkout your project into a temporary directory. - Here are examples for git, svn and hg:
- git clone <your url>
- svn checkout <your url>
- hg clone <your url>
- Click «submit» .
Automatically, via the API
You can trigger repository archival programmatically using the Software Heritage API. This means you can make it a part of any development workflow. If your code lives on one of the popular hosting platforms, we recommend using the dedicated API endpoints for their webhooks. See more here: blog post with detailed examples and documentation for the Bitbucket endpoint, Gitea endpoint, GitHub endpoint, GitLab endpoint (and all its instances), and SourceForge endpoint.
Here’s an example of configuring a webhook on GitHub, showing the recommended settings: trigger archival on branch, tag, or release creation, rather than on every push.
On GitHub you can also use a GitHub action, that may be simpler to configure, but keep in mind that it uses way more resources behind the scenes than a webhook call, for the same end result.
Step 3: Reference your code
To properly reference your code, use the Software Hash IDentifier (SWHID), a universal identifier for software, now an ISO/IEC international standard.
Getting the SWHID for a full directory
Navigate to your desired directory (the browser extension can help you quickly find the archived repository). Then, pull out the red ‘Permalinks’ tab, as shown below, and copy the SWHID identifier or the permalink to your clipboard for use in your documents.
Getting the SWHID for a code fragment
You can also get the SWHID of a file, or a code fragment inside a file. To do this, start by heading to the file. You can even select a specific code snippet if you like – just click the first line number, then Shift-click the last. After that, pull out the red ‘Permalinks’ tab and grab either the SWHID identifier or the permalink.
Citing software from the archive
If your project includes a codemeta.json
or citation.cff
file, you’ll see a special citation sidebar appear just below the permalinks. Clicking this will reveal a tab with a fully formatted citation, ready for your publications. Currently, this is provided in BibTeX format, perfect for LaTeX users working with the biblatex-software
package (described below).
For LaTeX users
If you use LaTeX for your documents, then you’ll love the
biblatex-software
package: it makes producing clean bibliographic entries for software a breeze, and it has native SWHID support built right in.
biblatex-software
is integrated into CTAN and TeXLive, and it works out-of-the-box in Overleaf (here’s a template for the official ACM article style). As of April 2022, it’s also directly integrated into the ACM article style itself.
template document for the official ACM article style
Below, you’ll find an example of what’s possible, extracted from the article “Archiving and referencing source code with Software Heritage”, ICMS 2020. Feel free to click the links to see it in action. For more details, watch the tutorial below, and check out the documentation to customize it.
Biblatex-software is integrated in CTAN and TeXLive, and works out of the box in Overleaf (here is a ). As of April 2022, biblatex-software is integrated in the ACM article style.
Here is an example of what you can obtain, extracted from the article “Archiving and referencing source code with Software Heritage”, ICMS 2020. Try clicking on the links to see the result, watch the tutorial below to learn more, and read the documentation to tweak it to your taste.
References
[1] [SW] Roberto Di Cosmo and Marco Danelutto, The Parmap library, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. URL: https://rdicosmo.github.io/parmap/
[2] [SW Rel.] Roberto Di Cosmo and Marco Danelutto, The Parmap library version 0.9.8, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. SWHID: <swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://gitorious.org/parmap/parmap.git;visit=swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82>
[3] [SW exc.] Roberto Di Cosmo and Marco Danelutto, “Core mapping routine”, from The Parmap library version 0.9.8, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. SWHID: <swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379;origin=https://gitorious.org/parmap/parmap.git;visit=swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82;anchor=swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;path=/parmap.ml;lines=101-143>