Close

How to archive and reference your code

When it comes to supporting Open Science and adhering to the ‘Know Your Software’ (KYSW) principle, getting your source code properly archived and referenced makes a real difference.

This page offers a straightforward checklist to help you do just that, smoothly, using Software Heritage.

Step 1: Prepare your public repository

  • Add a README file
  • Add an AUTHORS file
  • Add license information. There are two recommended ways:

Optional: Consider including a
codemeta.json file for machine-readable metadata. The CodeMeta Generator can help create one easily.It’s now common practice to include markdown versions of the README, but please keep the AUTHORS and LICENSE files as plain text.

Step 2: Save your code in Software Heritage

There are several options for archiving your code, depending on your workflow and where it’s hosted. We’ve summarized them below so you can pick what works best for you.

Manually through the UpdateSWH browser extension

For projects hosted on Bitbucket, GitHub, GitLab.com or any GitLab instance, archival is best triggered manually with a click by installing the dedicated Updateswh browser extension, available for Chrome and Firefox.

Manually through the Save Code Now feature

  • Go to the Software Heritage Save Code Now page
  • Choose the appropriate version control system in the drop-down list
    • Currently, the supported types are:
      • git, for origins using Git
      • hg, for origins using Mercurial
      • svn, for origins using Subversion
      • cvs, for origins using CVS
      • bzr, for origins using Bazaar
      • tarball, for tarball origins (supported formats: .jar.tar.tar.bz2.tar.gz.tar.lz.tar.xz.tar.zst.zip)

  • Enter the code repository reference URL
    N.B. This must be the URL that enables cloning or checking out your project from the repository. If you’re unsure, double-check by attempting to clone/checkout your project into a temporary directory.
  • Here are examples for git, svn and hg:
    • git clone <your url>
    • svn checkout <your url>
    • hg clone <your url>
  • Click «submit» .

Automatically, via the API

You can trigger repository archival programmatically using the Software Heritage API. This means you can make it a part of any development workflow. If your code lives on one of the popular hosting platforms, we recommend using the dedicated API endpoints for their webhooks. See more here: blog post with detailed examples and documentation for the Bitbucket endpointGitea endpointGitHub endpointGitLab endpoint (and all its instances), and SourceForge endpoint.

Here’s an example of configuring a webhook on GitHub, showing the recommended settings: trigger archival on branch, tag, or release creation, rather than on every push.

Get the SWHID of a directory

On GitHub you can also use a GitHub action, that may be simpler to configure, but keep in mind that it uses way more resources behind the scenes than a webhook call, for the same end result.

Step 3: Reference your code

To properly reference your code, use the Software Hash IDentifier (SWHID), a universal identifier for software, now an ISO/IEC international standard

Getting the SWHID for a full directory

Navigate to your desired directory (the browser extension can help you quickly find the archived repository). Then, pull out the red ‘Permalinks’ tab, as shown below, and copy the SWHID identifier or the permalink to your clipboard for use in your documents.

Get the SWHID of a directory

Getting the SWHID for a code fragment

You can also get the SWHID of a file, or a code fragment inside a file. To do this, start by heading to the file. You can even select a specific code snippet if you like – just click the first line number, then Shift-click the last. After that, pull out the red ‘Permalinks’ tab and grab either the SWHID identifier or the permalink.

Getting the SWHID for a code fragment

Citing software from the archive

If your project includes a codemeta.json or citation.cff file, you’ll see a special citation sidebar appear just below the permalinks. Clicking this will reveal a tab with a fully formatted citation, ready for your publications. Currently, this is provided in BibTeX format, perfect for LaTeX users working with the biblatex-software package (described below). 

For LaTeX users

If you use LaTeX for your documents, then you’ll love the biblatex-software package: it makes producing clean bibliographic entries for software a breeze, and it has native SWHID support built right in.

biblatex-software is integrated into CTAN and TeXLive, and it works out-of-the-box in Overleaf (here’s a template for the official ACM article style). As of April 2022, it’s also directly integrated into the ACM article style itself.

template document for the official ACM article style

Below, you’ll find an example of what’s possible, extracted from the articleArchiving and referencing source code with Software Heritage”, ICMS 2020. Feel free to click the links to see it in action. For more details, watch the tutorial below, and check out the documentation to customize it.

Biblatex-software is integrated in CTAN and TeXLive, and works out of the box in Overleaf (here is a ). As of April 2022, biblatex-software is integrated in the ACM article style.

Here is an example of what you can obtain, extracted from the article “Archiving and referencing source code with Software Heritage”, ICMS 2020. Try clicking on the links to see the result, watch the tutorial below to learn more, and read the documentation to tweak it to your taste.

References

[1] [SW] Roberto Di Cosmo and Marco Danelutto, The Parmap library, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. URL: https://rdicosmo.github.io/parmap/

[2] [SW Rel.] Roberto Di Cosmo and Marco Danelutto, The Parmap library version 0.9.8, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. SWHID: <swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://gitorious.org/parmap/parmap.git;visit=swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82>

[3] [SW exc.] Roberto Di Cosmo and Marco Danelutto, “Core mapping routine”, from The Parmap library version 0.9.8, 2012. University Paris Diderot and University of Pisa. LIC: LGPL-2.0. SWHID:  <swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379;origin=https://gitorious.org/parmap/parmap.git;visit=swh:1:snp:78209702559384ee1b5586df13eca84a5123aa82;anchor=swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;path=/parmap.ml;lines=101-143>