Close

Publications

Acknowledging and Referencing Software Heritage

Below you find a list of relevant scientific publications produced as part of our mission.

If your scientific work benefited from Software Heritage, we encourage you to acknowledge it in your publications. The preferred way of doing that is to: (1) add a footnote to the title page of your papers like this: “This work was made possible by Software Heritage, the great library of source code: https://www.softwareheritage.org”and (2) cite at least one of the iPres 2017 and CACM 2018 papers (from the list below) in the References section of your scientific publications.

Publication policy

We are committed to Open Access, and we strive to make available openly all publications funded by or for Software Heritage, if possible under a CC-BY-4.0 license. When needed, we make a copy of the (pre)publication available through links on this page. 

Publications

2024

Ludovic Courtès, Timothy Sample, Simon Tournier, Stefano Zacchiroli

Source Code Archiving to the Rescue of Reproducible Deployment Proceedings Article

In: 2024 ACM Conference on Reproducibility and Replicability, pp. 10 pages, ACM, 2024.

Abstract | BibTeX | Links:

Tommaso Fontana, Sebastiano Vigna, Stefano Zacchiroli

WebGraph: The Next Generation (Is in Rust) Proceedings Article

In: Companion Proceedings of the ACM Web Conference 2024 (WWW '24 Companion), pp. 686-689, ACM, 2024.

Abstract | BibTeX | Links:

Annalí Casanueva, Davide Rossi, Stefano Zacchiroli, Théo Zimmermann

The Impact of the COVID-19 Pandemic on Women’s Contribution to Public Code Journal Article

In: Empirical Software Engineering, 2024.

BibTeX | Links:

Annalí Casanueva, Davide Rossi, Stefano Zacchiroli, Théo Zimmermann

The Impact of the COVID-19 Pandemic on Women’s Contribution to Public Code Journal Article

In: Empirical Software Engineering, 2024.

BibTeX | Links:

2023

Mathilde Fichen, Morane Gruenpeter , Jérémy Bobbio , Sabrina Granger , Roberto Di Cosmo, Jean-François Abramatic, Isabelle Astic , Emmanuelle Bermès, Camille Françoise, Claude Gomez, Wendy Hagenmaier, Grégory Miura, Carlo Montangero, Simon Phipps, Kenneth Seals-Nutt

SWHAP Workshop, September 14th and 15th, 2023 Proceedings

HAL, 2023.

Abstract | BibTeX | Links:

Valentin Lorentz, Di Cosmo, Roberto, Stefano Zacchiroli

The Popular Content Filenames Dataset: Deriving Most Likely Filenames from the Software Heritage Archive Unpublished

2023, (working paper or preprint).

Abstract | BibTeX | Links:

Romain Lefeuvre, Jessie Galasso, Benoit Combemale, Houari Sahraoui, Stefano Zacchiroli

Fingerprinting and Building Large Reproducible Datasets Proceedings Article

In: 2023 ACM Conference on Reproducibility and Replicability, pp. 27-36, ACM, 2023.

Abstract | BibTeX | Links:

Jesus M. Gonzalez-Barahona, Sergio Montes-Leon, Gregorio Robles, Stefano Zacchiroli

The Software Heritage License Dataset (2022 Edition) Journal Article

In: Empirical Software Engineering, 2023, ISSN: 1382-3256.

Abstract | BibTeX | Links:

Roberto Di Cosmo, Stefano Zacchiroli

The Software Heritage Open Science Ecosystem Book Chapter

In: Mens, Tom; Roover, Coen De; Cleve, Anthony (Ed.): Software Ecosystems: Tooling and Analytics, pp. 33–61, Springer International Publishing, Cham, 2023, ISBN: 978-3-031-36060-2.

Abstract | BibTeX | Links:

2022

Roberto Di Cosmo

Code Source Book Section

In: Dictionnaire du Numérique, vol. February, 2022.

BibTeX | Links:

Kevin Wellenzohn, Michael H. Böhlen, Sven Helmer, Antoine Pietri, Stefano Zacchiroli

Robust and Scalable Content-and-Structure Indexing Journal Article

In: the VLDB Journal, 2022, ISSN: 1066-8888.

Abstract | BibTeX | Links:

Davide Rossi, Stefano Zacchiroli

Worldwide Gender Differences in Public Code Contributions (and How They Have Been Affected by the COVID-19 Pandemic) Proceedings Article

In: 44th International Conference on Software Engineering (ICSE 2022) – Software Engineering in Society (SEIS) Track, pp. 172-183, ACM, 2022.

Abstract | BibTeX | Links:

Stefano Zacchiroli

A Large-scale Dataset of (Open Source) License Text Variants Proceedings Article

In: The 2022 Mining Software Repositories Conference (MSR 2022), pp. 757-761, ACM, 2022.

Abstract | BibTeX | Links:

Davide Rossi, Stefano Zacchiroli

Geographic Diversity in Public Code Contributions: An Exploratory Large-Scale Study Over 50 Years Proceedings Article

In: The 2022 Mining Software Repositories Conference (MSR 2022), pp. 80-85, ACM, 2022.

Abstract | BibTeX | Links:

Daniele Serafini, Stefano Zacchiroli

Efficient Prior Publication Identification for Open Source Code Proceedings Article

In: 18th International Conference on Open Source Systems (OSS 2022), ACM, 2022.

Abstract | BibTeX | Links:

Roberto Di Cosmo

Should We Preserve the World's Software History, And Can We? Proceedings Article

In: Silvello, Gianmaria; Corcho, Óscar; Manghi, Paolo; Nunzio, Giorgio Maria Di; Golub, Koraljka; Ferro, Nicola; Poggi, Antonella (Ed.): Linking Theory and Practice of Digital Libraries – 26th International Conference on Theory and Practice of Digital Libraries, TPDL 2022, Padua, Italy, September 20-23, 2022, Proceedings, pp. 3–7, Springer, 2022.

BibTeX | Links:

Roberto Di Cosmo

Building the software pillar of Open Science Proceedings Article

In: Open Science European Conferencem (OSEC 2022), pp. 183–193, OpenEdition Press, 2022, ISBN: 9791036545627.

BibTeX | Links:

2021

Antoine Pietri

Organizing the graph of public software development for large-scale mining PhD Thesis

Université Paris Cité, 2021.

BibTeX | Links:

Morane Gruenpeter, Roberto Di Cosmo, Katherine Thornton, Kenneth Seals-Nutt, Carlo Montangero, Guido Scatena

Software Stories for landmark legacy code Technical Report

Inria 2021.

BibTeX | Links:

Laura Bussi, Roberto Di Cosmo, Carlo Montangero, Guido Scatena

Preserving landmark legacy software with the Software Heritage Acquisition Process Proceedings Article

In: iPres2021 – 17th International Conference on Digital Preservation, Beijing, China, 2021.

BibTeX | Links:

Stefano Zacchiroli

Gender Differences in Public Code Contributions: a 50-year Perspective Journal Article

In: IEEE Software, 2021, ISSN: 0740-7459.

Abstract | BibTeX | Links:

Thibault Allançon, Antoine Pietri, Stefano Zacchiroli

The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development Proceedings Article

In: ICSE 2021: The 43rd International Conference on Software Engineering, pp. 45-48, IEEE, 2021.

Abstract | BibTeX | Links:

2020

Morane Gruenpeter, Roberto Di Cosmo, Alice Allen, Anita Bandrowski, Peter Chan, Martin Fenner, Leyla Garcia, Catherine M Jones, Daniel S Katz, John Kunze, Moritz Schubotz, Ilian T Todorov

Use cases and identifier schemes for persistent software source code identification Technical Report

2020, (Output from the Research Data Alliance/FORCE11 Software Source Code Identification Working group).

BibTeX | Links:

Morane Gruenpeter, Roberto Di Cosmo, Hylke Koers, Patricia Herterich, Rob Hooft, Jessica Parland-von Essen, Jonas Tana, Tero Aalto, Sarah Jones

M2.15 Assessment report on 'FAIRness of software' Miscellaneous

2020.

BibTeX | Links:

Roberto Di Cosmo

Archiving and Referencing Source Code with Software Heritage Proceedings Article

In: ICMS, pp. 362–373, Springer, 2020, ISBN: 978-3-030-52200-1.

Abstract | BibTeX | Links:

Guillaume Rousseau, Roberto Di Cosmo, Stefano Zacchiroli

Software provenance tracking at the scale of public source code Journal Article

In: Empirical Software Engineering, pp. 1-30, 2020, ISSN: 1573-7616.

Abstract | BibTeX | Links:

Antoine Pietri, Guillaume Rousseau, Stefano Zacchiroli

Determining the Intrinsic Structure of Public Software Development History Proceedings Article

In: MSR 2020: The 17th International Conference on Mining Software Repositories, pp. 602-605, IEEE, 2020.

Abstract | BibTeX | Links:

Antoine Pietri, Guillaume Rousseau, Stefano Zacchiroli

Forking Without Clicking: on How to Identify Software Repository Forks Proceedings Article

In: MSR 2020: The 17th International Conference on Mining Software Repositories, pp. 277-287, IEEE, 2020.

Abstract | BibTeX | Links:

Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli

The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History Proceedings Article

In: MSR 2020: The 17th International Conference on Mining Software Repositories, pp. 1-5, IEEE, 2020.

Abstract | BibTeX | Links:

Roberto Di Cosmo, Marco Danelutto

[Rp] Reproducing and replicating the OCamlP3l experiment Journal Article

In: ReScience C, vol. 6, no. 1, 2020.

Abstract | BibTeX | Links:

Paolo Boldi, Antoine Pietri, Sebastiano Vigna, Stefano Zacchiroli

Ultra-Large-Scale Repository Analysis via Graph Compression Proceedings Article

In: SANER 2020: The 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 184-194, IEEE, 2020.

Abstract | BibTeX | Links:

Pierre Alliez, Roberto Di Cosmo, Benjamin Guedj, Alain Girault, Mohand-Said Hacid, Arnaud Legrand, Nicolas Rougier

Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria Journal Article

In: Computing in Science Engineering, vol. 22, no. 1, pp. 39-52, 2020, ISSN: 1558-366X.

Abstract | BibTeX | Links:

Roberto Di Cosmo, Jose Benito Gonzalez Lopez, Jean-François Abramatic, Kay Graf, Miguel Colom, Paolo Manghi, Melissa Harrison, Yannick Barborini, Ville Tenhunen, Michael Wagner, Wolfgang Dalitz, Jason Maassen, Carlos Martinez-Ortiz, Elisabetta Ronchieri, Sam Yates, Moritz Schubotz, Leonardo Candela, Martin Fenner, Eric Jeangirard

Scholarly Infrastructures for Research Software Book

European Commission. Directorate General for Research and Innovation., 2020, ISBN: 978-92-76-25568-0.

BibTeX | Links:

Roberto Di Cosmo

Announcing biblatex-software Journal Article

In: ACM SIGSOFT Software Engineering Notes, vol. 45, no. 4, pp. 22–23, 2020.

BibTeX | Links:

Roberto Di Cosmo, Morane Gruenpeter, Bruno Marmol, Alain Monteil, Laurent Romary, Jozefina Sadowska

Curated Archiving of Research Software Artifacts: Lessons Learned from the French Open Archive (HAL) Journal Article

In: International Journal of Digital Curation, vol. 15, no. 1, pp. 16, 2020.

Abstract | BibTeX | Links:

Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli

Referencing Source Code Artifacts: a Separate Concern in Software Citation Journal Article

In: Computing in Science & Engineering, 2020, ISSN: 1521-9615.

Abstract | BibTeX | Links:

2019

Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli

The Software Heritage Graph Dataset: Public software development under one roof Proceedings Article

In: Proceedings of the 16th International Conference on Mining Software Repositories, pp. 138-142, IEEE Press, 2019.

Abstract | BibTeX | Links:

Mélanie Clément-Fontaine, Roberto Di Cosmo, Bastien Guerry, Patrick Moreau, François Pellegrini

Encouraging a wider usage of software derived from research Online

2019, (Position paper of the software working group of the French National Council for Open Science).

BibTeX | Links:

2018

Antoine Pietri, Stefano Zacchiroli

Towards Universal Software Evolution Analysis Proceedings Article

In: BENEVOL 2018: The 17th Belgium-Netherlands Software Evolution Workshop, pp. 6-10, 2018, ISSN: 1613-0073.

Abstract | BibTeX | Links:

Jean-François Abramatic, Roberto Di Cosmo, Stefano Zacchiroli

Building the Universal Archive of Source Code Journal Article

In: Communications of the ACM, vol. 61, no. 10, pp. 29-31, 2018, ISSN: 0001-0782.

BibTeX | Links:

Roberto Di Cosmo, Morane Gruenpeter, Stefano Zacchiroli

Identifiers for Digital Objects: the Case of Software Source Code Preservation Proceedings Article

In: iPRES 2018 – 15th International Conference on Digital Preservation, 2018.

BibTeX | Links:

Yannick Barborini, Roberto Di Cosmo, Antoine R. Dumont, Morane Gruenpeter, Bruno P. Marmol, Alain Monteil, Jozefina Sadowska, Stefano Zacchiroli

The creation of a new type of scientific deposit: Software Miscellaneous

RDA Eleventh Plenary Meeting, Berlin, Germany, 2018, (poster).

BibTeX | Links:

Yannick Barborini, Roberto Di Cosmo, Antoine R. Dumont, Morane Gruenpeter, Bruno P. Marmol, Alain Monteil, Jozefina Sadowska, Stefano Zacchiroli

La création du nouveau type de dépôt scientifique – Le logiciel Miscellaneous

JSO 2018 – 7es journées Science Ouverte Couperin : 100 % open access : initiatives pour une transition réussie, 2018, (poster).

BibTeX | Links:

2017

Roberto Di Cosmo, Stefano Zacchiroli

Software Heritage: Why and How to Preserve Software Source Code Proceedings Article

In: iPRES 2017: 14th International Conference on Digital Preservation, Kyoto, Japan, 2017.

BibTeX | Links: