Close

March 21, 2024

The 2024 Software Heritage Community Workshop

On January 31st, 2024, members of the Software Heritage community came together at the Inria research center in Paris, France. This gathering, a prelude to the 2024 Symposium, was a chance for people from various backgrounds – developers, curators, librarians, open source advocates, and researchers from around the world – to share ideas and work together towards a common goal, like an orchestra: where every instrument contributes to the melody. In this blog post, we’ll give you a brief overview of what happened at the workshop and highlight some of the key insights that were shared. The 2024 workshop marks its third edition and introduces a full day of sessions for the first time, including several in an unconference style.

 

Software Heritage community workshop | © Photo Personal archive

🤿 Dive directly into the pragmatic results of this year’s workshop!

Draw me a community: the community event lineup

When developers, curators, librarians, open-source supporters and researchers from all over the world meet, what are they talking about?

Well, they talk about software preservation and the different aspects that this gigantic task requires. The goals discussed involve expanding inclusivity to communities with varied skills beyond computer science, establishing global recognition of the Software Heritage Archive as a common infrastructure, forming special interest groups with clear objectives and timelines, and consistently enhancing the archive’s outreach and usability for a worldwide audience.

The workshop was facilitated by Mélissa from La Dérivation, who expertly conducted our day’s activities, ensuring every voice was heard, every insight shared. Behind the scenes, Sabrina Granger, Software Heritage’s Open Science Community Manager, has coordinated the ambassadors to contribute sessions and activities. The entire Software Heritage team took part in the workshop and acknowledged the opportunity to connect with their community during this event.

Beyond the infrastructure, a vivid community

Software Heritage community workshop | © Photo Personal archive

Software Heritage’s major asset, beyond the innovative and mutualised infrastructure, is its vibrant community, a diverse group dedicated to preserving and sharing software. Their collective efforts are crucial in maintaining software as an essential part of cultural and scientific heritage.

The main aims of the workshop were to care for and provide a place of collaboration for the community, making sure everyone is working together and staying engaged. The workshop aimed to increase involvement and improve how the team communicates with partners and contributors. This two-way strategy was designed to support growth and closely monitor the community’s progress and unity.

The menu of the community event

The morning parallel sessions included:

Software Heritage community workshop | © Photo Personal archive

 

  • Discuss arguments to engage people outside the open software community into the advantages of openness: Led by Claudia Bauzer Medeiro, the session participants explored strategies to broaden participation and support for Software Heritage. Part of these strategies were identifying key influencers, such as ambassadors and leaders, and tailoring arguments to resonate with various groups, including universities and researchers. It highlighted the power of storytelling to illustrate the benefits of openness and considered the needs and profiles of the target audience, from software creators to end users and data practitioners. The discussion concluded with the importance of preparing and disseminating engaging materials and training documentation to effectively communicate the value of software archiving.
  • Making interest groups a reality: Led by Nicolas Dandrimont, the participants tackled the challenge of formalizing interest groups within the Software Heritage community. The participants acknowledged existing communication obstacles, such as underutilized mailing lists and the need for better engagement platforms, suggesting forums could help overcome these issues by fostering more dynamic interactions. Emphasizing the importance of in-person and time-boxed events to accommodate the busy schedules of community members, by organizing goal-oriented gatherings, both online and offline.
  • Metadata and CodeMeta: Led by Alain Monteil, enhancing software visibility, citation, evaluation, and discoverability, alongside discussions on tools for CodeMeta integration, source code extraction, addressing limitations, and generating interoperable schemes.
  • Hurdles of contributing code: Led by Pierre-Yves David, the participants addressed challenges faced by contributors, including managing cross-module changes, the necessity for comprehensive integration testing with continuous integration systems, creating detailed documentation for deployment, and improving interaction with the core team. Strategies such as the use of frequent pings to maintain communication were discussed to streamline the contribution process.
  • Relations between Software Heritage and HAL, the French national archive: Led by Bértrand Néron and Pierre Poulain, the work was focused on how academic papers and software archiving interact. It showed how HAL supports academic visibility through search engines, Google Scholar, and its integration with the OpenAIRE graph, while also facilitating software citation. The discussion also highlighted how software can be archived in Software Heritage, with SWHIDs used to reference this software in HAL, emphasizing the practical steps to link software preservation with academic contributions.

The afternoon parallel sessions, prepared by ambassadors and team members:

  • Developing and Enriching Software Heritage’s Wikipedia Pages: Led by Cécile Arènes and Océane Valencia, where participants worked on enhancing the visibility and accuracy of Software Heritage on Wikipedia. The session significantly contributed to the platform, increasing awareness and understanding of Software Heritage’s mission.
    👉 More details.
  • Defining Use Cases for Industry to Create Collaterals: Led by Agustín Benito Bethencourt, this workshop focused on identifying Software Heritage services and concepts appealing to the industry. Participants outlined key use cases that could help integrate Software Heritage more deeply into industrial practices, such as license compliance, certification, security (vulnerabilities), maintenance stakes, and software handover (supply chain and liability).
  • Using Software Heritage to Harvest Institution’s Entries: Led by Violaine Louvet, Elias Chetouane and Valentin Lorentz, where they guided participants through the process of utilizing the Software Heritage Archive to catalogue software developed by various institutions. This session highlighted the potential of Software Heritage as a tool for institutional archiving and discovery. Solutions discussed involved using the HAL API to find codes with UGA-affiliated authors on HAL, leveraging the SWH API to identify codes from UGA’s institutional software forge, and exploring metadata from “codemeta.json” files via Software Heritage, albeit requiring Software Heritage staff assistance for data updates. Additionally, a comprehensive approach suggested retrieving README.md files from Software Heritage projects for text-based metadata analysis, using tools like SOMEF. The consensus was that no single method fully captures all institutional codes, recommending a combination of these strategies and ongoing promotion of best practices like including “codemeta.json” in repositories, archiving code with Software Heritage, and utilizing institutional forges for software development.
    👉 More details.
  • Software Heritage for Editorial Offices: Led by Pierre Poulain, where participants discussed the unique benefits of Software Heritage for editorial offices, emphasizing its role in preserving and citing software in academic publishing. The participants crafted persuasive messages for editorial boards to archive software using SWHID. It tackled the common misconception that data and software are equivalent, underlining the unique features and requirements of software archiving. The aim was to promote a deeper understanding and adoption of SWHID in academic publishing, ensuring software’s role in research is accurately recognized and preserved.
    👉 More details.

Software Heritage community workshop | © Photo Personal archive

  • Helpdesk: Supporting the Software Heritage Community: Led by Lunar, where the challenges and difficulties of supporting Software Heritage users were discussed, such as: finding the appropriate channel for inquiries, underuse of the swh-users mailing list, and the spread of knowledge across different platforms without a centralized system for user support. A key observation was that the documentation does not cater specifically enough to different user types, leading to confusion and scattered information. A proposed guiding principle was to treat every helpdesk email as indicative of a broader issue, whether it be a software bug, a UX/design flaw, or gaps in the documentation. However, this approach has limitations, especially for requests related to access management. The session concluded that while some queries directly relate to software or operational matters, others may not, underscoring the need for a structured support process and clearer guidance for users on where and how to seek help.
    👉 More details.

The second segment for afternoon parallel sessions, included: 

  • Using Software Heritage for Research Purposes: Led by Romain Lefeuvre to answer the question of how researchers can utilize Software Heritage for various research objectives. It highlighted Software Heritage’s utility for big data analysis, machine learning training/testing, and software engineering studies, including archiving scientific artifacts and using SWHIDs for immutable citation. The discussion covered querying the archive’s graph metadata and file content for pattern extraction, and the creation of specialized datasets for domain-specific research. Challenges such as the scalability of code queries and the complexity of using swh-graph APIs were acknowledged, with proposed solutions including automated deployment scripts for swh-graph on external resources and the development of a more abstract query language to simplify graph interactions.
  • Software Heritage Scanner Use Cases: Led by Pierre-Yves David explored the versatile applications of the Software Heritage Scanner in research and beyond. Key discussions included its role in conducting audits, verifying code modifications, and enhancing software transparency by identifying open-source code that’s been incorrectly sold as proprietary. Participants examined the scanner’s utility in generating software bills of materials, even with modifications, and assessing software metrics to ensure the preservation of essential licensing information. The session also covered the scanner’s capabilities in long-term preservation checks, identifying outdated components, detecting patches applied inconsistently, and integrating with continuous integration/continuous deployment (CI/CD) processes.
  • Software Heritage community workshop | © Photo Personal archive

    Community Activities & Events: Led by Morane Gruenpeter, with participants exploring a variety of ideas aimed at increasing engagement and collaboration within the Software Heritage community. Discussions included the formation of Special Interest Groups on various topics. The participants proposed numerous activities to enhance community interaction, such as sprints focused on specific tasks, meetups to engage with other communities, and participation in scientific events to spread awareness of Software Heritage, highlighting the importance of informal events for feedback, thematic cafes and apéros to encourage discussions among researchers and documentalists, and regular virtual meetings for strategic planning. 

These sessions were a demonstration of the collaborative spirit of the Software Heritage community, each session contributing practical outcomes.

Join us

As the day concluded, we reflected on the insights. The 2024 Software Heritage Community Workshop was more than just a meeting of minds; it was a testament to the strength and vibrancy of our community.

Software Heritage community workshop | © Photo Personal archive

We invite you to join this ongoing symphony. Whether you’re a developer, researcher, student, or simply passionate about preserving our digital heritage, there’s a place for you here. Explore the ways to get involved through our community pages:

Together, we can ensure the legacy of software, in all its forms, is preserved for generations to come. The 2024 workshop may have ended, but our collective work continues. Join us, and be part of a movement that recognizes software as a cornerstone of our cultural and scientific heritage.

 

March 21, 2024