Close

August 30, 2024

New Features in the CodeMeta Generator

At Software Heritage, the universal source code archive, metadata is the software’s identity card. It provides information that can be used to identify, describe and curate software. To make it easier for users to discover software projects among the millions we archive, having accurate metadata is crucial.

However, there are many different ways for discribing software and capturing this metadata. To ensure uniformity and consistency in descriptive metadata, Software Heritage has adopted the CodeMeta format as discussed in this article. The CodeMeta vocabulary is used when indexing metadata and it is recommended to include a codemeta.json file in all research software repositories to make their metadata machine-readable and easily discoverable.

We recognize the immense value of effectively describing software projects, and we are excited to share the latest developments in the CodeMeta vocabulary and the CodeMeta generator tool.

A recap: What is CodeMeta?

There are numerous metadata vocabularies for describing software projects. CodeMeta addresses this complexity by providing a standardized “Rosetta stone” for translating between different vocabularies. The vocabulary is an extension of Schema.org. CodeMeta allows software metadata to be represented in a consistent JSON format, known as codemeta.json.

From CodeMeta v2.0 to CodeMeta v3.0

The CodeMeta description format is constantly evolving, in order to meet the needs of the research software ecosystem and scholarly infrastructures users as closely as possible. In 2023, version 3.0 has been published, adding the following new vocabulary elements:

  • review, which allows you to give review information about the software, in this case reviewAspect and reviewBody.
  • role, which, associated with an author or a contributor, allows you to define the function (roleName) that this person has held, and for what period of time (startDate and endDate).
  • hasSourceCode adding a link that states where the software code is for a given software.
  • isSourceCodeOf adding a link that states where software application is built from a given source code. This is the reverse property of ‘hasSourceCode’.

Some properties also changed name for clarification:

  • contIntegration became continuousIntegration
  • embargoDate became embargoEndDate

Just as there are translation files between different metadata description formats, there is the translation file from format v2.0 to format v3.0.

You can find an example of CodeMeta v3.0 for the codemeta project.

Features of the CodeMeta generator

Software Heritage maintains a tool for helping users to create the corresponding codemeta.json file, the CodeMeta generator. It consists of a simple form that users can fill in to generate a valid file. This file can then be added at the root of the software code repository.

Additionally, the generator now supports creating codemeta.json files in both v2.0 and v3.0 formats, and the form has been redesigned to include new functionalities, such as the Review box and role management.

The form has been reorganized, to include the new Review box.

The new Review box

You can now add roles to authors and contributors.

The new Role functionnality

License(s) field values are suggested and completed from the SPDX licences list.

The SPDX licence list suggestions

That is not all: you can import an already existing codemeta.json file (in v2.0 or v3.0). The form will be updated with the values found in the codemeta.json text area (where you pasted your file).

Finally, a little cherry on top, you can directly download the file from the tool!

CodeMeta generator actions

Metadata for citation

With the latest advancements in software metadata, you will soon have the ability to cite source code directly from the Software Heritage archive in BibTeX format, making it easier to reference software in academic work.

Get Involved with CodeMeta!

The CodeMeta project is an evolving community-driven initiative. We invite developers, researchers, and enthusiasts to join discussionssuggest changes, or contribute Pull Requests directly to the CodeMeta generator on GitHub. By collaborating, we can continue to improve the tool and advance software metadata practices worldwide.

Stay updated on the latest developments by following our contributions and announcements!

August 30, 2024