What is the best way to identify software? Introducing SWHID

Introduction

This is the first article in a series exploring SWHID, an open standard for identifying software artifacts. The series will cover what SWHID is, how it works, and how it can be applied in practice.

This first article sets the context. It explains why precise identification of software artifacts is becoming a key requirement for organizations working with complex software supply chains. It also introduces the difference between two types of identifiers, intrinsic and extrinsic, as a foundation for understanding what SWHID is and why it matters.

Industry Maturity, Regulation and the need for Software Identifiers

Our software industry is getting mature. One clear signal is the speed at which regulations are appearing. For many years, software development moved faster than regulation. Today the situation is changing.

Software is no longer only a technical matter. It is now part of critical infrastructure, public services, and industrial systems. As a result, concepts such as liability, security, and privacy are no longer only technical concerns. They are also legal responsibilities.

One visible consequence of this change is the growing importance of Software Bills of Materials, or SBOMs. Their purpose is simple: they help clarify who produced which part of a software system. This need exists because modern software is rarely written by a single team. Most systems are assembled from components developed by many independent organizations, with open source software playing a central role. A single application can depend on hundreds or even thousands of components. As a result, organizations often run software that they did not write and do not fully control

Managing this complexity requires visibility, an understanding of what code is present in their systems and where that code comes from. This is why traceability is becoming a key requirement. Regulations such as the European Cyber Resilience Act are accelerating this trend and pushing companies to document their software supply chains more carefully.

Traceability always starts with identification. Before you can track a component, you must be able to identify it unambiguously. In industries such as railway, aerospace, or manufacturing, this problem was solved long ago. Every component has a unique identifier that allows engineers and auditors to trace it across its full lifecycle. Software still lacks a widely adopted equivalent. We often identify software through project names, version numbers, or repository URLs, but these are not always precise or reliable enough for traceability in large supply chains.

Two Conversations That Inspired This Article

During the first days of March 2026, as part of my work as SwH Ambassador, I had the opportunity to present the concept of SWHID to two different audiences. Both groups are working on topics related to software supply chains and regulatory compliance.

The first talk took place at the CRA-Mondays meetings. This series is organized by the Eclipse Foundation’s Open Regulatory Compliance Working Group. The participants are mostly interested in understanding how the Cyber Resilience Act should be interpreted and applied to open source software and to companies that build products with it.

The second presentation was delivered to the OpenChain Telco Working Group. This group is responsible for the OpenChain SBOM Telco Guide. The guide defines how SBOMs should be structured in the telecommunications industry using the SPDX format. It has also been used as a reference by other industry groups working on similar problems.

Although the two audiences were different, they shared a common concern. Both groups are dealing with increasingly complex software supply chains. In these environments, documenting and managing software components becomes difficult. Precise identification of software artifacts becomes essential for compliance, traceability, and communication between organizations.

In both presentations, I started by explaining the basic ideas behind SWHID. After that, I discussed several practical use cases that could be relevant for each group. Very quickly, the discussions moved to a more fundamental question: how should we identify software artifacts in a reliable way?

To answer that question, we first need to understand the difference between two types of identifiers: intrinsic identifiers and extrinsic identifiers.

Defining Identifiers: Intrinsic vs. Extrinsic

To answer the question of how to identify software artifacts reliably, we first need to understand two types of identifiers.

An intrinsic identifier is a unique marker derived directly from the natural properties of an object. It is something the object is or possesses by its very nature, and it exists independently of any external system. Familiar examples include the chemical elements in the periodic table, DNA, or the growth rings of a tree. In the software world, cryptographic hashes and SWHIDs are examples of intrinsic identifiers.

Intrinsic identifiers are used in many engineering disciplines when long-term traceability is required. They are particularly valuable in distributed ecosystems where no single authority controls the system. Because they are derived from the artifact itself, independent systems can generate the same identifier for the same object. This means that different organizations and tools can refer to the same artifact without any prior coordination between them.

An extrinsic identifier, by contrast, is a unique label assigned to an object by an external authority or system. It is not a natural property of the object itself. Common examples include IP addresses, ISBNs, DOIs, UUIDs, PURLs, and CVEs.

What is SWHID?

SWHID stands for Software Hash IDentifier. It is a digital, persistent and standardised intrinsic identifier for software artifacts, and it works by calculating a cryptographic hash directly from the content of the artifact itself. This content-based approach has an important consequence: the same artifact will always produce the same identifier, regardless of where it is stored or who is looking at it.

One of the strengths of SWHID is that it can be applied at multiple levels of software structure. Whether you are working with a small snippet of code, a single file, a directory, a revision, or a full repository snapshot, SWHID can identify it. It works with both source code and binaries. This range makes it suitable for many different situations, from documentation and compliance processes to archival systems.

At a high level, the characteristics that make SWHID useful across so many applications can be summarised as: cryptographic integrity, a decentralized design, comprehensiveness, a focus on provenance, and fast execution.

SWHID is an open standard, published as ISO/IEC 18670:2025. Its specification is publicly available and free of charge, released under the Community-Spec-1.0 license, and governed by the SWHID Working Group under an open governance model. More information is available at swhid.org.

Next in This Series

This article has introduced the concept of intrinsic identifiers and the SWHID standard. In the following articles we will explore how they work and how they can be applied in practice.

The next article will describe the syntax of SWHID and how it can be used to compare two different pieces of software. After that, we will look at concrete use cases and how SWHID can be applied in real situations.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.