
The previous article introduced SWHID (Software Hash Identifier) as an open standard for identifying software artifacts using intrinsic identifiers derived from the content of the software itself. It explained why precise identification is becoming essential for software supply chains, especially in the context of SBOMs, traceability, and emerging regulations such as the Cyber Resilience Act.
In this second article of the series, I describe the syntax of SWHIDs and explain how their design enables reliable comparison between software artifacts.
Introduction
The detailed description of the SWHID syntax can be found at the syntax page of the version v1.2 of the public specification. I will only summarise it here, highlighting some aspects that might not be obvious at first. Still, some of the text of this article is a verbatim copy of the specification.
A SWHID consists of two separate parts:
SWHID ::= <core_identifier> [ <qualifiers> ]
- <code_identifier> identifies the software artifact and it is a mandatory part
- [<qualifliers>] provide context to the identifier and are optional. That context is related to where the object is meant to be seen and a pointer to a specific subpart of the artifact
Core identifier
The core_identifier is described through four fields:
- Prefix or identifier type, it is defined to be swh, which tells you the identifier is a SWHID
- Version or identifier version, represents the version of the identifier scheme, which currently is 1.
- A tag (object_type), corresponding to the type of object identified. There are five options and you can learn all the details about them by reading the specification:
- cnt for contents (source code files, blobs…)
- dir for directories
- rev for revisions (commits)
- rel for releases (tags)
- snp for snapshots
- The intrinsic identifier of the object (object_id). This is a hex-encoded (using lowercase ASCII characters) hash value computed from the content and relevant metadata of the object. The design and math behind the computation of the id will be a subject of a coming article
Each one of these fields is mandatory and they are presented in the above order, separated by a colon “:“.
Examples
Okular repository at invent.kde.org at the time of the example: f0bf71e/
swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b
f0bf71e/generators/epub folder
swh:1:dir:f920db730694e4c4c8631e661f46834d0bb52d9b
f0bf71e/generators/epub/generator_epub.h file
swh:1:cnt:f848f69be68a8607f12854baf4edf19a11bc5837
028eaf6/generators/epub/generator_epub.h file corresponding to a branch associated to a previous version
swh:1:cnt:72243cee0edcd5468b64a5fc15dc017390d48ab6
Qualifiers
There are two types of qualifiers:
- Context qualifiers: provide context to the identifier, answering the question “where can I find it?”
- Fragment qualifiers: identify specific parts of a software artifact, snippets cor specific bytes of a binary, for instance
To learn more about each type of qualifier, you can read the specific specification section. I describe below the most relevant information you need to know about them.
Context qualifiers answer the question where, pointing to a specific place in a code hosting platform platform, a repository, a branch, a folder, a file… where you can find the identified software artifact. There are four different context qualifiers
- origin allows declaration of the software origin where the object has been found or observed, as a URL
- visit adds the SWHID of the snapshot of the repository where the object has been found or observed. Only valid when origin is present
- path declares the absolute file path, from the root directory associated to the anchor node, to the object designated by the core_identifier. It is invalid for cnt objects
- anchor identifies a node in the Merkle DAG(this will be a subject of a future article) relative to which a path to the object is specified, as the core_identifier (except a cnt). It is used together with path
Fragment qualifiers identify subparts of a software artifact , like snippets or a specific part in a given blob. There are two different ones:
- lines qualifier designates a line range inside a content (cnt object_type)
- bytes allows designation of a byte range inside a content (cnt object_type)
Some considerations about qualifiers
Some rules apply to qualifiers. these are the most relevant ones to consider:
- Each qualifier is specified as a key-value pair, using a “=” character as a separator.
- Qualifiers are separated from the core identifier and from each other by using a semicolon “;“
- Some qualifiers only apply to specific content_type
- path is invalid for cnt
- Fragment qualifiers are valid for cnt object_type only.
- There are restrictions on the validity of some qualifiers
- Any qualifier shall appear at most once
- The validity of some qualifiers depends on the presence of other qualifiers.
- visit is only valid in the presence of origin
- anchor is only valid in the presence of path
- Qualifiers canonical order (good practice): origin, visit, anchor, path, lines or bytes.
- A comformant implementation shall not generate invalid qualifiers or qualifier combinations, and shall ignore them if present
Examples
Let’s evaluate the same examples that I previously provided for core identifiers, now with the addition of qualifiers.
Okular repository at invent.kde.org at the time of the example: f0bf71e/
swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b;origin=https://invent.kde.org/graphics/okular
f0bf71e/generators/epub folder
swh:1:dir:f920db730694e4c4c8631e661f46834d0bb52d9b;origin=https://invent.kde.org/graphics/okular;visit=swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b;anchor=swh:1:rev:5f39918badc1ae31c09b401c1822509c07c6eb23;path=/generators/epub/
f0bf71e/generators/epub/generator_epub.h file
swh:1:cnt:f848f69be68a8607f12854baf4edf19a11bc5837;origin=https://invent.kde.org/graphics/okular;visit=swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b;anchor=swh:1:rev:5f39918badc1ae31c09b401c1822509c07c6eb23;path=/generators/epub/generator_epub.h
f0bf71e/generators/epub/generator_epub.h file EPubGenerator class
swh:1:cnt:f848f69be68a8607f12854baf4edf19a11bc5837;origin=https://invent.kde.org/graphics/okular;visit=swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b;anchor=swh:1:rev:5f39918badc1ae31c09b401c1822509c07c6eb23;path=/generators/epub/generator_epub.h;lines=11-21
028eaf6/generators/epub/generator_epub.h file corresponding to a branch associated to a previous version
swh:1:cnt:72243cee0edcd5468b64a5fc15dc017390d48ab6;origin=https://invent.kde.org/graphics/okular;visit=swh:1:snp:5428f4f096e9626f6c7dc1f603e83b2090f7338b;anchor=swh:1:rev:284bcc626c29f90b2f8f227b2af7e84377b81305;path=/generators/epub/generator_epub.h
Summary of the syntax
This first image summarises the scope of SWHID, that is, the kind of software artifacts covered by the current specification

This following image summarises SWHID syntax, using as an example a snippet of the Apollo-11 code base, located in GitHub and also available at Software Heritage archive

Comparing Software
Once of SWHID core values as identifier lays in the fact that every SWHID is univocal by design. This means that when comparing software artifacts:
- Two software artifacts are identical (bit by bit) if their SWHID core_identifier are equal.
- In addition , two SWHIDs represent the same software artifact (or fragment thereof) if:
- They both have the same core_identifier
- They both have the same set of qualifiers and the values of these qualifiers are identical
- Two different SWHID core_identifier correspond to two different software artifacts
For comparison purposes the order of the qualifiers does not matter.
Summary
In summary, I just explained the syntax of SWHIDs, describing how their core identifier and optional qualifiers combine to uniquely reference software artifacts. as well as specific fragments within them. It also shows how SWHID enables reliable comparison between software artifacts, ensuring that identical artifacts produce identical identifiers, while different artifacts produce different ones.
In the next article, I will describe SWHID as an open standard and discuss swhid-rs, its reference implementation.