Everything you need to know about High-Value Datasets (HVDs): Part 3 – Guide to Publishing HVDs

As defined by the EU Directive, high-value datasets (HVDs) are those whose reuse brings significant benefits to society, the environment, and the economy.

According to the Regulation, these high-value datasets must be made available for reuse by June 2024. Starting in February 2025, member states are required to report their progress in this area to the European Commission every two years.

In our previous posts, we introduced the six thematic categories of HVDs from the European Commission and the possible future extensions to the categories.

In this blogpost, we outline the data standards and infrastructure that can help public service bodies to publish high-value datasets, including some practical steps to take along the process. We also explain how Derilinx can assist in this process with a range of services, tools, and any additional support needed.

At the end of the blog post you’ll find a useful infographic that summarises the key information in a visual way.

Background

When data is published online, the descriptive information provided alongside datasets is known as metadata—structured data about the dataset. Metadata typically includes details such as the dataset title, description, data types, and data owner, among other attributes.

Over time, various metadata standards have been developed to ensure that data provided to data portals includes consistent descriptive information.

One of the primary vocabularies to emerge is the W3C Data Catalog Vocabulary (DCAT). As described by the W3C:

DCAT provides RDF classes and properties to allow datasets and data services to be described and included in a catalog. The use of a standard model and vocabulary facilitates the consumption and aggregation of metadata from multiple catalogs, which can:

  1. increase the discoverability of datasets and data services
  2. allow federated search for datasets across catalogs in multiple sites

The document contains useful fundamental definitions including catalog, resource, dataset, etc. in section 5 “Vocabulary overview”.

The document defines DCAT profiles (also known as “Application Profiles”):

DCAT profile is a specification for a data catalog that adds additional constraints to DCAT. A data catalog that conforms to the profile also conforms to DCAT.

An initiative within the European Commission, SEMIC (The Semantic Interoperability Community), released an application profile, the DCAT Application Profile for data portals in Europe (DCAT-AP).

The basic use case for DCAT-AP is to enable cross-data portal search for data sets and make public sector data better searchable across borders and sectors. This can be achieved by the exchange of descriptions of datasets among data portals.

This application profile is a specification for metadata records to meet the specific application needs of data portals in Europe while providing semantic interoperability with other applications on the basis of reuse of established controlled vocabularies (e.g. EuroVoc) and mappings to existing metadata vocabularies (e.g. Dublin Core, SDMX, INSPIRE metadata, etc.).

DCAT-AP for High-Value-Datasets

In January of 2024, SEMIC announced the release of the DCAT Application Profile for High-Value Datasets (DCAT-AP HVD), known as an Annex, it describes additional usage of the DCAT-AP.

“DCAT-AP HVD comes as a response to the High-Value Datasets Implementing Regulation (HVD IR), adopted by the European Commission in December 2022. This regulation highlights the growing importance of data, especially those considered high value. With DCAT-AP HVD, complying with these new regulations is straightforward, requiring little extra effort for users already familiar with DCAT-AP.”

DCAT-AP HVD specific extra information to provide along with datasets that are classified as high-value under the implementing regulation. As the documentation states:

“A Dataset is a HVD Dataset if and only if a [Member State] has included it in its reporting. The HVD IR defines High-Value Datasets. It may be possible that the same definition applies to multiple entities. In that case, a Member State should select the most appropriate one, according to the rules in the regulation.”

As discussed in our previous post, the HVD IR defines six data categories: geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility.

If a dataset has been deemed as high-value, it must:

  • Persistent Identifier: DCAT-AP has proposed several guidelines on identifiers, including 10 rules for persistent identifiers.
  • Licencing / Rights Conditions: Use non-restrictive open data licenses (e.g., CC0, CC BY 4.0, or equivalent) or, where applicable, an European Legislation Identifier can be used.
  • HVD Data Category: A resource may belong to more than one data category.
  • Contact Point for APIs: Provide either a persistent email address or a link to a contact form on a webpage.
  • Bulk Downloads and Real-Time Access: Enable bulk downloads where appropriate and offer real-time access when the nature of the information requires it.

Additionally, it is recommended to include a reference to a public document that describes the internals of the dataset.

How Ireland's Open Data Portal is becoming HVD-ready

Derilinx are currently updating the National Open Data Portal (data.gov.ie) to align with DCAT-AP HVD standards. To achieve this:

  • The new DCAT-AP HVD specification is being incorporated into the standard schema, including:
    • A flag for if a dataset/resource is HVD
    • A category on the dataset reflecting the 6 HVD Categories
    • An Applicable legislation field that includes the HVD ELI if the dataset is marked HVD
    • An endpoint that provides the HVD only DCAT Catalog.
  • The new profiles will be utilised in the harvesting infrastructure.

Next steps for public service bodies

Publishing HVDs is a multi-step process which Derilinx can assist you with. These steps include:

If you are an Irish Public Service Body, you can leverage the Technical Services Framework for Open Data and Data Management to avail Derilinx services, avoiding a tedious procurement process.

Get in touch

Contact us to find out more about how we can support you on your HVDs journey.

HVDs Infographic

Open in a new tab

You might also be interested in:

OGP Framework

Everything you need to know about the Technical Services Framework for Open Data and Data Management for the Irish Public Sector

Latest posts on this area