Everything you need to know about High-Value Datasets (HVDs) – Part 2: European Commission’s study on possible themes for extensions

In “Everything you need to know about HVDs: Part 1 – FAQs” we discussed the existing 6 thematic categories (Geospatial, Earth observation and environment, Meteorological, Statistics, Companies and Company ownership, Mobility) that the European Commission (EC) published. Public sector bodies (PSBs) are required to make datasets in these categories available by June 2024. Additionally, we mentioned that, under Article 13(2) of the Open Data Directive, the thematic range could be expanded to incorporate new technical advancement and market development, which will allow HVDs to continue to provide valuable insights and support innovation in various sectors.

The EC recently published a comprehensive study on the possible themes for the extensions of high-value datasets.

European Commission, Directorate-General for Communications Networks, Content and Technology, Identification of data themes for the extensions of public sector High-Value Datasets – Final study, Publications Office of the European Union, 2023, https://data.europa.eu/doi/10.2759/739414

PSBs and agencies should consider the study’s findings to identify potential HVDs from the start. This article summarises the study to keep you informed of potential changes in regulations.

What's the purpose of this study?

The purpose of the study is to identify additional thematic categories of HVDs beyond the list of Annex I of Directive (EU) 2019/1024 and support the European Commission in progressively expanding the list of HVD themes. Building upon previous work, which led to the identification of the existing HVDs list, this study identified and analysed new HVD themes and related high-value datasets, assessing their potential benefits.

How were the new themes chosen?

The study team undertook the following main activities:

  1. Shortlisting Data Themes: The team conducted a comprehensive shortlisting exercise to identify several data themes that had the potential to be considered as additional HVD themes. This involved a combination of desk research and stakeholder consultations through three workshops and a survey. Each data theme was assessed against the following specific criteria to determine its eligibility as a new HVD:

i) Existence (Does the data actually exist?);
ii) Availability (Is it made publicly available by public sector bodies?);
iii) Accessibility (Is it free of charge?);
iv) Reusability (Can it be downloaded in a reusable format?).

Furthermore, the study team examined the regulatory ecosystem surrounding each data theme. They analysed the legal framework and policy initiatives related to data collection harmonisation and enhanced data sharing, especially considering the creation of Common European Data Spaces.

The picture below shows the approach adopted for shortlisting new potential HVD themes:Themes shortlisting process

  1. Identification of Policy Intervention Scenarios: Different scenarios of policy intervention were identified for the inclusion of the new potential data themes into the EC list of HVDs.
  2. Estimation of Net Present Value (NPV): The team estimated the net present value related to data provision according to specific criteria across the new potential HVD themes. This activity involved:

i) Conducting a cost-benefit analysis that covered all actors and the broader economy.
ii) Assessing the impact on large digital platforms, as well as small and medium-sized enterprises.

Which are the new data themes?

After a progressive refinement of the potential list of HVD categories, the study team selected the following themes to be considered as new HVDs: Climate Loss, Government and Public Sector, Health, Justice and Legal, Language, Energy, and Financial.

New potential themes

These themes were chosen based on their alignment with the eligibility criteria and their potential to provide valuable insights and benefits across various sectors.

What are the policy options and benefits of the new themes?

Based on well-established methodologies for assessing data accessibility and availability, the study team developed policy options centred around the concept of open data maturity. This concept is based on two primary dimensions – Openness and Coverage and considers the following criteria:

  • Openness: Machine readability, non-proprietary, download options, metadata available, terms of data (re-use).
  • Coverage: First administrative level, second administrative level and timely availability or frequency of the update.

Based on publicly available datasets, the study conducted an estimation of the current data maturity of the new HVD themes across European countries, according to the two dimensions mentioned above. This process identified a baseline scenario and two main These policy options were defined based on progressively higher target maturity scores for each criterion. This reflects the theoretical possibility of the EC adding the new themes to the HVDs list, imposing additional requirements for data preparation and publication: Top of Form Bottom of Form

Policy Options

Policy options for HVDs publication, based on maturity, costs and benefits of potentially publishing data in the new categories.

Cost-benefit analysis and estimation of the Net Present Value

The study team conducted a cost-benefit analysis for each data theme, assessing the net-present value (NPV) associated with applying the different policy options. The overall NPV for each theme varies significantly, influenced by factors such as market size and the number of publishers/base registries.

Upon analysis, Policy Option 1 (PO1) and Policy Option 2 (PO2) emerged as the recommended scenarios to be implemented. The chart below provides a summary of the NPVs’ magnitude per policy scenario, across different themes over the considered period (2024 – 2032):

Overall NPV per theme, across policy scenarios

What is the impact on Small and Medium-sized Enterprises (SMEs) and large companies?

As part of the overall analysis, the study also focused on estimating the potential value realised by large digital platforms and SMEs when reusing data from the newly selected themes.

  • Small and Medium-sized enterprises (SMEs) – Organisations that employ less than 250 persons and have an annual turnover not exceeding EUR 50M or an annual balance sheet not exceeding EUR 43M.
  • Large digital platforms – Organisations larger than SMEs characterised by two-sided digital markets where distinct groups of customers/users interact with each other. As part of this study, no differentiation was made on the basis of price structures (neutral vs. non-neutral).

The data market presents two key types of actors: data ‘user’ and data ‘supplier’ companies. As part of this analysis, companies were considered in scope if acting as data users.

The study examined historical statistics of companies in OECD countries, considering their size and economic activities. This analysis provided specific information about the number of organisations acting as data users. As a result, the study estimated the value distribution between SMEs and large digital platforms.

Additionally, the study team conducted a survey directed at different types of organisations to gather information about the potential added value of new themes in the Open Data Directive. This survey data was taken into account when quantifying the socio-economic benefits of making potential high-value datasets available for organisations and was used to validate the assumptions based on statistics and previous studies.

The study revealed that over 98% of data user companies in the EU27+UK are SMEs, while large data companies represent less than 2% of the total. Despite this numerical difference, the study revealed that the revenue generated by large data organisations balances the situation for most data themes, resulting in a relatively balanced distribution of total benefits related to open data between SMEs and large platforms across all policy options.

Our goal with this article is to provide you with valuable insights into a rapidly evolving policy area that holds significant importance. It’s crucial to consider how this area could impact your organisation. If you’re planning to assess your data inventory to identify high-value datasets (HVDs), it’s recommended that you also highlight any datasets falling under the new categories as potential HVDs. This will ensure that they are easily identifiable if the new categories are ultimately added to the legislation.

Part 3 coming soon!

In part 3 we will answer the question “What should Public Sector Bodies do now?“, highlighting the steps to follow for HVDs publication.

You might also be interested in:

OGP Framework

Everything you need to know about the Technical Services Framework for Open Data and Data Management for the Irish Public Sector

Latest posts on this area