Data Harmonisation



Open Data Harmonisation through a real world example: Smart Cities – Smart Dublin

By Katharine Cooney,  Data Analyst at Derilinx

What is data harmonisation about?

Data harmonisation is about making different data sets fit together in terms of their contents, their standards and how they are published. This article documents how this can work in practice and demonstrates its benefits for the data user.

The Smart Dublin Open Data portal or “Dublinked”, publishes open data from several publishers, including the four Dublin local authorities:Four Dublin Local Authorities

In many of these cases, similar data is published by the four authorities, but there may be differences. Content differences may be because the sources are different, perhaps one authority has a commercial contract with a company to provide the data, others rely on council sources. Sometimes the data reflects only council owned and operated resources, sometimes it includes privately owned resources. Different authorities may publish the data with different update frequencies, one monthly, another annually. It may be difficult, because of different naming conventions, for the user to recognise similar data across the local authorities. If the different authorities publish the data in different formats, at the very least this creates a conversion task for the user looking for Dublin-wide data. In the worst case, one or more authorities may not publish the data at all.

For any user trying to view or analyse data for the Dublin region, these issues can cause  problems. They need data that is consistent across the four local authority areas.

What are the Benefits of Data Harmonisation?

Harmonisation removes, or at least reduces, a lot of these problems. It means that in the case of the four Dublin local authorities, once the four datasets are harmonised they can be amalgamated by the user and viewed or analysed as a whole, giving a Dublin-wide view.

This would apply for any Smart City – comparable data needs to be published by each area or source of interest to similar standards and kept up to date.

In a Dublinked example, if I am interested in Accessible Parking Locations, I would like to see these across the whole Dublin region:

Currently, using the Map Explorer on Dublinked (above), it is straightforward to add the accessible parking data from three of the local authorities, but because the data from Dublin City Council does not have latitude and longitude information, it cannot be mapped. (See Dublinked wiki for instructions on the use of the Map Explorer)

Methodology for Data Harmonisation

To make Dublin-wide data readily available to users, Derilinx are working as part of the Smart Dublin Open Data working group to harmonise the data that is published by the local authorities. We are already working under the Open Data Directive to publish as much open data as possible, and this group has been focusing on Transport data.

(An interesting sidenote is that the new transport datasets published as a result of the working group’s effort, consistently rank amongst the most popular on Dublinked.)

Data AuditThe first step in harmonisation is an audit of existing data (see above). To provide some boundaries to the harmonisation task, a list of 42 key Transport datasets has been compiled, with an initial assessment by publish date. Where there are gaps, the local authorities are working to source and publish the data.

Simultaneously, a task has started to automatically assess the quality of each of the published datasets and their level of harmonisation across the local authorities.

The quality and harmonisation criteria include:

  • Scope (allow the user to select the data to match scope across datasets)
  • Formats (for example, if the data is geolocated, it should be published in a geospatial format)
  • Open Data standard (DCAT-AP 2.0) compliant metadata which is standardised across the datasets
  • Applying a standard naming convention, update frequency, common keywords across the datasets
  • Keeping data up to date

Meeting these criteria will provide the data user with consistent data across the Dublin region.

The reporting system reflects these criteria, providing both summary level data at various levels of detail and marks for each criteria for each dataset. This allows each local authority to assess their own performance as well as to easily identify areas for improvement.

Data Harmonisation Reporting Data Harmonisation Reporting

Future Plans

As the focus moves to the topic of the environment, Derilinx will again be supporting the local authorities in publishing any new datasets to the agreed harmonised standards. The audit of existing data will highlight any discrepancies and missing data, and the dashboard will be a tool for the authorities to improve level of harmonisation in this data.

Conclusion

The goal of data harmonisation is that someone using data from different sources, shall have a unified view, where conflicts and tensions in the data sets have been removed.

As a result of the harmonisation effort in Dublinked we see the following results:

  • Increased availability of open data across the Dublin region and as a result, increased usage of this data
  • Improved quality of published data
  • Harmonised data across the Dublin local authorities, allowing users to view, analyse, and generally reuse Dublin-wide data

Data Harmonisation is a key step in the process of data publishing and sharing. It makes data-reuse not only possible, but also much more efficient, which is crucial for a Smart City to progress on their journey.

You might also be interested in:

Smart Dublin Customer Success Story

Derilinx provide Smart Dublin with a fully managed Open Data Hub and a suite of Open Data Expert services.

Latest posts on Smart Cities