How Open is Open Data?

This is a follow up blog to the article "What is data circularity and why should you care?" where we discussed about data circularity and why we need it. It's common to misunderstand that "Data Circularity" means "Open Data", or vice versa. In this blog we will navigate deeply what is open data, the challenges associated with open data and and try to understand its relationship with the concept of data circularity.

Introduction

The notion of open data has drawn greater interest over the past years from governments, companies, and individuals alike. The concept of open data refers to the idea that certain types of data, generally created or owned by governments, should be made freely available to the public for use and reuse with no limitations. The ideas of openness, accountability, and democratic governance underpin the concept of open data. Nevertheless, as we learn more about open data, it becomes clear that the topic of whether open data is genuinely "open" is more complicated than it appears.

Data is often referred to as the new oil, and open data initiatives have gained momentum in recent years with the aim of unlocking its full potential. In fact, open data has the potential to open up new avenues for innovation and societal impact. It may, for example, be used to develop applications and services that solve social issues, such as tracking air quality, monitoring public transit, or displaying government spending. By providing individuals with information to hold governments and organisations responsible, open data may also empower citizens to participate in decision-making processes. However, the reality of open data may not always match the idealistic vision of complete openness. Despite the noble intentions, many open data initiatives face significant challenges in breaking data silos and promoting standardisation. Several factors can limit the accessibility and transparency of open data, raising questions about its true openness.

Challenges with Open Data

According to the European Commission's Data Strategy, open-source initiatives are "a crucial resource for economic growth, job creation, and societal improvement." Yet, there is no widespread and uniform adoption of open-data initiatives. According to research, this fragmentation is caused not just by disparities in resources, financing, and general data literacy levels within and between nations, but also by varied definitions of open data . Just one out of every ten government datasets from 115 countries is fully available, according to the Open Data Barometer Report.

Listed below are some of the challenges with Open Data:

The Accessibility Challenge:

One of the key challenges in open data is ensuring that it is accessible to all members of the public. While many governments and organisations publish data on open data portals, issues such as data formats, technical skills, and usability can hinder its accessibility. Data may be stored in formats that require specialised skills or tools to analyse , making it difficult for non-experts to access and utilize the data. Additionally, not all data is made available in a timely manner or with sufficient frequency, which can reduce its usefulness and relevance for decision-making.

The Representation Challenge:

Another challenge in open data is ensuring that it is representative of the entire population. Data can reflect existing biases and inequalities, and open data initiatives may not always address these disparities. For example, data related to vulnerable or marginalised populations, such as racial or ethnic minorities, may be limited or omitted altogether, leading to incomplete or skewed representations of the reality. This can perpetuate existing inequalities and hinder efforts to achieve social justice and equity.

The Privacy and Security Challenge:

Data privacy and security are crucial concerns in the era of digital data. Open data initiatives must strike a delicate balance between openness and protecting individuals' privacy rights. Sensitive information, such as personal, health, or financial data, may be included in open data sets, and improper handling or sharing of such data can lead to unintended consequences, including identity theft, discrimination, or surveillance. Ensuring appropriate safeguards for data privacy and security is essential to maintain the trust and confidence of the public.

The Context and Explanation Challenge:

Data can be complex and technical, and without proper context, it can be misinterpreted or used in misleading ways. Providing clear documentation, metadata, and explanations is crucial in ensuring that users understand the limitations, assumptions, and caveats associated with the data. This can help prevent misinterpretation or misuse of the data and promote informed decision-making. Contextual information is necessary to ensure that open data is truly open.

The Engagement and Participation Challenge:

Open data may not be truly open if it is not accompanied by meaningful engagement and participation. Open data initiatives should strive to involve diverse stakeholders, including the public, in the design, implementation, and evaluation of open data programs. This can help ensure that the data being made available is relevant, meaningful, and aligned with the needs and aspirations of the communities it is intended to serve.

How Open Data fails its purpose?

After having discussed the high level challenges that open data may suffer from, it it's now time to understand when and how, open data fails to be really open.

The goal of open-data platforms was to enable not just data access but also enabling value creation. While open-data initiatives have improved research and decision-making by making data more accessible on a greater scale, they do not fully optimise the quality of the data created and made available. The development of datafication, in particular, has now spread to most industries, creating ever-increasing volumes of data without a sustainable data structure and granularity behind it. Additionally, without open-source database standards, released data stays segregated. While each dataset may be acquired in a variety of ways, the lack of a standardised architecture and ontology for organising this data makes it difficult to combine with other datasets. Making such data public does not achieve the goals of open data and hampers data's latent decision-support potential.

Finally, the impact of open data varies by country, whether developed, developing, or least developed. Even with funding, preserving and upgrading data may be challenging, especially if the open data program is not expanded. Such projects are unsustainable in the long term, resulting in inaccurate, incomplete, and out-of-date data. Open data alone does not lead to better decision-making or assessment of a circular economy and its associated challenges. To give meaning to data and transform it into knowledge, data must be manageable, reliable, clear, and sustainable in order for it to be efficiently used, reused, shared, and recycled and have the expected positive effects on the assessment, monitoring, and achievement of any of the circular economy objectives.

The Problem with Data Silos and Standardisation

Many businesses and industries have data silos, or separate data repositories that are not easily accessible or connected with other data sources. Open data projects seek to break down data silos by making data freely available to the public and encouraging its reuse for a variety of reasons. However, even with open data, data silos can still persist due to the lack of standardisation.One of the main reasons for this is the poor quality of data made available through open data initiatives. Data quality issues can arise from various sources, such as data inaccuracies, incompleteness, inconsistency, and lack of timeliness. Poor quality data can significantly impact the usability and reliability of open data, making it difficult for users to trust and utilise the data effectively. This can also discourage data providers from sharing their data, further contributing to data silos.In addition to data quality issues, the lack of standardisation in open data initiatives poses a significant challenge. Each open data initiative may have its own data standards, formats, and protocols, which can create inconsistencies and incompatibilities when trying to integrate and reuse data in different contexts. This lack of a common data architecture hinders interoperability and data exchange between different open data initiatives , making it challenging to achieve meaningful data integration and analysis.

It is crucial to note that, this is where newer concepts of Data circularity and approaches like data repurposing come in handy. To ensure that open data is truly open, and aids in creation of value from it, the focus has to shift more towards data circularity. In the next section, we will look at the two concepts and find out how they compliment one another.

Data Circularity vs Open Data

Data Circularity is a concept that goes beyond simply making data open. It emphasises the need to create a circular flow of data where data is not only open for access but also actively reused, repurposed, and shared among different stakeholders in a continuous loop. Data Circularity promotes the idea of data as a reusable resource that can be utilised multiple times across different domains and applications , rather than being used in a linear and one-time manner.One of the key differences between Open Data and Data Circularity is the focus on reuse and repurposing of data in the latter. Open Data is primarily about making data available for access, while Data Circularity takes it a step further by emphasising the need to actively reuse and repurpose data to create value and generate positive impact. Data Circularity encourages stakeholders to think beyond their immediate use of data and consider how the data they produce or consume can be utilised by others in different contexts to solve different problems or create new opportunities.Another difference between the two concepts is the mindset and culture they promote. Open Data is often driven by the idea of sharing data for transparency and accountability purposes, and it may be mandated by regulations or policies. On the other hand, Data Circularity promotes a culture of collaboration, innovation, and creativity, where stakeholders actively seek opportunities to share and reuse data to create value and achieve common goals. It requires a proactive approach towards data sharing and collaboration, and a willingness to think beyond traditional silos and boundaries.

Data Circularity also has the potential to address some of the challenges and risks associated with Open Data. Open Data can sometimes raise concerns about data privacy, security, and misuse. Data Circularity, on the other hand, promotes responsible and ethical data sharing by emphasising the need to repurpose data in a way that respects privacy and security concerns, and aligns with legal and ethical standards. It encourages stakeholders to be mindful of the context in which data is shared and repurposed, and to take necessary precautions to protect data integrity and confidentiality.while Open Data and Data Circularity share some similarities in terms of promoting data transparency and accessibility, they are not the same concepts. Data Circularity goes beyond simply making data open by emphasising the need for active reuse and repurposing of data in a circular and sustainable manner. It promotes a culture of collaboration, innovation, and responsible data sharing , and has the potential to generate more value and impact from data. By adopting a Data Circularity mindset, organisations can unlock the full potential of data as a reusable resource and contribute towards building a more sustainable and efficient data ecosystem.

New approaches: Open Data Ecosystems

The notion of open data ecosystems originated in reaction to the drawbacks of open data. Data ecosystems are platforms that integrate numerous data sources based on a common data taxonomy and ontology, as well as an incentive for other users to share and discover new data sources.

These platforms are driven not just by data availability, but also by its value and, as a result, its potential for sharing and cooperation. When publishing data to the platform, one of the criteria is that users map and upload standardised data rather than utilizing their own data structure. Individuals, civil society, and academics, government and international organizations, and the corporate sector may all benefit from the flow of available quality data through data ecosystems.

Since data moves across multiple systems, building a data ecosystem necessitates interoperability from platform to platform. One of the pillars of such a data approach is the development and distribution of data standards and rules. In the long run, it encourages collaboration between and among diverse actors by using the decision-support potential of present and future data.

The feature of data that is heavily underutilised is it's inability to lose or diminish its value after its initial use. Therefore, repurposing data goes a long way. Repurposing data, however, necessitates the presence of a long-term data infrastructure that allows not just availability, but also the secure flow of data in forms that make this data relevant to multiple users and use cases. This should be an inbuilt characteristic of open data ecosystems, inline with the principles of data circularity.

The concept of trust in the ecosystem is a second basis of such a data ecosystem. Several degrees of trust are necessary for an integrated circular data ecosystem, ranging from individual data privacy concerns to data sovereignty concerns. For data to flow seamlessly from actor to actor, platform to platform, the data creator must believe in the ecosystem's safety and security , while the potential client of their data must trust that the data supplied is of the required quality. The ecosystem governance model, described as "the set of rules, processes, and practices that govern how interactions between data economy participants are conducted," is a critical component of trust in a data ecosystem.To create a long-term data ecosystem, the governance model must include all stakeholders' privacy and security concerns.

Summary

In this post, we discussed the challenges associated with open data, such as accessibility, representation, privacy and security, context and explanation, and engagement and participation. The article also explains how open data fails its purpose when data remains segregated due to the lack of standardised architecture and ontology for organising data. And finally, open data ecosystems are discussed which have the potential to facilitate a more informed and empowered evidence-based decision system while practicing the best guidelines of data circularity.

References

European Commission. (2014). Towards a circular economy: A zero waste programme for Europe (Communication from the Commission to the European Parliament, the Council, the European Economic and Social committee and the Committee of the Regions COM. Brussels. Available online: https://eur-lex.europa.eu/resource.html?uri=cellar:50edd1fd-01ec-11e4-831f-01aa75ed71a1.0001.01/DOC_1&format=PDF
K. Cukier and V. Mayer-Schoenberger, (2013) The rise of big data: How it’s changing the way we think, Foreign Affairs#92, 28–36.
Joseph T. Bonivel Jr. Ph.D and Solomon Wise. (2022) THE DATA DIVIDE How Emerging Technology and its Stakeholders can Influence the Fourth Industrial Revolution, Atlantic Council.
European Commission. (2020) Strategy for Data. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52020DC0066
Lorenzo Bigagli, Stefano Nativi and Merel Noorman. (2017) Visions of open data. In Open Data and the Knowledge Society by Bridgette Wessels, Rachel L. Finn, Kush Wadhwa, Thordis Sveinsdottir. Amsterdam University Press.
Ubaldi, B. (2013) Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives. OECD Working Papers on Public Governance No. 22.
WorldWideWeb Foundation (2016) Open Data Barometer Report. Available online: https://opendatabarometer.org/doc/4thEdition/ODB-4thEdition-GlobalReport.pdf
Open Government Data. (2007) Open Government Data Principles. Available online: https://opengovdata.org/
Kenneth Cukier and Viktor Mayer-Schoenberger. (2013) The Rise of Big Data: How It's Changing the Way We Think About the World. Foreign Affairs, Vol. 92, No. 3, pp. 28-40.
Michael Canares and Satyarupa Shekhar. (2016) Open Data and Sub-national Governments: Lessons from developing countries. Open Data for Development.
Tim Davies. (2014). Open Data in Developing Countries: Emerging Insights from Phase 1.
Backx, M. (2003). Gebouwen redden levens. Toegankelijkheidseisen van gebouwgegevens in het kader van de openbare orde en veiligheid. [Buildings save lives. Accessibility requirements for buildings in the context of the public order and safety]. MSc. dissertation, Delft University of Technology.
Wikipedia. (2023) DIKW pyramid. Available online: https://en.wikipedia.org/wiki/DIKW_pyramid
Declan Deasy, Yaroslav Eferin, Oleg Petrov. (2022) Integrated national data ecosystems: the next stage of digital transformation. World Bank Blogs. Available online: https://blogs.worldbank.org/digital-development/integrated-national-data-ecosystems-next-stage-digital-transformation
World Bank. (2021) Data for Better Lives. Available online: https://www.worldbank.org/en/publication/wdr2021
Ericsson. (2020) Data interoperability across IoT ecosystems with One Data Model (OneDM). Ericsson Blog. Available online: https://www.ericsson.com/en/blog/2020/9/data-interoperability-across-iot-ecosystems-with-onedatamodel
One Data Model. (2022) Available online: https://onedm.org/
ELISE. (2023) Establishment of Sustainable Data Ecosystems - Recommendations for the evolution of spatial data infrastructures into self-sustainable data ecosystems. Available online: https://joinup.ec.europa.eu/collection/elise-european-location-interoperability-solutions-e-government/establishment-sustainable-data-ecosystems-recommendations-evolution-spatial-data-infrastructures
ONTOCHAIN. (2023) Available online: https://ontochain.ngi.eu/
Gaia X. (2020) Gaia X: A Federated Data Infrastructure for Europe. Available online: https://www.data-infrastructure.eu/Redaktion/EN/Dossier/gaia-x.html
European Commission. (2022) Interoperable Europe Act Proposal. Available online: https://commission.europa.eu/publications/interoperable-europe-act-proposal_en
ITIF. (2023) US-EU Data Sharing Partnership for AI Is a Welcome Step Forward, Says Center for Data Innovation. Available online: https://itif.org/publications/2023/01/27/us-eu-data-sharing-partnership-for-ai-is-a-welcome-step-forward/