Over the last 10 years there has been an explosion in data gathering. For example, a study conducted by IDC in 2021 estimated that on average, approximately 270 GB of healthcare and life science data will be created for every person in the world. The National Library of Medicine predicts that by 2025 the world’s human population will be 8 billion people. The scale and scope of having this amount of data to leverage is both enormous and daunting.
This is not a new problem; the phrase “Data Rich, Information Poor” was coined in 1996 to define the struggles healthcare organizations had in reviewing medical data, records, patient information, and medical history. If this was identified as a problem 25+ years ago, why is it that when we collect so much data today, we become information poor? CIOs and CDOs are asking the same questions: “How can we harness our data to gain insight into our business and create new and innovative ways to enhance customer experiences?”
Manage Data Correctly With Cloud
It all starts with how you manage and leverage the data you collect. For example, the challenge for healthcare and life science organizations is how best to store the data so it can be analyzed to provide value. It’s also important for that same data to improve patient care and reduce costs.
Managing sources of data is another challenge that is growing exponentially, as data coming from new sources that is increasingly diverse needs to be securely accessed and analyzed by any number of applications and people.
This creates the need for scalable, adaptable, and secure cloud infrastructure. It’s a primary driver for organizations to move to the cloud from legacy on-premises systems, and it opens new possibilities based on the pace of innovation from cloud providers. Choosing a provider based on these parameters will help enable the goal of better management and insights derived from the data.
Capitalize on Connectivity and Collaboration
There is a huge shift moving away from traditional data warehouse architecture, and that’s because there are many different silos. Analyzing the data accurately is a huge challenge due to a lack of computing capabilities. The result has many organizations looking to extract more value from their data but struggling to capture, store, and analyze all the data being generated by today’s modern and digital businesses.
As companies have accumulated vast amounts of data, that data lives in different silos, making it difficult to analyze. The silos cause multiple problems—the data needed for a given workload may be split across multiple silos and inaccessible, the silo where the data lives might not meet the price-performance requirements for a given workload, and the silos may require different management, security, and authorization approaches—increasing operational cost and risk.
Put the Pieces Together to Succeed at Scale
Organizations are looking for a highly scalable, available, secure, and flexible data storage solution that can handle extremely large data sets. To achieve this, companies should build data platforms that can store all the structured and unstructured data, use an open data format, and tag data in a central, searchable catalog. They also need to be able to run multiple analytics services against their data to ensure they have the right tool for the job.
For example, healthcare organizations can build state-of-the-art platforms, such as AI-assisted decision support systems that leverage artificial intelligence and existing data to analyze images and symptoms. The resulting analytics can be used to help care providers predict levels of need.
In a world full of unstructured and structured data, there exists a deep trove of valuable information. By moving that data into a solid cloud infrastructure and leveraging advanced data analytics, companies can more effectively mine and gather the information they need—making them both data rich, and information wealthy.
Where We Stand Today (May 2025)
Fast-forward to 2025, and the data landscape has grown not just in volume but in velocity and variety:
Zettabyte Era
Global data volumes are set to surpass 175 ZB—more than five times the 33 ZB of 2018.¹
Edge & Hybrid Workloads
Nearly 75% of enterprise-generated data is created and processed at the edge, powering use cases from autonomous vehicles to real-time personalization.²
AI-Native Demand
Generative AI and ML pipelines now require curated, lineage-tracked, and quality-gated datasets—manual data prep consumes up to 70% of engineering time.
The old model of “lift and shift” into monolithic lakes falls short. Today’s organizations must adopt federated architectures and automated governance to keep pace.
Charting the Next 24 Months
Domain-Driven Data Mesh
Teams own and publish “data products” (e.g., customer profiles, risk scores) into a shared catalog. This reduces time-to-insight from weeks to hours and aligns teams behind clear SLAs.
Metadata-Powered Governance Fabric
Automated engines (e.g., AWS Glue, Apache Atlas) tag, classify, and enforce policies via code. Privacy, masking, and retention rules apply at ingestion—no separate compliance projects required.
AI-Augmented Observability
The data observability market—valued at $2.3 billion in 2023 and growing over 11% annually³—will evolve into self-healing pipelines that recommend or enact fixes, cutting manual toil by up to 70%.
Privacy-Enhancing Collaboration
Data clean rooms and secure multi-party computation allow cross-enterprise analytics without exposing raw records—ideal for co-marketing and risk benchmarking.
Sustainable Data Operations
Carbon-aware scheduling and low-emission region targeting will optimize both cost and ESG impact, as sustainability becomes a board-level mandate.
Turning Insight into Advantage
The journey from Data Rich to Information Wealth starts with an integrated strategy of architecture, automation, and culture:
Pilot with Purpose: Focus on two high-value domains (e.g., fraud detection, supply chain). Measure time-to-insight, incident rates, and cost savings in six months.
Empower Teams: Launch a “data ambassadors” program, embedding champions in every business unit. Tie their objectives to data-product health metrics (freshness, quality, usage).
Automate & Scale: Package mesh and fabric components into infrastructure-as-code modules. Deploy self-service pipelines with low-code tools to halve central engineering tickets in a year.
By blending the lessons of 2022 with today’s innovations—domain meshes, governance fabrics, AI observability, and edge convergence—you’ll turn raw zettabytes into a sustainable competitive edge.
Link to the article i published on this topic from 2022
https://www.informationweek.com/data-management/trolling-the-data-rich
Sources used for this article
¹ IDC, Global Datasphere Forecast, 2018–2025
² Gartner, Edge Computing Trends, 2024–2025
³ MarketsandMarkets, Data Observability Market—Global Forecast to 2033
Top comments (0)