A data warehouse is a central repository of integrated data from disparate sources which has been optimized for analytics and reporting. A cloud data warehouse takes a new approach by offering data warehouse functionalities as a service instead of in a traditional on-premise data center.
Availing of a data warehouse as a service (DWaaS) approach offers several benefits, including:
- Low-cost data warehousing because it negates the high expenses of setting up and maintaining an on-premise data center.
- More time to focus on the value derived from data instead of the intricacies of managing a data center.
- Effortless scalability, with the ability to easily adapt to increased (or decreased) data volumes and workloads.
- Makes data warehousing more accessible to a greater number of companies that might not have the resources to build their own on-premise data center.
The transition to a cloud data warehouses can prove difficult for enterprises. Whether your company is moving from a legacy on-premise setup or the cloud represents your first foray into data warehousing, these five best practices will help you get it right from the start.
Each cloud data warehouse service provider has its own unique architecture, however, there are some common architectural trends you’ll need to become familiar with before choosing a vendor.
- Massively parallel processing: several leading cloud data warehouse vendors use a massively parallel processing architecture, which partitions data across many servers or nodes, each with its own processor. The MPP design coordinates query workloads over all processors simultaneously for much faster data querying.
- Columnar storage: in a columnar database, tables are stored by their column values rather than by row. This type of design leads to better performance by reducing the time needed to answer queries.
- Serverless: some cloud data warehouse vendors have a serverless architecture wherein you don’t need to provision or manage any infrastructure.
Panoply has a good resource on cloud data warehouse architecture compared to traditional on-premise data warehouse architecture.
Security and compliance concerns are significant stumbling blocks for enterprises considering a cloud data warehouse service, and rightly so. It is no surprise that decision-makers worry about the risks of information security violations in the cloud—when data is trusted in the hands of a vendor, this can create fear due to a lack of control. The fear of security issues is further compounded when you consider that the data warehouse is a core component of business intelligence.
It’s imperative to perform due diligence and ensure potential data warehouse cloud vendors can satisfy all your security and data protection requirements. Important things to look out for include:
- Strong user authentication policies
- Data encryption
- Compliance with relevant industry regulations, for example, HIPAA for protected health information
- Detailed documentation highlighting the vendor’s policies for data security, protection, archiving, and replication on virtualized environments and the underlying hardware
Don’t go all-in from the get-go—a phased approach makes the most sense in which you first migrate the most straightforward, least technical workloads to the cloud-based data warehouse. Variable workloads, such as financial reporting activities are good candidates for initial migration because they do not continuously use data warehouse resources. Business unit specific workloads as part of a departmental data mart are also good for initial migration.
Sustained workloads such as the daily reporting that drives tactical business decisions can be migrated later, as can business-critical auditable workloads. Doing things incrementally can build confidence and trust in the cloud data warehouse, and the experience gained will be invaluable for migrating more complex workloads and use cases.
Traditionally, ETL (Extract, Load, Transform) tools are used to move and integrate data from transactional systems into a data warehouse. The data is consolidated and transformed before it ends up in the data warehouse. It’s important for enterprises to ensure existing ETL tools and data flows are validated for a cloud implementation with support for cloud-native technologies
For enterprises making their first move into data warehousing, it might be best to look for native cloud data integration tools (ETL as a service), which have been built to service the challenge of integrating data from transactional on-premise database systems into a cloud repository.
Increasingly, enterprises are using an ELT approach in which raw data goes straight from source systems to the cloud data warehouse, where it is transformed as-needed for particular business use cases. The power of cloud-based infrastructure makes ELT more feasible.
Even though the cloud is undoubtedly a cheaper option for data warehousing, it’s still vital to have transparency on the potential costs incurred from data warehouse activities. Different vendors use different cost models, with some charging for the amount of data queried. Most vendors also charge for data egress (moving data out of the public cloud back to on-premise systems).
It’s a good idea in all cases to anticipate costs and communicate to relevant business users the importance of being efficient with the queries they perform on data residing in the data warehouse. Visibility on other costs is also important, including data backup and disaster recovery expenses.
These best practices will help guide you in choosing the right service provider and getting set up in a cloud data warehouse. Once you have decided to make the transition, make sure your company builds a strong rapport with your chosen vendor, with a focus on responsiveness, performance, and engagement.