DEV Community

Marwa Talaat for AWS MENA Community

Posted on

Amazon Connect Data Lake Best Practices | AWS Whitepaper Summary

Nowadays, organizations are developing data lake strategies to harness intelligence from diverse and ever-growing data. According to Aberdeen, the survey concludes that today’s organizations manage an average of 33 unique data sources for analytics and experience 50% year-over-year data volume growth. Rapid data volume growth creates challenges in data management and storage capacity.
The traditional on-premises call center. The hardware communication, software, and infrastructure are all stored and operated in business places. Therefore, you need a dedicated IT team responsible for installation, maintenance, software support is controlled internally, and much more. his means heavy recurring fees over the long term for small and midsize businesses.
Therefore, the organizations need to get the most benefit from advanced analytics to robust platform and cost-effective to run a solution like a contact center. Amazon Web Services (AWS) provides customers with a comprehensive set of services and a scalable platform to ensure high availability, security, and resiliency of a data lake in the cloud.

This whitepaper outlines the best practices for architecting a contact center data lake with Amazon Connect.

The figure shows how it involved multiple proprietary systems, resulting in disparate data sources containing data in various formats. Therefore, there are challenges in standardizing and consolidating information that slows down the discovery of new business insights or possible operational issues.


As we know that data plays a crucial role in contact center systems success. So, having a streamlined designed contact center could help customer service agents to deliver a better customer experience by using data lake.

What is Data Lake?
It is more than just a data repository. It is centralized and allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without structuring it, run analytics, big data processing, real-time analytics, and machine learning for predictive decisions.

Is there a difference between Data Lake and Data Warehouses? When to use them?
Yes, it depends on the requirements, a typical organization will require both a data warehouse and a data lake as they serve different needs and use cases. The following table shows some differences in characteristics between them.

Essential elements to consider in building Data Lakes:

1. Data Movement: It allows you to import any amount of data that can be real-time data from multiple sources and moved into the data lake. It also allows you to scale to data of any size, defining structures, schema, and transformations.
2. Securely store, and catalogue data: It allows you to store relational and non-relational. It also allows you to understand data through crawling, cataloguing, and indexing of data. Finally, data must be secured to ensure your data assets are protected.
3. Analytics: It allows various roles like data scientists to access data with their choice of analytic tools and frameworks
4. Machine Learning: It allows organizations to generate insights and doing machine learning models, predictions, and recommendations to achieve the optimal result.

What are the main challenges of Data Lake?

  1. Architecture raw data is stored with no oversight of the contents
  2. Making data usable, it needs to have defined mechanisms to catalogue, and secure data.

Should you host your data lake in the cloud?
Data Lakes are an ideal workload to be deployed in the cloud because the cloud provides performance, scalability, reliability, availability, a diverse set of analytic engines, and massive economies of scale.

Data Lakes in the cloud on AWS

AWS delivers the breadth and depth of services to build a secure, scalable, comprehensive, and cost-effective data lake solution. You can use the AWS services to ingest, store, find, process, and analyze data from a wide variety of sources. The following figure shows a strategic approach for a contact center with Amazon Connect.


Amazon Connect

It is a fully managed cloud-based and artificial intelligence (AI) enabled contact center within minutes, pay-as-you-go model which is easy-to-use and cost-effective and there is no infrastructure to manage or upfront costs.

Amazon Connect Benefits in Contact Center

Considerations when designing a data lake:

  • The way of collecting data, their types and structure.
  • The Data storage and share petabytes of data, on-demand globally and cost-effective
  • The scale IT resources to support a high number of concurrent queries against data.
  • The users view, search, and run queries on multiple data repositories.
  • The future insights using historical data patterns and past scenarios

Data lake data types

Some of the data types that Amazon Connect manages:
1. Customer profiles: Enables agents to deliver efficient and personalized customer service by importing customer information from various applications into a unified customer profile
2. Contact trace record: Captures transactional metrics such as hold time, wait time, and agent interaction time in JSON format. Amazon Connect aggregates data to create metrics reporting.
3. Contact flow logs: Captures real-time events and metrics about how your customers interact with contact flows
4. Contact Lens output files: Using NLP and speech-to-text analytics, Contact Lens for Amazon Connect provides insights to analyze customer sentiment, identify conversations trends for product feedback, and compliance audits for standard greetings and signoffs.
5. Agent events streams: Captures and stores agent activity in S3 via Amazon Kinesis Data Streams. You can create dashboards for near real-time agent reporting such as agent login, agent logout, agent connects with a contact and agent status change.
6. Voice and chat recordings: Amazon Connect records a conversation only when a customer connects to an agent. When the contact disconnects, the call recordings are available in your S3 bucket, or accessible in the customer's contact trace record.
7. Third-party integration: AWS Partners or other third-party solutions with Amazon Connect to consolidate logs and external data sources in Amazon S3.

Data lake lifecycle

Building a data lake typically involves five stages:
• Setting up storage: As your data lake grows, you need an object storage service like S3 that offers industry-leading scalability, data availability, security, and performance. S3s Storage Lens provides organization-wide visibility into object storage usage and activity trends with actionable recommendations to reduce cost and operational overhead.
• Moving data: AWS provides a comprehensive data transfer services portfolio to move your existing data into a centralized data lake. Amazon Storage Gateway and AWS Direct Connect can address hybrid cloud storage needs.
• Preparing and cataloguing data: Amazon Connect segregates data by AWS account ID and Amazon Connect instance ID to ensure authorized data access at the Amazon Connect instance level.
• Configuring security policies: You can create IAM users, groups, and roles in AWS accounts and associate them with identity-based policies that grant access to S3 resources.
• Making data available for consumption: Once your data lands in the S3 data lake, you can use any purpose-built analytics services such as Amazon Athena and Amazon QuickSight for a wide range of use cases without labour-intensive extract, transform, and load (ETL) job
The following figure is a high-level architecture diagram of an Amazon Connect contact center data lake


Conclusion and further reading

Amazon Connect is a purpose-built omnichannel cloud contact center. You can simplify operations, improve agent efficiency, and lower contact center costs. Amazon S3 is a scalable, durable, and reliable service to build and manage a secure data lake at scale for contact centers. You can store all your contact center data as-is in the S3 data lake without restructuring the data. Your employees and stakeholders can run various analytics on the contact center data lake, including big data processing, real-time dashboards and visualizations, and ML to guide data-driven business decisions. You can harness the power and unleash the intelligence of your contact center data lake to accelerate business growth.

For additional information, see:

Original AWS Whitepaper
Data Lake Storage on AWS
Analytics on AWS
On-Premises vs. Cloud-Based Call Center Software: How to Make the Call
Data Lake vs Data Warehouse: What's the Difference?

Top comments (0)