Bonkur Harshith Reddy

Posted on Nov 20

A Deep Technical Chronicle of the AWS Data and AI Meetup in Hyderabad: Unified Studio, Bedrock, and Modern Migration

#aws #data #ai #cloud

Introduction

The AWS Data and AI Meetup in Hyderabad offered an entire day of hands-on learning across analytics, machine learning, generative AI, and large-scale data migration. Through a combination of conceptual sessions and practical workshops, the event demonstrated how AWS services integrate to build modern, scalable data and AI systems.

This article documents the full experience in depth, covering both the architectural discussions and the step-by-step implementations we followed throughout the workshops.

Organized By

This event was organized by Hafiz Mohammad Khan, an AWS Community Hero who actively leads and supports AWS events and developer communities across Hyderabad.

The AWS Community Heroes program recognizes technologists who consistently contribute knowledge, organize events, and support developers across the global AWS ecosystem. Hafiz coordinated the sessions, workshops, and overall flow of the meetup, ensuring a smooth and engaging technical experience.

Why I Attended This Meetup

My background is primarily in Google Cloud Platform, where I have worked with BigQuery, data processing workflows, and the broader GCP AI ecosystem. Over time, I grew increasingly curious about how AWS approaches the same large-scale data engineering, ML, and generative AI challenges.

I wanted to see firsthand how AWS enables:

Unified Analytics
Combining structured, unstructured, and streaming data into a single platform so SQL, ML, and BI workloads operate from one unified layer.
ML Lifecycle Management
Managing data preparation, training, tuning, deployment, and monitoring through a standardized and automated process.
Dataset Governance
Managing access, lineage, quality, security policies, and compliance across complex datasets.
Lakehouse Architectures
Combining the flexibility of data lakes with the reliability and performance of data warehouses using open formats like Iceberg.
GenAI Integration
Building applications powered by embeddings, foundation models, and orchestration features through services like Amazon Bedrock.
Large-Scale Migration
Moving enterprise databases and analytical workloads into AWS using tools like DMS Serverless and SCT.

This event offered the perfect opportunity to explore the AWS ecosystem from end to end.

Event Flow

The day followed this sequence:

Session 1 → Workshop 1 → Lunch → Workshop 2 → High Tea → Session 2

This structure created a balanced mix of learning and networking while giving time to interact with speakers, AWS specialists, and fellow participants.

Speakers and Their Expertise

Neha Prasad
Analytics Specialist at AWS
Anirudh Chawla
Analytics Specialist at AWS
Shivapriya
Solutions Architect at AWS
Vishal Alhat
Developer Advocate at AWS
Harsha Mathan
Principal Data Engineer at Verisk

Session 1: The Modern Data and AI Problem Landscape

Speaker: Neha Prasad

The opening session focused on the challenges enterprises face while scaling data and AI initiatives.

High Effort Machine Learning Systems

Enterprises often rely on disconnected tools for exploration, feature engineering, training, and deployment. This fragmentation slows iteration and increases operational complexity.

Persona Fragmentation

Data engineers, analysts, data scientists, and ML engineers use different tools with varying governance standards, making collaboration and reproducibility difficult.

Data Growth vs. Data Utilization

Although organizations collect massive amounts of data, only a small portion gets used effectively because ingestion, governance, analytics, and ML pipelines lack tight integration.

Governance Challenges

Access control, lineage tracking, quality checks, and cataloging tools often operate in silos, lowering confidence in large-scale pipelines.

Why SageMaker Unified Studio

Unified Studio solves these problems by centralizing analytics, data preparation, ML workflows, governance, and lineage into a single tightly integrated environment.

Understanding SageMaker Unified Studio

A Single Workspace

Unified Studio allows users to perform:

SQL Analytics
Run SQL queries directly inside SageMaker to explore structured datasets.
Notebook-Based Experimentation
Use Jupyter-style notebooks for prototyping and model development.
Data Preparation
Clean, transform, and preprocess raw data for ML or analytics.
Pipeline Creation
Build automated workflows for ingestion, training, evaluation, and deployment.
Training
Run scalable distributed training jobs.
Deployment
Publish models as endpoints or batch jobs for real applications.
Lineage Tracking
Track dataset evolution, transformations, and model dependencies.

Kernel Per Cell Model

Users can run SQL, Python, Bash, or PySpark within the same notebook, enabling hybrid workflows without switching tools.

Integrated Governance

Unified Studio connects directly to the AWS Data Catalog, enabling:

Dataset Versioning
Automatically track dataset changes to enable rollback, comparison, and reproducibility.
Metadata Management
Store schema information, owners, classifications, and descriptions.
Schema Rules
Enforce structural and validation requirements across data pipelines.
Access Controls
Manage who can view or modify datasets for secure and compliant usage.

Iceberg Support

Apache Iceberg integration enables:

ACID Compliance
Ensures consistent concurrent reads and writes at any scale.
Schema Evolution
Modify tables without breaking downstream jobs.
Time Travel
Query historical versions for debugging or audits.
Partition Evolution
Change partition strategies without reprocessing data.

These capabilities are essential for large-scale analytic pipelines.

What I Learned From This Session

Before this session, I only had a surface-level idea of how AWS unified analytics and ML workflows actually worked. Seeing Unified Studio in action made it clear how AWS connects data preparation, analytics, training, deployment, and governance inside one seamless environment.

I realized how powerful features like dataset versioning, schema evolution, time travel, lineage tracking, and multi-kernel execution are in reducing friction across teams and tools. These capabilities solve many of the coordination and reproducibility challenges I’ve faced in real projects.

This session showed me how mature and integrated the AWS data platform has become. It made me want to explore Iceberg tables, Unified Studio pipelines, and governed ML workflows in much more depth.

Workshop 1: End-to-End Analytics to ML Pipeline Using Unified Studio

Speaker: Anirudh Chawla

This workshop demonstrated how to build a complete analytics-to-ML workflow using a sales dataset.

Creating Analytics and ML Projects

We created two environments:

Analytics Project
Used for dataset exploration.
ML Project
Used for feature engineering and model training.

Unified Studio automatically provisioned infrastructure and configurations.

Dataset Exploration

Inside the Analytics Project, we:

Uploaded the sales dataset
Imported the raw CSV so it could be profiled, queried, and analyzed.
Used SQL for exploratory queries
Ran SQL statements to inspect row counts, filter data, aggregate metrics, and validate data quality.
Viewed auto-generated visualizations
Quickly explored trends and anomalies with built-in charts.
Examined column-level statistics
Reviewed min, max, mean, distinct counts, and missing values to assess readiness.

Publishing the Dataset

Once the exploration phase was complete inside the Analytics Project, we published the cleaned and analyzed dataset to the AWS Data Catalog. This step essentially “promoted” the dataset from a local working copy into a governed, shareable asset. Publishing added metadata, schema details, and access controls, making the dataset discoverable to other projects inside Unified Studio. This also ensured that downstream teams or ML pipelines always referenced a validated, consistent version of the data rather than ad-hoc files.

Switching to the ML Project

After publishing, we switched from the Analytics Project into the ML Project to begin the machine learning workflow. Instead of manually uploading files again, we simply imported the published dataset from the Data Catalog. This guaranteed that the ML pipeline consumed the same curated data we explored earlier, with all transformations and schema definitions preserved. Once imported, the dataset became available inside Data Wrangler and the training workflows, allowing us to begin feature engineering, validation, and model development without repeating any exploration steps.

Data Wrangler Transformation

Using Data Wrangler, we:

Cleaned missing values
Filled or removed incomplete entries.
Engineered features
Created derived variables to enrich model performance.
Applied validation rules
Ensured the dataset met quality and formatting requirements.
Prepared the dataset for training
Output the processed data into a training-ready format.

Pipeline Construction

We built a complete ML pipeline consisting of:

Preprocessing
Automated data cleaning, transformations, and feature engineering.
Training
Triggered a job to train an ML model using the prepared data.
Evaluation
Assessed model accuracy using validation metrics.
Conditional model registration
Registered the model only if it met required quality thresholds.

Model Deployment and Lineage

The model was deployed as an endpoint. Unified Studio displayed full lineage from ingestion to deployment, supporting reproducibility and auditability.

What I Learned From This Workshop

Workshop 1 finally showed me how an end-to-end ML workflow actually comes together inside SageMaker Unified Studio. I’ve used separate tools for data exploration, feature engineering, pipeline orchestration, and deployment before, but I had never seen all of them integrated so tightly in one environment.

I learned how Unified Studio simplifies every step: exploring datasets with SQL, transforming them with Data Wrangler, and automating the entire process using ML Pipelines. Seeing preprocessing, training, evaluation, and conditional model registration run seamlessly in a single pipeline made it clear how mature the AWS MLOps ecosystem has become.

The hands-on demo also highlighted features I previously underestimated, like dataset publishing, lineage tracking, project-level separation, and automatic environment provisioning. These capabilities remove a lot of friction that usually slows down real-world ML workflows.

After this workshop, I now understand how to build production-ready ML pipelines the AWS way, and I’m excited to experiment more with Data Wrangler flows, conditional pipeline steps, and automated model deployment from end to end.

Workshop 2: Generative AI Image Editing Using Bedrock

Speakers: Vishal Alhat and Shivapriya

This workshop focused on building a generative AI application using a fully serverless architecture.

Architecture Components

The application used:

AWS Amplify
Hosted and served the frontend with CI/CD capabilities.
Amazon Cognito
Handled authentication and user session management.
API Gateway
Routed frontend requests to backend Lambda functions.
AWS Lambda
Executed backend logic, triggered Bedrock requests, and returned results.
Amazon Bedrock
Performed generative AI image manipulation using foundation model APIs.
Amazon DynamoDB
Stored metadata such as prompts, job IDs, timestamps, and output references.

Application Flow

Users authenticated through Cognito and submitted prompts or images through the Amplify frontend. API Gateway routed requests to Lambda, which invoked Bedrock models for image generation or editing. DynamoDB stored metadata for tracking and retrieval.

Hands-On Takeaway

This workshop showcased how generative AI applications can be built without provisioning GPUs or managing ML infrastructure. Bedrock simplifies foundation model usage, while serverless components handle scalability.

My Takeaways from Workshop 2

Workshop 2 showed me how quickly a complete GenAI application can be built when every component is serverless. Seeing Amplify, Cognito, API Gateway, Lambda, Bedrock, and DynamoDB working together helped me understand how each service fits into the overall flow. I realized how much complexity disappears when authentication, API routing, backend logic, model invocation, and database storage are all managed for you by AWS.

The hands-on demo made it clear that Bedrock is not just an AI model hosting service. It becomes much more powerful when paired with Lambda for orchestration and DynamoDB for storing metadata and user context. I also learned how frontend and backend pieces communicate through API Gateway and how Amplify simplifies deployment.

Overall, this workshop gave me confidence that building a production-ready GenAI feature does not require managing GPUs or heavy ML infrastructure. The serverless architecture made the entire workflow feel simple, scalable, and practical for real applications.

Session 2: Database Migration Deep Dive (DMS, SCT, Snowflake)

Speaker: Harsha Mathan

This session walked through an enterprise migration from a legacy SQL Server system to Snowflake.

Migration Challenges

Large migrations often encounter:

Unpredictable CDC volume
Change Data Capture streams may spike unexpectedly, causing lag or replication issues.
Schema incompatibilities
Source and destination do not always align, requiring transformations.
High operational overhead
Migration jobs require careful monitoring, troubleshooting, and coordination.
Infrastructure saturation during spikes
Sudden load surges can overwhelm legacy systems and slow migration.

End-to-End Migration Architecture

The full migration pipeline included:

SQL Server (source)
The transactional system that supplied both full and incremental data.
Step Functions (orchestration)
Managed workflow sequencing, retries, and state tracking.
AWS DMS (replication)
Performed full load and continuous CDC replication.
Amazon S3 (Parquet staging)
Stored incoming replicated data in Parquet format.
AWS Glue (schema adjustments)
Cleaned and transformed schema mismatches between the source and Snowflake.
Snowflake (destination)
The cloud data warehouse used for analytics consumption.

Full Load and CDC Separation

Separating historical full loads from ongoing CDC streams created a much more stable migration flow. Full load jobs typically involve large volumes of static historical data, while CDC streams handle real-time incremental updates. Running them together often leads to contention, latency, and unnecessary retries. By isolating these two phases, the team ensured that the heavy historical batch did not interfere with the continuous replication pipeline. This also simplified troubleshooting, improved throughput, and enabled the migration to progress predictably without overwhelming the source system.

Parquet and Glue Integration

Storing replicated data in Parquet format offered significant performance and cost benefits. Parquet’s columnar structure compressed better, reduced storage footprint, and accelerated analytical queries compared to raw formats like CSV or JSON. AWS Glue then stepped in to handle schema alignment, type corrections, and transformation of fields that did not map cleanly from SQL Server to Snowflake. This combination of Parquet and Glue provided a clean, optimized staging layer that ensured data was structured correctly and efficiently before being loaded into Snowflake for analytics.

DMS Serverless

Using DMS Serverless removed much of the operational burden typically associated with managing migration infrastructure. Instead of manually allocating resources or worrying about capacity planning during CDC spikes, DMS Serverless automatically scaled replication capacity in response to workload changes. This eliminated throughput bottlenecks and reduced the chances of lag building up during peak periods. It also simplified administrative overhead, as there were no servers to patch, monitor, or resize. Overall, it made the migration pipeline more resilient and hands-off, especially for long-running enterprise workloads.

Generative AI in SCT

AWS SCT uses generative AI to automatically convert SQL Server stored procedures and functions into Snowflake-compatible syntax, reducing manual rewriting.

My Key Takeaways

By the end of the meetup, I gained a deeper understanding of how modern data and AI systems are built on AWS:

I learned how SageMaker Unified Studio brings data exploration, feature engineering, ML pipelines, and deployment into a single governed workspace, removing the friction of switching between multiple tools.
I understood how features like dataset versioning, lineage tracking, schema evolution, and access controls play a critical role in building trustworthy and compliant analytics pipelines.
The Apache Iceberg discussion helped me see how open table formats enable scalable lakehouse architectures with ACID guarantees and reproducibility.
The GenAI workshop showed me how serverless components such as Amplify, Cognito, API Gateway, Lambda, Bedrock, and DynamoDB work together to form a simple, scalable, production-ready application architecture.
The migration deep dive clarified how enterprise systems move from legacy databases to modern warehouses using DMS Serverless, Step Functions, Glue transformations, and Parquet staging.
Overall, the event helped me connect analytics, ML, GenAI, and migration patterns into one cohesive view of how AWS approaches end-to-end data engineering and AI workflows.

What’s Next: AI for Bharat Program

During the meetup, the speakers also highlighted the AI for Bharat initiative, a nationwide program designed to help developers across India build real-world generative AI applications using AWS. The program combines structured workshops, hands-on labs, and a national-level hackathon focused on analytics, LLMs, Bedrock, agents, and scalable cloud architectures.

You can explore the program here:
🔗 https://vision.hack2skill.com/event/ai-for-bharat

After attending this meetup and getting hands-on experience with Unified Studio, Bedrock, serverless application design, and migration workflows, the AI for Bharat program feels like the perfect next step. It offers an opportunity to apply these skills in a competitive setting, build production-ready AI solutions, earn certificates, and collaborate with developers across India.

If you want to build with GenAI and cloud-native architectures on AWS, this is one of the best programs to join.

Conclusion

The AWS Data and AI Meetup in Hyderabad provided a comprehensive look into modern cloud-native data engineering, machine learning, and generative AI practices. The combination of conceptual sessions, detailed architecture discussions, and immersive hands-on workshops made the event extremely valuable.

For anyone exploring AWS for large-scale data and AI systems, this meetup offered a complete and practical blueprint for what modern cloud solutions look like in production.