Manato Takai

Posted on May 31 • Originally published at kaminashi-developer.hatenablog.jp

Building an Application Log Analytics Platform with Amazon S3 Tables: Cost Optimization by Migrating from CloudWatch Logs

#architecture #aws #infrastructure #monitoring

Introduction

Our team had been using CloudWatch Logs as the log storage layer for our identity management system, but as the service grew, the associated cost became a concern. This article describes how we built and migrated to a log storage architecture using Amazon S3 Tables, referred to here as S3 Tables, and the cost optimization results we achieved.

Rising CloudWatch Logs Costs as the Service Grew

Amazon CloudWatch Logs, referred to here as CloudWatch Logs, is an AWS service for storing and analyzing logs. In many common architectures, application logs are stored directly in CloudWatch Logs. When an issue occurs, teams query logs using CloudWatch Logs Insights and configure alerts for specific error logs in combination with CloudWatch Alarms. Our identity management platform followed a similar pattern. Logs from ECS were delivered to CloudWatch Logs through a log router, and we used Amazon Managed Service for Grafana, referred to here as Grafana, for day-to-day development and operations.

As our services grew, the identity management platform, which handles authentication and authorization requests across all services, saw a corresponding increase in request volume and feature scope. As a result, logging costs grew to a level that could no longer be ignored. We had made steady improvements through weekly reviews by removing unnecessary logs and consolidating others, but these efforts alone were not enough to achieve substantial cost optimization. We therefore began considering a more fundamental solution.

The main costs incurred during normal CloudWatch Logs operations can be grouped into three categories: log ingestion, storage, and queries. As the pricing table also shows, log ingestion is the primary driver of cost. This meant that even if we optimized retention periods and other storage-related settings, log costs would continue to increase in proportion to request volume.

We therefore considered ingesting and storing logs in S3 Tables, which appeared to offer a reasonable balance between cost and operational usability.

Amazon CloudWatch Pricing

Migration to Amazon S3 Tables

What is Amazon S3 Tables?

S3 Tables is an AWS service that implements Apache Iceberg, an open table data format. As shown in the diagram below, Iceberg manages metadata for files stored in the Iceberg format, which represent the actual data. This allows query engines such as Amazon Athena and Spark to read and write the data as tables. Because the table structure is abstracted through metadata, Iceberg supports flexible changes to table schemas such as partitions and columns. It also provides time travel capabilities based on snapshot-managed metadata history, allowing past data to be queried. In addition, it addresses several limitations of the Hive format traditionally used in data lakehouse architectures. For more detail, I recommend reading the official Apache Iceberg site.

Unlike CloudWatch Logs, S3 Tables does not allow data to be queried flexibly without defining a schema in advance. Even so, we decided to migrate to S3 Tables after considering several advantages: the ability to search logs using familiar SQL syntax, the cost benefits of using low-cost S3 storage, the ability to delegate routine Iceberg operations such as compaction to a managed service, and the flexibility to adapt to future changes in log structure through schema evolution.

Gradual Migration Using a Log Delivery Pipeline

Before the migration, we sampled several logs from CloudWatch Logs, loaded them into S3 Tables, and validated the setup. However, we still did not know how practical it would be in real operations, so we migrated gradually using the following process.

After logs from ECS were received by the log router, the log router delivered them to both CloudWatch Logs and Amazon Data Firehose to S3 Tables.
We changed the query destination in Grafana, which we use as our monitoring tool, to S3 Tables and identified usability issues and operational concerns through day-to-day use.
Once we determined that operations with S3 Tables were viable, we changed the log router destination to S3 Tables. Error logs continued to be delivered to CloudWatch Logs because they were also used for alerts.

With this approach, even if we discovered that S3 Tables was not suitable for our operations, logs stored in CloudWatch Logs would remain available and queryable without data loss. This allowed us to migrate safely.

Cost Optimization Results After Migrating to S3 Tables

After we stopped delivering logs other than error logs to CloudWatch Logs, we quickly saw a significant cost optimization effect. The cost of log ingestion, which had been the main cost driver, was reduced by more than 80 percent. Query costs also decreased substantially after moving from CloudWatch Logs Insights to Athena.

Current Technical Constraints in S3 Tables and the Surrounding Ecosystem

Changing Partitions in S3 Tables

S3 Tables partitions cannot be changed from Athena. Therefore, if you want to modify the partition configuration after table creation, you need to write a Spark query using an AWS Glue job or execute DDL from another Iceberg client. Since Athena already provides a managed query service, this is an area where I would like to see improvement in the future.

Additional Athena operations for querying Iceberg tables

How to add partition fields to an Iceberg table

When Intelligent Tiering Can Be Enabled

Although S3 Tables has lower storage costs than CloudWatch Logs, costs still increase day by day as logs accumulate. S3 Tables supports Intelligent Tiering, announced in December 2025, which automatically optimizes the storage class of files that have not been accessed for a defined period. If appropriate partitions are in place, logs are often accessed primarily for recent time ranges, so older logs can be moved to a different storage class and further cost optimization can be expected.

However, Intelligent Tiering can only be configured when a table is created. A table created with standard storage cannot be changed later. In most cases, Intelligent Tiering should be the right choice, so it is worth making sure it is configured from the start. We recreated the table with considerable regret.

Notes on Building with Terraform

Our team uses Terraform to manage AWS infrastructure. We rely heavily on the modules provided by hashicorp/terraform-provider-aws, but not all API parameters around S3 Tables are supported yet. For example, pull requests have been opened for the partition settings and Intelligent Tiering settings mentioned earlier, but they have not yet been merged. A pull request I previously submitted for Athena managed query results support took about three months to be merged. If these features are important to you, consider reacting to the relevant pull requests and communicating your interest in Iceberg to the maintainers.

https://github.com/hashicorp/terraform-provider-aws/pull/46015

https://github.com/hashicorp/terraform-provider-aws/pull/46532

Conclusion

This article introduced our experience migrating application log storage from CloudWatch Logs to S3 Tables for cost optimization. CloudWatch Logs is highly usable and easy to adopt, so I would still choose CloudWatch Logs by default in the early stages of application development. At the same time, logging costs tend to grow as an application scales. When that happens, S3 Tables is worth considering as one of the options for log storage.

DEV Community