ObservabilityGuy

Posted on Sep 18

Break Through Data Silos: Practices of Multi-cloud Observability Integration Based on Object Storage Service (OSS)

#dataengineering #cloud #architecture #monitoring

Data Integration Dilemma in the Multi-cloud Environment
To support global strategies, modern enterprises often adopt multiple cloud service providers to mitigate potential business risks from relying on a single service provider while enhancing their bargaining power and flexibility. As a result, their business logs are often stored in the file systems of multiple cloud service providers. To achieve unified observability of business data, enterprises need to integrate data from a multi-cloud environment. This integration process presents several challenges:

● Failure to discover new files in a timely manner

Cloud file system providers typically only provide Bucket traversal interfaces, but are unable to traverse newly added files in chronological order. Therefore, when the number of files in a single Bucket exceeds hundreds of millions, efficiently identifying new files without significantly prolonging the subsequent process becomes the primary challenge faced by developers engaged in multi-cloud file data integration.

● Need for elastic scaling

The business volume of an enterprise often fluctuates periodically, and accordingly, the log volume changes. The log file size may vary significantly in the peak and off-peak hours. If the elastic scaling of resources fails, the following problems will inevitably occur:

During off-peak hours, it causes resource waste.
During peak hours, it increases the latency of log visibility.
● Complex data parsing:

In a multi-cloud environment, when different services or the same service are deployed across clouds, log fields and storage formats often vary, requiring unified conversion and adaptation.

Simple Log Service (SLS) is a cloud-native observability and analysis platform that provides large-scale, low-cost, and real-time one-stop services for various data types such as Log, Metric, and Trace. The object import feature of SLS uses innovative file discovery methods and an optimized overall architecture to implement high-performance and user-friendly multi-cloud file data import capabilities, effectively simplifying the cross-cloud data integration process.

Why SLS Object Import?
Overview of Object Import Architecture

To address the challenges of multi-cloud data import, the Alibaba Cloud Simple Log Service (SLS) team developed a new two-phase parallel processing architecture after in-depth research and careful design. This architecture separates file discovery from data pulling to achieve the following benefits:

● File discovery: Diverse intelligent file discovery strategies are used to efficiently identify and capture file changes while ensuring data integrity.

● Data pulling: It is independent of the file discovery process, focusing on high-speed data transmission.

● Parallel execution: The two phases are executed at the same time without interfering with each other to maximize performance.

This design not only overcomes the issue that file discovery may become a performance bottleneck in traditional methods but also significantly improves the overall data import efficiency, providing users with a faster and more reliable multi-cloud data integration solution.

How to quickly discover newly added files among hundreds of millions of files?

To realize the efficient and timely detection of newly added files, we have developed and implemented a diversified intelligent file detection strategy. This integrated solution comprises the following components:

● Periodically traverse the entire bucket, serving as a reliable basic guarantee mechanism to ensure 100% file coverage and eliminate omissions.

● Incrementally traverse bucket files, ensuring that new files can be discovered within one minute in scenarios where newly added files are in lexicographical order.

● Use the OSS metadata indexing feature for assistance, ensuring that new files can be discovered within one minute in OSS file import scenarios.

● Use SQS capabilities for assistance, ensuring that new files can be discovered within one minute in S3 file import scenarios.

These file discovery methods together form a powerful ecosystem that can quickly and accurately identify new files in various cloud storage scenarios. Once a new file is detected, the system then initiates a parsing process to further process the data.

What file formats are supported for parsing/decompression?

● Supported file formats for parsing:

Single-line text logs, multi-line text logs, single-line JSON, CSV, ORC, and Parquet,

● Supported file formats for decompression:

zip, gzip, zstd, lz4, snappy, bzip2, and defiate.

What if there are traffic spikes? Will it cause latency?

In the face of surges in business volumes, you do not need to worry about the latency of data import. The object import feature of Alibaba Cloud Simple Log Service (SLS) uses the elastic scaling technology to ensure efficient performance under various load conditions.

After discovering new files, the system automatically sets the parallelism based on the number of files. This allows you to quickly scale up the storage or computing resources used by a single task. It is especially suitable for scenarios where there is a large number of files but each file is small.

Which cloud service providers are supported?

Currently, you can import file data from Alibaba Cloud Object Storage Service (OSS) and S3. More cloud service providers will be covered in the future.

Based on the SLS object import feature, users can import files in multiple common formats from the multi-cloud environment into SLS without development and adaptation. When the number of bucket files is less than one million, or newly added file names are in lexicographic order, users can see new file data in the SLS Logstore within three minutes without any operation (excluding cases of poor public network quality). Next, let's take a look at the specific practical exercises.

SLS Object Import Practice
Object Storage Service (OSS) Data Integration
The following example shows how to import a file created in April 2025 in an OSS bucket to the SLS Logstore.

Prepare the data
The data file to be imported is stored in any OSS bucket.

Create an OSS data import task
Grant permissions to the account that you use to import OSS data. For more information, see Import OSS data.

Then, go to the project to which you want to import data, create an OSS data import task on the Task Management page, and click as prompted:

On the task creation page, select oss-ingestion-test for the Logstore parameter, and then click Next.

Next, fill in or select specific settings in sequence as prompted in the following figure. Here we set the display name to oss-import-test, which will be displayed on the task overview interface. Then, select the region where the source data bucket is located. Here, we select Hangzhou. Select the source bucket from the drop-down list. Since we only need to import data from April 2025, we can enter the specific file path, ingestion-test/2025/04, in the file path prefix. There is no need to set an additional file modification time range, as all files can be imported directly. Data formats need to be set based on the specific file data. For example, the data in the example is single-line JSON, so we select Snappy for the compression format. As we only need to import historical files once, we select Never Check for the check interval here. No archive files are involved, so this button is not enabled. In this step, we have set up an OSS data import task. Click Preview to see the sample data import results. If the task meets the requirements, click Next to create the task. Otherwise, fine-tune the settings.

In the preceding scenario, when a log is imported to SLS, the timestamp of the log is recorded as the time when the data enters SLS, which sometimes is not in line with user expectations. This requires some configuration for the log time. As shown in the following figure, a field named time is used to record the timestamp. Example: 1743604620. Then, set the time field format to epoch. If you do not know how to set the format, see Time format. Then, click Preview to view the data import sample, as shown in the following figure. Click Next to create an OSS data import task.

View task status and imported data
On the Task Management page, retrieve the newly created task by following the steps in the figure and click the task name to go to the Task Overview page to view the execution details of the task.

On the Task Overview page, you can view the detailed configurations and running status of the task.

Click the target Logstore from the Task Overview page or click the Logstore in the left-side navigation pane to go to the Logstore and view the imported data. The following figure shows the final import result.

S3 Data Integration
Prepare the data
The data file to be imported is already stored in any S3 bucket.

Create an S3 data import task
Grant permissions to the AccessKey pair that you want to use. For more information, see Import an S3 file.

Then, go to the Task Management page and follow the instructions below to create an Amazon S3 data import task.

On the task creation page, select s3-ingestion-test for the Logstore parameter, and then click Next.

Next, fill in or select specific settings in sequence as prompted in the following figure. Here we set the display name to s3-import-test, which will be displayed on the task overview interface. Then, select the region where the source data bucket is located. Here, we select ap-northeast-1. Then, enter the source bucket. Next, enter the AWS AccessKey ID and AWS Secret AccessKey with permissions in turn for pulling file data. You can choose to enable SQS or not. The difference is that when SQS is used, the new object metadata will be obtained from SQS. Otherwise, the S3 bucket will be traversed to obtain the metadata. Generally, if the number of files in the bucket exceeds one million, it is recommended to enable SQS; otherwise, the data latency will gradually increase. In the example, we enable SQS. Next, you can also specify the file path prefix from which you want to obtain data, and set the file path regular expression for filtering files. You can then also specify the file time range to be imported by setting the file modification time filter. Set the data format/compression format/encoding format to guide how to parse the data.

After the preceding configurations are completed, click Preview to check whether the data meets expectations. If not, modify the configurations. Otherwise, click Next to create an S3 file data import task.

On the Task Overview page, you can view the detailed configurations and running status of the task.

Case Analysis of Multi-cloud Data Integration
Cross-cloud Bill Audit
As cloud service users, in addition to the business itself, enterprises are most concerned about costs. However, such data is scattered across multiple cloud service providers, making it difficult for users to count. Now, users can export billing data to the file system of the cloud service provider, and then use the SLS object import feature to import the billing data to SLS for further analysis.

First, it is necessary to deliver billing logs from Alibaba Cloud to OSS, and then use the object import feature to import these logs from OSS to SLS. For more information about how to deliver Alibaba Cloud billing logs to OSS, see Billing Subscription. For more information about how to deliver AWS billing logs to S3, see Creating a Cost Explorer Report. In this section, we will briefly analyze the billing information of users on Alibaba Cloud and AWS as an example.

After billing logs are imported to SLS, you must standardize the fields of different vendors by using data transformation and add the source vendor to each log.

Sample of AWS transforming statement:

* | project-rename product = line_item_product_code
| extend originProduct='aws'
| extend cost = pricing_public_on_demand_cost
| project product, cost, originProduct

Sample of Alibaba Cloud data transforming statement:

* | project-rename product=ProductCode
| extend cost=PretaxGrossAmount
| extend originProduct='aliyun'
| project product, cost, originProduct

Sample transformation result:

View the daily costs of all services from different cloud service providers.

SQL Sample statement:

* |  SELECT SUM(cost) AS cost, date_trunc('day', dt) AS dt, originProduct FROM (SELECT 
    SUM(CASE WHEN originProduct = 'aws' THEN cost * 7.19 ELSE cost END) AS cost, 
    date_parse(date, '%Y-%m-%d %H:%i:%S') AS dt, 
    originProduct 
FROM log
WHERE 
    cost > 0 
GROUP BY 
    dt, 
    originProduct )
GROUP BY dt, originProduct

Sample query result:

Based on the billing information, we can see the business growth, detect unreasonable resource usage promptly, determine whether resources need to be reduced, and lower the unexpected cost consumption.

View the trend of cost increase for a single product

* | SELECT SUM(cost) AS cost, date_trunc('day', dt) AS dt, product FROM (SELECT 
    SUM(CASE WHEN product = 'aws' THEN cost * 7.19 ELSE cost END) AS cost, 
    date_parse(date, '%Y-%m-%d %H:%i:%S') AS dt, 
    product  FROM log
WHERE 
    cost > 0
GROUP BY 
    dt, 
    product ) 
GROUP BY dt, product LIMIT 10000000

Sample query result:

Cross-cloud ActionTrail
In addition to billing data, operations on resources are also sensitive. We can also use the SLS object import feature to aggregate operation logs from different cloud service providers for further analysis. The following example shows how to analyze user operations on resources in Alibaba Cloud and AWS.

First, you need to deliver Alibaba Cloud ActionTrail logs to OSS, and then import these logs from OSS to SLS by using the object import feature. For more information about how to deliver operations logs to OSS, see Continuously deliver events to a specified service provided by Alibaba Cloud ActionTrail. For more information about how to import OSS file data, see the preceding content. Then, deliver the AWS ActionTrail logs to S3, and import these logs from S3 to SLS via the object import feature. For more information about how to deliver ActionTrail logs to S3, see Create a trail for your AWS account provided by AWS CloudTrail. Next, import logs from OSS and S3 to SLS according to the above steps. Finally, we can easily analyze the operations on the resources of the two cloud service providers in SLS.

To simplify the data, first unify the ActionTrail logs of the two cloud service providers. For AWS logs, since CloudTrail places multiple logs in the same field, the logs are split here and marked as from AWS.

* | expand-values -path='$.Records' content as item 
| parse-json item 
| project-away item 
| extend originProduct='aws'

Transforming examples are as follows:

Similarly, ActionTrail logs from Alibaba Cloud are processed in a similar manner. The transformation statement is as follows:

* | parse-json event 
| project-away event
| extend originProduct='aliyun'

Next, query and analyze these operation logs.

View the hourly usage frequency of all services. The query statement is as follows:

* | SELECT COUNT(*) AS num, date_trunc('hour', __time__) AS dt, product GROUP BY dt, product LIMIT 100000000

Here we can see that the number of operations on ECS is much larger than that on other services.

Going a step further, monitor sensitive operations on cloud resources and configure alerts to prompt us in a timely manner.

Set query statements to monitor resource deletion events.

*| SELECT originProduct, eventName, eventId, userAgent WHERE eventName LIKE '%Delete%'

Configure alerts for the event.

Through such operations, whether resources are in Alibaba Cloud or AWS, we will receive an alert prompt once they are deleted, enabling timely perception and handling.

Better Usage of SLS Object Import
High Internet traffic fees?

Use zstd compression
The SLS object import feature is free of charge. However, traffic fees are incurred when we use the Internet. In this case, it is recommended to use the zstd compression algorithm to compress the file before importing it. Generally, the zstd compression rate is 2x to 5x. We take the intermediate value, 3.5x, and the OSS Internet traffic fee being 0.5 yuan/GB as an example. For a total of 100 GB of original files, the original cost via the Internet is 50 yuan, while after compression, it only costs 14.29 yuan.
Create new files in lexicographic order
When the number of files in the bucket exceeds one million, we can create new files based on the lexicographic order to ensure that the files are discovered within two minutes. This does not affect the real-time import and does not need to enable third-party services.
Poor performance?

Replace large files with small files
SLS allocates a maximum of one concurrency per file to pull data. For files with the same data volume, the smaller the size of a single file, the greater the number of concurrent threads that can be allocated, thus increasing the overall file import rate.
Store logs in different directories based on the business.
To further improve the real-time performance of file import, we can create multiple import tasks based on the file directory prefix to concurrently pull data.
Other issues

Add files instead of appending
The SLS object import feature detects whether files are added or modified, and then imports these files. Continuously appending to the same file will bring data duplication.

Summary
The SLS object import feature is designed to provide the capability to uniformly import multi-cloud data to SLS. It accelerates data import with a variety of new file recognition technologies to ensure that users can see new file data within three minutes. Although the current support scope is limited to OSS and S3, we are constantly upgrading the feature and will support data import in more complex scenarios in the future. Please stay tuned.

DEV Community

Break Through Data Silos: Practices of Multi-cloud Observability Integration Based on Object Storage Service (OSS)

Top comments (0)