Introduction
Simple Log Service (SLS) is a cloud-native observability and analysis platform. It provides large-scale, low-cost, and real-time services for managing data such as logs, metrics, and traces. Its capabilities cover the entire data lifecycle, from collection, processing, and query analysis to visualization, alerting, and delivery.
However, traditional solutions often involve high costs and risks when facing challenges such as:
● Accidentally writing sensitive user data in plaintext to terabytes of logs during a release.
● Test data contaminating production data and skewing analysis.
● Urgently purging unexpected data after an online failure.
The new soft delete feature in SLS addresses these critical challenges in emergency data deletion and governance. Operating with performance comparable to an index query, it offers a powerful new way to manage your data.
What Is Soft Delete
Why did SLS not previously support hard deletes? As a real-time log platform for large-scale data, SLS was designed for maximum write and query performance. A hard delete must locate and remove data from original files, associated indexes, and column store files. In a distributed system, the resource usage, state synchronization, and real-time requirements for this process are prohibitively high.
SLS implements soft delete using a "mark and filter" mechanism. The physical data is retained but hidden from users. This design meets the urgent need for data deletion while ensuring system stability.
How Soft Delete Works
Traditional hard deletes require scanning and rewriting the underlying storage. With petabytes of data, this process can cause significant I/O overhead and system jitter.
SLS soft delete is based on a "mark and filter" mechanism:
Delete operation: This is a two-step process with performance close to an index query.
Use a query to quickly filter the log rows for deletion.
Mark the filtered log rows to indicate they are deleted.Query filtering: Marked data is automatically hidden from queries. The change takes effect immediately and applies to both queries and SQL.
Think of it like tidying your house. You mark and set aside unwanted items (soft delete). Later, a garbage collector removes them all at once (physical delete after the TTL expires). This approach keeps your space tidy without the immediate cost of disposal.
Practical Use Cases
Use case 1: An emergency response at 3 a.m.
An O&M engineer at an e-commerce company faced an emergency before the Double 11 shopping festival. For two hours, a newly launched order system wrote user phone numbers in plaintext to the logs.
● Traditional solution
Downtime for fixes impacts the business.
Use SPL to filter logs in real time and rewrite them to a new logstore.
Processing terabytes of data takes six hours, causing a business interruption.
● Soft delete solution
from_time = (int)(time.time()) - 2 * 3600
to_time = (int)(time.time())
toDeleteQuery = phoneNumber:*
request = DeleteLogsRequest(project, logstore, from_time, to_time, query=toDeleteQuery)
res: DeleteLogsResponse = client.delete_logs(request)
Delete the sensitive logs by specifying a time range and deletion criteria. The process completes in seconds.
Query results immediately hide the sensitive data.
The business continues to operate without interruption.
The data is physically purged automatically when the logstore TTL expires.
Use case 2: Test data contaminates the production environment
An analyst at a finance company discovered anomalies in a risk control model. Tracing the issue revealed that data from the testing environment flowed into production logs, contaminating the model training data.
● Traditional solution 1
An extract, transform, and load (ETL) data cleansing solution is time-consuming and costly.
● Traditional solution 2
Stop the analysis jobs and define new filter conditions.
Modify all query statements to add the new filter conditions.
If indexing are not enabled for a field, the index must be rebuilt.
The entire process takes two to three days, impacting business decisions.
● Soft delete solution
from_time = (int)(time.time()) - 2 * 24 * 3600
to_time = (int)(time.time())
toDeleteQuery = dataSource:testEnv
request = DeleteLogsRequest(project, logstore, from_time, to_time, query=toDeleteQuery)
res: DeleteLogsResponse = client.delete_logs(request)
Precisely mark the test data for deletion. The process completes in seconds.
Analysis jobs can resume running without any modifications.
This provides a true "data rescue" capability.
Use case 3: Precisely removing abnormal logs from a failure
A Software as a Service (SaaS) provider offers an online collaboration platform as its core business. Last week, an upgrade to a buggy version caused the backend actiontrail module to generate many abnormal logs. These logs had a specific error code and a key log mark (event_type: "file_upload_error"). The logs polluted alert monitoring and interfered with subsequent data analytics. The abnormal logs needed to be cleaned up urgently.
● Traditional solution 1
The extract, transform, and load (ETL) data cleansing solution is time-consuming and costly.
● Traditional solution 2
Each analysis or view of a monitoring chart requires adding extra filter conditions for dirty data. This operation is tedious. It can also trigger SLS resource limits.
● The soft delete solution
from_time = (int)(time.time()) - 7 * 24 * 3600
to_time = (int)(time.time())
toDeleteQuery = '''version>=2.1 and version < 2.3 and __tag__:__path__: "/user/actiontrail.LOG" and (error_code:500 or error_code:502) and event_type:file_upload_error'''
request = DeleteLogsRequest(project, logstore, from_time, to_time, query=toDeleteQuery)
res: DeleteLogsResponse = client.delete_logs(request)
Accurately detect abnormal logs by matching numeric ranges and multiple text values.
Delete abnormal logs with precision in seconds. The backend automatically merges and efficiently caches deletion information. This has almost no impact on query analysis performance.
The backend automatically refreshes data reports to show the corrected results.
If you find more data to delete, simply trigger the delete operation again.
Conclusion
The need to instantly and safely delete data is no longer a niche request but a critical requirement, driven by compliance mandates, incident response, and daily O&M. Traditionally, this forced a difficult trade-off between data cleansing speed and system stability.
SLS soft delete eliminates that compromise. It delivers both speed and safety. Having already proven its stability and effectiveness in resolving numerous real-world emergencies, this feature is now ready for wider adoption.
Soft delete is now available in the Singapore and China (Ulanqab) regions, with a phased rollout to other regions underway. For a practical guide, see How to use soft delete.

Top comments (0)