ObservabilityGuy

Posted on Nov 24

Introducing the All-New SLS SQL Copilot

#programming #sql

This article is the first time that Alibaba Cloud Simple Log Service (SLS) systematically unveils the product philosophy, architecture design, and core technology accumulation behind SLS SQL Copilot. We will provide an in-depth look at how this intelligent analysis assistant starts from real user needs and integrates cutting-edge AI capabilities with over a decade of SLS log analysis best practices to create a future-oriented and intelligent log analysis experience.

Origin
Ten years ago, SLS was launched as the first-of-its-kind log service on the cloud. Its one-stop services, including Log Collection, distributed storage, and high-performance query and retrieval, were impressive.

Eight years ago, SLS launched its SQL analysis service for the first time. This transformed log data from simple storage into a subject for interactive query analysis, and users began to realize the value of their data.

Five years ago, SLS experimented with self-service data exploration. The Data Explorer service was designed to help users build search statements on their own.

Three years ago, SLS implemented self-service SQL diagnosis. This feature aimed to lower the barrier to using SQL for users and automatically correct syntax errors.

History of SLS data insights:

Throughout its history, SLS has never stopped its quest to explore and mine the value of log data.

However, as the user base and application scenarios continued to expand, a constant stream of inquiries and usage problems related to log data query and analysis arose, leading to an endless flow of tickets.

"How should I configure tokens for Index Fields?"

"Why can't I retrieve this log?"

"I want to analyze..., how should I write the SQL statement?"

"The result of the SQL computation doesn't seem right?"

We became keenly aware that although SLS has powerful query analysis capabilities, many users were deterred from log analysis due to complex syntax rules, diverse data structures, or a lack of guidance on best practices. To address this, we repeatedly launched user experience improvement plans and ticket clearance initiatives. Through continuous user support and refinement based on in-depth usage, we gradually accumulated a wealth of practical experience and usage tips. At that time, we thought about how great it would be to have an intelligent system that could fully leverage our accumulated expert experience and knowledge. This would allow every user to benefit from professional analysis guidance and make this knowledge universally accessible. It would be even better if the system could support self-service Q&A pairs and directly convert requirement descriptions in natural language into professional SQL statements.

Fast forward to 2025, AI is sweeping the globe and reshaping every corner of the world... The capabilities that were once "fantasies" are now becoming a reality.

Philosophy
First, what is a Copilot?

This concept originates from the history of aviation. A pilot is called a "Pilot". As aviation technology advanced, humans gradually implemented assisted piloting systems on aircraft to help pilots better complete their flight missions. Thus, "Co-Pilot" gradually became the official term Copilot.

Now, SLS is like an aircraft carrying massive amounts of user data. We want to build a similar intelligent assisted piloting system within SLS to assist you, the SLS user (our pilot), in efficiently completing query and analysis tasks and maximizing the value released from log data.

In log analysis and O&M scenarios, Alibaba Cloud SLS uses its original | syntax to provide efficient data retrieval and real-time analysis capabilities.

However, users often face difficulties with queries and analysis because they are unfamiliar with the query syntax, the log data structures are complex, or they lack experience with best practices. They face challenges such as not knowing how to write search statements, being unable to retrieve data, obtaining incorrect results, or experiencing poor query performance. SLS SQL Copilot was created to address these issues. As an intelligent analysis tool natively integrated into SLS, it uses an efficient "conversational interaction" method to intelligently transform users' natural language requirement descriptions into search or SQL statements. This significantly lowers the barrier to log analysis and helps users quickly locate problems and gain insights into the value of their log data.

Here are answers to a few questions (which are also of great concern to users):

Q: What is the difference between this and the standard Text2SQL AI products that are common on the market?
A: We are not creating a generic Text2SQL tool. We are building only the SLS SQL Copilot. It is tailored to the unique characteristics of SLS to leverage its native attributes and capabilities as much as possible.
Q: Are you building an application or a capability?
A: We are building a capability, not an application. It does not exist as a standalone application but evolves along with the native capabilities and roadmap of SLS. This deeper attachment means we understand SLS better, can integrate capabilities more efficiently and deeply, and can update attributes more quickly and synchronously.
Therefore,

Our design philosophy is: More native, more accurate, and more efficient --> Creating a new paradigm for log data query and analysis

Our mission: To build a native, convenient, and easy-to-use query and analysis assistant for SLS users.

More native
In what aspects is it native?

The three elements of query and analysis in SLS: time range, search statement, and SQL statement.

Native Time Range Integration
"Time" is a natural attribute of log data. SLS considers the time range an essential element for query and analysis, and this design philosophy is carried over to SLS SQL Copilot.

Commercially available Text2SQL tools or standard AI services usually embed time filter conditions within the SQL statement when generating SQL, which makes the SQL statement verbose.

SELECT ...
FROM ...
WHERE __time__ > to_unixtime(now() - INTERVAL '7' DAY)

However, we chose an approach that better aligns with the characteristics of SLS by separating the time range from the query logic. This keeps the SQL statement itself concise and allows it to focus on the query and analysis logic.

In terms of implementation, we support precise time range inference and output the time range as an independent generated component. This avoids query abnormalities caused by conflicts between time parameters and the SQL statement. Whether you are using the official SLS console, an Open API, or an MCP tool call, you can directly use the generated time range to seamlessly execute SLS queries. This detail fully reflects the native feel of SLS.

Integration of Multiple Query Paradigms
SLS supports a rich set of query paradigms, such as full-text index, field-specific search, and statistical analysis, as well as continuously expanding new capabilities, such as Structured Process Language (SPL) data processing. Each query method has its own applicable scenarios and syntax characteristics.

This is precisely a pain point for ordinary Text2SQL tools. When faced with so many query paradigms, it is difficult to accurately determine which one should be used for a user's requirement.

This is where the advantages of SQL Copilot become apparent. We natively support the three major query paradigms of SLS: pure query, pure SQL, and mixed query + SQL. SQL Copilot can intelligently determine user requirements and automatically select the most suitable query pattern to achieve optimal query performance.

Take a game log scenario as an example. Three different questions from a user correspond to completely different query intents:

Scenario 1: "Query for logs of players older than 25" -- Pure query mode

Scenario 2: "Analyze the percentage of players older than 25" -- Pure SQL analysis mode

Scenario 3: "Analyze the top 10 users by page views (PVs) among players older than 25" -- Mixed query + SQL mode

Integration of Special Analytic Functions
SLS has developed many powerful special functions for log analysis scenarios, such as year-over-year and month-over-month analysis, IP geolocation parsing, and time series completion. These are all high-frequency requirements in log analysis. SQL Copilot natively integrates these special functions and automatically invokes them when it detects relevant requirements. This unleashes the full analysis potential of SLS and makes complex analysis simple.

Take year-over-year analysis as an example. To implement a "comparison of PVs for today vs. yesterday vs. a week ago", standard SQL requires complex nested subqueries and UNIONs, which is not only verbose but also error-prone. In contrast, the year-over-year and month-over-month functions in SLS can complete the analysis in just a few lines, which is concise and elegant.

These are the manifestations of native integration. It deeply integrates the product DNA of SLS and its own design philosophy, allowing the AI to truly understand SLS and SLS to truly understand users.

More Accurate (Intelligent)
What constitutes "accurate" SQL generation? It is not just about correct syntax. It is more about truly understanding the user's intent and accurately matching business requirements. True intelligence is reflected in the deep understanding of and precise response to user requirements. However, in practice, achieving this ideal presents several significant challenges:

Incomplete user context information, lack of session coherence, vague requirement descriptions, and difficulty in understanding business semantics. Every detail can affect the final generation result.

Based on the deep refinement from these online practices, we have gradually improved and built a complete intelligent system:

Intelligent Context Awareness
Index Field Metadata
The system automatically detects the index field configuration information of the current logstore and then intelligently infers relevant active fields based on the user's requirements.

Raw Log Data Sampling
Log data is often unstructured, and its format varies greatly. Relying solely on index field information makes it difficult to handle complex data parsing requirements.

Have you ever encountered this situation? You want to extract specific information from a large text field named content, but you do not know how to express the query requirement.

Therefore, with user Service-Linked Role (SLR) authorization, Copilot automatically samples raw log data to provide the system with first-hand raw data context. This is particularly important in scenarios where information is extracted from large content fields, whether it involves regular expression extraction, JSON parsing, or string splitting.

{
  "__time__":1756287229,
  "_image_name_":"ali-registry.net/test:v1",
  "_container_name_":"test",
  "_time_":"2025-08-27T09:33:48.499251778Z",
  "__tag__:__hostname__":"2bdbfc7b6894",
  "__tag__:__client_ip__":"47.98.125.123",
  "__tag__:__receive_time__":"1756287231",
  "__tag__:__pack_id__":"9505840C6C3CD469-18718A",
  "content":"192.168.0.1 - - [27/Aug/2025:17:33:48 +0800] \"GET /hello/ HTTP/1.0\" 200 3876 \"-\" \"Chrome/0.1.0\" \"120.232.180.90, 120.232.180.90\""
}

Preset Query Environment Awareness
In practice, we have observed an interesting phenomenon: Many users are accustomed to first pasting a large block of log data when asking a question and only state their actual requirement at the end.

User question: body_bytes_sent:4017 host:www.yxw.mock.com http_referer:www.pj.mock.com http_user_agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.724.100 Safari/534.30 http_x_forwarded_for:210.45.235.97 remote_addr:180.164.95.121 remote_user:hzc9 request_length:3940 request_method:GET request_time:71 request_uri:/request/path-0/file-0 status:200 time_local:27/Aug/2025:10:06:33 upstream_response_time:0.06 ...(The original log may be very long) [The question starts here] I want to extract the first IP address as the access IP field, or I want to extract the visit path, and so on...
body_bytes_sent:4017
host:www.yxw.mock.com
http_referer:www.pj.mock.com
http_user_agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.724.100 Safari/534.30
http_x_forwarded_for:210.45.235.97
remote_addr:180.164.95.121
remote_user:hzc9
request_length:3940
request_method:GET
request_time:71
request_uri:/request/path-0/file-0
status:200
time_local:27/Aug/2025:10:06:33
upstream_response_time:0.06
...( The original log may be quite long)
[Only starting to ask questions here]
I want to extract the first IP as the access IP field, or I want to extract the access path, etc...

This approach is understandable because the user wants to provide complete contextual information. However, SQL Copilot offers a more elegant method: preset query environment awareness.

The environment awareness feature of SQL Copilot automatically records the context of your most recent query. You only need to say "Based on the preset query...", and the system will know which data environment you are referring to. This not only makes questions more concise but also makes data sampling more precise and efficient.

External Knowledge Base Integration
Another interesting design is the integration of enterprise knowledge bases. Every enterprise or organization has its own "jargon", such as business terms, product codenames, acronyms, and expressions commonly used by teams. This information is often critical for analysis, but the Artificial Intelligence (AI) may not understand it at all.

We support the integration of user-created knowledge bases. This allows SQL Copilot to learn the user's "corporate language", enabling it to truly learn to speak "your language" and understand "your business".

Requirement Guidance and Understanding
"Help me check yesterday's data." Does this question sound familiar to you?

Based on extensive experience in user support, we have found that users' requirement descriptions often have various issues: They are too vague, unclear, or ambiguous, or users simply do not know how to ask. This is actually quite normal. After all, accurately expressing the analytical ideas in one's mind is not an easy task. However, for the AI, a clear requirement description directly determines the quality of the generated output.

To address this, we drew inspiration from the triage system in hospitals and designed an intelligent requirement guidance and routing system:

• When a requirement is vague → It intelligently recommends specific questions that fit your business scenario, allowing you to ask questions efficiently with a single click.

• When a requirement is unclear → It intelligently provides reminders and helps you confirm key information (such as specific fields) to avoid misunderstandings.

• When a requirement is clear → It helps connect the context, organize the language into a structured expression, and refine the key points of the requirement.

• For daily Q&A → It acts as a thoughtful chat companion and can even engage in some humorous technical banter.

Requirement guidance mode

Q&A chat mode

Reflection and inference mode

Requirement summary mode

It acts like a patient technical expert, engaging in in-depth one-on-one communication with you to help you gradually clarify your requirements and ideas. It guides you from a vague request such as "Help me check the data" to a specific one such as "Analyze the distribution trend of the error status code over the last 7 days", and from fragmented expressions to a structured refinement of your requirements. This progressive and guided interaction not only improves the user experience but more importantly, it allows Copilot to truly understand and confirm your requirements.

User Semantic Model
Intelligent Perception of Log Data Structure and Features

Faced with complex and diverse log formats, how can the AI truly "understand" the data? SQL Copilot has an intelligent data understanding capability. Whether the data is in numeric, JSON, text, or custom formats, it can automatically parse the data features and distribution, providing support for generating precise queries later.

Semantic Extraction
Understanding the data structure is just the first step. The deeper challenge lies in understanding the user's business semantics. The following situations are common in real-world scenarios:

Difficulty in inferring semantic columns

User question: Group by 'Service Name' and count the frequency.
Field ambiguity: The logstore contains both the service_name and event.serviceName index fields. Which one should be chosen?
Ambiguous business rules

User question: Count the distribution of requests with 'high latency'.
Rule ambiguity: What is considered high latency? Is it 1 second? 3 seconds? Or a user-defined threshold?
Unknown analysis preferences

User question: Analyze the trend of request PV.
Implicit preference: Is the time granularity for the trend by minute, hour, or day?
These implicit business semantics often determine the accuracy of the analysis results. The more deeply the user's business semantics are understood, the more the generated query will align with the actual requirements.

The question is, how can we understand these business semantics of the user?
We came up with an idea: The reports, alerts, and query history that users have created in the past are actually the best 'semantic textbook'. Because these are all meant for people to read, users will naturally use field names, metric definitions, and analysis logic that have business significance.
Therefore, we thought of intelligently extracting semantic information from the user's historical queries, reports, and alert rules.

Take the reports created by users as an example. Behind every chart is an SQL statement. A seemingly ordinary SQL statement actually contains a wealth of business information, such as field aliases, filter conditions, and aggregation logic. Every detail 'tells' us how users think about and express their business requirements.

Specifically, we learn the user's business semantics from multiple dimensions:

Naming and expression habits

Identify business terms and naming conventions in field aliases by using AS aliases.
Business rule determination

Understand business rules and threshold standards through WHERE conditional expressions.
Data processing preferences

Identify data parsing methods through expressions such as JSON extraction, regular expression matching, and string splitting.
Analysis logic preferences

Understand common choices for grouping dimensions and aggregation calculations by using GROUP BY clauses and aggregate functions.
Identify and infer business meta information by using WITH Common Table Expressions (CTEs) and subqueries.
Through semantic extraction, we can ultimately obtain a user semantic model similar to the following. This model will provide significant reference value in Copilot's context engineering.

User Session Profile
In addition to learning from historical SQL, we have another source for learning: the user's real-time conversations.

This is another interesting design: We learn the user's 'chat profile' from each conversation. Every question and every round of interaction reveals the user's preferences and habits. The system continuously summarizes the user's session features to form a dynamic user chat profile. What analysis angles do you prefer? How do you typically express your requirements? Which data perspectives do you favor? These real-time behavior insights and personalized understandings allow Copilot to understand you better with each conversation.

To maintain a clear context and ensure focused analysis, it is recommended to start a new session when switching to a different topic.

Quality Validation and Correction
It is inevitable that statements generated by a Large Language Model (LLM) may contain errors. To ensure the quality of the generated output, we designed a quality validation and assurance mechanism. We added DryRun execution, validation, and correction steps to the flow to validate the syntax and execution logic and to perform a final verification against the user's requirements. If any issues are found, they are corrected immediately to guarantee the final quality of the generated output.

Higher Efficiency
Seamlessly Convert Natural Language to Query Execution
Ad hoc query → Requirement description → Statement generation → In-place execution. The entire workflow, from query to execution, is seamlessly integrated.

Users do not need to master SQL syntax. They only need to describe their requirements in everyday language, such as 'count the number of requests with an API latency greater than 1 second in the last 2 hours, grouped by service name'. Copilot automatically generates a complete SLS search statement. The time range, retrieval conditions, and aggregation logic are all handled in one step. With native integration, you can run the statement as soon as it is generated.

This is the efficiency advantage of native integration.

Automatic Diagnosis and Correction of Query Errors
SQL Copilot also features automatic error diagnosis and correction.

Intelligent diagnosis → Issue localization → Precise repair. By converting erroneous statements into executable queries in seconds, this process minimizes debugging time and allows users to concentrate on generating insights rather than correcting syntax.

Core Feature Highlights
Now, after extensive refinement, the all-new SLS SQL Copilot brings you:

Smarter Generation

Supports multiple paradigms such as pure query, pure SQL, and query + SQL, and intelligently generates optimal statements.
Natively integrates time filtering and attribute analysis functions to fully unleash the potential of SLS.
Deeply understands business semantics and continuously learns from historical queries and sessions. The more you use it, the better it understands you.

Easier Debugging

Dry run pre-execution to detect and fix potential problems in advance.
Provides clear explanations to help you understand the logic of each step.
Intelligent diagnosis for incorrect SQL automatically detects problems and provides precise corrections.

More Interesting Interaction

Requirement guidance: Not sure how to perform an analysis? Let it recommend suitable questions for you.
On-demand Q&A: Generate examples, explain syntax, and provide analysis ideas.
Intelligent conversation: Chat with you in SQL (Yes, it even has a bit of technical humor).

More Convenient to Use

Natively integrated into the SLS console and ready to use out of the box.
Supports APIs and the Meta Control Protocol (MCP) for easy integration into your pipelines.
Outlook
At this point, our story has just begun.

Intelligent Data Exploration
The content and format of log data are complex and varied. Important information is often hidden in complex unstructured text. This sometimes requires an exploratory process of "trying solution A, and if that does not work, trying solution B". We look forward to combining this with the autonomous decision-making capabilities of an Agent to allow the AI to investigate data features on its own and find truly valuable information—thinking and acting just like an experienced data analyst.

In-depth Understanding of User Semantics
Currently, we only learn semantics from a user's historical queries, but the user's actual needs are far more diverse. As the construction of the semantic model deepens, we hope to understand more of the user's business context, making each interaction closer to the user's true intent. Your business scenarios, analysis habits, and even unstated needs should all be understood and remembered. This allows every conversation with Copilot to better understand you.

Scenario-based Intelligent Insights
What is most exciting is that—without the user having to ask questions proactively, the system can, based on business scenarios and log features, autonomously discover problems and potential needs and provide data insights and analysis. It will evolve from a passive "Q&A assistant" to a proactive "analysis partner", truly unlocking the value of log data.

DEV Community

Introducing the All-New SLS SQL Copilot

Top comments (0)