ObservabilityGuy

Posted on Mar 17 • Edited on Apr 21

Breaking Through the Key Bottlenecks in Observability: Ultimate Integration of Entities and Relationships

#ai #integration

This article introduces EntityStore's graph querying in UModel, shifting observability from isolated entities to relationship-aware topology analysis.
1.Bridging Observable Data with Relationship Graphs
1.1 From Isolated Entities to Connected Networks
In today's cloud-native worldview, we are accustomed to treating each component, such as service, container, middleware, and infrastructure, within a system as a separate entity for monitoring and management. We configure dashboards, set alerts, and track performance metrics for these individual components. However, this "individual-centric" perspective has a fundamental blind spot: it ignores the most essential characteristic of any system — connections. No entity exists in isolation. Instead, they form a vast, complex, and dynamically evolving graph of relationships through interactions such as calls, dependencies, and inclusion.

Traditional monitoring and querying tools, whether based on SQL or Search Processing Language (SPL), are fundamentally designed to process two-dimensional, tabular data. While they excel at answering questions about individuals ("What is the CPU usage of this pod?"), they struggle when faced with relational inquiries, such as "Which downstream services will be affected by the failure of this service?" and "Which intermediate services must be traversed to reach the core database?". Answering such questions typically requires complex JOIN operations, multi-step queries, or even manual reconstruction using offline architecture diagrams. This approach is not only inefficient but often impractical in systems with deep dependency chains and intricate topologies. We may possess comprehensive data about every individual "point," yet lack a clear map of the critical "lines" connecting them.

1.2 Our Approach: Integrating Graph Querying
To address this challenge, our solution centers on treating the graph as a core component of the observability data model. We believe that the true nature of a system is inherently graph-like. Therefore, its querying and analysis should also be conducted in a way that best reflects this essence—through graph queries.

To realize this vision, we have built EntityStore at the core of the UModel architecture. It features an innovative dual-storage design and maintains two dedicated logstores: entity and topo. The former stores detailed properties of individual entities, and the latter stores the topological relationships among entities. Together, they form a real-time, queryable digital twin graph of the entire observability system.

Based on this foundation, we provide three progressively powerful graph querying capabilities, designed to meet diverse user needs, from beginners to experts:

● graph-match: designed for common path-finding scenarios, with intuitive syntax that allows users to express queries like sentences (example: A calls C through B) to quickly identify specific call chains.

● graph-call: encapsulates the most frequently used graph algorithms (such as neighbor discovery and direct relationship query) into functional APIs. Users can focus on intent (example: "Find all neighbors of A within 3 hops) without needing to understand underlying implementation details.

● Cypher: incorporates the industry-standard graph query language and delivers the most comprehensive and powerful graph querying capabilities. It supports arbitrarily complex pattern matching, multi-hop traversals, and aggregation analysis, which makes it the ultimate tool for resolving complex graph problems.

This integrated solution aims to deliver powerful graph analytics capabilities in a low-barrier, engineering-friendly manner to every O&M and development engineer.

1.3 Core Value: Unlocking New Dimensions of System Insights
The introduction of graph querying capabilities not only adds a new query syntax, but also unlocks an entirely new dimension for understanding and analyzing systems.

● Global fault impact analysis (analysis of the impact scope of problems): When a fault occurs, a single graph query can rapidly trace all potential downstream propagation paths and identify affected business components. This enables real-time decision-making and helps prioritize incident response and mitigation efforts.

● End-to-end root cause tracing: In contrast to impact analysis, when a backend service exhibits exceptions, graph traversal can move upstream to locate the originating business request or recent change, which enables precise root cause identification.

● Architecture health and compliance auditing: Graph queries allow validation of runtime architectures against intended designs. For example, you can query "unauthorized cross-domain service calls" or "whether a core data service is relied on by unauthorized applications", which enables continuous architectural governance.

● Security and permission path analysis: In security audits, you can trace the complete access path from a user to a resource, verifying that each layer of authorization complies with security policies and mitigating risks of data leakage.

In summary, graph querying elevates our perception of systems from a mere collection of points to a structured network of interconnected components. It empowers us to ask and answer questions grounded in the actual relationships within the system, unlocking unprecedented depth of insight. It unlocks efficient fault diagnostics, architectural governance, and security assurance in increasingly complex environments.

2.Concepts Related to Graph Queries
2.1Key Concepts

Collaboration relationships:

UModel (knowledge graph)
├── EntitySet: apm.service (type definition)
│   ├── Entity: user-service (instance 1)
│   ├── Entity: payment-service (instance 2)
│   └── Entity: order-service (instance 3)
├── EntitySet: k8s.pod (type definition)
│   ├── Entity: web-pod-123 (instance 1)
│   └── Entity: api-pod-456 ((instance 2)
└── EntitySetLink: service_runs_on_pod (relationship definition)
    ├── Relation: user-service -> web-pod-123
    └── Relation: payment-service -> api-pod-456

EntityStore uses Simple Log Service (SLS) Logstore resources to implement features such as data writing and consumption. When an EntityStore is created, the following Logstore assets are automatically created:

● ${workspace}__entity: for writing entity data.

● ${workspace}__topo: for writing relationship data.

The graph query methods described in this article focus specifically on querying the relationship data written to ${workspace}__topo. They support capabilities such as multi-hop path analysis, entity adjacency analysis, and custom topology pattern recognition.

Note: The graph query methods introduced in this article are based on the low-level querying layer of the Cloud Monitor 2.0 high-level PaaS API. They are intended for advanced users who require highly customized and flexible query patterns. If you need only simple association lookup, information query, and other capabilities, we recommend that you use the high-level PaaS API, which is more user-friendly.

2.2 Overview

3.Concepts Related to Graph Queries
Before you dive into the use of graph queries, it is essential to understand the foundational concepts. The core idea behind graph querying is to model data as a graph structure: Entities are represented as nodes, and relationships are represented as edges. Each node has a label and properties. The label identifies the type of the node, and properties store detailed information about the node. Similarly, each edge has a type and properties. The type indicates the category of the relationship, and properties store additional information about the relationship.

3.1 Syntax for Describing Nodes and Edges
In a graph query, a specific syntax is used to describe nodes and edges:

● Node: represented by using parentheses (()).
● Edge: represented by using square brackets ([]).
● Format: : {Property key-value pair}

Basic syntax examples:

// Any node
()

// A node with a specific label
(:"apm@apm.service")           // The graph-match syntax
(:`apm@apm.service`)           // The Cypher syntax

// A node with a label and properties
(:"apm@apm.service" { __entity_type__: 'apm.service' })

// A named variable node
(s:"apm@apm.service" { __entity_id__: '123456' })

// Any edge
[]

// A named edge
[edge]

// The edge with a specific type
[e:calls { __type__: "calls" }]

Syntax differences:

● graph-match: In the SPL context, special characters must be enclosed in double quotes (").

● Cypher: As a standalone syntax, labels are wrapped in backticks (`).

// graph-match syntax
.topo | graph-match (s:"apm@apm.service" {__entity_id__: '123'})-[e]-(d)
        project s, e, d

// Cypher syntax（backtick string format: ``apm@apm.service``. Two backticks are used to escape the label.）
.topo | graph-call cypher(`
    MATCH (s:``apm@apm.service`` {__entity_id__: '35af918180394ff853be6c9b458704ea'})-[e]-(d)
    RETURN s, e, d
`)

3.2 Path Syntax and Direction
In graph queries, ASCII characters are used to represent the direction of relationships:

3.3 Return Value Structure
In the EntityStore model, node labels follow the domain@entity_type format. For example, apm@apm.service represents a node in the apm domain with the entity type apm.service. This labeling convention not only clearly indicates the domain and type of the node but also enables fast filtering and querying by domain. Node properties include built-in system properties (such as entity_id, domain, and entity_type) and custom properties (such as servicename and instanceid). The type of edge can also be represented by a string, such as calls, runs_on, and contains. Each edge also has corresponding property information.

3.3.1 Node in JSON Format

{
  "id": "apm@apm.service:347150ad7eaee43d2bd25d113f567569",
  "label": "apm@apm.service", 
  "properties": {
    "__domain__": "apm",
    "__entity_type__": "apm.service",
    "__entity_id__": "347150ad7eaee43d2bd25d113f567569",
    "__label__": "apm@apm.service"
  }
}

3.3.2 Edge in JSON Format

{
  "startNodeId": "apm@apm.service:347150ad7eaee43d2bd25d113f567569",
  "endNodeId": "apm@apm.service.host:34f627359470c9d36da593708e9f2db7",
  "type": "contains",
  "properties": {
    "__type__": "contains"
  }
}

The essence of graph querying lies in pattern matching: Users describe a graph pattern, and the system searches a graph for all subgraphs that match this pattern. A graph pattern can be expressed by using a path expression. The most basic form is (Node)-[Edge]->(Node), which represents traversing from a source node to a destination node through an edge. Path expressions can be extended into more complex patterns. For example, (A)-[e1]->(B)-[e2]->(C) represents a two-hop path from A to C through B, whereas (A)-[*1..3]->(B) indicates a variable-length path from A to B with 1 to 3 hops. This approach is intuitive and powerful, capable of describing relationships ranging from simple one-to-one connections to complex, multi-layered network paths.

4.graph-match: Intuitive Path Querying
graph-match is the most intuitive and user-friendly feature in graph querying. Its design philosophy is to allow users to express their query intent in a way that closely resembles natural language, and then the system automatically executes the query and returns the results. The syntax of graph-match is relatively simple. It consists of two parts: path description and result projection.

A core characteristic of graph-match is that queries must start from a known starting point. This starting point requires both the label and the entity_id property to be specified, which ensures that the system can quickly locate the exact entity. From a technical implementation perspective, this is a deliberate design choice: graph traversal is typically an operation with exponential complexity. Allowing queries to start from arbitrary patterns could lead to full graph scans, making performance unpredictable and unguaranteed. By requiring a specified starting point, the system can perform directed traversal from that point, effectively constraining the search space within a manageable scope.

The path description syntax follows an intuitive directional expression: (A)-[e]->(B) represents a directed edge from A to B. (A)<-[e]-(B) represents a directed edge from B to A. (A)-[e]-(B) represents a bidirectional edge (no direction enforced). You can assign variables to nodes and edges within the path. These variables can then be referenced in subsequent project statements. Paths can connect multiple nodes and edges to form a multi-hop traversal, such as (start)-[e1]->(mid)-[e2]->(end).

A project statement is used to specify the content to be returned. The system can directly return the JSON objects of nodes or edges, or use dot notation to extract specific properties, such as node.entity_type and edge.type. The project statements also support field renaming, which allows returned fields to have more user-friendly names. This flexible output mechanism enables graph-match to meet both the needs of rapid exploration (by returning complete objects) and data analysis (by extracting specific fields).

4.1 Practical Application Examples
4.1.1 End-to-end Path Querying
Query the complete call chain starting from a specific operation:

.topo |
  graph-match (s:"apm@apm.operation" {__entity_id__: '925f76b2a7943e910187fd5961125288'})
              <-[e1]-(v1)-[e2:calls]->(v2)-[e3]->(v3)
  project s, 
          "e1.__type__", 
          "v1.__label__", 
          "e2.__type__", 
          "v2.__label__", 
          "e3.__type__", 
          "v3.__label__", 
          v3

Return results:
● s: the start operation node
● e1.type: the type of the first relationship
● v1.label: the label of the intermediate node
● v2, v3: the information about subsequent nodes

4.1.2 Neighbor Node Statistics
Count the distribution of neighbors for a specific service:

.topo |
  graph-match (s:"apm@apm.service" {__entity_id__: '0e73700c768a8e662165a8d4d46cd286'})
              -[e]-(d)   
  project eType="e.__type__", dLabel="d.__label__"
| stats cnt=count(1) by dLabel, eType
| sort cnt desc
| limit 20

4.1.3 Conditional Path Querying
Find path destinations that meet specific criteria:

.topo |
  graph-match (s:"apm@apm.service.operation" {__entity_id__: '6f0bb4c892effff81538df574a5cfcd9'})
              <-[e1]-(v1)-[e2:runs_on]->(v2)-[e3]->(v3)
  project s, 
          "e1.__type__", 
          "v1.__label__", 
          "e2.__type__", 
          "v2.__label__", 
          "e3.__type__", 
          destId="v3.__entity_id__", 
          v3 
| where destId='9a3ad23aa0826d643c7b2ab7c6897591'
| project s, v3

4.1.4 Pod-to-node Relationship Chain
Trace the full deployment hierarchy of a pod:

.topo |
  graph-match (pod:"k8s@k8s.pod" {__entity_id__: '347150ad7eaee43d2bd25d113f567569'})
              <-[r1:contains]-(node:"k8s@k8s.node")
              <-[r2:contains]-(cluster:"k8s@k8s.cluster")
  project 
    pod,
    node, 
    cluster,
    "r1.__type__",
    "r2.__type__"

4.1.5 Limitations of graph-match
Despite its intuitive and user-friendly syntax, graph-match has several limitations.

5.graph-call: Functional Graph Operations
graph-call provides a set of functional interfaces for graph querying. These functions encapsulate common graph operation patterns, enabling users to perform specific types of queries more efficiently. The design philosophy of graph-call is to provide declarative function APIs. Users need to only specify their intent and parameters, while the system handles and optimizes the underlying traversal algorithms.

getNeighborNodes is the most commonly used graph-call function. It is used to obtain the neighbor nodes of a specified node. The signature of the function is getNeighborNodes(type, depth, nodeList), where the type parameter controls the type of traversal, the depth parameter controls the depth of the traversal, and the nodeList parameter specifies the starting node list. Valid values of the type parameter: sequence (directed traversal, preserving edge direction), sequence_in (returns only paths leading into the starting node), sequence_out (returns only paths originating from the starting node), and full (all directions traversal, regardless of the direction of the edge). This classification allows users to select the most appropriate traversal policy based on their business requirements.

The depth parameter specifies the number of hops for traversal. In practice, we recommend that you do not set this value too large. A depth of 3 to 5 levels is typically sufficient to cover most scenarios. Excessively deep traversals can lead to performance degradation and return overly broad results that may lack practical significance due to excessive indirect associations. The nodeList parameter accepts an array of node descriptors. Each descriptor follows the same syntax as in graph-match, requiring both a label and entity_id. getNeighborNodes performs traversal separately for each starting node and then merges the results before returning.

The returned result of getNeighborNodes contains four fields: srcNode (JSON object representing the source node), destNode (JSON object representing the destination node), relationType (relationship type), and srcPosition (source node position in the path, with -1 indicating a direct neighbor). The srcPosition field is particularly useful. It allows users to distinguish between direct and indirect relationships. During statistical analysis, results can be grouped by position to understand the distribution of relationships across different levels of the graph.

The getDirectRelations function is used to batch query direct relationships between nodes. Unlike getNeighborNodes, getDirectRelations returns only direct connections and does not perform multi-hop traversals. This function is especially useful for batch checking relationships among multiple known nodes, such as checking whether a set of services has call relationships and verifying dependencies among a group of resources. The function takes a list of nodes as input and returns an array of relationships, with each relationship containing complete information about the nodes and edges.

5.1 Practical Application Examples
5.1.1 Obtain the Complete Neighbor Relationships of a Service

-- Obtain all neighbors of a service within 2 hops.
.topo | graph-call getNeighborNodes(
  'full', 2,
  [(:"apm@apm.service" {__entity_id__: '0e73700c768a8e662165a8d4d46cd286'})]
)
| stats cnt=count(1) by relationType
| sort cnt desc

5.1.2 Upstream Impact Analysis for Failures
Identify upstream services that may affect the destination service:

.topo | graph-call getNeighborNodes(
  'sequence_in', 3,
  [(:"apm@apm.service" {__entity_id__: '0e73700c768a8e662165a8d4d46cd286'})]
)
| where relationType in ('calls', 'depends_on')
| extend impact_level = CASE
    WHEN srcPosition = '-1' THEN 'direct'
    WHEN srcPosition = '-2' THEN 'secondary'
    ELSE 'indirect' END
| extend parsed_service_id = json_extract_scalar(srcNode, '$.id')
| project 
    upstream_service = parsed_service_id,
    impact_level,
    relation_type = relationType
| stats cnt=count(1) by impact_level, relation_type

5.1.3 Downstream Impact Analysis for Failures
Identify downstream services affected by a failing service:

.topo | graph-call getNeighborNodes(
  'sequence_out', 3,
  [(:"apm@apm.service" {__entity_id__: 'failing-service-id'})]
)
| where relationType in ('calls', 'depends_on')
| extend affected_service = json_extract_scalar(destNode, '$.id')
| stats impact_count=count(1) by affected_service
| sort impact_count desc
| limit 20

5.1.4 Analysis of Cloud Resource Dependencies
Analyze the network dependencies of an ECS instance:

.topo | graph-call getNeighborNodes(
  'sequence_out', 2,
  [(:"acs@acs.ecs.instance" {__entity_id__: 'i-bp1234567890'})]
)
| extend relation_category = CASE
    WHEN relationType in ('belongs_to', 'runs_in') THEN 'infrastructure'
    WHEN relationType in ('depends_on', 'uses') THEN 'dependency'
    WHEN relationType in ('connects_to', 'accesses') THEN 'network'
    ELSE 'other' END
| stats cnt=count(1) by relation_category
| sort cnt desc
| limit 0, 100

5.1.5 Query Direct Relationships between Nodes in Batches

.topo | graph-call getDirectRelations(
  [
    (:"app@app.service" {__entity_id__: '347150ad7eaee43d2bd25d113f567569'}),
    (:"app@app.operation" {__entity_id__: '73ef19770998ff5d4c1bfd042bc00a0f'})
  ]
)

Example returned relationships:

{
  "startNodeId": "app@app.service:347150ad7eaee43d2bd25d113f567569",
  "endNodeId": "app@app.operation:73ef19770998ff5d4c1bfd042bc00a0f", 
  "type": "contains",
  "properties": {"__type__": "contains"}
}

The functional design of graph-call offers the advantage of clear query intent, enabling the system to optimize execution for specific patterns. However, this also means that it is only suitable for predefined query scenarios. For scenarios where you need to customize complex path patterns, Cypher remains the necessary choice. In practice, we recommend that you preferentially use the predefined functions of graph-call, and resort to the more flexible Cypher only when these predefined functions cannot meet your requirements.

6.Cypher: A Powerful Declarative Query Language
Cypher is the standard query language in the graph database domain. It is designed to combine the usability and declarative style of SQL with optimizations specifically tailored for graph structures. In EntityStore, Cypher provides the most powerful and flexible graph query capabilities, capable of handling a wide range of scenarios, from simple single-node queries to complex multi-hop traversals across large-scale networks.

The syntax of Cypher follows a three-part structure: MATCH, WHERE, and RETURN. This structure is similar to the SELECT, FROM, and WHERE clauses of SQL but logically better aligned with the thinking pattern of graph queries. The MATCH clause describes the graph pattern to search for. The WHERE clause adds filtering conditions. The RETURN clause specifies what results to return. This structured syntax makes complex graph queries easy to read and maintain.

The power of the MATCH clause lies in the graph pattern description it supports. You can define arbitrarily complex path patterns within MATCH, including multi-hop paths, optional paths, and path variables. The syntax for multi-hop paths is [*min..max], where the range is left-closed and right-open. For example, [*2..3] matches only exactly 2-hop paths. This syntax design allows users to flexibly control traversal depth, striking a balance between precision and performance. The MATCH clause also supports combining multiple path patterns: You can define several patterns simultaneously, and the system will return all subgraphs that match any of the specified patterns.

The WHERE clause supports rich filtering conditions. You can apply various predicates on node and edge properties, including equality, substring matching, prefix/suffix checks, and range queries. The WHERE clause also supports logical operators (AND, OR, and NOT) and complex expressions. Compared with graph-match, the WHERE clause of Cypher is more flexible. It not only filters final results but also allows constraints on intermediate nodes along a path. This is especially useful for queries with complex path patterns.

The RETURN clause provides fine-grained control over output. The system can return node objects, edge objects, and path objects, and extract specific property fields. The RETURN clause also supports aggregate functions (such as count, sum, and avg) and grouping operations, which enables Cypher to perform not only graph traversal but also graph analytics. Combined with the powerful data processing capabilities of SPL, the integration of Cypher and SPL enables a complete end-to-end workflow, from data querying to analytical computation.

6.1 Basic query examples
6.1.1 Single-node query

-- Query all nodes of a specific type.
.topo | graph-call cypher(`
    MATCH (n {__entity_type__:"apm.service"})
    WHERE n.__domain__ STARTS WITH 'a' AND n.__entity_type__ = "apm.service"
    RETURN n
`)

Advantages over graph-match:
● Complex filtering using the WHERE clause is supported.
● MATCH can contain only nodes without specifying relationships.
● More property queries (such as entity_type and domain) are supported.

6.1.2 Relationship Query

-- Query call relationships between services.
.topo | graph-call cypher(`
    MATCH (src:``apm@apm.service``)-[e:calls]->(dest:``apm@apm.service``)
    WHERE src.cluster = 'production' AND dest.cluster = 'production'
    RETURN src.service, dest.service, e.__type__
`)

6.2 Multi-hop Queries
6.2.1 Basic Multi-hop Syntax

-- Find call chains of 2 to 3 hops.
.topo | graph-call cypher(`
    MATCH (src {__entity_type__:"acs.service"})-[e:calls*2..4]->(dest)
    WHERE dest.__domain__ = 'acs'
    RETURN src, dest, dest.__entity_type__
`)

Note:
● The multi-hop range is left-closed and right-open. For example, *2..4 indicates 2 hops or 3 hops.
● *1..3 indicates 1 hop or 2 hops, but not 3 hops.

6.2.2 Reachability Analysis

-- Find reachable paths between services.
.topo | graph-call cypher(`
    MATCH (startNode:``apm@apm.service`` {service: 'gateway'})
          -[path:calls*1..4]->
          (endNode:``apm@apm.service`` {service: 'database'})
    RETURN startNode.service, length(path) as hop_count, endNode.service
`)

6.2.3 Impact Chain Analysis

-- Analyze fault propagation paths.
.topo | graph-call cypher(`
    MATCH (failed:``apm@apm.test_service`` {status: 'error'})
          -[impact:depends_on*1..3]->
          (affected)
    WHERE affected.__entity_type__ = 'apm.service'
    RETURN failed.service, 
           length(impact) as impact_distance,
           affected.service
    ORDER BY impact_distance ASC
`)

6.2.4 Node Aggregation Statistics

-- Count the number of services by domain.
.topo | graph-call cypher(`
    MATCH (src {__entity_type__:"apm.service"})-[e:calls*2..3]->(dest)
    WHERE dest.__domain__ = 'apm'
    RETURN src, count(src) as connection_count
`)

Applicable scenarios:
● Connected component analysis: Identify connected subgraphs in a graph
● Centrality calculation: Identify key nodes in the network
● Cluster detection: Detect clusters of tightly interconnected nodes

6.2.5 Path Pattern Matching

-- Find specific topological patterns.
.topo | graph-call cypher(`
    MATCH (src:``acs@acs.vpc.vswitch``)-[e1]->(n1)<-[e2]-(n2)-[e3]->(n3)
    WHERE NOT (src = n2 AND e1.__type__ = e2.__type__) 
        AND n1.__entity_type__ <> n3.__entity_type__ 
        AND NOT (src)<-[e1:``calls``]-(n1)
    RETURN src, e1.__type__, n1, e2.__type__, n2, e3.__type__, n3
`)

Applicable scenarios:
● Security auditing: Detect abnormal network connection patterns
● Compliance check: Verify the compliance of the network architecture
● Pattern recognition: Identify specific system topology structures

A key feature of Cypher is its support for queries based on custom entity properties. In graph-match, intermediate nodes can be filtered only by labels. In contrast, with Cypher, users can query and filter nodes based on any custom property of an entity. This feature allows Cypher to handle more fine-grained query requirements, such as finding all instances with CPU usage greater than 80%, or finding all resources belonging to a particular user.

6.3 Custom Property Query Examples
Querying based on custom entity properties is a core highlight of the full-featured Cypher. In standard queries, although entity details can be retrieved by using USearch, filtering by entity property during graph traversal is limited. The full-featured Cypher enables property-level querying and allows you to directly reference custom properties of entities within MATCH and WHERE clauses. The system automatically fetches detailed entity information from EntityStore and applies filters based on these properties. This design allows graph queries to go beyond mere traversal based on topological structure, but also enables intelligent filtering based on the actual properties of entities, greatly improving query accuracy.

Multi-level path output is another key feature. In traditional graph queries, multi-hop queries usually return only the start and end points, and the intermediate path information may be lost. However, in scenarios such as troubleshooting and impact analysis, knowing the complete path is often more valuable than knowing just the start and end points. The full-featured Cypher supports returning path objects, which contain detailed information about all nodes and edges along the path. You can examine the complete link of data flows based on path objects. This capability is especially useful for analyzing fault propagation paths, tracing data flows, and understanding system architecture.

6.3.1 Querying Based on Custom Entity Properties

-- Use the custom properties of an entity to query data. (This is an example. The actual key-value properties are subject to the actual scenario.)
.topo | graph-call cypher(`
    MATCH (n:``acs@acs.alb.listener`` {listener_id: 'lsn-rxp57*****'})-[e]->(d)
    WHERE d.vSwitchId CONTAINS 'vsw-bp1gvyids******' 
        AND d.user_id IN ['1654*******', '2'] 
        AND d.dns_name ENDS WITH '.com'
    RETURN n, e, d
`)

6.3.2 Querying Based on Complex Property Conditions

-- Use the complex properties of an entity to query data. (This is an example. The actual key-value properties are subject to the actual scenario.)
.topo | graph-call cypher(`
    MATCH (instance:``acs@acs.ecs.instance``)
    WHERE instance.instance_type STARTS WITH 'ecs.c6'
        AND instance.cpu_cores >= 4
        AND instance.memory_gb >= 8
        AND instance.status = 'Running'
    RETURN 
        instance.instance_id,
        instance.instance_type,
        instance.cpu_cores,
        instance.memory_gb,
        instance.availability_zone
    ORDER BY instance.cpu_cores DESC, instance.memory_gb DESC
`)

6.4 Multi-level Path Output
6.4.1 Return Complete Path Information

-- Return the complete path information across multiple hops
.topo | graph-call cypher(`
    MATCH (n:``acs@acs.alb.listener``)-[e:``calls``*2..3]-()
    RETURN e
`)

Path result format:
● An array of all edges in the path is returned.
● Each edge contains the complete start and end nodes and property information.
● Path length and path weight calculation are supported.

6.5 Fine-grained Link Control for Connectivity Search
6.5.1 Connection Analysis Across Network Layers

-- Find connection paths from ECS instances to load balancers
.topo | graph-call cypher(`
    MATCH (start_node:``acs@acs.ecs.instance``)
          -[e*2..3]-
          (mid_node {listener_name: 'entity-test-listener-zuozhi'})
          -[e2*1..2]-
          (end_node:``acs@acs.alb.loadbalancer``)
    WHERE start_node.__entity_id__ <> mid_node.__entity_id__ 
        AND start_node.__entity_type__ <> mid_node.__entity_type__
    RETURN 
        start_node.instance_name, 
        e, 
        mid_node.__entity_type__, 
        e2, 
        end_node.instance_name
`)

6.5.2 Service Mesh Connection Analysis

<pre><code>-- Analyze traffic paths in a microservices mesh
.topo | graph-call cypher(`
    MATCH (client:``apm@apm.service)
          -[request:calls]->
          (gateway:``apm@apm.gateway)
          -[route:routes_to]->
          (service:``apm@apm.service)
          -[backend:calls]->
          (database:``middleware@database)
    WHERE client.environment='production'
        AND request.protocol='HTTP'
        AND route.load_balancer_type='round_robin'
    RETURN 
        client.service,
        gateway.gateway_name,
        service.service,
        database.database_name,
        request.request_count,
        backend.connection_pool_size
`)</code></pre>

6.5.3 Cascading Failure Analysis

-- Analyze the cascading impact of service failures
.topo | graph-call cypher(`
    MATCH (failed_service:``apm@apm.service`` {service: 'load-generator'})
    MATCH (failed_service)-[cascade_path*1..4]->(affected_service:``apm@apm.service``)
    RETURN 
        failed_service.service as root_cause,
        length(cascade_path) as impact_depth,
        affected_service.service as affected_service,
        cascade_path as dependency_chain
    ORDER BY impact_depth ASC
`)

7.Typical Application Scenarios
Graph queries are widely used in actual O&M and analysis scenarios. The following sections present several typical application patterns to help you better understand how to apply graph query capabilities in practical scenarios.

7.1 Analyze Service Call Chains

-- Analyze the call patterns of a specific service
.topo |
  graph-match (s:"apm@apm.service" {__entity_id__: 'abcdefg123123'})
              -[e:calls]-(d:"apm@apm.service")
  project 
    source_service="s.service",
    target_service="d.service", 
    call_type="e.__type__"
| stats call_count=count(1) by source_service, target_service
| sort call_count desc

7.2 Permission Chain Tracing
In complex systems, understanding how user permissions propagate to resources is crucial for security auditing and compliance checks:

-- Trace access paths from users to resources

.topo |
  graph-match (user:"identity@user" {__entity_id__: 'user-123'})
              -[auth:authenticated_to]->(app:"apm@apm.service")
              -[access:accesses]->(resource:"acs@acs.rds.instance")
  project 
    user_id="user.user_id",
    app_name="app.service",
    resource_id="resource.instance_id",
    auth_method="auth.auth_method",
    access_level="access.permission_level"

7.3 Data Integrity Check
7.3.1 Check Data Integrity

.topo | graph-call cypher(`
    MATCH (n)-[e]->(m)
    RETURN 
        count(DISTINCT n) as unique_nodes,
        count(DISTINCT e) as unique_edges,
        count(DISTINCT e.__type__) as edge_types
`)

7.3.2 Identify Dangling Relationships

-- Find relationships pointing to non-existent entities
.let topoData = .topo | graph-call cypher(`
        MATCH ()-[e]->()
        RETURN e
    `)
    | extend startNodeId = json_extract_scalar(e, '$.startNodeId'), endNodeId = json_extract_scalar(e, '$.endNodeId'), relationType = json_extract_scalar(e, '$.type')
    | project startNodeId, endNodeId, relationType;
--$topoData
.let entityData = .entity with(domain='*', type='*') 
| project __entity_id__, __entity_type__, __domain__
| extend matchedId = concat(__domain__, '@', __entity_type__, ':', __entity_id__)
| join -kind='left' $topoData on matchedId = $topoData.endNodeId
| project matchedId, startNodeId, endNodeId, relationType
| extend status = COALESCE(startNodeId, 'Dangling')
| where status = 'Dangling';
$entityData

8.Data Integrity and Query Mode Selection
When you use graph queries, data integrity is an issue that requires special attention. The graph query capability of EntityStore relies on three types of data: UModel (data model definitions), Entity (entity data), and Topo (topology relationship data). The data integrity of these three components directly affects query capabilities and results.

8.1 Analysis of Data Missing Scenarios

8.2 Pure-topo Mode
It should be noted that the full-featured Cypher requires complete data across UModel, Entity, and Topo. If entity data is incomplete, although topological queries can be performed, filtering based on custom properties does not work. To address this issue, the system provides a pure-topo mode:

-- Standard mode (complete data required)
.topo | graph-call cypher(`
    MATCH (n:``acs@acs.alb.listener`` {ListenerId: 'lsn-123'})-[e]->(d)
    WHERE d.vSwitchId CONTAINS 'vsw-456'
    RETURN n, e, d
`)

-- pure-topo mode (relies only on relationship data)
.topo | graph-call cypher(`
    MATCH (n:``acs@acs.alb.listener``)-[e]->(d)
    RETURN n, e, d
`, 'pure-topo')

Characteristics of pure-topo mode:
● Advantages: The query speed is faster without relying on entity data.
● Limitations: Custom properties of entities cannot be used for filtering.
● Applicable scenarios: This mode is applicable to scenarios such as topology analysis and relationship verification.

8.3 Policy for Selecting Query Modes
When all three aspects of data are complete, you can use all the features of the full-featured Cypher, including queries based on custom properties and multi-level path output. If entity data is incomplete but Topo data is complete, the pure-topo mode can be used for querying. This mode offers faster query performance but supports only topology-based queries, and filtering by entity property is not available. If Topo data is incomplete but entity data is complete, graph queries cannot be performed. This is because graph queries depend on relationships. Without relationship data, graphs cannot be formed.

In practice, you should select an appropriate query method based on data integrity. If data integrity is sufficient, you can preferentially adopt the full-featured Cypher to enjoy the convenience of property-level queries. If performance is the primary concern and only topology information is required, you can use the pure-topo mode. To check data integrity, we recommend that you first run simple test queries to check data integrity before executing complex queries.

9.Performance optimization and best practices
Graph queries are powerful, but performance can become a bottleneck when dealing with large volumes of data. You can adopt proper methods and optimization policies to significantly improve query performance and ensure that the system remains stable and responsive even under heavy loads.

9.1 Query structure optimization
9.1.1 Proper use of indexes

-- ❌ Before optimization: full table scan
.topo | graph-call cypher(`
    MATCH (n) WHERE n.service = 'web-app'
    RETURN n
`)

-- ✅ After optimization: using label-based indexing
.topo | graph-call cypher(`
    MATCH (n:``apm@apm.service`` {service: 'web-app'})
    RETURN n
`)

9.1.2 Early filtering with conditions

-- ❌ Before optimization: late filtering
.topo | graph-call cypher(`
    MATCH (start)-[*1..5]->(endNode)
    WHERE start.environment = 'production' AND endNode.status = 'active'
    RETURN start, endNode
`)

-- ✅ After optimization: early filtering
.topo | graph-call cypher(`
    MATCH (start {environment: 'production'})-[*1..5]->(endNode {status: 'active'})
    RETURN start, endNode
`)

9.2 Query Scope Control
Precise control over the query scope is the most critical optimization policy:
● Time range optimization: Use time fields to limit the query range.
● Traversal depth limitation: Performance degrades significantly when the traversal depth exceeds five layers.
● Exact starting point: Use specific entity_id instead of fuzzy match.
● Traversal type selection: Select between sequence or full traversal based on actual requirements.

9.3 Result Set Control
9.3.1 Paging and Limits

-- Use LIMIT to control the number of results
.topo | graph-call cypher(`
    MATCH (service:``apm@apm.service``)-[calls:calls]->(target)
    WHERE calls.request_count > 1000
    RETURN service.service, target.service, calls.request_count
    ORDER BY calls.request_count DESC
    LIMIT 50
`)

9.3.2 Result Sampling

-- Sample large result sets
.topo | graph-call cypher(`
    MATCH (n:``apm@apm.service``)
    RETURN n.service
    LIMIT 100
`)
| extend seed = random()
| where seed < 0.1

9.4.Multi-hop Traversal Optimization
9.4.1 Control Hop Depth

-- Avoid excessively deep traversals
.topo | graph-call cypher(`
    MATCH (start)-[path*1..3]->(endNode)
    WHERE length(path) <= 2
    RETURN path
`)

9.4.2 Use Directional Optimization

-- Reduce search space by using the relationship direction
.topo | graph-call cypher(`
    MATCH (start)-[calls:calls*1..3]->(endNode)  -- Explicit direction
    WHERE start.__entity_type__ = 'apm.service'
    RETURN start, endNode
`)

9.5 Best Practice Recommendations
● Use of SPL for filtering: You can filter out unwanted results in a timely manner after graph queries.
● Batch processing: For large-scale graph queries, batch processing can be used.
● Result caching: For frequently queried paths, the results can be cached.
● Query splitting: You can break down complex queries into multiple simple ones, and then combine the results by using SPL.

10.FAQ
10.1Edge Type Coincides with a Cypher Keyword

.topo | graph-call cypher(`
    MATCH (s)-[e:``contains``]->(d)
    WHERE s.__domain__ CONTAINS "apm"
    RETURN e
`)

The term contains is both a Cypher keyword and an edge type. In this case, as an edge label in Cypher syntax, it must be enclosed in backticks (`) to escape it. Furthermore, since the Cypher query is embedded within an SPL context, where backticks themselves need to be escaped, a single backtick in Cypher is represented as double backticks in SPL. Therefore, the edge type contains must be enclosed with double backticks to ensure correct parsing.

10.2 Multi-hop Syntax Description
-- Find call chains with 2 to 3 hops

.topo | graph-call cypher(`
    MATCH (src {__entity_type__:"acs.service"})-[e:calls*2..4]->(dest)
    WHERE dest.__domain__ = 'acs'
    RETURN src, dest, dest.__entity_type__
`)

Note:
● The multi-hop range is left-closed and right-open. For example, *2..4 indicates 2 hops or 3 hops.
● *1..3 indicates 1 hop or 2 hops, but not 3 hops.

Verify this conclusion:

.topo | graph-call cypher(`
    MATCH (s)-[e*1..3]->(d)
    RETURN length(e) as len
`, 'pure-topo')
| stats cnt=count(1) by len
| project len, cnt

10.3 Cypher Relationship Abbreviation Not Supported
✅ Supported syntax:

.topo | graph-call cypher(`
    MATCH (s)-[]->(d)
    RETURN s
`, 'pure-topo')

❌ Unsupported syntax:

.topo | graph-call cypher('
MATCH (s)-->(d)
RETURN s
', 'pure-topo')

DEV Community

Breaking Through the Key Bottlenecks in Observability: Ultimate Integration of Entities and Relationships

Top comments (0)