DEV Community

ObservabilityGuy
ObservabilityGuy

Posted on

Building a Navigation Map for Data Assets: Data Discovery and End-to-End Analysis in UModel

This article introduces UModel’s data discovery and end-to-end analysis capabilities, enabling unified metadata exploration, relationship mapping, and...
1.Background Information
Imagine you are standing in a vast library filled with tens of thousands of books, where the catalog of each book is scattered across different rooms and each room uses its own unique indexing system. If you want to find a book about service calls, you have to go back and forth between the APM room, Kubernetes room, and cloud resource room, and remember different search rules for each room.

This is the real dilemma faced by many enterprises in the field of observability. UModel acts like an intelligent management system built for this chaotic library, which allows you to easily explore and understand the structure of the entire knowledge graph.

1.1 What is UModel?
UModel is a graph-based observable data modeling method designed to address core challenges in the collection, organization, and usage of observable data within enterprise-level environments. UModel employs a graph structure composed of nodes and links to describe the IT world, and implements unified representation, storage decoupling, and intelligent analysis of observable data through standardized data modeling.

As the foundational data modeling framework for Alibaba Cloud's observable system, UModel provides enterprises with a set of common observable interaction languages that enable humans, programs, and AI to understand and analyze observable data, thereby building true full-stack observability capabilities.

Core Concepts
UModel employs fundamental graph theory concepts and uses nodes and links to form a directed graph for modeling IT systems:

● Node: The core component is a Set (data collection), which represents a collection of homogeneous entities or data, such as EntitySet (entity set), MetricSet (metric set), and LogSet (log set). It also includes the Storage type for the Set, such as Simple Log Service (SLS), Prometheus, and MySQL.

● Link: indicates the relationships between nodes, such as EntitySetLink (entity association), DataLink (data association), and StorageLink (storage association).

● Field: defines constraints and properties for Sets and Links and encompasses over 20 configuration items, including names, types, constraint rules, and analysis features.

1.2 What is a UModel Query?
A UModel query is a dedicated interface in EntityStore for querying knowledge graph metadata. Using the .umodel query syntax, it enables exploration of EntitySet definitions, EntitySetLink relationships, and the complete knowledge graph structure. This provides robust support for data modeling analysis and schema management.

Query Differentiation
The following table describes the differences between UModel queries and queries of other types.


The UModel query operates at the metadata layer, which helps users understand the structure and definitions of data models, rather than the specific runtime data.

2.UModel Query
2.1 Data Model
Data Structure
The data returned by a UModel query has a fixed five-field structure.


Note: metadata, schema, and spec are JSON-formatted strings. Use the json_extract_scalar function to extract values.

2.2 Query Syntax
Basic Query Syntax

-- Basic query format
.umodel | [SPL operations...]
-- Query with constraints
.umodel | where <condition> | limit <count>
Enter fullscreen mode Exit fullscreen mode

Core Query Patterns

  1. List Queries - metadata enumeration Query all UModel data:
-- Query all UModel data (not recommended for production environments):
.umodel
-- Paginated query
.umodel | limit 0, 10
Enter fullscreen mode Exit fullscreen mode

Filter by type:

-- Query all EntitySet definitions
.umodel | where kind = 'entity_set' | limit 0, 10
-- Query all EntitySetLink definitions
.umodel | where kind = 'entity_set_link' | limit 0, 10
-- Query all link types (relationship definitions)
.umodel | where __type__ = 'link' | limit 0, 10
-- Query all node types (entity definitions)
.umodel | where __type__ = 'node' | limit 0, 10
Enter fullscreen mode Exit fullscreen mode

Filter by property:

-- Query the definition of an entity with a specific name
.umodel | where json_extract_scalar(metadata, '$.name') = 'acs.ecs.instance' | limit 0, 10
-- Query all definitions in a specific domain
.umodel | where json_extract_scalar(metadata, '$.domain') = 'apm' | limit 0, 10
-- Query definitions across multiple domains
.umodel | where json_extract_scalar(metadata, '$.domain') in ('acs', 'apm', 'k8s') | limit 0, 10
Enter fullscreen mode Exit fullscreen mode

2.Graph Analysis - relationship exploration
UModel supports metadata-driven graph computations for analyzing relationships between EntitySets:

Basic graph query syntax:

.umodel | graph-match <path> project <output>
Enter fullscreen mode Exit fullscreen mode

Concepts:

In graph queries, two fundamental graph concepts are critical:

Node type (label): represented as @ in UModel metadata graph queries. Example: apm@entity_set.
Node ID: represented as entity_id in UModel metadata graph queries, formatted as kind::domain::name. Example: entity_set::apm::apm.service.
Path queries in graphs use ASCII characters to represent the direction of relationships.


Query EntitySet relationships:

-- Query all relationships for a specific EntitySet
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ecs.instance'})
              -[e]-(d)
  project s, e, d | limit 0, 10
Enter fullscreen mode Exit fullscreen mode

Directional relationship queries:

-- Incoming relationships (pointing to an EntitySet):
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ecs.instance'})
              <--(d)
  project s, d | limit 0, 10
-- Outgoing relationships (originating from an EntitySet):
.umodel
| graph-match (s:"acs@entity_set" {__entity_id__: 'entity_set::acs::acs.ack.cluster'})
              -->(d)
  project s, d | limit 0, 10
Enter fullscreen mode Exit fullscreen mode

2.3 Advanced Queries
JSON path extraction
Since UModel data is stored in JSON format, JSON functions are required for field extraction:

-- Extract basic information
.umodel
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    entity_domain = json_extract_scalar(metadata, '$.domain'),
    entity_description = json_extract_scalar(metadata, '$.description.zh_cn')
| project entity_name, entity_domain, entity_description | limit 0, 100
Enter fullscreen mode Exit fullscreen mode

Composite filtering with multiple conditions

-- Query with complex conditions
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') in ('apm', 'k8s')
  and json_array_length(json_extract(spec, '$.fields')) > 5
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| sort field_count desc
| limit 20
Enter fullscreen mode Exit fullscreen mode

Aggregate analysis

-- Count the number of EntitySets by domain
.umodel
| where kind = 'entity_set'
| extend domain = json_extract_scalar(metadata, '$.domain')
| stats entity_count = count() by domain
| sort entity_count desc
Enter fullscreen mode Exit fullscreen mode

2.4 Performance Optimization Recommendations
Use Precise Filters

-- Before optimization: broad scope
.umodel | where json_extract_scalar(metadata, '$.name') like '%service%'
-- After optimization: precise matching
.umodel | where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') = 'apm'
  and json_extract_scalar(metadata, '$.name') = 'apm.service'
Enter fullscreen mode Exit fullscreen mode

Pre-filtering

-- Before optimization: late filtering
.umodel
| extend name = json_extract_scalar(metadata, '$.name')
| where name = 'apm.service'
-- After optimization: pre-filtering
.umodel
| where json_extract_scalar(metadata, '$.name') = 'apm.service'
| extend name = json_extract_scalar(metadata, '$.name')
Enter fullscreen mode Exit fullscreen mode

Graph Query Optimization

-- Before optimization: full graph search
.umodel | graph-match (s)-[e]-(d) project s, e, d
-- After optimization: specifying the start point.umodel
| graph-match (s:"apm@entity_set" {__entity_id__: 'entity_set::apm::apm.service'})
              -[e]-(d)
  project s, e, d
Enter fullscreen mode Exit fullscreen mode

3.Application Scenarios of UModel Queries
UModel queries can address a wide range of practical challenges and provide robust support for data modeling, schema management, and knowledge graph analysis.

3.1 Schema Exploration and Discovery
Scenario Description
In large-scale observability systems, hundreds of EntitySet definitions may be distributed across multiple domains. Users need to quickly identify what entity types are defined in the system and understand their basic information.

Application Examples
Explore all entity types:

-- List all EntitySets with their basic information
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    entity_domain = json_extract_scalar(metadata, '$.domain'),
    description = json_extract_scalar(metadata, '$.description.zh_cn')
| project entity_name, entity_domain, description
| sort entity_domain, entity_name
| limit 0, 100
Enter fullscreen mode Exit fullscreen mode

View by domain:

-- View all entity definitions within a specific domain, such as APM
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.domain') = 'apm'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    description = json_extract_scalar(metadata, '$.short_description.zh_cn')
| project entity_name, description
| limit 0, 50
Enter fullscreen mode Exit fullscreen mode

3.2 Data Modeling and Analysis
Scenario Description
During data modeling optimization, you need to analyze information about existing EntitySets, including field complexity, primary key design, and index configuration, to identify the models that require optimization.

Application Examples
Analyze field complexity:

-- Analyze the distribution of field counts across EntitySets by domain
.umodel
| where kind = 'entity_set'
| extend
    domain = json_extract_scalar(metadata, '$.domain'),
    entity_name = json_extract_scalar(metadata, '$.name'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| stats
    avg_fields = avg(field_count),
    max_fields = max(field_count),
    min_fields = min(field_count),
    entity_count = count()
  by domain
| sort entity_count desc
Enter fullscreen mode Exit fullscreen mode

Identify complex entities:

-- Find EntitySets with the highest number of fields (potential candidates for optimization)
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain'),
    field_count = json_array_length(json_extract(spec, '$.fields'))
| sort field_count desc
| limit 20
Enter fullscreen mode Exit fullscreen mode

3.3 Relationship Graph Analysis
Scenario Description
Mapping relationships between EntitySets are fundamental to building a complete knowledge graph. Graph queries enable the analysis of associations among entities, helping to uncover dependencies and connections within the data model.

Application Examples
Query all relationships of an entity:

-- Query all relationships of a specific EntitySet, such as apm.service
.umodel
| graph-match (s:"apm@entity_set" {__entity_id__: 'entity_set::apm::apm.service'})
              -[e]-(d)
  project s, e, d
| limit 0, 50
Enter fullscreen mode Exit fullscreen mode

Analyze relationship type distribution:

-- Count the occurrences of each relationship type
.umodel
| where kind = 'entity_set_link'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    link_type = json_extract_scalar(metadata, '$.link_type')
| stats limk_count = count() by link_type
| sort limk_count desc
Enter fullscreen mode Exit fullscreen mode

Find specific relationships:

-- Find all relationship definitions of the runs_on type
.umodel
| where kind = 'entity_set_link'
  and json_extract_scalar(metadata, '$.link_type') = 'runs_on'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    source = json_extract_scalar(metadata, '$.source'),
    target = json_extract_scalar(metadata, '$.target')
| project link_name, source, target
Enter fullscreen mode Exit fullscreen mode

3.4 Metadata Quality Check
Scenario Description
Ensure the integrity and consistency of UModel metadata by identifying issues such as missing descriptions and undefined fields.

Application Examples
Check EntitySets with missing descriptions:

-- Find EntitySets without descriptions in Chinese
.umodel
| where kind = 'entity_set'
  and (json_extract_scalar(metadata, '$.description.zh_cn') = ''
       or json_extract_scalar(metadata, '$.description.zh_cn') is null)
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain
Enter fullscreen mode Exit fullscreen mode

Verify the integrity of field definitions:

-- Identify EntitySets with no fields defined
.umodel
| where kind = 'entity_set'
  and (json_extract(spec, '$.fields') is null
       or json_array_length(json_extract(spec, '$.fields')) = 0)
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain
Enter fullscreen mode Exit fullscreen mode

3.5 Cross-domain Association Analysis
Scenario Description
In complex observability systems, entities from different domains, such as APM, Kubernetes, and cloud resources, may have cross-domain relationships. UModel queries can be used to analyze these cross-domain association patterns and understand how domains are interconnected.

Application Examples
Find cross-domain relationships:

-- Identify EntitySetLinks that connect different domains
.umodel
| where kind = 'entity_set_link'
| extend
    link_name = json_extract_scalar(metadata, '$.name'),
    source_domain = json_extract_scalar(spec, '$.src.domain'),
    target_domain = json_extract_scalar(spec, '$.dest.domain')
| where source_domain != target_domain
| project link_name, source_domain, target_domain
| limit 0, 50
Enter fullscreen mode Exit fullscreen mode

Analyze inter-domain connectivity:

-- Count the number of relationships between domains
.umodel
| where kind = 'entity_set_link'
| extend
    source_domain = json_extract_scalar(spec, '$.src.domain'),
    target_domain = json_extract_scalar(spec, '$.dest.domain')
| stats count = count() by source_domain, target_domain
| sort count desc
Enter fullscreen mode Exit fullscreen mode

3.6 Version and Evolution Analysis
Scenario Description
UModel schemas evolve as business develops. You need to track schema versioning and historical changes.

Application Examples
View schema version information:

-- View the schema versions of all EntitySets
.umodel
| where kind = 'entity_set'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    schema_version = json_extract_scalar(schema, '$.version'),
    schema_url = json_extract_scalar(schema, '$.url')
| project entity_name, schema_version, schema_url
| limit 0, 100
Enter fullscreen mode Exit fullscreen mode

3.7 Fast Locating and Retrieval
Scenario Description
Quickly locate specific EntitySets or relationship definitions within a large volume of metadata. Fuzzy match and term query are supported.

Application Examples
Fuzzy search by name:

-- Search for EntitySets with "service" in the name
.umodel
| where kind = 'entity_set'
  and json_extract_scalar(metadata, '$.name') like '%service%'
| extend
    entity_name = json_extract_scalar(metadata, '$.name'),
    domain = json_extract_scalar(metadata, '$.domain')
| project entity_name, domain
| limit 0, 20
Enter fullscreen mode Exit fullscreen mode

Exact search for a specific entity:

-- Find the complete definition of a specific EntitySet exactly
.umodel
| where json_extract_scalar(metadata, '$.name') = 'apm.service'
| limit 1
Enter fullscreen mode Exit fullscreen mode

4.Summary
UModel query, as a dedicated interface in EntityStore for querying knowledge graph metadata, provides robust support capabilities for observability data modeling. You can use UModel queries to implement the following features:

1.Schema structure exploration: allows you to quickly understand all defined entity and relationship types within the system.
2.Data model analysis: enables you to deeply examine field designs, primary key configurations, complexity, and other aspects of EntitySets.
3.Relationship graph construction: allows you to use graph queries to analyze associations between entities and comprehend the topological structure of the knowledge graph.
4.Quality check: allows you to verify the integrity and consistency of metadata.
5.Cross-domain analysis: allows you to investigate association patterns across different domains.
6.Fast retrieval: enables you to rapidly locate destination definitions within large volumes of metadata.

These capabilities make UModel Query an indispensable tool for data modeling analysis, schema management, and knowledge graph exploration, providing a solid foundation for building and maintaining high-quality observability data models.

Top comments (0)