Challenges
Recently, my team has been assigned tasks to redesign the monitoring system. My organization has an ecosystem with hundreds of applications deployed across multiple cloud providers, mostly AWS (tens of AWS accounts in our AWS Org).
The old monitoring system was designed and deployed years ago. It’s a prom stack with a Prometheus instance, Grafana, Alermanager, and various types of exporters. It was good at that time. When the ecosystem grows fast, however, it now has problems:
- Not highly available
- Not scalable, scaling is too complex and not efficient
- Data retention is too short (14 days) due to performance dramatically decreasing and scaling difficulties
With all the above problems, the ideal solution must meet the requirements below:
- Highly available
- Scalable, able to scale easily
- Disaster recovery
- Data must be stored for at least a year
- Compatible with Prom stack and PromQL so that we don’t spend much effort on migration and getting familiar with the new stack.
- Have an efficient way to collect metrics from multiple AWS accounts
- The deployment process must be automated, both infra and configurations
- Easy to be managed/maintain and automate daily operations tasks
- Nice to have if supporting multi-tenant
Solution
After researching and making some PoC, we find that Victoria Metrics is a good fit for us. Vitoria Metrics has all of the required features. It’s highly available built-in, scaling is so easy since every component is separated. We implemented it and are using it for the production environment. We call it by name Ultra Metrics
. Let’s look at our solution in detail.
High-level architecture
This is the high-level architecture of the solution:
We use cluster version of Victoria Metrics (VM), the cluster has some major components:
-
vmstorage
: stores the raw data and returns the queried data on the given time range for the given label filters. This is the only stateful component in the cluster. -
vminsert
: accepts the ingested data and spreads it amongvmstorage
nodes according to consistent hashing over metric name and all its labels. -
vmselect
: performs incoming queries by fetching the needed data from all the configuredvmstorage
nodes -
vmauth
: is a simple auth proxy, router for the cluster. It reads auth credentials from Authorization HTTP header (Basic Auth, Bearer token, and InfluxDB authorization is supported), matches them against configs, and proxies incoming HTTP requests to the configured targets. -
vmagent
: is a tiny but mighty agent which helps you collect metrics from various sources and store them in Victoria Metrics or any other Prometheus-compatible storage systems that support the remote_write protocol. -
vmalert
: executes a list of the given alerting or recording rules against configured data sources. For sending alerting notificationsvmalert
relies on configured Alertmanager. Recording rules results are persisted via remote write protocol.vmalert
is heavily inspired by Prometheus implementation and aims to be compatible with its syntax -
promxy
: used for querying the data from multiple clusters. It’s Prometheus proxy that makes many shards of Prometheus appear as a single API endpoint to the user.
How does the solution fit into our case?
Here are how Ultra Metrics
addresses the requirements:
High availability
The system is able to continue accepting new incoming data and processing new quires when some components of the cluster are temporarily unavailable.
We accomplish this by using the cluster version of VM. Each component is deployed with redundancy and auto-healing. Data is also redundant by replicating (read more) to multiple nodes in the same cluster.
-
vminsert
andvmselect
are stateless components and deployed behind a proxyvmauth
.vmauth
stops routing requests into unavailable nodes. -
vmstorage
is the only stateful component, however, since data is redundant, it’s fine if some nodes go down temporarily.-
vminsert
re-routes incoming data from unavailablevmstorage
nodes to healthyvmstorage
nodes -
vmselect
continues serving responses if avmstorage
node is unavailable
-
Scalability
Since each component's responsibility is separated, and is mostly stateless services. It’s much easier to scale both vertical and horizontal. Each component may scale independently.
The storage component is the only stateful one. However, vmstorage
nodes don't know about each other, don't communicate with each other, and don't share any data. It simplifies cluster maintenance and cluster scaling. Scaling storage layer is now so easy, just adding new nodes and updating vminsert
and vmselect
configurations. That’s it, no more steps are required.
Disaster recovery
Following Victoria Metrics’ recommendation that all components run in the same subnet network (same availability zone) to utilize high bandwidth, low latency, and thus low error rates. This increases cluster performance.
To have a multi-AZ, even multi-region (which we choose) setup, we run an independent cluster in each AZ or region. Then configure vmagent
to send data to all clusters. vmagent
has this feature built-in. [promxy](https://github.com/jacksontj/promxy)
may be used for querying the data from multiple clusters. It provides a single data source for all PromQL queries meaning Grafana can have a single source and we can have globally aggregated PromQL queries.
Failover can be achieved by a combination of Route53
failover and/or promxy
. When an entire AZ/region goes down, the system is still available for both read and write operations. Once the AZ/region is back in operation, missing data will be sent to that cluster by vmagent
from its caching buffer.
Multi-tenancy
The system is centralization monitoring system, there are multiple teams using it. Data of each team is stored independently and isolated from others. Team has ability to access data for their own team only. This is exactly what are VM multi-tenancy feature offers.
Victoria Metrics cluster has built-in support for multiple isolated tenants. It’s expected that the data of tenants be stored in a separate database managed by a separate service sitting in front of the Victoria Metrics cluster such as vmauth
Data for all the tenants are evenly spread among available vmstorage
nodes. This guarantees even load among vmstorage
nodes when different tenants have different amounts of data and different query loads. Performance and resource usage doesn't depend on the number of tenants also.
Let’s say a tenant is an AWS account in the above architecture.
vmagent
remote write URL are configured as example below:
- URLs for data ingestion:
https://us-east-1.ultra-metrics.com:8427/api/v1/write
https://ap-southeast-1.ultra-metrics.com:8427/api/v1/write
- URLs for Prometheus querying:
https://us-east-1.ultra-metrics.com:8427/api/v1/query
https://ap-southeast-1.ultra-metrics.com:8427/api/v1/query
vmauth
configurations look like this snippet:
users:
...
# Requests with the 'Authorization: Bearer account1Secret' and 'Authorization: Token account1Secret'
# header are proxied to https://<internal-nlb-domain>:8481
# For example, https://<internal-nlb-domain>:8427/api/v1/query is proxied to https://<internal-nlb-domain>:8481/select/1/prometheus/api/v1/query
- bearer_token: account1Secret
url_map:
- src_paths:
- /api/v1/query
- /api/v1/query_range
- /api/v1/series
- /api/v1/label/[^/]+/values
- /api/v1/metadata
- /api/v1/labels
- /api/v1/query_exemplars
url_prefix:
- https://<internal-nlb-domain>:8481/select/1/prometheus
- src_paths:
- /api/v1/write
url_prefix:
- https://<internal-nlb-domain>:8480/insert/1/prometheus
# Requests with the 'Authorization: Bearer account2Secret' and 'Authorization: Token account2Secret'
# header are proxied to https://<internal-nlb-domain>:8481
# For example, https://<internal-nlb-domain>:8427/api/v1/query is proxied to https://<internal-nlb-domain>:8481/select/2/prometheus/api/v1/query
- bearer_token: account2Secret
url_map:
- src_paths:
- /api/v1/query
- /api/v1/query_range
- /api/v1/series
- /api/v1/label/[^/]+/values
- /api/v1/metadata
- /api/v1/labels
- /api/v1/query_exemplars
url_prefix:
- https://<internal-nlb-domain>:8481/select/2/prometheus
- src_paths:
- /api/v1/write
url_prefix:
- https://<internal-nlb-domain>:8480/insert/2/prometheus
Note that:
-
8247
isvmauth
's port -
8481
isvmselect
's port -
8480
isvminsert
's port
Prom-stack compatibility
- VM implements Prometheus querying API so there is no changes from query APIs, syntax, etc.. So all tools used continue to function as they are.
-
We don’t even need to make any changes (sidecar, agent, etc...) except to add few lines configurations to the old monitoring system to make it works with new system.
remote_write: - url: https://us-east-1.ultra-metrics.com:8427/api/v1/write - url: https://ap-southeast-1.ultra-metrics.com:8427/api/v1/write
Thus, we can continue using the old monitoring while experimenting new system.
Some statistics
Will be updated soon.
What’s next?
- Infrastructure provisioning by CDK
- Automate cluster deployment using AWS Automation runbook
- GitOps for daily operation tasks on the cluster
Top comments (3)
Interesting. Is it Open Source?
yep, it's @mmuller88. All things mentioned in this post are available in open source version. Victoria Metrics has enterprise version also.
woww.. great post!