Goal
[!TIP]
In hurry? Jump directly to Result section to see the outcome of this dive.
The goal of this dive is to read and understand the patch notes for Athenz v1.12.27.
ToC
-
v1.12.27
- PR: expose on-call URL value in client-side config#3055
- Glossary
- PR summary in my own words
- PR summary from AI
- Can I test it out?
- PR: ui - switch from zms to msd for policy creation by @ArtjomsPorss in #3034
- Glossary
- PR summary in my own words
- PR summary from AI
- Can I test it out?
- PR: feat: Add functionality to search My Domains in UI by @chandrasekhar1996 in #3058
- PR Summary in my own words
- PR Summary from AI
- Can I test it out?
- PR: fix: preserve domain contacts when updating an individual contact without page refresh by @chandrasekhar1996 in #3083
- PR Summary in my own words
- PR Summary from AI
- Can I test it out?
- PR: Use correct URL path and query param for athenz role. #3089
- PR Summary from AI
- PR Summary in my own words
- Can I test it out?
- What I learned
- PR: use metadata to specify use of default identity #3084
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: Make ZpeUpdPolLoader ScheduledExecutorService thread daemon #3086
- Prerequisites
- PR Summary from AI
- PR Summary in my own words
- Can I test it out?: Yes
- Future Potentials
- PR: make otel metric options more configurable #3090
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: Yes
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: expose openid_issuer field for access tokens in zts java client #3091
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: Add FreeBSD support to libs/go/sia/util #3093
- Glossary
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: expose x509/ssh key id for instance register/refresh operations #3092
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: fix util test os filenames + new GetGroupGID impl #3094
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: Yes, but skipped
- Future Potentials
- PR: update go and java dependencies to their latest releases #3095
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: Yes, but skipped
- Future Potential
- PR: allow wildcard in first domain component of StaticWorkloadName #3096
- Glossary
- Prerequisites
- PR Summary in my own words
- PR Summary from AI
- Breaking Changes?: No
- Can I test it out?: No
- Future Potential
PR: expose on-call URL value in client-side config#3055
https://github.com/AthenZ/athenz/pull/3055
Glossary
-
On call: Emergency calls -
client-side: Web browser
PR summary in my own words
Simply speaking:
-
Before: We had to set theNEXT_PUBLIC_ONCALL_URLenvironment variable at build time. -
After: We can now setonCallUrl=""in the client-side configuration or as an ENV variable. There is no need to rebuild => Simply modify the config and restart the pod.
Just like we can set the ZMS_URL for k8s-athenz-sia without needing to rebuild the image.
PR summary from AI
This pull request refactors how the on-call URL is exposed to the client-side in the UI application. It moves away from
relying on NEXT_PUBLIC environment variables, which are baked in at build time, to a more robust client-side configuration mechanism using Next.js's publicRuntimeConfig. This ensures the onCallUrl is consistently available for client-side usage without being tied to the build process.
Can I test it out?
Yes, I've done it.
Setup: extend-config.js
The following change allows the athenz-ui to read the ENV value ON_CALL_URL:
apiVersion: v1
data:
extended-config.js: |
'use strict';
const config = {
authProxy: {
onCallUrl: process.env.ON_CALL_URL, << HERE
timeZone: 'Asia/Tokyo',
Setup: ON_CALL_URL env for the athenz-ui deployment
And pass the env ON_CALL_URL inside the k8s deployment of the athenz-ui pod:
- name: NODE_EXTRA_CA_CERTS
value: /etc/ssl/certs/ca-certificates.crt
- name: ON_CALL_URL
value: https://www.google.com < HERE
image: ghcr.io/mlajkim/athenz-ui:latest
Verify: On Call URL
- Choose any athenz domain in UI (Sample URL)
- Click
More Details - Click
addon theOn Callsection (The name of the team will be displayed if you already have set up) - Set
whatever team memberyou want to test as - Click the link; You will be redirected to
{ON_CALL_URL}/{whatever team member}
PR: ui - switch from zms to msd for policy creation by @ArtjomsPorss in #3034
https://github.com/AthenZ/athenz/pull/3034
Glossary
-
MSD: Or Micro Segmentation Daemon, is basically an API server that sets up IP policies in the ZMS Server to achieve micro-segmentation. Note that IPs change all the time in a cloud environment, so something has to handle these changes continuously.
PR summary in my own words
The UI used to handle Micro-Segmentation business logic by communicating directly with ZMS. Now, the MSD handles this for the UI instead. The UI simply calls the API, and it's done.
PR summary from AI
This pull request refactors the microsegmentation policy management workflow within the UI. It transitions the backend interaction from orchestrating multiple calls to the ZMS to directly communicating with the Microsegmentation Daemon (MSD). By utilizing the createOrUpdateTransportPolicy API, this change significantly simplifies client-side logic by removing redundant ZMS calls and delegates policy validation and IP mapping directly to the MSD service.
Can I test it out?
No. MSD is not yet open-sourced.
So only Yahoo Inc. can test this. However, I learned that I should do the following:
featureFlag: true,
pageFeatureFlag: {
microsegmentation: {
policyValidation: true,
},
},
This enables the MS feature on the UI. But since the PR is just a refactor, the behavior is expected to remain the same.
PR: feat: Add functionality to search My Domains in UI by @chandrasekhar1996 in #3058
https://github.com/AthenZ/athenz/pull/3058
PR Summary in my own words
A Athenz-UI-side filter (search) feature for the domain list.
PR Summary from AI
This pull request introduces a new search capability for the 'My Domains' section in the UI. It allows users to efficiently filter their list of domains using a search bar, improving navigation and usability for large domain sets. The implementation includes client-side filtering with intelligent result prioritization and robust input handling.
Can I test it out?
Yes, you can directly search on UI:
PR: fix: preserve domain contacts when updating an individual contact without page refresh by @chandrasekhar1996 in #3083
https://github.com/AthenZ/athenz/pull/3083
PR Summary in my own words
It fixes a bug that required users to refresh the page to see updated contact info.
PR Summary from AI
This pull request implements a crucial fix to prevent the loss of domain contact information when an individual contact (like a product owner or security owner) is updated without a full page refresh. By introducing a new helper method, the system now intelligently merges existing contacts with the newly updated ones, ensuring that all contact types are preserved during the update process. This enhances data integrity and provides a smoother user experience.
Can I test it out?
I was not able to test it out right away and got the following status:
- ✅ Was able to add the contact
- ✅ Was able to see the contact in the DB
- ❌ Failed to show the added contact in the UI
- ❌ Failed to reproduce the issue
Setup: ZMS properties domain_contact_types
The UI defaults to these two and sends these types based on that, so we need to let ZMS know about them too:
zms.properties: |
athenz.zms.domain_contact_types=product-owner,security-owner
This will fix the following error:
Setup: UI users_data.json
Add hard-coded users to the UI for testing:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: athenz-ui-users-cm
namespace: athenz
data:
users_data.json: |
[
{
"is_human": 1,
"login": "__admin__",
"gecos": "__fullname__",
"enabled_status": 1
},
{
"is_human": 1,
"login": "jekim",
"gecos": "Jeongwoo Kim",
"enabled_status": 1
},
{
"is_human": 1,
"login": "athenz_admin",
"gecos": "Athenz Admin",
"enabled_status": 1
},
{
"is_human": 1,
"login": "test",
"gecos": "Test User",
"enabled_status": 1
}
]
EOF
Setup: Patch UI deployment to read the users_data.json
kubectl patch deployment athenz-ui -n athenz --type='json' -p='[
{
"op": "add",
"path": "/spec/template/spec/volumes/-",
"value": {
"name": "users-config",
"configMap": {
"name": "athenz-ui-users-cm"
}
}
},
{
"op": "add",
"path": "/spec/template/spec/containers/0/volumeMounts/-",
"value": {
"name": "users-config",
"mountPath": "/home/athenz/src/config/users_data.json",
"subPath": "users_data.json"
}
}
]'
Verify: UI Behavior
[!WARNING]
I was not able to reproduce the issue yet, but I was able to add the contact.
Verify: UI Network
Confirm that an API call was made to /api/v1/domain;domain=user?returnMeta=true containing the contacts field:
{
"data": {
"enabled": true,
"auditEnabled": false,
"ypmId": 0,
"contacts": {
"product-owner": "user.athenz_admin",
"security-owner": "user.jekim"
},
"autoDeleteTenantAssumeRoleAssertions": false,
"name": "user",
"modified": "2026-02-09T23:15:19.456Z",
"id": "75911440-052f-11f1-b5de-8cd6cbf5b517"
},
"meta": {}
}
Verify: DB Table
mariadb
SHOW DATABASES;
USE zms_server;
SHOW TABLES;
SELECT * FROM domain_contacts
WHERE domain_id = (
SELECT domain_id
FROM domain
WHERE name = 'user'
);
# +-----------+----------------+-------------------+
# | domain_id | type | name |
# +-----------+----------------+-------------------+
# | 1 | product-owner | user.athenz_admin |
# | 1 | security-owner | user.jekim |
# +-----------+----------------+-------------------+
PR: Use correct URL path and query param for athenz role. #3089
https://github.com/AthenZ/athenz/pull/3089
PR Summary from AI
Fixes the documentation incorrectness
PR Summary in my own words
Uses Athenz user role members as a SSOT to control who has access to specific Azure resources,
without relying on Azure Console's user management. It simply fixes an inaccuracy in the documentation:
// principal authorized to request a given scope in the credentials).
resource ExternalCredentialsResponse POST "/external/{provider}/domain/{domainName}/creds" {
SimpleName provider; //provider name to request credentials from
https://github.com/AthenZ/athenz/blob/master/core/zts/src/main/rdl/ExternalCredentials.rdli#L22
And NOT athenzRole but:
public static final String ZTS_EXTERNAL_ATTR_ROLE_NAME = "athenzRoleName";
Can I test it out?
Not efficient as I do not have Azure subscription.
What I learned
I have heard that Yahoo Inc uses the Athenz as a SSOT for login for other PFs but learned that it includes for Azure too, implemented in Jonmv/assume azure services #2634
sequenceDiagram
autonumber
actor User as Athenz Client (User/Service)
participant ZTS as Athenz ZTS Server
participant AzureARM as Azure Resource Manager
participant AzureAD as Azure AD (Entra ID)
Note over User, ZTS: 1. Request Token
User->>ZTS: POST /external/azure/.../creds<br/>{Role: "azure-log-reader",<br/>IdentityName: "log-reader",<br/>ResourceGroup: "system"}
Note over ZTS, AzureARM: 2. Lookup Identity Info (Name -> Client ID)
ZTS->>AzureARM: Query Client ID for "log-reader" in Resource Group<br/>(Using ZTS's own Azure Identity permissions)
AzureARM-->>ZTS: Returns Client ID (UUID)
Note over ZTS, AzureAD: 3. Token Exchange (Federation)
ZTS->>ZTS: Generate ID Token<br/>(iss: ZTS, sub: "coretech:role.azure-log-reader")
ZTS->>AzureAD: Submit ID Token & Request Access Token<br/>(aud: api://AzureADTokenExchange)
AzureAD->>AzureAD: Validate Federated Credential<br/>(Check Issuer & Subject match)
AzureAD-->>ZTS: Issues Azure Access Token
Note over ZTS, User: 4. Final Response
ZTS-->>User: Returns Azure Access Token
classDiagram
direction LR
class AthenzDomain {
Name: coretech
AzureSubscription: (ID)
AzureTenant: (ID)
}
class AthenzRole {
Name: azure-log-reader
FullARN: coretech:role.azure-log-reader
Members: (Users/Services)
}
class AzureManagedIdentity {
Name: log-reader
ResourceGroup: system
ClientID: (UUID)
}
class FederatedCredential {
Issuer: <ZTS API URL>
Subject: coretech:role.azure-log-reader
Audience: api://AzureADTokenExchange
}
%% Relationship Definitions
AthenzDomain "1" *-- "many" AthenzRole : contains
AzureManagedIdentity "1" *-- "1" FederatedCredential : contains config
AthenzRole .. FederatedCredential : 1. Mapping (Subject Match)
FederatedCredential .. AzureManagedIdentity : 2. Grant Access (Allows usage of this ID)
note for FederatedCredential "Key Connector:\nAzure grants access based on this credential\nwhen a specific Athenz Role requests it."
PR: use metadata to specify use of default identity #3084
https://github.com/AthenZ/athenz/pull/3084
Prerequisites
- GCP Service Account Name must be at least 6 characters long
- Yahoo Inc has many services that are shorter than 6 characters, like
zts,zms,msdalready - GCP instances allow you to specify metadata of any key-value pairs
PR Summary in my own words
Allows GCP users to set a new metadata key defaultServiceIdentity in their GCP instances. This maps to an Athenz service name that differs from the native GCP service account name, allowing them to use their short Athenz service names (like zts or zms, shorter than 6 characters).
PR Summary from AI
This pull request introduces a mechanism to use the instance's default identity for a service by specifying a defaultServiceIdentity metadata attribute. This is useful when the desired service name doesn't match the GCP service account name. The changes in attestation.go implement this logic, and new tests are added in attestation_test.go to cover the new functionality. My review focuses on improving the robustness of the new logic and correcting issues in the new tests. I've suggested handling a potential edge case with empty service names and improving error visibility in the main logic. For the tests, I've pointed out a logic bug that prevents a test from failing correctly and recommended using test-specific logging for cleaner output. Overall, the changes are good and the tests are comprehensive, but these adjustments will improve the quality and maintainability.
Can I test it out?: Yes, but skipped
Probably yes, but I would need to:
- Create a GCP instance
- Setup an Athenz service that is shorter than 6 characters
- Setup GCP instance metadata to use the short service name with the new key
defaultServiceIdentity
Which I don't think is easy to do by myself. Since I understand the concept, I think it's fine to skip it.
Future Potentials
If we ever face a situation where we need to use a short service name for any other 3rd party systems, we can consider to have a alias stored inside the instance metadata.
PR: Make ZpeUpdPolLoader ScheduledExecutorService thread daemon #3086
https://github.com/AthenZ/athenz/pull/3086
Prerequisites
1
In Java, there are two types of threads:
-
Non-Daemon Thread: A thread that runs in the foreground and is essential to the application's execution. It is like a head chef in a restaurant. (Which also implies you usually do not shutdown your restraunt when your head chef is still cooking (running)) -
Daemon Thread: A thread that runs in the background and is not essential to the application's execution. It is like a background music in a restaurant. (Which also implies you can shutdown your restraunt even when the background music is still playing)
Platform threads are designated daemon or non-daemon threads.
— Class Thread - Oracle Docs
2
Starting of Java 21 (Sep 2023~), there are two types of threads:
-
Platform threads: The threads that we are familiar with, which are mapped to OS threads. -
Virtual threads: Lightweight threads introduced in Java 21 that are managed by the JVM and are not mapped to OS threads. Can run millions of them.
3
[!NOTE]
Please note thatDaemon Threaddoes not initiate shutdown sequence by itself.
The JVM initiates the shutdown sequence in response to one of several events:
- when the number of live non-daemon threads drops to zero for the first time
- when the Runtime.exit or System.exit method is called for the first time
- when some external event occurs, such as an interrupt or a signal is received from the operating system
Then it does for any daemon or non-daemon threads:
the registered shutdown hooks are started in some unspecified order.
Finally:
The shutdown sequence finishes when all shutdown hooks have terminated.
And note that:
one or more shutdown hooks do not terminate, for example, because of an infinite loop. In this case, the shutdown sequence will never finish.
4
Since Java's shutdown sequence can be initiated with external requests as well, it is important to make your threads to have a safe shutdown method like close() or shutdown() that will be running in the registered shutdown hooks stage, to prevent:
- dead lock for your relying DB services
- resource leak
- etc
5
Relying users uses AuthZpeClient.allowAccess() to determine if certain requests are allowed or not. AuthZpeClient uses ZpeUpdPolLoader to load policies from ZMS. ZpeUpdPolLoader load policies by creating ScheduledExecutorService thread behind the scene (without letting the relying users know it) to load policies from ZMS periodically. ZpeUpdPolLoader loads policy over asking every time is for ZMS's over load protection & distributed access check despite of ZMS's availability.
PR Summary from AI
This pull request addresses a resource management issue where the ScheduledExecutorService in ZpeUpdPolLoader could prevent graceful application shutdown by keeping non-daemon threads alive. The change modifies the executor's thread creation to mark them as daemon threads and provides them with explicit names, ensuring they do not block the JVM exit and improving troubleshooting capabilities.
PR Summary in my own words
Make ScheduledExecutorService non-daemon thread => daemon thread so that it does not require to close() thread that is initiated by the library - which developers of the library do not know it exists.
Can I test it out?: Yes
[!TIP]
Does not require Athenz server to be up & running.
Setup: Namespace athenz
kubectl create ns athenz
Setup: Java code as a CM
Create a configmap with athenz zpe java client & its code that uses ZpeUpdPolLoader:
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: athenz-thread-test-code
namespace: athenz
data:
pom.xml: |
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>demo</groupId>
<artifactId>repro</artifactId>
<version>1.0</version>
<properties>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>com.yahoo.athenz</groupId>
<artifactId>athenz-zpe-java-client</artifactId>
<version>${athenz.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<mainClass>demo.Repro</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
Repro.java: |
package demo;
import com.yahoo.athenz.zpe.ZpeUpdater;
import java.nio.file.Files;
import java.nio.file.Path;
public class Repro {
public static void main(String[] args) throws Exception {
// Dummy environment settings to prevent errors during ZPE initialization
System.setProperty("athenz.zpe.skip_policy_dir_check", "true");
Path tmp = Files.createTempDirectory("athenz-zpe-pol");
System.setProperty("athenz.zpe.policy_dir", tmp.toString());
System.out.println("[INFO] Initializing ZpeUpdater...");
// This line creates a scheduler thread in the background.
ZpeUpdater zpe = new ZpeUpdater();
Thread.sleep(1000); // Wait for thread creation
Thread.getAllStackTraces().keySet().stream()
.filter(t -> t.isAlive() && !t.isDaemon())
.forEach(t -> System.out.println(" - " + t.getName()));
System.out.println("[INFO] main() finishes here. No explicit close() called.");
}
}
EOF
Verify: v1.12.26 vs v1.12.27
You can see that the pod with Athenz v1.12.27+ shutdowns and completes successfully, while the pod with Athenz v1.12.26+ hangs there keep in Running state:
To verify it yourself, you can create a pod that runs the java code with athenz-zpe-java-client version 1.12.26:
kubectl run test-bug-1-12-26 --image=maven:3.9-eclipse-temurin-17 \
--namespace=athenz \
--restart=Never \
--overrides='{
"spec": {
"containers": [{
"name": "test",
"image": "maven:3.9-eclipse-temurin-17",
"command": ["/bin/sh", "-c"],
"args": ["mkdir -p /app/src/main/java/demo && cp /config/pom.xml /app/ && cp /config/Repro.java /app/src/main/java/demo/ && cd /app && mvn -q compile exec:java -Dathenz.version=1.12.26"],
"volumeMounts": [{"name": "code-vol", "mountPath": "/config"}]
}],
"volumes": [{"name": "code-vol", "configMap": {"name": "athenz-thread-test-code"}}]
}
}'
Then compare with the pod that runs the java code with athenz-zpe-java-client version 1.12.27:
kubectl run test-fix-1-12-27 --image=maven:3.9-eclipse-temurin-17 \
--namespace=athenz \
--restart=Never \
--overrides='{
"spec": {
"containers": [{
"name": "test",
"image": "maven:3.9-eclipse-temurin-17",
"command": ["/bin/sh", "-c"],
"args": ["mkdir -p /app/src/main/java/demo && cp /config/pom.xml /app/ && cp /config/Repro.java /app/src/main/java/demo/ && cd /app && mvn -q compile exec:java -Dathenz.version=1.12.27"],
"volumeMounts": [{"name": "code-vol", "mountPath": "/config"}]
}],
"volumes": [{"name": "code-vol", "configMap": {"name": "athenz-thread-test-code"}}]
}
}'
Future Potentials
Not for this specific improvement, but I might be able to spot similar thread-related problems in the future.
PR: make otel metric options more configurable #3090
https://github.com/AthenZ/athenz/pull/3090
Prerequisites
1
Otel offers three metrics:
- counter: total amount of requests + total number of errors etc
- gauge: memory usage, current live users
- histogram: latency (heavy with its nature)
Unlike logs, Otel stores its metrics in memory with unique label combination, for example:
| Label (Time Series) | Value |
|---|---|
| athenz_api_request_duration_seconds{api="getDomain", domain="sys.auth", method="GET", status="200"} | 0.045 |
| athenz_api_request_duration_seconds{api="getDomain", domain="my.custom.domain", method="GET", status="200"} | 0.051 |
The problem of this label above is that production level can have many domains (some internal case has 200,000 domains), and the otel will suffer from OOM due this massive number of combination. We call it as "Cardinality Explosion".
2
What is Cardinality?
Cardinality refers to the number of unique combinations of labels (tags) attached to a metric. In time-series databases (like Prometheus or Datadog), every unique combination creates a brand new, separate data stream called a Time Series.
3
What is then "Cardinality Explosion"?
The number of time series grows exponentially by multiplying the number of possible values for each label.
-
Safe:
Method(GET, POST = 2) ×Status(200, 400, 500 = 3) 👉 Total 6 time series. (Easy to manage) -
💥 Explosion: Add
Domain(10,000 user domains) 👉 2 × 3 × 10,000 = Total 60,000 time series!
Why is it dangerous?
The monitoring server has to keep a separate "memory bucket" active for every single time series. A sudden explosion in cardinality will cause:
- OOM (Out of Memory) Crashes: Your monitoring server (e.g., Prometheus) runs out of RAM and dies.
- Slow Queries: Grafana dashboards will freeze or time out while trying to load the massive number of series.
- Massive Bills: Cloud monitoring tools like Datadog charge by the number of custom metrics/series. It can lead to a huge unexpected bill.
The Golden Rule: Never use labels with unbounded or highly variable values (like User IDs, IP addresses, or highly diverse Domain names) in your metrics.
PR Summary in my own words
[!TIP]
Histogram data (latency) is heavy and expensive. If you enable the separate option, skipping the domain-specific data is highly recommended to save resources. That is why the default forskip_domain_histogramistrue.
For Histogram (athenz.otel_separate_domain_histogram_metrics):
-
false(default): Creates a single "fat" metric containing all labels, including domain names. (High risk of cardinality explosion) -
true: Separates the metric into 3 distinct metrics (Core API metric,_requestDomain, and_principalDomain).-
athenz.otel_skip_domain_histogram_metrics: -
true(default): Skips recording the two domain-specific metrics. Only the core metric (withapi,method,status) is recorded. (Maximum cost efficiency, but no domain-level latency tracking) -
false: Records all 3 separated metrics. (Moderate cost efficiency, retains domain-level latency tracking)
-
For Counter (athenz.otel_separate_domain_counter_metrics):
-
false(default): Creates a single "fat" metric containing all labels, including domain names. -
true: Separates the metric into 3 distinct metrics (Core API metric,_requestDomain, and_principalDomain).-
athenz.otel_skip_domain_counter_metrics: -
false(default): Records all 3 separated metrics. (Reduces core cardinality while still tracking domain-level request counts) -
true: Skips recording the two domain-specific metrics. Only the core metric is recorded. (Maximum cost efficiency, but no domain-level count tracking)
-
PR Summary from AI
This pull request enhances the configurability of OpenTelemetry metrics by separating the settings for histogram and counter metrics, and adding options to skip domain-specific metrics to control cardinality. The changes are logical and well-implemented, with corresponding updates to tests.
A key point to consider is that renaming the configuration property athenz.otel_separate_domain_metrics to athenz.otel_separate_domain_histogram_metrics constitutes a breaking change for users who might have the old property configured. It would be beneficial to update the pull request description to acknowledge this.
Breaking Changes?: Yes
Users using athenz.otel_separate_domain_metrics will need to update their configuration to athenz.otel_separate_domain_histogram_metrics.
Can I test it out?: Yes, but skipped
It seems like Athenz by defaut has the library, so all I have to do is to apply the config change to the cluster, and run something like:
curl localhost:8080/metrics | grep athenz_api_request_duration_seconds
But skipped for now.
Future Potentials
Once we deploy the otel support, this knowledge will be very benefitial.
PR: expose openid_issuer field for access tokens in zts java client #3091
https://github.com/AthenZ/athenz/pull/3091
Prerequisites
AWS EKS can be configured to use Athenz as OIDC Authentication Provider to authorize access to configured EKS clusters
https://athenz.github.io/athenz/oidc_aws_eks/
ZTS has openid_issuer field for AccessToken Request if you want the AccessToken with issuer field for openid
PR Summary in my own words
ZTS Java Client has not supported any parameters to request for openid_issuer=true agasint ZTS. This PR allows you to do so so that your Java application may connect to Athenz protected AWS EKS.
Also they introduced AccessTokenRequestBuilder so that can set default behavior and never pass it as a parameter.
PR Summary from AI
This pull request successfully exposes the openid_issuer field for access tokens in the ZTS Java client. The introduction of the AccessTokenRequestBuilder is a great design choice to avoid further overloading the getAccessToken method, improving the client's usability and maintainability. The changes are well-implemented, and the tests have been updated accordingly to cover the new functionality, including a new comprehensive test for PrefetchTokenScheduledItem. My review includes one suggestion to improve the maintainability of a newly added test method by refactoring it into smaller, more focused tests.
Breaking Changes?: No
No, just a feature that had been missing; not a bug but more of missing core feature.
Can I test it out?: Yes, but skipped
I think I could test getting the AccessToken with openid_issuer=true withmy own Java Code to test it, but I skipped it for now.
Future Potentials
If I ever work with Athenz <=> AWS EKS, I might be able to use it.
PR: Add FreeBSD support to libs/go/sia/util #3093
https://github.com/AthenZ/athenz/pull/3093
Glossary
What is FreeBSD?
An OS great for networking, a direct descendant of Unix
└── unix
├── BSD (by UC Berkly in 1970s)
│ └── MacOS
│ └── FreeBSD
│ └── Netflix
│ └── Uber
├── Linux (Only Influenced Though)
What is Vendor Build?
-
Normal Build: Every build fetches dependencies from the internet (versions may differ). -
Vendor Build: Allows you to build using local ingredients. It's always the same because the local ingredients never change, making it much safer and guaranteed to work.
Prerequisites
OSS SIA allows you to vendor-build for:
- darwin
- linux
- windows
but has not yet supported freebsd.
PR Summary in my own words
It now supports vendor builds for freebsd as well, using libs/go/sia/util/os_util_freebsd.go
PR Summary from AI
This pull request adds support for FreeBSD by introducing os_util_freebsd.go, which is a good step towards broader platform compatibility.
Breaking Changes?: No
Enhancement; you can now vendor-build on FreeBSD OS too.
Can I test it out?: Yes, but skipped
I need to prepare FreeBSD & Try to vendor-build. Skipped for now.
Future Potentials
If we want to vendor-build somewhere else, I think I can work on it with it.
PR: expose x509/ssh key id for instance register/refresh operations #3092
https://github.com/AthenZ/athenz/pull/3092
Prerequisites
[!NOTE]
The same asDescriptionin https://github.com/AthenZ/athenz/pull/3092
You can fetch X.509 cert with custom CA for specific:
- Athenz domain
- Athenz service
This however requires system admin to set it.
Also note that Vespa used Yahoo Inc.'s Athenz domain and wanted to use their custom CA, so they allowed Vespa-owned domains to have custom CAs. However, adding custom CAs for a specific domain/service can only be done by system admins.
PR Summary in my own words
This PR allows a service admin (not a system admin) to test a custom or default CA. They no longer have to ask system admins to modify CA settings just for testing purposes, allowing them to try different CAs without changing the custom CA setting for the domain/service permanently.
Non-system admins can test it by inclduing new field:
x509CertSignerKeyIdsshCertSignerKeyId
It also allows a domain using a custom CA to connect to services that only accept the default CA. It does this by letting them request a cert signed by the default CA, potentially allowing a single service to operate with both a default and custom CA.
PR Summary from AI
This pull request successfully exposes x509/ssh key IDs for instance register/refresh and role certificate operations, enhancing testing capabilities for service admins. The changes across the Go and Java codebases are mostly well-implemented and consistent. I've identified a few opportunities for improvement, primarily around reducing code duplication in both Go validation logic and the new Java tests. Additionally, I've found a minor bug in one of the new tests that should be addressed.
Breaking Changes?: No
More like enhancement: allow you to submit extra field for InstanceRefreshInformation or InstanceRegisterInformation
Can I test it out?: Yes, but skipped
I think I can register a CA for a specific athenz service, and try to generate X.509 cert signed by the specific CA, but skipped for now.
Future Potentials
If one of our services spinned out, maybe yes.
PR: fix util test os filenames + new GetGroupGID impl #3094
https://github.com/AthenZ/athenz/pull/3094
Prerequisites
SIA wants to ensure that the outputted cert files have the correct GroupID. SIA builders can specify a group name (not a GroupID; defaults to athenz if unset) to allow members of that group to read the cert file. SIA reads the /etc/group file to find the GID matching the provided group name, ensuring the outputted cert file is assigned the correct GroupID.
PR Summary in my own words
Uses the standard Go package os.Read() to parse /etc/group data instead of relying on the grep command. This makes it more robust and cross-platform compatible across different OSes.
PR Summary from AI
This pull request significantly refactors the group ID retrieval mechanism within the utility package by introducing a new GetGroupGID function that directly reads and parses the /etc/group file. This change moves away from executing external grep commands, enhancing the robustness and cross-platform compatibility of the group lookup logic. Additionally, the PR includes important fixes to OS-specific test file names and expands the test suite for the updated group ID retrieval functionality, ensuring comprehensive validation of the new implementation.
Breaking Changes?: No
It simply makes it better reading the /etc/group.
Can I test it out?: Yes, but skipped
I think I can create an instance with grep removed and try to build the sia package, which is expected to fail.
And then this enhanceement will allow me to build it because it no longer uses the grep.
Future Potentials
A good reminder that using standard Go libraries is much better than depending on external OS packages.
External packages might be:
- deleted
- modified
- or differ between OSes
It's dangerous to rely on them for important data outputs, so internal handling is always safer.
PR: update go and java dependencies to their latest releases #3095
https://github.com/AthenZ/athenz/pull/3095
Prerequisites
Athenz used to handle validation internally because its dependency package didn't enforce email validation in libs/go/athenzutils/principal.go:
// athenz always verifies that we include a valid
// email in the certificate
idx := strings.Index(emails[0], "@")
if idx == -1 {
return "", fmt.Errorf("certificate email is invalid: %s", emails[0])
}
PR Summary in my own words
A periodic package upgrade. One of the upgrades includes net/mail's stricter parsing of email addresses, meaning we no longer need to maintain internal logic to test email validation.
While reading this PR, I realized that the internal email check might no longer be needed at all. I wasn't completely sure, so I submitted a PR to get feedback: https://github.com/AthenZ/athenz/pull/3226
PR Summary from AI
This pull request updates various Go and Java dependencies to their latest versions. The changes primarily consist of version bumps in go.mod, go.sum, and pom.xml. A key consequence of these updates is the removal of Go test cases that relied on a certificate with an invalid email format. As noted in the description, stricter parsing in newer libraries now causes these tests to fail during certificate loading, making this a reasonable and well-documented adjustment. The dependency updates are beneficial for project maintenance and security. The changes appear correct and well-justified.
Breaking Changes?: No
No breaking change, the same email invalid checking logic, simply done by different entity.
Can I test it out?: Yes, but skipped
I think I can prepare one certificate with invalid email address in it, and send it to Athenz server, but skipped for now.
Future Potential
If we ever upgrade the package to 1.25.2+ from other package, the stricker checking mechanism may bring breaking changes
PR: allow wildcard in first domain component of StaticWorkloadName #3096
https://github.com/AthenZ/athenz/pull/3096
Glossary
-
StaticWorkload: The opposite of aServiceIdentity(e.g.,IP Address,FQDN,VIP,LB,NAT). These are resources you don't name using an Athenz Service.
Prerequisites
Before this PR, if you want to include StaticWorkloadFQDN for the following domains, you had to add them all individually:
- keep.google.com
- docs.google.com
- drive.google.com
- mail.google.com
- and more
PR Summary in my own words
This PR allows you to use * (wildcard) for the static workload's name like *.google.com to include subdomains.
Note that for stricter checking, it only allows you to use the * only once and it must be at the front of the domain name.
You can see as a sample from test to see what evaluates to true (allowed) or false (not allowed).
It also silently added a CLAUDE.md for AIs
PR Summary from AI
This pull request introduces the ability to use a wildcard character (*) in the first domain component of a StaticWorkloadName. This change is implemented across the RDL definition, Java schema, and Go client schema for the MSD (Microservice Daemon) component. Additionally, a new CLAUDE.md file has been added to provide guidance for AI assistants working with the repository.
Breaking Changes?: No
Nope.
Can I test it out?: No
MSD is not yet open sourced
Future Potential
If we ever use the MSD, I think I can think of the * usage for the StaticWorkloadFQDN
What's Next?
Reading Athenz Note definitely helped me out understanding the recent changes. I will continue doing this for later releases as well.
Closing
If you enjoyed this deep dive, please leave a like & subscribe for more!







Top comments (0)