Jeongwoo Kim

Posted on Mar 6 • Edited on Apr 9

Read Athenz Patch Note v1.12.27

#athenz

Goal

[!TIP]
In hurry? Jump directly to Result section to see the outcome of this dive.

The goal of this dive is to read and understand the patch notes for Athenz v1.12.27.

This pull request refactors how the on-call URL is exposed to the client-side in the UI application. It moves away from
relying on NEXT_PUBLIC environment variables, which are baked in at build time, to a more robust client-side configuration mechanism using Next.js's publicRuntimeConfig. This ensures the onCallUrl is consistently available for client-side usage without being tied to the build process.

Can I test it out?

Yes, I've done it.

Setup: `extend-config.js`

The following change allows the athenz-ui to read the ENV value ON_CALL_URL:

apiVersion: v1
data:
  extended-config.js: |
    'use strict';

    const config = {
        authProxy: {
            onCallUrl: process.env.ON_CALL_URL, << HERE
            timeZone: 'Asia/Tokyo',

Setup: `ON_CALL_URL` env for the `athenz-ui` deployment

And pass the env ON_CALL_URL inside the k8s deployment of the athenz-ui pod:

        - name: NODE_EXTRA_CA_CERTS
          value: /etc/ssl/certs/ca-certificates.crt
        - name: ON_CALL_URL
          value: https://www.google.com < HERE
        image: ghcr.io/mlajkim/athenz-ui:latest

Verify: On Call URL

Choose any athenz domain in UI (Sample URL)
Click More Details
Click add on the On Call section (The name of the team will be displayed if you already have set up)
Set whatever team member you want to test as
Click the link; You will be redirected to {ON_CALL_URL}/{whatever team member}

PR: ui - switch from zms to msd for policy creation by @ArtjomsPorss in #3034

https://github.com/AthenZ/athenz/pull/3034

Glossary

MSD: Or Micro Segmentation Daemon, is basically an API server that sets up IP policies in the ZMS Server to achieve micro-segmentation. Note that IPs change all the time in a cloud environment, so something has to handle these changes continuously.

PR summary in my own words

The UI used to handle Micro-Segmentation business logic by communicating directly with ZMS. Now, the MSD handles this for the UI instead. The UI simply calls the API, and it's done.

PR summary from AI

This pull request refactors the microsegmentation policy management workflow within the UI. It transitions the backend interaction from orchestrating multiple calls to the ZMS to directly communicating with the Microsegmentation Daemon (MSD). By utilizing the createOrUpdateTransportPolicy API, this change significantly simplifies client-side logic by removing redundant ZMS calls and delegates policy validation and IP mapping directly to the MSD service.

Can I test it out?

No. MSD is not yet open-sourced.
So only Yahoo Inc. can test this. However, I learned that I should do the following:

  featureFlag: true,
  pageFeatureFlag: {
      microsegmentation: {
          policyValidation: true,
      },
  },

This enables the MS feature on the UI. But since the PR is just a refactor, the behavior is expected to remain the same.

PR: feat: Add functionality to search My Domains in UI by @chandrasekhar1996 in #3058

https://github.com/AthenZ/athenz/pull/3058

PR Summary in my own words

A Athenz-UI-side filter (search) feature for the domain list.

PR Summary from AI

This pull request introduces a new search capability for the 'My Domains' section in the UI. It allows users to efficiently filter their list of domains using a search bar, improving navigation and usability for large domain sets. The implementation includes client-side filtering with intelligent result prioritization and robust input handling.

Can I test it out?

Yes, you can directly search on UI:

PR: fix: preserve domain contacts when updating an individual contact without page refresh by @chandrasekhar1996 in #3083

https://github.com/AthenZ/athenz/pull/3083

PR Summary in my own words

It fixes a bug that required users to refresh the page to see updated contact info.

PR Summary from AI

This pull request implements a crucial fix to prevent the loss of domain contact information when an individual contact (like a product owner or security owner) is updated without a full page refresh. By introducing a new helper method, the system now intelligently merges existing contacts with the newly updated ones, ensuring that all contact types are preserved during the update process. This enhances data integrity and provides a smoother user experience.

Can I test it out?

I was not able to test it out right away and got the following status:

✅ Was able to add the contact
✅ Was able to see the contact in the DB
❌ Failed to show the added contact in the UI
❌ Failed to reproduce the issue

Setup: ZMS properties `domain_contact_types`

The UI defaults to these two and sends these types based on that, so we need to let ZMS know about them too:

zms.properties: |
    athenz.zms.domain_contact_types=product-owner,security-owner

This will fix the following error:

Setup: UI users_data.json

Add hard-coded users to the UI for testing:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: athenz-ui-users-cm
  namespace: athenz
data:
  users_data.json: |
    [
      {
        "is_human": 1,
        "login": "__admin__",
        "gecos": "__fullname__",
        "enabled_status": 1
      },
      {
        "is_human": 1,
        "login": "jekim",
        "gecos": "Jeongwoo Kim",
        "enabled_status": 1
      },
      {
        "is_human": 1,
        "login": "athenz_admin",
        "gecos": "Athenz Admin",
        "enabled_status": 1
      },
      {
        "is_human": 1,
        "login": "test",
        "gecos": "Test User",
        "enabled_status": 1
      }
    ]
EOF

Setup: Patch UI deployment to read the users_data.json

kubectl patch deployment athenz-ui -n athenz --type='json' -p='[
  {
    "op": "add",
    "path": "/spec/template/spec/volumes/-",
    "value": {
      "name": "users-config",
      "configMap": {
        "name": "athenz-ui-users-cm"
      }
    }
  },
  {
    "op": "add",
    "path": "/spec/template/spec/containers/0/volumeMounts/-",
    "value": {
      "name": "users-config",
      "mountPath": "/home/athenz/src/config/users_data.json",
      "subPath": "users_data.json"
    }
  }
]'

Verify: UI Behavior

[!WARNING]
I was not able to reproduce the issue yet, but I was able to add the contact.

Verify: UI Network

Confirm that an API call was made to /api/v1/domain;domain=user?returnMeta=true containing the contacts field:

{
  "data": {
    "enabled": true,
    "auditEnabled": false,
    "ypmId": 0,
    "contacts": {
      "product-owner": "user.athenz_admin",
      "security-owner": "user.jekim"
    },
    "autoDeleteTenantAssumeRoleAssertions": false,
    "name": "user",
    "modified": "2026-02-09T23:15:19.456Z",
    "id": "75911440-052f-11f1-b5de-8cd6cbf5b517"
  },
  "meta": {}
}

Verify: DB Table

mariadb

SHOW DATABASES;
USE zms_server;

SHOW TABLES;
SELECT * FROM domain_contacts
WHERE domain_id = (
    SELECT domain_id
    FROM domain
    WHERE name = 'user'
);

# +-----------+----------------+-------------------+
# | domain_id | type           | name              |
# +-----------+----------------+-------------------+
# |         1 | product-owner  | user.athenz_admin |
# |         1 | security-owner | user.jekim        |
# +-----------+----------------+-------------------+

PR: Use correct URL path and query param for athenz role. #3089

https://github.com/AthenZ/athenz/pull/3089

PR Summary from AI

Fixes the documentation incorrectness

PR Summary in my own words

Uses Athenz user role members as a SSOT to control who has access to specific Azure resources,
without relying on Azure Console's user management. It simply fixes an inaccuracy in the documentation:

// principal authorized to request a given scope in the credentials).
resource ExternalCredentialsResponse POST "/external/{provider}/domain/{domainName}/creds" {
    SimpleName provider; //provider name to request credentials from

https://github.com/AthenZ/athenz/blob/master/core/zts/src/main/rdl/ExternalCredentials.rdli#L22

And NOT athenzRole but:

public static final String ZTS_EXTERNAL_ATTR_ROLE_NAME     = "athenzRoleName";

https://github.com/AthenZ/athenz/blob/4af98e4a2a338c638c44a30f0322e6ee328c953e/servers/zts/src/main/java/com/yahoo/athenz/zts/ZTSConsts.java#L248

Can I test it out?

Not efficient as I do not have Azure subscription.

What I learned

I have heard that Yahoo Inc uses the Athenz as a SSOT for login for other PFs but learned that it includes for Azure too, implemented in Jonmv/assume azure services #2634

sequenceDiagram
    autonumber
    actor User as Athenz Client (User/Service)
    participant ZTS as Athenz ZTS Server
    participant AzureARM as Azure Resource Manager
    participant AzureAD as Azure AD (Entra ID)

    Note over User, ZTS: 1. Request Token
    User->>ZTS: POST /external/azure/.../creds<br/>{Role: "azure-log-reader",<br/>IdentityName: "log-reader",<br/>ResourceGroup: "system"}

    Note over ZTS, AzureARM: 2. Lookup Identity Info (Name -> Client ID)
    ZTS->>AzureARM: Query Client ID for "log-reader" in Resource Group<br/>(Using ZTS's own Azure Identity permissions)
    AzureARM-->>ZTS: Returns Client ID (UUID)

    Note over ZTS, AzureAD: 3. Token Exchange (Federation)
    ZTS->>ZTS: Generate ID Token<br/>(iss: ZTS, sub: "coretech:role.azure-log-reader")
    ZTS->>AzureAD: Submit ID Token & Request Access Token<br/>(aud: api://AzureADTokenExchange)
    AzureAD->>AzureAD: Validate Federated Credential<br/>(Check Issuer & Subject match)
    AzureAD-->>ZTS: Issues Azure Access Token

    Note over ZTS, User: 4. Final Response
    ZTS-->>User: Returns Azure Access Token

classDiagram
    direction LR

    class AthenzDomain {
        Name: coretech
        AzureSubscription: (ID)
        AzureTenant: (ID)
    }

    class AthenzRole {
        Name: azure-log-reader
        FullARN: coretech:role.azure-log-reader
        Members: (Users/Services)
    }

    class AzureManagedIdentity {
        Name: log-reader
        ResourceGroup: system
        ClientID: (UUID)
    }

    class FederatedCredential {
        Issuer: <ZTS API URL>
        Subject: coretech:role.azure-log-reader
        Audience: api://AzureADTokenExchange
    }

    %% Relationship Definitions
    AthenzDomain "1" *-- "many" AthenzRole : contains
    AzureManagedIdentity "1" *-- "1" FederatedCredential : contains config

    AthenzRole .. FederatedCredential : 1. Mapping (Subject Match)
    FederatedCredential .. AzureManagedIdentity : 2. Grant Access (Allows usage of this ID)

    note for FederatedCredential "Key Connector:\nAzure grants access based on this credential\nwhen a specific Athenz Role requests it."

PR: use metadata to specify use of default identity #3084

https://github.com/AthenZ/athenz/pull/3084

Prerequisites

GCP Service Account Name must be at least 6 characters long
Yahoo Inc has many services that are shorter than 6 characters, like zts, zms, msd already
GCP instances allow you to specify metadata of any key-value pairs

PR Summary in my own words

Allows GCP users to set a new metadata key defaultServiceIdentity in their GCP instances. This maps to an Athenz service name that differs from the native GCP service account name, allowing them to use their short Athenz service names (like zts or zms, shorter than 6 characters).

PR Summary from AI

This pull request introduces a mechanism to use the instance's default identity for a service by specifying a defaultServiceIdentity metadata attribute. This is useful when the desired service name doesn't match the GCP service account name. The changes in attestation.go implement this logic, and new tests are added in attestation_test.go to cover the new functionality. My review focuses on improving the robustness of the new logic and correcting issues in the new tests. I've suggested handling a potential edge case with empty service names and improving error visibility in the main logic. For the tests, I've pointed out a logic bug that prevents a test from failing correctly and recommended using test-specific logging for cleaner output. Overall, the changes are good and the tests are comprehensive, but these adjustments will improve the quality and maintainability.

Can I test it out?: Yes, but skipped

Probably yes, but I would need to:

Create a GCP instance
Setup an Athenz service that is shorter than 6 characters
Setup GCP instance metadata to use the short service name with the new key defaultServiceIdentity

Which I don't think is easy to do by myself. Since I understand the concept, I think it's fine to skip it.

Future Potentials

If we ever face a situation where we need to use a short service name for any other 3rd party systems, we can consider to have a alias stored inside the instance metadata.

PR: Make ZpeUpdPolLoader ScheduledExecutorService thread daemon #3086

https://github.com/AthenZ/athenz/pull/3086

Prerequisites

1

In Java, there are two types of threads:

Non-Daemon Thread: A thread that runs in the foreground and is essential to the application's execution. It is like a head chef in a restaurant. (Which also implies you usually do not shutdown your restraunt when your head chef is still cooking (running))
Daemon Thread: A thread that runs in the background and is not essential to the application's execution. It is like a background music in a restaurant. (Which also implies you can shutdown your restraunt even when the background music is still playing)

Platform threads are designated daemon or non-daemon threads.
— Class Thread - Oracle Docs

2

Starting of Java 21 (Sep 2023~), there are two types of threads:

Platform threads: The threads that we are familiar with, which are mapped to OS threads.
Virtual threads: Lightweight threads introduced in Java 21 that are managed by the JVM and are not mapped to OS threads. Can run millions of them.

3

[!NOTE]
Please note that Daemon Thread does not initiate shutdown sequence by itself.

The JVM initiates the shutdown sequence in response to one of several events:

when the number of live non-daemon threads drops to zero for the first time

when the Runtime.exit or System.exit method is called for the first time

when some external event occurs, such as an interrupt or a signal is received from the operating system

— Shutdown Sequence - Class Runtime - Oracle Docs

Then it does for any daemon or non-daemon threads:

the registered shutdown hooks are started in some unspecified order.

Finally:

The shutdown sequence finishes when all shutdown hooks have terminated.

And note that:

one or more shutdown hooks do not terminate, for example, because of an infinite loop. In this case, the shutdown sequence will never finish.

4

Since Java's shutdown sequence can be initiated with external requests as well, it is important to make your threads to have a safe shutdown method like close() or shutdown() that will be running in the registered shutdown hooks stage, to prevent:

dead lock for your relying DB services
resource leak
etc

5

Relying users uses AuthZpeClient.allowAccess() to determine if certain requests are allowed or not. AuthZpeClient uses ZpeUpdPolLoader to load policies from ZMS. ZpeUpdPolLoader load policies by creating ScheduledExecutorService thread behind the scene (without letting the relying users know it) to load policies from ZMS periodically. ZpeUpdPolLoader loads policy over asking every time is for ZMS's over load protection & distributed access check despite of ZMS's availability.

PR Summary from AI

This pull request addresses a resource management issue where the ScheduledExecutorService in ZpeUpdPolLoader could prevent graceful application shutdown by keeping non-daemon threads alive. The change modifies the executor's thread creation to mark them as daemon threads and provides them with explicit names, ensuring they do not block the JVM exit and improving troubleshooting capabilities.

PR Summary in my own words

Make ScheduledExecutorService non-daemon thread => daemon thread so that it does not require to close() thread that is initiated by the library - which developers of the library do not know it exists.

Can I test it out?: Yes

[!TIP]
Does not require Athenz server to be up & running.

Setup: Namespace `athenz`

kubectl create ns athenz

Setup: Java code as a CM

Create a configmap with athenz zpe java client & its code that uses ZpeUpdPolLoader:

cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: athenz-thread-test-code
  namespace: athenz
data:
  pom.xml: |
    <project>
      <modelVersion>4.0.0</modelVersion>
      <groupId>demo</groupId>
      <artifactId>repro</artifactId>
      <version>1.0</version>
      <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
      </properties>
      <dependencies>
        <dependency>
          <groupId>com.yahoo.athenz</groupId>
          <artifactId>athenz-zpe-java-client</artifactId>
          <version>${athenz.version}</version>
        </dependency>
      </dependencies>
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.1.0</version>
            <configuration>
              <mainClass>demo.Repro</mainClass>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>

  Repro.java: |
    package demo;
    import com.yahoo.athenz.zpe.ZpeUpdater;
    import java.nio.file.Files;
    import java.nio.file.Path;

    public class Repro {
      public static void main(String[] args) throws Exception {
        // Dummy environment settings to prevent errors during ZPE initialization
        System.setProperty("athenz.zpe.skip_policy_dir_check", "true");
        Path tmp = Files.createTempDirectory("athenz-zpe-pol");
        System.setProperty("athenz.zpe.policy_dir", tmp.toString());

        System.out.println("[INFO] Initializing ZpeUpdater...");
        // This line creates a scheduler thread in the background.
        ZpeUpdater zpe = new ZpeUpdater(); 

        Thread.sleep(1000); // Wait for thread creation

        Thread.getAllStackTraces().keySet().stream()
            .filter(t -> t.isAlive() && !t.isDaemon())
            .forEach(t -> System.out.println("  - " + t.getName()));

        System.out.println("[INFO] main() finishes here. No explicit close() called.");
      }
    }
EOF

Verify: v1.12.26 vs v1.12.27

You can see that the pod with Athenz v1.12.27+ shutdowns and completes successfully, while the pod with Athenz v1.12.26+ hangs there keep in Running state:

To verify it yourself, you can create a pod that runs the java code with athenz-zpe-java-client version 1.12.26:

kubectl run test-bug-1-12-26 --image=maven:3.9-eclipse-temurin-17 \
  --namespace=athenz \
  --restart=Never \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "test",
        "image": "maven:3.9-eclipse-temurin-17",
        "command": ["/bin/sh", "-c"],
        "args": ["mkdir -p /app/src/main/java/demo && cp /config/pom.xml /app/ && cp /config/Repro.java /app/src/main/java/demo/ && cd /app && mvn -q compile exec:java -Dathenz.version=1.12.26"],
        "volumeMounts": [{"name": "code-vol", "mountPath": "/config"}]
      }],
      "volumes": [{"name": "code-vol", "configMap": {"name": "athenz-thread-test-code"}}]
    }
  }'

Then compare with the pod that runs the java code with athenz-zpe-java-client version 1.12.27:

kubectl run test-fix-1-12-27 --image=maven:3.9-eclipse-temurin-17 \
  --namespace=athenz \
  --restart=Never \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "test",
        "image": "maven:3.9-eclipse-temurin-17",
        "command": ["/bin/sh", "-c"],
        "args": ["mkdir -p /app/src/main/java/demo && cp /config/pom.xml /app/ && cp /config/Repro.java /app/src/main/java/demo/ && cd /app && mvn -q compile exec:java -Dathenz.version=1.12.27"],
        "volumeMounts": [{"name": "code-vol", "mountPath": "/config"}]
      }],
      "volumes": [{"name": "code-vol", "configMap": {"name": "athenz-thread-test-code"}}]
    }
  }'

Future Potentials

Not for this specific improvement, but I might be able to spot similar thread-related problems in the future.

PR: make otel metric options more configurable #3090

https://github.com/AthenZ/athenz/pull/3090

Prerequisites

1

Otel offers three metrics:

counter: total amount of requests + total number of errors etc
gauge: memory usage, current live users
histogram: latency (heavy with its nature)

Unlike logs, Otel stores its metrics in memory with unique label combination, for example:

Label (Time Series)	Value
athenz_api_request_duration_seconds{api="getDomain", domain="sys.auth", method="GET", status="200"}	0.045
athenz_api_request_duration_seconds{api="getDomain", domain="my.custom.domain", method="GET", status="200"}	0.051

The problem of this label above is that production level can have many domains (some internal case has 200,000 domains), and the otel will suffer from OOM due this massive number of combination. We call it as "Cardinality Explosion".

2

What is Cardinality?

Cardinality refers to the number of unique combinations of labels (tags) attached to a metric. In time-series databases (like Prometheus or Datadog), every unique combination creates a brand new, separate data stream called a Time Series.

3

What is then "Cardinality Explosion"?

The number of time series grows exponentially by multiplying the number of possible values for each label.

Safe: Method (GET, POST = 2) × Status (200, 400, 500 = 3) 👉 Total 6 time series. (Easy to manage)
💥 Explosion: Add Domain (10,000 user domains) 👉 2 × 3 × 10,000 = Total 60,000 time series!

Why is it dangerous?

The monitoring server has to keep a separate "memory bucket" active for every single time series. A sudden explosion in cardinality will cause:

OOM (Out of Memory) Crashes: Your monitoring server (e.g., Prometheus) runs out of RAM and dies.
Slow Queries: Grafana dashboards will freeze or time out while trying to load the massive number of series.
Massive Bills: Cloud monitoring tools like Datadog charge by the number of custom metrics/series. It can lead to a huge unexpected bill.

The Golden Rule: Never use labels with unbounded or highly variable values (like User IDs, IP addresses, or highly diverse Domain names) in your metrics.

PR Summary in my own words

[!TIP]
Histogram data (latency) is heavy and expensive. If you enable the separate option, skipping the domain-specific data is highly recommended to save resources. That is why the default for skip_domain_histogram is true.

For Histogram (athenz.otel_separate_domain_histogram_metrics):

false (default): Creates a single "fat" metric containing all labels, including domain names. (High risk of cardinality explosion)
true: Separates the metric into 3 distinct metrics (Core API metric, _requestDomain, and _principalDomain).
- athenz.otel_skip_domain_histogram_metrics:
- true (default): Skips recording the two domain-specific metrics. Only the core metric (with api, method, status) is recorded. (Maximum cost efficiency, but no domain-level latency tracking)
- false: Records all 3 separated metrics. (Moderate cost efficiency, retains domain-level latency tracking)

For Counter (athenz.otel_separate_domain_counter_metrics):

false (default): Creates a single "fat" metric containing all labels, including domain names.
true: Separates the metric into 3 distinct metrics (Core API metric, _requestDomain, and _principalDomain).
- athenz.otel_skip_domain_counter_metrics:
- false (default): Records all 3 separated metrics. (Reduces core cardinality while still tracking domain-level request counts)
- true: Skips recording the two domain-specific metrics. Only the core metric is recorded. (Maximum cost efficiency, but no domain-level count tracking)

PR Summary from AI

This pull request enhances the configurability of OpenTelemetry metrics by separating the settings for histogram and counter metrics, and adding options to skip domain-specific metrics to control cardinality. The changes are logical and well-implemented, with corresponding updates to tests.

A key point to consider is that renaming the configuration property athenz.otel_separate_domain_metrics to athenz.otel_separate_domain_histogram_metrics constitutes a breaking change for users who might have the old property configured. It would be beneficial to update the pull request description to acknowledge this.

Breaking Changes?: Yes

Users using athenz.otel_separate_domain_metrics will need to update their configuration to athenz.otel_separate_domain_histogram_metrics.

Can I test it out?: Yes, but skipped

It seems like Athenz by defaut has the library, so all I have to do is to apply the config change to the cluster, and run something like:

curl localhost:8080/metrics | grep athenz_api_request_duration_seconds

But skipped for now.

Future Potentials

Once we deploy the otel support, this knowledge will be very benefitial.

PR: expose openid_issuer field for access tokens in zts java client #3091

https://github.com/AthenZ/athenz/pull/3091

Prerequisites

AWS EKS can be configured to use Athenz as OIDC Authentication Provider to authorize access to configured EKS clusters

https://athenz.github.io/athenz/oidc_aws_eks/

ZTS has openid_issuer field for AccessToken Request if you want the AccessToken with issuer field for openid

PR Summary in my own words

ZTS Java Client has not supported any parameters to request for openid_issuer=true agasint ZTS. This PR allows you to do so so that your Java application may connect to Athenz protected AWS EKS.

Also they introduced AccessTokenRequestBuilder so that can set default behavior and never pass it as a parameter.

PR Summary from AI

This pull request successfully exposes the openid_issuer field for access tokens in the ZTS Java client. The introduction of the AccessTokenRequestBuilder is a great design choice to avoid further overloading the getAccessToken method, improving the client's usability and maintainability. The changes are well-implemented, and the tests have been updated accordingly to cover the new functionality, including a new comprehensive test for PrefetchTokenScheduledItem. My review includes one suggestion to improve the maintainability of a newly added test method by refactoring it into smaller, more focused tests.

Breaking Changes?: No

No, just a feature that had been missing; not a bug but more of missing core feature.

Can I test it out?: Yes, but skipped

I think I could test getting the AccessToken with openid_issuer=true withmy own Java Code to test it, but I skipped it for now.

Future Potentials

If I ever work with Athenz <=> AWS EKS, I might be able to use it.

PR: Add FreeBSD support to libs/go/sia/util #3093

https://github.com/AthenZ/athenz/pull/3093

Glossary

What is FreeBSD?

An OS great for networking, a direct descendant of Unix

└── unix
    ├── BSD (by UC Berkly in 1970s)
    │   └── MacOS
    │   └── FreeBSD
    │           └── Netflix
    │           └── Uber
    ├── Linux (Only Influenced Though)

What is Vendor Build?

Normal Build: Every build fetches dependencies from the internet (versions may differ).
Vendor Build: Allows you to build using local ingredients. It's always the same because the local ingredients never change, making it much safer and guaranteed to work.

Prerequisites

OSS SIA allows you to vendor-build for:

darwin
linux
windows

but has not yet supported freebsd.

PR Summary in my own words

It now supports vendor builds for freebsd as well, using libs/go/sia/util/os_util_freebsd.go

PR Summary from AI

This pull request adds support for FreeBSD by introducing os_util_freebsd.go, which is a good step towards broader platform compatibility.

Breaking Changes?: No

Enhancement; you can now vendor-build on FreeBSD OS too.

Can I test it out?: Yes, but skipped

I need to prepare FreeBSD & Try to vendor-build. Skipped for now.

Future Potentials

If we want to vendor-build somewhere else, I think I can work on it with it.

PR: expose x509/ssh key id for instance register/refresh operations #3092

https://github.com/AthenZ/athenz/pull/3092

Prerequisites

[!NOTE]
The same as Description in https://github.com/AthenZ/athenz/pull/3092

You can fetch X.509 cert with custom CA for specific:

Athenz domain
Athenz service

This however requires system admin to set it.

Also note that Vespa used Yahoo Inc.'s Athenz domain and wanted to use their custom CA, so they allowed Vespa-owned domains to have custom CAs. However, adding custom CAs for a specific domain/service can only be done by system admins.

PR Summary in my own words

This PR allows a service admin (not a system admin) to test a custom or default CA. They no longer have to ask system admins to modify CA settings just for testing purposes, allowing them to try different CAs without changing the custom CA setting for the domain/service permanently.

Non-system admins can test it by inclduing new field:

x509CertSignerKeyId
sshCertSignerKeyId

It also allows a domain using a custom CA to connect to services that only accept the default CA. It does this by letting them request a cert signed by the default CA, potentially allowing a single service to operate with both a default and custom CA.

PR Summary from AI

This pull request successfully exposes x509/ssh key IDs for instance register/refresh and role certificate operations, enhancing testing capabilities for service admins. The changes across the Go and Java codebases are mostly well-implemented and consistent. I've identified a few opportunities for improvement, primarily around reducing code duplication in both Go validation logic and the new Java tests. Additionally, I've found a minor bug in one of the new tests that should be addressed.

Breaking Changes?: No

More like enhancement: allow you to submit extra field for InstanceRefreshInformation or InstanceRegisterInformation

Can I test it out?: Yes, but skipped

I think I can register a CA for a specific athenz service, and try to generate X.509 cert signed by the specific CA, but skipped for now.

Future Potentials

If one of our services spinned out, maybe yes.

PR: fix util test os filenames + new GetGroupGID impl #3094

https://github.com/AthenZ/athenz/pull/3094

Prerequisites

SIA wants to ensure that the outputted cert files have the correct GroupID. SIA builders can specify a group name (not a GroupID; defaults to athenz if unset) to allow members of that group to read the cert file. SIA reads the /etc/group file to find the GID matching the provided group name, ensuring the outputted cert file is assigned the correct GroupID.

PR Summary in my own words

Uses the standard Go package os.Read() to parse /etc/group data instead of relying on the grep command. This makes it more robust and cross-platform compatible across different OSes.

PR Summary from AI

This pull request significantly refactors the group ID retrieval mechanism within the utility package by introducing a new GetGroupGID function that directly reads and parses the /etc/group file. This change moves away from executing external grep commands, enhancing the robustness and cross-platform compatibility of the group lookup logic. Additionally, the PR includes important fixes to OS-specific test file names and expands the test suite for the updated group ID retrieval functionality, ensuring comprehensive validation of the new implementation.

Breaking Changes?: No

It simply makes it better reading the /etc/group.

Can I test it out?: Yes, but skipped

I think I can create an instance with grep removed and try to build the sia package, which is expected to fail.

And then this enhanceement will allow me to build it because it no longer uses the grep.

Future Potentials

A good reminder that using standard Go libraries is much better than depending on external OS packages.

External packages might be:

deleted
modified
or differ between OSes

It's dangerous to rely on them for important data outputs, so internal handling is always safer.

PR: update go and java dependencies to their latest releases #3095

https://github.com/AthenZ/athenz/pull/3095

Prerequisites

Athenz used to handle validation internally because its dependency package didn't enforce email validation in libs/go/athenzutils/principal.go:

// athenz always verifies that we include a valid
// email in the certificate

idx := strings.Index(emails[0], "@")
if idx == -1 {
  return "", fmt.Errorf("certificate email is invalid: %s", emails[0])
}

PR Summary in my own words

A periodic package upgrade. One of the upgrades includes net/mail's stricter parsing of email addresses, meaning we no longer need to maintain internal logic to test email validation.

While reading this PR, I realized that the internal email check might no longer be needed at all. I wasn't completely sure, so I submitted a PR to get feedback: https://github.com/AthenZ/athenz/pull/3226

PR Summary from AI

This pull request updates various Go and Java dependencies to their latest versions. The changes primarily consist of version bumps in go.mod, go.sum, and pom.xml. A key consequence of these updates is the removal of Go test cases that relied on a certificate with an invalid email format. As noted in the description, stricter parsing in newer libraries now causes these tests to fail during certificate loading, making this a reasonable and well-documented adjustment. The dependency updates are beneficial for project maintenance and security. The changes appear correct and well-justified.

Breaking Changes?: No

No breaking change, the same email invalid checking logic, simply done by different entity.

Can I test it out?: Yes, but skipped

I think I can prepare one certificate with invalid email address in it, and send it to Athenz server, but skipped for now.

Future Potential

If we ever upgrade the package to 1.25.2+ from other package, the stricker checking mechanism may bring breaking changes

PR: allow wildcard in first domain component of StaticWorkloadName #3096

https://github.com/AthenZ/athenz/pull/3096

Glossary

StaticWorkload: The opposite of a ServiceIdentity (e.g., IP Address, FQDN, VIP, LB, NAT). These are resources you don't name using an Athenz Service.

Prerequisites

Before this PR, if you want to include StaticWorkloadFQDN for the following domains, you had to add them all individually:

keep.google.com
docs.google.com
drive.google.com
mail.google.com
and more

PR Summary in my own words

This PR allows you to use * (wildcard) for the static workload's name like *.google.com to include subdomains.

Note that for stricter checking, it only allows you to use the * only once and it must be at the front of the domain name.

You can see as a sample from test to see what evaluates to true (allowed) or false (not allowed).

It also silently added a CLAUDE.md for AIs

PR Summary from AI

This pull request introduces the ability to use a wildcard character (*) in the first domain component of a StaticWorkloadName. This change is implemented across the RDL definition, Java schema, and Go client schema for the MSD (Microservice Daemon) component. Additionally, a new CLAUDE.md file has been added to provide guidance for AI assistants working with the repository.

Breaking Changes?: No

Nope.

Can I test it out?: No

MSD is not yet open sourced

Future Potential

If we ever use the MSD, I think I can think of the * usage for the StaticWorkloadFQDN

What's Next?

Reading Athenz Note definitely helped me out understanding the recent changes. I will continue doing this for later releases as well.

Closing

If you enjoyed this deep dive, please leave a like & subscribe for more!

Goal

ToC

PR: expose on-call URL value in client-side config#3055

Glossary

PR summary in my own words

PR summary from AI

Can I test it out?

Setup: extend-config.js

Setup: ON_CALL_URL env for the athenz-ui deployment

Verify: On Call URL

PR: ui - switch from zms to msd for policy creation by @ArtjomsPorss in #3034

Glossary

PR summary in my own words

PR summary from AI

Can I test it out?

PR: feat: Add functionality to search My Domains in UI by @chandrasekhar1996 in #3058

PR Summary in my own words

PR Summary from AI

Can I test it out?

PR: fix: preserve domain contacts when updating an individual contact without page refresh by @chandrasekhar1996 in #3083

PR Summary in my own words

PR Summary from AI

Can I test it out?

Setup: ZMS properties domain_contact_types

Setup: UI users_data.json

Setup: Patch UI deployment to read the users_data.json

Verify: UI Behavior

Verify: UI Network

Verify: DB Table

PR: Use correct URL path and query param for athenz role. #3089

PR Summary from AI

PR Summary in my own words

Can I test it out?

What I learned

PR: use metadata to specify use of default identity #3084

Prerequisites

PR Summary in my own words

PR Summary from AI

Can I test it out?: Yes, but skipped

Future Potentials

PR: Make ZpeUpdPolLoader ScheduledExecutorService thread daemon #3086

Prerequisites

1

2

3

4

5

PR Summary from AI

PR Summary in my own words

Can I test it out?: Yes

Setup: Namespace athenz

Setup: Java code as a CM

Verify: v1.12.26 vs v1.12.27

Future Potentials

PR: make otel metric options more configurable #3090

Prerequisites

1

2

3

PR Summary in my own words

PR Summary from AI

Breaking Changes?: Yes

Can I test it out?: Yes, but skipped

Future Potentials

PR: expose openid_issuer field for access tokens in zts java client #3091

Prerequisites

PR Summary in my own words

PR Summary from AI

Breaking Changes?: No

Can I test it out?: Yes, but skipped

Future Potentials

PR: Add FreeBSD support to libs/go/sia/util #3093

Glossary

Prerequisites

PR Summary in my own words

PR Summary from AI

Breaking Changes?: No

Can I test it out?: Yes, but skipped

Future Potentials

PR: expose x509/ssh key id for instance register/refresh operations #3092

Setup: `extend-config.js`

Setup: `ON_CALL_URL` env for the `athenz-ui` deployment

Setup: ZMS properties `domain_contact_types`

Setup: Namespace `athenz`