Agent Credential Management: Two-Tier Service Accounts for Secure AI Agent Workflows

#aiagents #credentialmanagement #security #serviceaccounts

I spent three days trying to get a multi-agent system to talk to a Kubernetes API endpoint. Every time I used the default service account, the agent would hit a 403 and lock out. I was using the right permissions, the right roles, the right RBAC rules. It wasn’t until I implemented a two-tier service account system that the agents finally stopped throwing errors. It’s not just about having the right permissions , it’s about structuring them in a way that isolates the agent’s access and limits its blast radius.

If you're running AI agents in Kubernetes, especially ones that interact with external systems or sensitive data, this is a pattern you should consider. This isn't just about security , it's about making sure your agents fail safely and don't accidentally break your entire infrastructure if they're compromised.

I first tried using the default service account for all my agents. It worked fine for a while, but as I scaled out to more agents and more workflows, I started seeing odd behavior. The agents would occasionally call APIs with incorrect credentials, or worse, they'd silently fail without any logs. I checked the RBAC policies, adjusted them, and kept getting the same issues. It was like the agents were leaking credentials or being intercepted by something in the cluster.

At one point, I even tried using a single dedicated service account for all agents, which I thought would give them more consistent permissions. But that only made things worse. Now every agent had the same access, and when one got compromised, the whole system was at risk. I realized I needed a way to isolate each agent’s credentials while still maintaining some level of central control.

The solution came in the form of a two-tier service account system. Instead of giving each agent its own service account with direct access to the cluster, I created a central service account with limited permissions that acted as a "proxy" for all the agent workloads. Each agent then had its own "child" service account, which was granted access only through this central account. This way, I could manage permissions at the central level without having to update every agent individually when something changed.

Here’s how I set it up. First, I created the central service account with a role that had access to the necessary resources but nothing more:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: agent-proxy
  namespace: agent-system
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: agent-proxy-role
  namespace: agent-system
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["get", "list", "watch"]
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: agent-proxy-binding
  namespace: agent-system
subjects:
- kind: ServiceAccount
  name: agent-proxy
  namespace: agent-system
roleRef:
  kind: Role
  name: agent-proxy-role
  apiGroup: rbac.authorization.k8s.io

Then, for each agent, I created a separate service account that was bound to the central account using a RoleBinding that referenced the central account’s role. This gave each agent its own identity while still limiting their access to only what was necessary:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: agent-worker-1
  namespace: agent-system
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: agent-worker-1-binding
  namespace: agent-system
subjects:
- kind: ServiceAccount
  name: agent-worker-1
  namespace: agent-system
roleRef:
  kind: Role
  name: agent-proxy-role
  apiGroup: rbac.authorization.k8s.io

Each agent’s deployment was configured to use its own service account, and I made sure to set the automountServiceAccountToken to true so that the agents could access the token automatically. This way, the agents had their own credentials but were still limited by the permissions defined in the central role.

This approach has a few key benefits. First, it isolates each agent's access, so a compromise in one agent doesn’t automatically give access to all others. Second, it simplifies permission management , you can update the central role once and all agents inherit the new permissions. And third, it makes auditing easier , if an agent does something unexpected, you can trace it back to its specific service account.

This isn’t a perfect solution, though. There are tradeoffs. Setting this up requires more configuration than just using a single service account, and it adds some complexity to the system. You also have to make sure that the central account doesn’t have more permissions than it needs , otherwise, you end up with the same security risks you were trying to avoid.

What I would do differently is to automate this process more. I had to manually create each service account and role binding, which was tedious and error-prone. If I were to do this again, I’d set up a Kubernetes operator or a script that could generate the necessary resources based on a list of agent names. That would reduce the risk of mistakes and make it easier to scale.

Another thing I’d consider is adding more fine-grained permissions. Right now, the central role gives access to pods, services, and configmaps, but that might be more than some agents need. Going forward, I’d look into creating more specific roles for different types of agents , for example, one role for agents that only need to read pods and another for agents that can modify services. This would make the system more secure and more efficient.

I also wish I had implemented some kind of token rotation or short-lived credentials for the agent service accounts. Right now, the tokens are long-lived, which means if an agent’s credentials are ever compromised, the attacker has a long window to do damage. Adding a system that rotates the tokens periodically would help reduce that risk, even if it adds some complexity.

In the end, the two-tier service account system worked well for my needs. It gave me the security I wanted without sacrificing the flexibility I needed. It wasn’t the easiest setup to get going, but the tradeoff was worth it , I now have a more secure and manageable agent system. If you're managing AI agents in Kubernetes, I'd recommend considering this approach for your credential management.

DEV Community

Agent Credential Management: Two-Tier Service Accounts for Secure AI Agent Workflows

Top comments (0)