Soumyajyoti Mahalanobish

Posted on May 14

Managing Shared Environments with Grace

1. The Problem: A Single Test Environment, Too Many Developers

Every day, developers from multiple squads need access to a shared environment for staging deployments and end‑to‑end tests. Conflicts are routine. People hop into the environment mid‑session or, worse, forget to release it.

Imagine this: Developer A is debugging a flaky integration test. Mid‑way, Developer B deploys a different branch, unaware that the environment is in use. Chaos ensues. Slack threads grow long. Tempers rise. No one's to blame, but everyone's affected.

This scenario calls for:

A way to reserve the environment
Visibility into who's using it
Notifications for the next person in line

2. The Idea: A Self‑Service Queue with Visibility

This prototype envisions a lean and elegant system:

Developers click to join the queue
When their turn comes, they get notified on Slack
They can lock the environment with a single click
A reason is recorded to inform others of usage

No separate access control, no complicated UIs—just a small, robust system that integrates into developer tools and Slack.

3. System Architecture

The architecture comprises:

Twirp‑powered Go backend – lightweight RPCs over HTTP
Redis – queue state and locking with TTL
Slack – direct user notifications
Frontend – embedded in internal dashboards

Redis stores per‑application queues as lists (e.g., queue:env:shared) and lock state as a hash (e.g., lock:env:shared). Queue position, TTLs, and Slack IDs are all stored as structured JSON entries.

4. Protobuf Definitions

Interfaces are defined via buf.gen.sh:

message QueueEntry {
  string user_email = 1;
  string user_name  = 2;
  string reason     = 3;
  string slack_id   = 4;
  int64  timestamp  = 5;
}

message JoinQueueRequest {
  string     app_name = 1;
  QueueEntry entry    = 2;
}

message LeaveQueueRequest {
  string app_name   = 1;
  string user_email = 2;
}

message LockState {
  string user_email = 1;
  string reason     = 2;
  int64  timestamp  = 3;
  int64  expires_at = 4;
}

service Deployments {
  rpc JoinQueue(JoinQueueRequest) returns (JoinQueueResponse);
  rpc LeaveQueue(LeaveQueueRequest) returns (LeaveQueueResponse);
  rpc GetQueueStatus(GetQueueStatusRequest) returns (GetQueueStatusResponse);
}

5. Go Implementation Highlights

The backend implements Twirp RPCs and Redis atomic operations.

Joining the Queue

val, _ := json.Marshal(req.Entry)
pipe := s.redis.TxPipeline()
pipe.LRange(ctx, queueKey, 0, -1)
pipe.RPush(ctx, queueKey, val)
cmds, err := pipe.Exec(ctx)
if err != nil {
  return nil, err
}
entries := cmds[0].(*redis.StringSliceCmd).Val()
for _, item := range entries {
  if strings.Contains(item, req.Entry.UserEmail) {
    return nil, errors.New("already in queue")
  }
}

Promoting Lock on Leave

pipe := s.redis.TxPipeline()
pipe.LRem(ctx, queueKey, 0, leavingEntry)
pipe.LIndex(ctx, queueKey, 0)
if nextEntryJSON := pipe.Exec(ctx)[1].(*redis.StringCmd).Val(); nextEntryJSON != "" {
  var nextEntry deployment.QueueEntry
  json.Unmarshal([]byte(nextEntryJSON), &nextEntry)
  s.redis.HSet(ctx, lockKey, map[string]interface{}{
    "user_email": nextEntry.UserEmail,
    "reason":     nextEntry.Reason,
    "timestamp":  time.Now().Unix(),
  })
  notifySlack(nextEntry)
}

6. Slack Integration

Slack alerts are essential to ensure seamless transitions.

Notifications are enhanced with Block Kit:

blocks := []BlockElement{
  TextBlock(fmt.Sprintf(
    "You are now first in the test queue! Reason: *%s*", entry.Reason)),
  Button("Take Lock", lockURL),
}
SendSlackMessage(entry.SlackId, blocks)

Retries and exponential back‑off are handled via a wrapper:

for i := 0; i < 3; i++ {
  err := sendToSlackAPI(payload)
  if err == nil {
    break
  }
  time.Sleep(time.Duration(1<<i) * time.Second)
}

7. Frontend Integration

A single button toggles lock intent:

const handleToggle = async () => {
  const res = await fetch(`/api/toggle-lock`, { method: 'POST' });
  if (res.status === 200) refreshQueue();
};

UX states:

Not in queue → Join
First in queue → Take lock
Has lock → Release and notify next

Real‑time updates are supported via Redis pub/sub.

8. Testing: Critical for Internal Developer Platforms

Thorough testing is essential for internal developer platforms for several reasons:

Trust and Adoption: Developers will only use tools they can rely on. A flaky queue management system will be abandoned quickly.
Failure Amplification: A bug in a developer platform affects the productivity of entire engineering teams, not just individual users.
Complex State Transitions: Queue management involves multiple state transitions that must be tested rigorously to prevent deadlocks or lost queue positions.
Distributed Consistency: Ensuring consistent state between Redis, the application, and Slack notifications requires robust integration testing.
Error Recovery: Systems must be resilient to network issues, Redis failures, or Slack API outages.

Unit + integration tests:

func TestLockTransition(t *testing.T) {
  // Test Redis LRem, HSet, LIndex chain
  // Validate Slack calls were triggered
}

func TestQueueConcurrency(t *testing.T) {
  // Validate queue integrity with concurrent join/leave operations
}

func TestSlackNotificationRetries(t *testing.T) {
  // Ensure backoff strategy properly handles temporary Slack outages
}

Conclusion

This setup demonstrates how simple components like Redis, Slack, and Go can come together to orchestrate shared environment access with minimal friction. The queue model introduces fairness and predictability to what was once a chaotic workflow. If your developers are wrestling for staging access, it might be time to queue up—with grace.

Top comments (1)

Wrishav_OSS • May 19

what happens in the scenario , if queue size grows too long ? Do we have unbounded queue