DEV Community

Soumyajyoti Mahalanobish
Soumyajyoti Mahalanobish

Posted on

Managing Shared Environments with Grace

1. The Problem: A Single Test Environment, Too Many Developers

Every day, developers from multiple squads need access to a shared environment for staging deployments and end‑to‑end tests. Conflicts are routine. People hop into the environment mid‑session or, worse, forget to release it.

Imagine this: Developer A is debugging a flaky integration test. Mid‑way, Developer B deploys a different branch, unaware that the environment is in use. Chaos ensues. Slack threads grow long. Tempers rise. No one's to blame, but everyone's affected.

This scenario calls for:

  • A way to reserve the environment
  • Visibility into who's using it
  • Notifications for the next person in line

2. The Idea: A Self‑Service Queue with Visibility

This prototype envisions a lean and elegant system:

  • Developers click to join the queue
  • When their turn comes, they get notified on Slack
  • They can lock the environment with a single click
  • A reason is recorded to inform others of usage

No separate access control, no complicated UIs—just a small, robust system that integrates into developer tools and Slack.

3. System Architecture

The architecture comprises:

  • Twirp‑powered Go backend – lightweight RPCs over HTTP
  • Redis – queue state and locking with TTL
  • Slack – direct user notifications
  • Frontend – embedded in internal dashboards

Minimal-Architecture

Redis stores per‑application queues as lists (e.g., queue:env:shared) and lock state as a hash (e.g., lock:env:shared). Queue position, TTLs, and Slack IDs are all stored as structured JSON entries.

4. Protobuf Definitions

Interfaces are defined via buf.gen.sh:

message QueueEntry {
  string user_email = 1;
  string user_name  = 2;
  string reason     = 3;
  string slack_id   = 4;
  int64  timestamp  = 5;
}

message JoinQueueRequest {
  string     app_name = 1;
  QueueEntry entry    = 2;
}

message LeaveQueueRequest {
  string app_name   = 1;
  string user_email = 2;
}

message LockState {
  string user_email = 1;
  string reason     = 2;
  int64  timestamp  = 3;
  int64  expires_at = 4;
}

service Deployments {
  rpc JoinQueue(JoinQueueRequest) returns (JoinQueueResponse);
  rpc LeaveQueue(LeaveQueueRequest) returns (LeaveQueueResponse);
  rpc GetQueueStatus(GetQueueStatusRequest) returns (GetQueueStatusResponse);
}
Enter fullscreen mode Exit fullscreen mode

5. Go Implementation Highlights

The backend implements Twirp RPCs and Redis atomic operations.

Joining the Queue

val, _ := json.Marshal(req.Entry)
pipe := s.redis.TxPipeline()
pipe.LRange(ctx, queueKey, 0, -1)
pipe.RPush(ctx, queueKey, val)
cmds, err := pipe.Exec(ctx)
if err != nil {
  return nil, err
}
entries := cmds[0].(*redis.StringSliceCmd).Val()
for _, item := range entries {
  if strings.Contains(item, req.Entry.UserEmail) {
    return nil, errors.New("already in queue")
  }
}
Enter fullscreen mode Exit fullscreen mode

Promoting Lock on Leave

pipe := s.redis.TxPipeline()
pipe.LRem(ctx, queueKey, 0, leavingEntry)
pipe.LIndex(ctx, queueKey, 0)
if nextEntryJSON := pipe.Exec(ctx)[1].(*redis.StringCmd).Val(); nextEntryJSON != "" {
  var nextEntry deployment.QueueEntry
  json.Unmarshal([]byte(nextEntryJSON), &nextEntry)
  s.redis.HSet(ctx, lockKey, map[string]interface{}{
    "user_email": nextEntry.UserEmail,
    "reason":     nextEntry.Reason,
    "timestamp":  time.Now().Unix(),
  })
  notifySlack(nextEntry)
}
Enter fullscreen mode Exit fullscreen mode

6. Slack Integration

Slack alerts are essential to ensure seamless transitions.

Slack-Bot

Notifications are enhanced with Block Kit:

blocks := []BlockElement{
  TextBlock(fmt.Sprintf(
    "You are now first in the test queue! Reason: *%s*", entry.Reason)),
  Button("Take Lock", lockURL),
}
SendSlackMessage(entry.SlackId, blocks)
Enter fullscreen mode Exit fullscreen mode

Retries and exponential back‑off are handled via a wrapper:

for i := 0; i < 3; i++ {
  err := sendToSlackAPI(payload)
  if err == nil {
    break
  }
  time.Sleep(time.Duration(1<<i) * time.Second)
}
Enter fullscreen mode Exit fullscreen mode

7. Frontend Integration

A single button toggles lock intent:

const handleToggle = async () => {
  const res = await fetch(`/api/toggle-lock`, { method: 'POST' });
  if (res.status === 200) refreshQueue();
};
Enter fullscreen mode Exit fullscreen mode

UX states:

  • Not in queue → Join
  • First in queue → Take lock
  • Has lock → Release and notify next

Real‑time updates are supported via Redis pub/sub.

8. Testing: Critical for Internal Developer Platforms

Thorough testing is essential for internal developer platforms for several reasons:

  • Trust and Adoption: Developers will only use tools they can rely on. A flaky queue management system will be abandoned quickly.
  • Failure Amplification: A bug in a developer platform affects the productivity of entire engineering teams, not just individual users.
  • Complex State Transitions: Queue management involves multiple state transitions that must be tested rigorously to prevent deadlocks or lost queue positions.
  • Distributed Consistency: Ensuring consistent state between Redis, the application, and Slack notifications requires robust integration testing.
  • Error Recovery: Systems must be resilient to network issues, Redis failures, or Slack API outages.

Unit + integration tests:

func TestLockTransition(t *testing.T) {
  // Test Redis LRem, HSet, LIndex chain
  // Validate Slack calls were triggered
}

func TestQueueConcurrency(t *testing.T) {
  // Validate queue integrity with concurrent join/leave operations
}

func TestSlackNotificationRetries(t *testing.T) {
  // Ensure backoff strategy properly handles temporary Slack outages
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

This setup demonstrates how simple components like Redis, Slack, and Go can come together to orchestrate shared environment access with minimal friction. The queue model introduces fairness and predictability to what was once a chaotic workflow. If your developers are wrestling for staging access, it might be time to queue up—with grace.

Top comments (1)

Collapse
 
wrishavgogo profile image
Wrishav_OSS

what happens in the scenario , if queue size grows too long ? Do we have unbounded queue