1. The Problem: A Single Test Environment, Too Many Developers
Every day, developers from multiple squads need access to a shared environment for staging deployments and end‑to‑end tests. Conflicts are routine. People hop into the environment mid‑session or, worse, forget to release it.
Imagine this: Developer A is debugging a flaky integration test. Mid‑way, Developer B deploys a different branch, unaware that the environment is in use. Chaos ensues. Slack threads grow long. Tempers rise. No one's to blame, but everyone's affected.
This scenario calls for:
- A way to reserve the environment
- Visibility into who's using it
- Notifications for the next person in line
2. The Idea: A Self‑Service Queue with Visibility
This prototype envisions a lean and elegant system:
- Developers click to join the queue
- When their turn comes, they get notified on Slack
- They can lock the environment with a single click
- A reason is recorded to inform others of usage
No separate access control, no complicated UIs—just a small, robust system that integrates into developer tools and Slack.
3. System Architecture
The architecture comprises:
- Twirp‑powered Go backend – lightweight RPCs over HTTP
- Redis – queue state and locking with TTL
- Slack – direct user notifications
- Frontend – embedded in internal dashboards
Redis stores per‑application queues as lists (e.g., queue:env:shared
) and lock state as a hash (e.g., lock:env:shared
). Queue position, TTLs, and Slack IDs are all stored as structured JSON entries.
4. Protobuf Definitions
Interfaces are defined via buf.gen.sh
:
message QueueEntry {
string user_email = 1;
string user_name = 2;
string reason = 3;
string slack_id = 4;
int64 timestamp = 5;
}
message JoinQueueRequest {
string app_name = 1;
QueueEntry entry = 2;
}
message LeaveQueueRequest {
string app_name = 1;
string user_email = 2;
}
message LockState {
string user_email = 1;
string reason = 2;
int64 timestamp = 3;
int64 expires_at = 4;
}
service Deployments {
rpc JoinQueue(JoinQueueRequest) returns (JoinQueueResponse);
rpc LeaveQueue(LeaveQueueRequest) returns (LeaveQueueResponse);
rpc GetQueueStatus(GetQueueStatusRequest) returns (GetQueueStatusResponse);
}
5. Go Implementation Highlights
The backend implements Twirp RPCs and Redis atomic operations.
Joining the Queue
val, _ := json.Marshal(req.Entry)
pipe := s.redis.TxPipeline()
pipe.LRange(ctx, queueKey, 0, -1)
pipe.RPush(ctx, queueKey, val)
cmds, err := pipe.Exec(ctx)
if err != nil {
return nil, err
}
entries := cmds[0].(*redis.StringSliceCmd).Val()
for _, item := range entries {
if strings.Contains(item, req.Entry.UserEmail) {
return nil, errors.New("already in queue")
}
}
Promoting Lock on Leave
pipe := s.redis.TxPipeline()
pipe.LRem(ctx, queueKey, 0, leavingEntry)
pipe.LIndex(ctx, queueKey, 0)
if nextEntryJSON := pipe.Exec(ctx)[1].(*redis.StringCmd).Val(); nextEntryJSON != "" {
var nextEntry deployment.QueueEntry
json.Unmarshal([]byte(nextEntryJSON), &nextEntry)
s.redis.HSet(ctx, lockKey, map[string]interface{}{
"user_email": nextEntry.UserEmail,
"reason": nextEntry.Reason,
"timestamp": time.Now().Unix(),
})
notifySlack(nextEntry)
}
6. Slack Integration
Slack alerts are essential to ensure seamless transitions.
Notifications are enhanced with Block Kit:
blocks := []BlockElement{
TextBlock(fmt.Sprintf(
"You are now first in the test queue! Reason: *%s*", entry.Reason)),
Button("Take Lock", lockURL),
}
SendSlackMessage(entry.SlackId, blocks)
Retries and exponential back‑off are handled via a wrapper:
for i := 0; i < 3; i++ {
err := sendToSlackAPI(payload)
if err == nil {
break
}
time.Sleep(time.Duration(1<<i) * time.Second)
}
7. Frontend Integration
A single button toggles lock intent:
const handleToggle = async () => {
const res = await fetch(`/api/toggle-lock`, { method: 'POST' });
if (res.status === 200) refreshQueue();
};
UX states:
- Not in queue → Join
- First in queue → Take lock
- Has lock → Release and notify next
Real‑time updates are supported via Redis pub/sub.
8. Testing: Critical for Internal Developer Platforms
Thorough testing is essential for internal developer platforms for several reasons:
- Trust and Adoption: Developers will only use tools they can rely on. A flaky queue management system will be abandoned quickly.
- Failure Amplification: A bug in a developer platform affects the productivity of entire engineering teams, not just individual users.
- Complex State Transitions: Queue management involves multiple state transitions that must be tested rigorously to prevent deadlocks or lost queue positions.
- Distributed Consistency: Ensuring consistent state between Redis, the application, and Slack notifications requires robust integration testing.
- Error Recovery: Systems must be resilient to network issues, Redis failures, or Slack API outages.
Unit + integration tests:
func TestLockTransition(t *testing.T) {
// Test Redis LRem, HSet, LIndex chain
// Validate Slack calls were triggered
}
func TestQueueConcurrency(t *testing.T) {
// Validate queue integrity with concurrent join/leave operations
}
func TestSlackNotificationRetries(t *testing.T) {
// Ensure backoff strategy properly handles temporary Slack outages
}
Conclusion
This setup demonstrates how simple components like Redis, Slack, and Go can come together to orchestrate shared environment access with minimal friction. The queue model introduces fairness and predictability to what was once a chaotic workflow. If your developers are wrestling for staging access, it might be time to queue up—with grace.
Top comments (1)
what happens in the scenario , if queue size grows too long ? Do we have unbounded queue