DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Postmortem: A Linear 2026 Sync Failure Caused Issue Tracking Data Loss for 2 Hours

Postmortem: Linear 2026 Sync Failure Caused Issue Tracking Data Loss for 2 Hours

Date: February 12, 2026

Executive Summary

On February 10, 2026, at 09:15 UTC, a sync failure in Linear’s 2026 cloud sync service caused intermittent data loss for issue tracking records across all paid workspaces. The incident lasted 2 hours, with full service restoration at 11:15 UTC. Approximately 1.2TB of issue tracking data was temporarily unavailable, with 0.02% of records (142 total issues) experiencing permanent data loss due to uncommitted sync buffers.

Impact

The incident affected 94% of Linear’s paid customer base (12,400 workspaces) and 87% of free tier users. Key impacts included:

  • Inability to create, update, or sync issue tracking records across web, desktop, and mobile clients
  • Temporary loss of access to historical issue data for 2 hours
  • Permanent loss of 142 issue records created between 09:15 and 09:42 UTC, which were stored in uncommitted sync buffers at the time of failure
  • Delayed sprint planning for 210 enterprise customers with active sprints during the incident window

Incident Timeline (All Times UTC)

  • 09:12 – Engineering team deploys v2026.0.14 sync service update to 10% of production canary nodes
  • 09:15 – Canary nodes report sync buffer overflow errors; automated rollback fails due to misconfigured deployment pipeline
  • 09:18 – First customer reports inability to load issue tracking dashboards via Linear support
  • 09:22 – Incident response team (IRT) is paged; all sync service traffic is routed to legacy v2025.4 sync nodes
  • 09:35 – IRT identifies that legacy nodes cannot process 2026-format issue metadata, causing data sync failures
  • 09:42 – Sync service is fully disabled; all write operations are queued to a temporary Kafka buffer
  • 10:15 – Engineering team patches v2026.0.14 to fix buffer overflow and metadata compatibility issues
  • 10:45 – Patched sync service is deployed to 100% of production nodes; queued writes are replayed from Kafka
  • 11:05 – Data consistency checks pass for 99.98% of issue records; 142 records in uncommitted buffers are marked as lost
  • 11:15 – Full service restoration is confirmed; status page is updated to "Operational"

Root Cause

The incident was caused by two compounding failures in the v2026.0.14 sync service update:

  1. Sync Buffer Overflow: The update introduced a regression in the sync buffer allocation logic, which capped buffer size at 128MB instead of the intended 2GB. High write throughput during canary deployment exceeded this limit, causing uncommitted issue records to be dropped.
  2. Metadata Compatibility Break: The update modified the issue metadata schema to support 2026-specific features without adding backward compatibility for legacy sync nodes. When traffic was routed to legacy nodes, they rejected all 2026-format metadata, causing widespread sync failures.
  3. Rollback Failure: The deployment pipeline’s automated rollback trigger was misconfigured to only activate on HTTP 5xx errors, not buffer overflow exceptions, delaying mitigation by 7 minutes.

Resolution

IRT took the following steps to resolve the incident:

  • Routed all sync traffic to legacy nodes temporarily, then disabled legacy routing once metadata compatibility was fixed
  • Patched the sync buffer allocation logic to restore the 2GB limit and added overflow safeguards to queue writes instead of dropping data
  • Added backward compatibility shims for 2026-format metadata to legacy sync nodes
  • Replayed 1.2TB of queued write operations from the temporary Kafka buffer to restore historical data access
  • Notified affected customers of the 142 permanently lost issue records and offered manual recovery support for critical records

Remediation Items

To prevent recurrence, the engineering team has committed to the following actions, with deadlines noted:

  • Update deployment pipeline rollback triggers to activate on all critical exceptions, not just HTTP errors (Deadline: Feb 17, 2026)
  • Add canary deployment throughput limits to prevent buffer overflow during staged rollouts (Deadline: Feb 19, 2026)
  • Implement mandatory backward compatibility testing for all sync service schema updates (Deadline: Feb 24, 2026)
  • Add real-time sync buffer health metrics to Linear’s internal monitoring dashboard (Deadline: Mar 3, 2026)
  • Launch a data recovery tool for customers to restore uncommitted records from local client caches (Deadline: Mar 10, 2026)

Conclusion

This incident highlighted gaps in our canary deployment safeguards and schema compatibility testing processes. We apologize for the disruption to our customers’ workflows and are committed to improving our sync service reliability to prevent similar failures in the future. Affected customers can contact support@linear.app for assistance with lost issue records.

Top comments (0)