Modern graph databases often represent dynamic systems: applications evolving over time, relationships appearing and disappearing, and entities acquiring new attributes as data changes.
When the underlying graph is user-facing, maintaining a complete history of nodes and relationships becomes a critical capability.
This article presents a production-grade, bitemporal versioning model for Neo4j, supporting:
- Accurate historical reconstruction
- Time-travel queries
- Temporal relationship tracking
- Efficient ingestion
- Minimal impact on existing “current” queries
The approach is designed for high-read systems where graph state changes incrementally and users must view data at any point in time.
1. Design Goals
A temporal graph versioning system must satisfy the following constraints:
1.1 Minimal disruption to existing queries
Everyday queries (fetching the “current” graph) must remain simple:
MATCH (n) WHERE NOT n:Deleted
MATCH ()-[r]->() WHERE r.Status = "Active"
No complex temporal logic in the majority of queries.
1.2 Complete bitemporal representation
Every node or relationship must encode:
StartDate — when it became valid
EndDate — when it stopped being valid (NULL = current)
This enables time-travel queries and historical reconstruction.
1.3 Deterministic version merging
Each node and relationship must have a stable primary key so the ingestion pipeline can decide:
- Should this entity be created?
- Should it be updated?
- Should old versions be closed?
1.4 Efficient deletion detection
We cannot “blindly” delete nodes. Instead, the pipeline must:
- Mark entities touched in this ingestion cycle (via
lastUpdated) - Infer deletions by comparing against the process date
1.5 Neo4j MERGE limitations must be respected
Neo4j does not support:
MERGE (a)-[r:LINK {EndDate: NULL}]->(b)
This is why relationships use a Status property rather than attempting NULL-based merges.
2. Data Model
2.1 Versioned Nodes
Each logical entity is represented as multiple immutable node versions:
(:Entity {
Id: "E123",
StartDate: datetime("2024-01-10T00:00:00Z"),
EndDate: null,
lastUpdated: datetime("2024-12-01T10:00:00Z")
})
When a node becomes invalid:
-
EndDateis set -
:Deletedlabel is added
ASCII Diagram
+------------------+ +------------------+
| Entity (v1) | ----> | Entity (v2) |
| Id: E123 | | Id: E123 |
| Start: T1 | | Start: T2 |
| End: T2 | | End: null |
| Label: Deleted | | Label: <none> |
+------------------+ +------------------+
2.2 Versioned Relationships
Like nodes, relationships also maintain temporal state:
(a)-[:LINK {
Id: "R987",
StartDate: datetime("2024-01-10T00:00:00Z"),
EndDate: null,
Status: "Active",
lastUpdated: datetime("2024-12-01T10:00:00Z")
}]->(b)
Why we need Status
Neo4j cannot MERGE on EndDate = NULL, so we use:
Status = "Active"Status = "Deleted"
This provides a safe, deterministic merge target.
3. Ingestion Architecture (Multi-Phase)
Your ingestion pipeline comprises three phases, ensuring consistent versioning.
+-----------------------------------------------------+
| Ingestion Pipeline |
+-----------------------------------------------------+
| |
| Phase 1: Nodes → Create or update nodes |
| Phase 2: Links → Create or update relationships|
| Phase 3: Clean-up → Close missing versions |
| |
+-----------------------------------------------------+
3.1 Phase 1 — Node Ingestion
For each incoming node:
- MERGE by
Id - If node exists and attributes differ → close old version, create new
- Update
lastUpdated = processTime
Cypher (simplified)
MERGE (n:Entity {Id: $id})
ON MATCH SET
n.lastUpdated = $processDate
ON CREATE SET
n.StartDate = $processDate,
n.lastUpdated = $processDate
When detecting changes, the ingestion process may:
- Set EndDate on the previous version
- Add
:Deleted - Create a fresh version
3.2 Phase 2 — Relationship Ingestion
For each incoming relationship:
MATCH (a:Entity {Id: $src})
MATCH (b:Entity {Id: $dst})
MERGE (a)-[r:LINK {Id: $id}]->(b)
ON MATCH SET
r.lastUpdated = $processDate
ON CREATE SET
r.StartDate = $processDate,
r.Status = "Active",
r.lastUpdated = $processDate
If a relationship changed (attribute changes), the pipeline must:
-
Mark old relationship as:
r.EndDate = $processDater.Status = "Deleted"
-
Create a new version:
StartDate = $processDateStatus = "Active"
3.3 Phase 3 — Version Closure (Deletion Detection)
After phases 1 & 2, you detect deletions:
Any node whose lastUpdated != processDate is no longer valid:
MATCH (n:Entity)
WHERE n.lastUpdated <> $processDate AND NOT n:Deleted
SET n.EndDate = $processDate, n:Deleted
Same for relationships:
MATCH ()-[r:LINK]->()
WHERE r.lastUpdated <> $processDate AND r.Status = "Active"
SET r.EndDate = $processDate, r.Status = "Deleted"
This allows ingestion to determine “missing = deleted” without manual intervention.
4. Querying the Current Graph
Your versioning design enables extremely simple “current state” queries:
Nodes
MATCH (n:Entity)
WHERE NOT n:Deleted
RETURN n
Relationships
MATCH (a)-[r:LINK]->(b)
WHERE r.Status = "Active"
RETURN a, r, b
Minimal logic.
High performance.
Clean integration with UI/API.
5. Querying Historical Snapshots
To reconstruct graph state for a given timestamp T:
Nodes
MATCH (n:Entity)
WHERE n.StartDate <= $T AND (n.EndDate IS NULL OR n.EndDate > $T)
RETURN n
Relationships
MATCH (a)-[r:LINK]->(b)
WHERE r.StartDate <= $T AND (r.EndDate IS NULL OR r.EndDate > $T)
RETURN a, r, b
This produces an accurate, complete view of the graph at time T.
6. Go + Neo4j Driver Pseudo-code
Below is idiomatic Go pseudocode demonstrating versioned ingestion logic.
6.1 Creating/Updating a Node
func ingestNode(id string, props map[string]interface{}, processDate time.Time) {
session := driver.NewSession(neo4j.SessionConfig{AccessMode: neo4j.AccessModeWrite})
defer session.Close()
_, err := session.WriteTransaction(func(tx neo4j.Transaction) (interface{}, error) {
params := map[string]interface{}{
"id": id,
"processDate": processDate,
"props": props,
}
query := `
MERGE (n:Entity {Id: $id})
ON MATCH SET
n.lastUpdated = $processDate
ON CREATE SET
n.StartDate = $processDate,
n.lastUpdated = $processDate,
n += $props
`
return tx.Run(query, params)
})
if err != nil {
log.Fatal(err)
}
}
6.2 Closing Stale Nodes
func closeStaleNodes(processDate time.Time) {
session := driver.NewSession(neo4j.SessionConfig{AccessMode: neo4j.AccessModeWrite})
defer session.Close()
_, err := session.Run(`
MATCH (n:Entity)
WHERE n.lastUpdated <> $processDate AND NOT n:Deleted
SET n.EndDate = $processDate, n:Deleted
`, map[string]interface{}{
"processDate": processDate,
})
if err != nil {
log.Fatal(err)
}
}
7. Common Pitfalls & How This Model Solves Them
7.1 MERGE cannot match on NULL
Many developers attempt:
MERGE (a)-[r:LINK {EndDate: NULL}]->(b)
This does not work in Neo4j.
Solution:
Use Status for deterministic relationship merging.
7.2 Avoid overwriting nodes
You never update older versions.
Instead:
- Close old version (
EndDate,:Deleted) - Create new version
This preserves full history.
7.3 Efficient current-state filtering
Instead of comparing timestamps, we rely on:
NOT n:Deletedr.Status = "Active"
These are extremely fast and index-friendly.
8. Performance Considerations
Indexes
You should index:
Node: Entity(Id)
Node: Entity(Deleted)
Rel: LINK(Id)
Rel: LINK(Status)
Batching
Batching ingestion improves performance substantially.
Avoiding deep history scans
Historical reconstruction always uses date filtering, not traversal of version chains.
9. Summary of the Model
Nodes:
- Id
- StartDate
- EndDate
- lastUpdated
- :Deleted label
Relationships:
- Id
- StartDate
- EndDate
- Status ("Active"/"Deleted")
- lastUpdated
Ingestion phases:
1. Node ingest
2. Relationship ingest
3. Close stale versions
Key benefits:
- Clean, fast “current” queries
- Complete historical accuracy
- Deterministic version merging
- No risk of MERGE-on-NULL issues
- Proven scalability
10. Conclusion
Temporal versioning in Neo4j is not just a schema change—it is an architectural decision that affects ingestion pipelines, storage models, and query semantics.
The strategy described above enables:
- Efficient ingestion without overwriting data
- Simple current-state queries
- Accurate time-travel analysis
- Clean separation of active vs. historical data
- A scalable, deterministic versioning model
This design supports both high-performance applications and advanced tooling such as diffing, history exploration, and lineage tracking.
If you are building any graph system where state changes matter, this approach provides a strong, production-grade foundation for temporal graph modeling.
Top comments (0)