Franck Pachot

Posted on Mar 25

Non-First Normal Forms and MongoDB: an alternative to 4NF to address 3NF anomalies

#database #mormalization #mongodb #document

SQL databases are grounded in relational algebra, but they are not the only databases with a strong theoretical basis. MongoDB, designed as a document database to solve practical engineering problems and improve developer experience, also builds on theory. It supports non–first-normal-form schemas and extends relational operators to work with them. This foundation is described in a 1986 paper, published 23 years earlier: Theory of Non-First Normal Form Relational Databases

MongoDB's aggregation pipeline operators ($unwind, $group, $lookup, set operations) are concrete implementations of the paper's abstract algebraic operators.

I've build the following examples while reading the paper to illustrate the concepts with practical examples.

The Basic ¬1NF Data Model

The paper defines nested relations where attributes can be atomic or relation-valued:

This structure records facts about employees, like some information about their children, their skills, and exams they passed.

This maps directly to a MongoDB document schema:

// This IS the paper's example, expressed as MongoDB documents
db.employees.insertMany([
  {
    ename: "Smith",
    Children: [
      { name: "Sam", dob: "2/10/84" },
      { name: "Sue", dob: "1/20/85" }
    ],
    Skills: [
      {
        type: "typing",
        Exams: [
          { year: 1984, city: "Atlanta" },
          { year: 1985, city: "Dallas" }
        ]
      },
      {
        type: "dictation",
        Exams: [
          { year: 1984, city: "Atlanta" }
        ]
      }
    ]
  },
  {
    ename: "Watson",
    Children: [
      { name: "Sam", dob: "3/12/78" }
    ],
    Skills: [
      {
        type: "filing",
        Exams: [
          { year: 1984, city: "Atlanta" },
          { year: 1975, city: "Austin" },
          { year: 1971, city: "Austin" }
        ]
      },
      {
        type: "typing",
        Exams: [
          { year: 1962, city: "Waco" }
        ]
      }
    ]
  }
])

MongoDB's document model is essentially what Roth called a "database scheme" — a collection of rules where attributes can be zero-order (scalar fields) or higher-order (embedded arrays of documents). The paper's Figure 3-1 is literally a MongoDB collection:

When using an object-oriented language, this model looks natural but this paper is from 1986 so there was another motivation.

Motivation: ¬1NF as an alternative to 4NF decomposition

The paper's example (Figure 1-1) shows an employee relation in 1NF requires 10 rows with massive redundancy:

employee | child | skill
-----------------------------
Smith    | Sam   | typing
Smith    | Sue   | typing
Smith    | Sam   | filing
Smith    | Sue   | filing
Jones    | Joe   | typing
Jones    | Mike  | typing
Jones    | Joe   | dictation
Jones    | Mike  | dictation
Jones    | Joe   | data entry
Jones    | Mike  | data entry

The ¬1NF version needs only 2 tuples. The paper notes three problems with 1NF:

Insert anomaly: Adding a child for Jones requires adding 3 tuples (one per skill)
Update anomaly: Changing Smith's "typing" to "word processing" requires updating multiple rows
Decomposition cost: The 1NF solution requires splitting into two tables and joining them back

Today, these anomalies are not a major problem because a single updateMany() or insertMany operation in a MongoDB transaction is atomic, just like an UPDATE or INSERT in SQL databases. More importantly, the document model makes multi-document operations rare—the Cartesian explosion of redundant data remains undesirable.

The first four normal forms (1NF, 2NF, 3NF, BCNF) do not address this, because there are no non-trivial functional dependencies in this relation: all three attributes together form the only candidate key. This relation is therefore already in BCNF. This is precisely why Fagin introduced 4NF in 1977.

This example illustrates a multivalued dependency (MVD) violation: children and skills are independent multivalued attributes of an employee, but they’re stored together. This creates a redundant Cartesian product (10 rows for 2 employees). Fourth Normal Form (4NF) addresses this by splitting the independent dependencies into separate tables to remove these anomalies, but at the cost of more tables touched per transaction and more joins per query.

Even a Third Normal Form (3NF) schema can still suffer from insert and update anomalies and data redundancy. MongoDB’s document model can offer many of 4NF’s benefits without incurring the join overhead.

Here is the MongoDB Equivalent:

// The ¬1NF representation — exactly what MongoDB does naturally
db.employees.insertMany([
  {
    employee: "Smith",
    Children: [{ child: "Sam" }, { child: "Sue" }],
    Skills: [{ skill: "typing" }, { skill: "filing" }]
  },
  {
    employee: "Jones",
    Children: [{ child: "Joe" }, { child: "Mike" }],
    Skills: [
      { skill: "typing" },
      { skill: "dictation" },
      { skill: "data entry" }
    ]
  }
])

Adding a new child for Jones is just one operation, no anomaly:


db.employees.updateOne(
  { employee: "Jones" },
  { $push: { Children: { child: "Sara" } } }
)

Changing Smith's "typing" to "word processing" — one operation:


db.employees.updateOne(
  { employee: "Smith", "Skills.skill": "typing" },
  { $set: { "Skills.$.skill": "word processing" } }
)

Querying everything about an employee doesnt need a join:


db.employees.findOne({ employee: "Smith" })

MongoDB exists for a fundamental reason: its document model eliminates the need for decomposition and joins, as well as the update anomalies introduced by the 1NF constraint. Most workloads rely on single-document operations, which MongoDB physically optimizes as single-shard transactions, a single disk read or write, and a single replication operation. These inserts and updates lock the document, which serves as the consistency boundary, but modify only the fields whose values change and update only the corresponding index entries.

Nest and Unnest Operators

The paper defines nest (ν) to aggregate flat rows into nested structures, and unnest (μ) to flatten nested structures back:

ν_{B=(C,D)}(r): nest attributes C,D into a new nested relation B
μ_{B}(r): unnest nested relation B

MongoDB's aggregation pipeline has direct equivalents.

The paper's unnest (μ) operator is $unwind:

db.employees.aggregate([
  { $unwind: "$Skills" }
])

Each skill becomes its own document, duplicating employee info.

Deep unnest (μ* in the paper) is possible with multiple $unnest stages:

db.employees.aggregate([
  { $unwind: "$Children" },
  { $unwind: "$Skills" }
])

This produces the full 1NF Cartesian expansion, flattening both Children and Skills.

When starting with a flat collection, the paper's nest operator (ν) is $group:

db.flat_employees.aggregate([
  {
    $group: {
      _id: "$employee",
      Children: { $addToSet: { child: "$child" } },
      Skills:   { $addToSet: { skill: "$skill" } }
    }
  }
])

The paper proves that the order of unnesting doesn't matter (Thomas & Fischer's result). In MongoDB, these produce the same fully-flat result:

db.employees.aggregate([
  { $unwind: "$Children" },
  { $unwind: "$Skills" }
])

db.employees.aggregate([
  { $unwind: "$Skills" },
  { $unwind: "$Children" }
])

The paper also notes that nest is not always an inverse for unnest in general, but IS an inverse for Partitioned Normal Form (PNF) relations.

Partitioned Normal Form (PNF)

PNF requires that the atomic attributes of each relation (and each nested relation) form a key. The paper shows a pathological relation (Figure 1-3) that violates PNF:

Smith, the same employee, has two different skill sets, and Jones has a duplicate across sets.

In MongoDB, PNF corresponds to the fundamental design principle that _id determines the document. The pathological case above would mean having two documents for "Smith" with different Skills arrays — which is exactly what MongoDB's _id uniqueness constraint prevents at the collection level.

Without using _id as the employee identifier, we could have two employees with the same name:

db.bad_design.insertMany([
  { employee: "Smith", Skills: ["typing", "filing"] },
  { employee: "Smith", Skills: ["sorting", "mailing"] }
])

The correct design if we don't want to use _id is enforcing PNF with a unique index:

db.good_design.createIndex({ employee: 1 }, { unique: true })

db.good_design.insertOne({
  employee: "Smith",
  Skills: ["typing", "filing", "sorting", "mailing"]
})

Like in many papers, the employee name is used as an identifier (two "Smith" is one person) but obviously in real life there's a generated identifier to identify the physical person.

The paper, in theorem 5-1, proves that PNF is closed under unnesting. In MongoDB terms: if your documents are well-designed (one document per logical entity), then $unwind won't create ambiguous or inconsistent flat results.

For nested relations, PNF also applies recursively.

MongoDB schema validation can enforce the PNF constraint by comparing the size of an array with the size when duplicates are ignored (using $setUnion with an empty array):

db.createCollection("employees", {
  validator: {
    $and: [
      // JSON Schema for structure and types
      {
        $jsonSchema: {
          bsonType: "object",
          required: ["ename"],
          properties: {
            ename: { bsonType: "string" },
            Children: { bsonType: "array", items: { bsonType: "object", required: ["name"], properties: { name: { bsonType: "string" }, dob: { bsonType: "string" } } } },
            Skills: { bsonType: "array", items: { bsonType: "object", required: ["type"], properties: { type: { bsonType: "string" },
             Exams: { bsonType: "array", items: { bsonType: "object", required: ["year"], properties: { year: { bsonType: "int" }, city: { bsonType: "string" } } } } } } }
          }
        }
      },
      // PNF uniqueness constraints using $expr
      {
        $expr: {
          $and: [
            // Level 1: Children.name must be unique within the document
            {
              $eq: [
                { $size: { $ifNull: ["$Children.name", []] } },
                { $size: { $setUnion: [{ $ifNull: ["$Children.name", []] }] } }
              ]
            },
            // Level 1: Skills.type must be unique within the document
            {
              $eq: [
                { $size: { $ifNull: ["$Skills.type", []] } },
                { $size: { $setUnion: [{ $ifNull: ["$Skills.type", []] }] } }
              ]
            },
            // Level 2: Within EACH Skills element, Exams.year must be unique
            {
              $allElementsTrue: {
                $map: {
                  input: { $ifNull: ["$Skills", []] },
                  as: "skill",
                  in: { $eq: [ { $size: { $ifNull: ["$$skill.Exams.year", []] } }, { $size: { $setUnion: [{ $ifNull: ["$$skill.Exams.year", []] }] } } ] }
                }
              }
            }
          ]
        }
      }
    ]
  },
  validationLevel: "strict",
  validationAction: "error"
});

The fact that nest is not always the inverse of unnest is a well-known MongoDB pitfall:

db.employees.drop()  
db.employees.insertMany([  
  {  
    _id: 1,  
    employee: "Smith",  
    Children: [{ child: "Sam" }, { child: "Sue" }],  
    Skills: [{ skill: "typing" }]  
  },  
  {  
    _id: 2,  
    employee: "Smith",  
    Children: [{ child: "Tom" }],  
    Skills: [{ skill: "filing" }, { skill: "sorting" }]  
  },  
  {  
    _id: 3,  
    employee: "Jones",  
    Children: [{ child: "Joe" }],  
    Skills: [{ skill: "typing" }]  
  }  
])  

db.employees.aggregate([  
  // unnest (μ) 
  { $unwind: "$Children" },  
  // nest (ν), grouping by employee, which is NOT a key  
  {  
    $group: {  
      _id: "$employee",  
      Children: { $addToSet: "$Children" },  
      Skills: { $first: "$Skills" }  
    }  
  },  
  // Clean up  
  { $project: { _id: 0, employee: "$_id", Children: 1, Skills: 1 } },  
  { $sort: { employee: 1 } }  
])

This results in two Smiths collapsed into one:

[
  {
    Children: [ { child: 'Joe' } ],
    Skills: [ { skill: 'typing' } ],
    employee: 'Jones'
  },
  {
    Children: [ { child: 'Tom' }, { child: 'Sam' }, { child: 'Sue' } ],
    Skills: [ { skill: 'typing' } ],
    employee: 'Smith'
  }
]

The correct way is grouping by the proper key:

db.employees.aggregate([  
  // unnest (μ)
  { $unwind: "$Children" },  
  // nest (ν), grouping by _id, which IS a key (PNF)  
  {  
    $group: {  
      _id: "$_id",  
      employee: { $first: "$employee" },  
      Children: { $push: "$Children" },  
      Skills: { $first: "$Skills" }  
    }  
  },  
  // Clean up  
  { $project: { _id: 1, employee: 1, Children: 1, Skills: 1 } },  
  { $sort: { _id: 1 } }  
])

This is the paper's Theorem 5-2 demonstrated in MongoDB: nest inverts unnest if and only if the grouping key satisfies PNF.

Extended Algebra Operators

I a previous post, From Relational Algebra to Document Semantics I explained how MongoDB extended the semantics of the relational selection Selection (σ) to non-1NF schemas, and mentioned that other relational operations are available in MongoDB. They are covered by the ¬1NF paper.

The paper defines extended union (∪ᵉ) to merge nested relations for tuples that agree on atomic attributes, rather than treating them as separate tuples. In MongoDB, this is achieved with $merge or with aggregation. Suppose we want to merge new course data into existing student records:

db.students.insertMany([
  {
    sname: "Jones",
    Courses: [
      { cname: "Math", grade: "A" },
      { cname: "Science", grade: "B" }
    ]
  },
  {
    sname: "Smith",
    Courses: [
      { cname: "Math", grade: "A" },
      { cname: "Physics", grade: "C" },
      { cname: "Science", grade: "A" }
    ]
  }
])

db.students.updateOne(
  { sname: "Jones" },
  { $addToSet: { Courses: { cname: "Physics", grade: "B" } } }
)

db.students.updateOne(
  { sname: "Smith" },
  {
    $addToSet: {
      Courses: {
        $each: [
          { cname: "Chemistry", grade: "A" },
          { cname: "English", grade: "B" }
        ]
      }
    }
  }
)

This added "Physics:B" to Jones, and "Chemistry:A", "English:B" to Smith without adding two tuples with different course sets like a standard union would do:

db.students.find()

[
  {
    _id: ObjectId('69c3a72344fc089068d4b0c2'),
    sname: 'Jones',
    Courses: [
      { cname: 'Math', grade: 'A' },
      { cname: 'Science', grade: 'B' },
      { cname: 'Physics', grade: 'B' }
    ]
  },
  {
    _id: ObjectId('69c3a72344fc089068d4b0c3'),
    sname: 'Smith',
    Courses: [
      { cname: 'Math', grade: 'A' },
      { cname: 'Physics', grade: 'C' },
      { cname: 'Science', grade: 'A' },
      { cname: 'Chemistry', grade: 'A' },
      { cname: 'English', grade: 'B' }
    ]
  }
]

The paper emphasizes that standard set union treats entire tuples as atomic — if two tuples differ at all, both appear in the result. Extended union is smarter: it matches on keys and merges the nested parts. MongoDB's $addToSet and $merge stage embody this exact semantics.

MongoDB provides $setIntersection (∩ᵉ), $setUnion (∪ᵉ), and $setDifference(−ᵉ) — these are the exact extended operators from the paper, applied to arrays within documents.

The paper defines extended projection (πᵉ) as normal projection followed by tuplewise extended union — merging tuples that agree on remaining atomic attributes. In MongoDB — this is $group after $project.

The paper limits extended join (⋈ᵉ) to relations sharing top-level attributes. Common nested attributes are intersected. In MongoDB, $lookup performs the join, and we can intersect nested parts.

Null Values and Empty Nested Relations

The paper defines three null types ("do not exists", "unknown", "no information") and establishes that empty nested relations means "no-information", not "nonexistent".

MongoDB has null, missing fields, and empty arrays, which map to the paper's null types.

"no information" is an empty array

For example, this adds employee Jones with unknown information as empty arrays:

db.employees.insertOne({
  ename: "Jones",
  Children: [],    // Paper says this equals {(ni, ni)} — we know nothing
  Skills: []       // Same — empty array = no information
})

"unknown" is a field with a null value

Jones has some skills, but we don't know which ones, only that he took an exam somewhere in 1981:

db.employees.updateOne(
  { ename: "Jones" },
  {
    $set: {
      Children: null,  // MongoDB null ≈ paper's "we don't know"
      Skills: [{
        type: null,    // there IS a skill, we don't know which
        Exams: [{ year: 1981, city: null }]  // unknown city
      }]
    }
  }
)

"do not exists" is an inexisting field

Jones definitively has no children:

db.employees.updateOne(  
  { ename: "Jones" },  
  {  
    $unset: { Children: "" }  // Remove the field entirely — no children  
  }  
)

MongoDB's $unwind by default removes documents with empty arrays. The paper says that unnesting an empty set should produce a null tuple, not delete the document. MongoDB equivalent uses preserveNullAndArrays:

db.employees.aggregate([
  { $unwind: { path: "$Skills", preserveNullAndArrays: true } }
])

SQL/NF Query Language → MongoDB Query Language

The paper introduces SQL/NF, a user-oriented non-1NF query language based on commercial SQL and a proposed relational database standard. It never became a standard but MongoDB's query language implements the same.

SQL/NF accepts nested expressions in SELECT (Projection with filtering):

SELECT dname, (SELECT ALL FROM Emps WHERE sal > 35000)
FROM Company
WHERE EXISTS(Emps WHERE sal > 35000)

MongoDB can do the same in $project:

db.company.aggregate([  
  { $match: { "Emps.sal": { $gt: 35000 } } },  
  {  
    $project: {  
      _id: 0,  
      dname: 1,  
      Emps: {  
        $filter: {  
          input: "$Emps",  
          as: "emp",  
          cond: { $gt: ["$$emp.sal", 35000] }  
        }  
      }  
    }  
  }  
])

The paper argues that functions should be applied to relations, not columns:

SELECT dno, COUNT(Emps)
FROM Company;

SELECT dno
FROM Company
WHERE COUNT(Emps) > 10;

This is exactly how MongoDB's aggregation works:

db.company.aggregate([
  {
    $project: {
      dno: 1,
      emp_count: { $size: "$Emps" }
    }
  }
])

db.company.aggregate([
  {
    $match: {
      $expr: { $gt: [{ $size: "$Emps" }, 10] }
    }
  },
  { $project: { dno: 1 } }
])

The paper makes a profound observation: GROUP BY creates a ¬1NF structure (partitioned relations) with special rules, and is unnecessary when you already have nested relations.

In SQL on a 1NF schema the following gets the departements with more than ten employees:

SELECT dno
FROM Dept
WHERE dno IN (
  SELECT dno FROM Emp
  GROUP BY dno
  HAVING COUNT(DISTINCT eno) > 10
)

SQL/NF on a ¬1NF where all is nested, the query is simpler:

SELECT dno
FROM Company
WHERE COUNT(Emps) > 10

In MongoDB, when data is already nested, no $group is needed:

db.company.find(
  { $expr: { $gt: [{ $size: "$Emps" }, 10] } },
  { dno: 1 }
)

This is one of MongoDB's core value propositions: by embedding related data, many queries that require GROUP BY in SQL become trivial reads.

SQL/NF introduce NEST and UNNEST:

NEST Emp ON eno, ename, sal AS Emps;

UNNEST Company ON Emps;

We have seen that the equivalent in MongoDB are $group and $unwind:

db.emp.aggregate([
  {
    $group: {
      _id: "$dno",
      Emps: {
        $push: { eno: "$eno", ename: "$ename", sal: "$sal" }
      }
    }
  }
])

db.company.aggregate([
  { $unwind: "$Emps" }
])

The paper introduces the concept that modifying nested data with with SQL/NF is a MODIFY on the outer document:

-- Insert a new employee into department 10
MODIFY Company
SET Emps = (STORE Emps VALUES <32, Samuels, 49000>)
WHERE dno = 10

-- Give 10% raise to dept 10 employees making >30K
MODIFY Company
SET Emps = (MODIFY Emps SET sal = sal * 1.1 WHERE sal > 30000)
WHERE dno = 10

MongoDB uses $push to insert in a nested array:

// Insert new employee into department 10
db.company.updateOne(
  { dno: 10 },
  { $push: { Emps: { eno: 32, ename: "Samuels", sal: 49000 } } }
)

MongoDB uses $arrayFilters to target an array item:

// 10% raise for dept 10 employees making > 30K
db.company.updateOne(
  { dno: 10 },
  { $mul: { "Emps.$[elem].sal": 1.1 } },
  { arrayFilters: [{ "elem.sal": { $gt: 30000 } }] }
)

The paper also defined SQL/NF Data Definition Language (DDL):

SCHEMA
  TABLE Company
    ITEM dno INTEGER UNIQUE NOT NULL
    ITEM dname CHARACTER 10
    ITEM loc CHARACTER 10
    ITEM (TABLE Emps
      ITEM eno INTEGER UNIQUE
      ITEM ename CHARACTER 10
      ITEM sal REAL)

While MongoDB doesn't require to declare the schema upfront, to allow more agility when building an application, it provides schema validation to enforce the schema when stable:

db.createCollection("company", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["dno"],
      properties: {
        dno: {
          bsonType: "int",
          description: "department number — required, unique via index"
        },
        dname: {
          bsonType: "string",
          maxLength: 10
        },
        loc: {
          bsonType: "string",
          maxLength: 10
        },
        Emps: {
          bsonType: "array",
          items: {
            bsonType: "object",
            properties: {
              eno: { bsonType: "int" },
              ename: { bsonType: "string", maxLength: 10 },
              sal: { bsonType: "double" }
            }
          }
        }
      }
    }
  }
})

// PNF enforcement via unique index:
db.company.createIndex({ dno: 1 }, { unique: true })

To avoid an additional index, you can use "_id" instead of "dno".

The paper defines "don't care values" and pattern matching in SQL/NF:

-- Find company tuples where an employee is named Smith with sal 20000
Company WHERE <?, "Smith", 20000> IN Emps

In MongoDB, $elemMatch is the "don't care" pattern match:

db.company.find({
  Emps: {
    $elemMatch: {
      // eno: ? — not specified (don't care)
      ename: "Smith",
      sal: 20000
    }
  }
})

SQL/NF uses reference names and aliasing:

SELECT dname, (Emps WHERE sal > 35000) AS Emps_rich
FROM Company
WHERE EXISTS(Emps_rich)

In MongoDB we use $let or $addFields:

db.company.aggregate([
  {
    $addFields: {
      Emps_rich: {
        $filter: {
          input: "$Emps",
          as: "emp",
          cond: { $gt: ["$$emp.sal", 35000] }
        }
      }
    }
  },
  { $match: { "Emps_rich.0": { $exists: true } } },
  { $project: { dname: 1, Emps_rich: 1 } }
])

¬1NF applications

The paper shows some examples of applications models in non-first normal forms and here are the equivalent in MongoDB.

The paper shows an example of office Forms, like invoices:

Here is the MongoDB document:

db.invoices.insertMany([
  {
    cname: "Smith",
    caddress: "Chicago",
    total: 80.00,
    Orders: [
      { prod_no: 102, qty: 10, price: 1.30, amount: 13.00 },
      { prod_no: 210, qty: 1, price: 67.00, amount: 67.00 }
    ]
  },
  {
    cname: "West",
    caddress: "Auburn",
    total: 20.80,
    Orders: [
      { prod_no: 102, qty: 5, price: 1.30, amount: 6.50 },
      { prod_no: 213, qty: 43, price: 0.10, amount: 4.30 },
      { prod_no: 456, qty: 10, price: 1.00, amount: 10.00 }
    ]
  }
])

The paper shows a circuit design requiring 6 tables in 1NF:

It is a single table in ¬1NF:

MongoDB does the same in one document:

db.entities.insertMany([
  {
    id: 100,
    name: "2-AND",
    Geometry: [
      { x1: 0, y1: 0, x2: 0, y2: 10 },
      { x1: 0, y1: 10, x2: 10, y2: 10 },
      { x1: 10, y1: 10, x2: 10, y2: 0 },
      { x1: 10, y1: 0, x2: 0, y2: 0 }
    ],
    Pins: [
      { no: 1, class: "IN", x: 0, y: 3 },
      { no: 2, class: "IN", x: 0, y: 7 },
      { no: 3, class: "OUT", x: 10, y: 5 }
    ],
    Blocks: [],
    Connections: []
  },
  {
    id: 200,
    name: "4-AND",
    Geometry: [
      { x1: 0, y1: 0, x2: 0, y2: 50 },
      { x1: 0, y1: 50, x2: 50, y2: 50 },
      { x1: 50, y1: 50, x2: 50, y2: 0 },
      { x1: 50, y1: 0, x2: 0, y2: 0 }
    ],
    Pins: [
      { no: 1, class: "IN", x: 0, y: 13 },
      { no: 2, class: "IN", x: 0, y: 17 },
      { no: 3, class: "IN", x: 0, y: 33 },
      { no: 4, class: "IN", x: 0, y: 37 },
      { no: 5, class: "OUT", x: 50, y: 25 }
    ],
    Blocks: [
      { no: 1, type: 100, x: 10, y: 10 },
      { no: 2, type: 100, x: 10, y: 30 },
      { no: 3, type: 100, x: 30, y: 20 }
    ],
    Connections: [
      { cno: 1, block1: 0, pin1: 1, block2: 1, pin2: 1, segments: [] },
      { cno: 2, block1: 0, pin1: 2, block2: 1, pin2: 2, segments: [] },
      { cno: 6, block1: 1, pin1: 3, block2: 3, pin2: 1,
        segments: [{ x: 25, y: 15 }, { x: 25, y: 23 }] },
      { cno: 7, block1: 2, pin1: 3, block2: 3, pin2: 2,
        segments: [{ x: 25, y: 35 }, { x: 25, y: 27 }] }
    ]
  }
])

Complex queries on this structure can use a single multi-key index, and a single read with no joins.

The paper shows an example of statistical databases:

Here is the MongoDB document for it:

db.salary_summary.insertMany([
  {
    age_group: "[18,30]",
    divisions: [
      {
        div: "div1",
        departments: [
          { dept: "man", sum_sal: "100K" },
          { dept: "personnel", sum_sal: "150K" }
        ]
      },
      {
        div: "div2",
        departments: [
          { dept: "acct", sum_sal: "290K" }
        ]
      }
    ]
  },
  {
    age_group: "[31,40]",
    divisions: [
      {
        div: "div1",
        departments: [
          { dept: "man", sum_sal: "200K" },
          { dept: "personnel", sum_sal: "300K" }
        ]
      },
      {
        div: "div2",
        departments: [
          { dept: "acct", sum_sal: "400K" }
        ]
      }
    ]
  }
])

Nested Normal Form (NNF)

The paper presents a method for achieving nested normal form (NNF) that removes anomalies from partial and transitive dependencies in PNF relations. It differs from prior algorithms by deriving non-1NF relations from an initial 4NF decomposition, incorporating embedded multivalued dependencies, and improving how functional dependencies are handled.

The paper's NNF algorithm starts with functional and multivalued dependencies and builds scheme trees that minimize redundancy. This is essentially MongoDB schema design. Here are a few examples.

In example 9.1, the employee determines the location, so instead of an sigleton array:

{
  employee: "Smith",
  Locations: ["Austin"],
  Skills: ["typing", "filing"],
  Children: ["Sam", "Sue"]
}

we use a scalar:

{
  employee: "Smith",
  location: "Austin",
  Skills: ["typing", "filing"],
  Children: ["Sam", "Sue"]
}

Queries and indexing work the same, but it’s preferable to use a scalar for one-to-one relationships and an array for one-to-many. With polymorphism, the schema is per document, so a collection can hold both.

In example 9.3, the scouting database, the paper's final design is three scheme trees:

This is three collections in MongoDB:

// Collection 1: Boy-Girl activities
db.activities.insertMany([
  {
    boy: "Tom",
    girl: "Jane",
    Dates: ["2024-01-15", "2024-02-20"],
    Dances: ["2024-03-01"]
  },
  {
    boy: "Tom",
    girl: "Lisa",
    Dates: ["2024-01-22"],
    Dances: []
  }
])

// Collection 2: Boy scout leaders and their boys
db.bsl_boys.insertMany([
  {
    bsl: "Mr. Adams",
    Boys: ["Tom", "Jerry", "Mike"]
  }
])

// Collection 3: Girl scout leaders and their girls
db.gsl_girls.insertMany([
  {
    gsl: "Ms. Baker",
    Girls: ["Jane", "Lisa", "Sara"]
  }
])

Normalization still applies in MongoDB to clearly depict the intended relationships. There is no referential integrity ("foreign keys") between documents. This preserve the decoupling of scout activities and scout leadership. If all scouts involved in an activity must have a leader, this must be enforced by the application, or may be checked asynchronously by an application pipeline with $lookup.

In example 9.5, the university database, the tutor's office is redundant:

{
  class: "CS101",
  day: "Monday",
  students: [
    { student: "Alice", major: "CS", tutor: "Prof X", office: "Room 101", exam: "A" },
    { student: "Bob", major: "Math", tutor: "Prof X", office: "Room 101", exam: "B" }
    //                                                  ^^^^^^^^^^^^^^^^ redundant!
  ]
}

The further normalization produces three collections to separate the functional dependencies groups:

// Collection 1: Class schedule (Class → Day, then Students and Tutors nested)
db.classes.insertOne({
  class: "CS101",
  day: "Monday",
  Students: [
    { student: "Alice", exam: "A" },
    { student: "Bob", exam: "B" }
  ],
  Tutors: [
    { tutor: "Prof X", hour: "10am" },
    { tutor: "Prof Y", hour: "2pm" }
  ]
})

// Collection 2: Student majors (Student → Major, separate because it's independent)
db.students.insertMany([
  { student: "Alice", major: "CS" },
  { student: "Bob", major: "Math" }
])

// Collection 3: Tutor offices (Tutor → Office, separate for same reason)
db.tutors.insertMany([
  { tutor: "Prof X", office: "Room 101" },
  { tutor: "Prof Y", office: "Room 205" }
])

This is exactly the MongoDB schema design principle: embed data that is naturally "owned" by the parent document, but reference data that has independent identity and would be redundantly repeated.

This level of normalization is not always necessary, and you may accept some redundancy for performance reasons. The initial design, where the tutor’s office is repeated for all students in a class, is acceptable if there is clear ownership in the application and it can run an updateMany() to update all related records in a single transaction.

Summary

Normalization exists on a non-1NF schema and relational operations as well. Functional dependencies are enforced with schema validation and unique index. Multi-valued dependencies can be modeled as separate arrays in the same document. Join dependencies may use embedding or separate collections.

Here is a summary of the mapping between the paper's concepts and MongoDB:

Paper Concept	MongoDB Equivalent
¬1NF Relation	Collection of documents with embedded arrays/documents
Relation scheme / Database scheme	JSON Schema / Collection structure
Zero-order attribute	Scalar field
Higher-order attribute	Embedded array or sub-document
Nest (ν)	`$group` with `$push` / `$addToSet`
Unnest (μ)	`$unwind`
Unnest* (μ*)	Multiple `$unwind` stages
PNF	Unique `_id` / unique indexes; one document per entity
Extended union (∪ᵉ)	`$addToSet`, `$setUnion`, `updateOne` with `$push`
Extended intersection (∩ᵉ)	`$setIntersection`
Extended difference (−ᵉ)	`$setDifference`
Extended join (⋈ᵉ)	`$lookup` + `$setIntersection` on common nested fields
Extended projection (πᵉ)	`$project` + `$group` with `$setUnion`
Selection (σ)	`$match`, `find()`
SQL/NF nested SELECT	`$filter`, `$map` in `$project`
SQL/NF functions on relations	`$size`, `$sum`, `$avg` on arrays
GROUP BY elimination	Natural consequence of document model
NEST ON	`$group` in aggregation
UNNEST ON	`$unwind` in aggregation
NULL / ni	`null`, missing fields
Empty nested relation	Empty array `[]`
`preserveNullAndArrays`	Paper's requirement that unnesting empty sets preserves tuples
dne (does not exist)	Field absent
Outer join	`$lookup` (preserves unmatched documents)
SUBSUME	No built-in; implementable with aggregation
NNF design	MongoDB schema design best practices
FD → scalar field	"Embed as scalar, not array, for 1:1 relationships"
MVD → separate arrays	"Use separate arrays for independent multi-valued attributes"
Scheme trees	Document structure hierarchy
4NF decomposition → collections	"When to embed vs. reference" decision
Reference names / aliasing	`$addFields`, `$let`, `as` in `$lookup`
Don't-care values	`$elemMatch` (match on specified fields, ignore others)
Recursive schemes (future work)	`$graphLookup` (added later to MongoDB)

Roth's 1986 dissertation effectively laid the theoretical foundation for document databases—23 years before MongoDB. The parallels are striking:

Data model: Documents with embedded document arrays correspond to ¬1NF relations with nested relations.
Query language: SQL/NF's orthogonality principle ("wherever a scalar can appear, a relation can appear") matches MongoDB's rule that array operations are available wherever scalar operations exist.
Schema design: NNF maps to MongoDB's embedding-vs-referencing rules: embed for MVDs (independent multi-valued relationships), use scalar fields for FDs (1:1 relationships), and split into collections when data has its own identity.
Operators: $unwind (unnest), $group (nest), and $setUnion / $setIntersection / $setDifference (extended set operators) all match what the paper anticipated.
Null handling: Options like preserveNullAndArrays implement the paper's recommendation that empty nested relations should not cause information loss during unnesting.

Where MongoDB goes beyond the paper is in recursive structures, which Roth explicitly left as future work. $graphLookup supports the recursive schema traversal the dissertation identified as an open problem.

DEV Community