Divyanshu Deepam

Posted on May 26

10 Avro Schema Mistakes Even Experienced Developer Do

#programming #automation #productivity #kafka

Avro schemas widely used with messaging systems like Apache Kafka to serialize messages into a compact binary format. This dramatically reduces bandwidth and storage overhead compared to sending verbose formats like JSON or XML.
They look deceptively simple until a tiny mistake breaks your serializer, schema registry validation, or downstream consumers.

Some mistakes are obvious rookie errors. Others are subtle enough that even experienced developers make them when moving fast.

I built Dev Suite Avro Schema Validator ( https://devsuite.tools/avro-schema-validator ) to automate the painful part. Paste your schema, and it validates, analyzes, and flags production-breaking issues in seconds.

Here are 10 real Avro schema mistakes that can quietly break production pipelines:

(1) Referencing a Named Type with an Incorrect Fullname

Rule: Subsequent references to a declared named type MUST be made by its full name.

Test Case

{
  "type": "record",
  "name": "Order",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "shippingAddress",
      "type": {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "city", "type": "string" }
        ]
      }
    },
    {
      "name": "billingAddress",
      "type": "com.Address"
    }
  ]
}

Why this fails
Named type references containing a dot are treated by Avro as explicit full names, not partially qualified names. Since com.Address does not match the declared full name com.devsuite.Address, schema resolution fails.

(2) Referencing a Named Type Before It Is Declared

Rule: All named types used within a schema MUST be declared where they are first used.

Test Case

{
  "type": "record",
  "name": "Customer",
  "namespace": "com.devsuite.core",
  "fields": [
    {
      "name": "primaryAccount",
      "type": "com.devsuite.core.Account"
    },
    {
      "name": "secondaryAccount",
      "type": {
        "type": "record",
        "name": "Account",
        "fields": [
          { "name": "routingNumber", "type": "string" }
        ]
      }
    }
  ]
}

Why this fails
The primaryAccount field references com.devsuite.core.Account before it is inline-declared in the secondaryAccount field. Avro parses top-down, left-to-right; a type cannot be referenced by full name until the parser has actually encountered its full structural declaration.

Declare the named type before the first reference that uses it.

(3) Alias Contains the Type’s Own Name

Rule: The aliases attribute MUST NOT contain the name attribute of the named type.

Test Case

{
  "type": "record",
  "name": "Account",
  "namespace": "com.devsuite",
  "aliases": ["Account", "OldAccount"],
  "fields": [
    { "name": "accountId", "type": "string" }
  ]
}

Why this fails
Aliases in Avro exist to support schema evolution, particularly when a type has been renamed but older producers or consumers may still refer to the previous name. Including the current schema name inside its own aliases list defeats the purpose entirely because the alias mechanism is meant to represent alternate historical identities, not duplicate the current one.

Account should be removed from the aliases array

(4) Duplicate Equivalent Types Inside a Union

Rule: Any primitive type MUST be included at most once, which also applies to logical type annotations. A UUID logical type, which annotates string, and a string primitive type therefore MUST NOT appear in the same type union.

Test Case

{
  "type": "record",
  "name": "Event",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "identifier",
      "type": [
        "string",
        {
          "type": "string",
          "logicalType": "uuid"
        }
      ]
    }
  ]
}

Why this fails
Avro unions distinguish branches by schema type category, not semantic meaning. Since a UUID logical type is still fundamentally a string, combining both creates ambiguity during resolution.

(5) Multiple Arrays or Maps Inside the Same Union

Rule: A union MUST NOT contain more than one array type and NOT more than one map type.

Test Case

{
  "type": "record",
  "name": "DataPayload",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "collections",
      "type": [
        { "type": "array", "items": "string" },
        { "type": "array", "items": "int" }
      ]
    }
  ]
}

Why this fails
The issue is that Avro distinguishes union members by top-level schema category, not by their internal configuration details. Two arrays are still both arrays, regardless of whether their item definitions differ. The same applies to maps. During deserialization, Avro cannot reliably determine which union branch should be selected purely based on the fact that both branches are structurally the same top-level type. This makes the schema ambiguous and therefore invalid.

Tip - Wrap structurally different meanings inside named records instead of directly placing multiple arrays or maps in the same union.

(6) Union Default Value Does Not Match the First Type

Rule: The default value of a union field MUST match the structure of the first type declared in the union array.

Test Case

{
  "type": "record",
  "name": "UserStatus",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "state",
      "type": ["null", "string"],
      "default": "ACTIVE"
    }
  ]
}

Why this fails
The union allows null or a string. However, because “null” is the first element in the array, the default value must be null.
If a developer wants the default value to be “ACTIVE”, they must reorder the union to [“string”, “null”]

(7) Enum Default Not Present in Symbols

Rule: The default value for an enum must be a string that exactly matches one of the values defined in the symbols array.

Test Case

{
  "type": "record",
  "name": "UserStatus",
  "namespace": "com.devsuite.test",
  "fields": [
    {
      "name": "status",
      "type": {
        "type": "enum",
        "name": "StatusEnum",
        "symbols": ["ACTIVE", "INACTIVE", "BANNED"]
      },
      "default": "PENDING"
    }
  ]
}

Why this fails
Enum defaults are not arbitrary fallback strings chosen by business meaning; they must correspond exactly to one of the enum’s declared symbols. Developers frequently make this mistake when renaming enum values during schema evolution or when choosing a semantically meaningful default that feels right but no longer exists in the actual symbols array.

(8) Default Object Missing Required Fields

Rule: The default attribute value MUST be a structurally valid instance representation of that specific type.

Test Case

{
  "type": "record",
  "name": "MapData",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "coordinates",
      "type": {
        "type": "record",
        "name": "Point",
        "fields": [
          { "name": "x", "type": "int" },
          { "name": "y", "type": "int" }
        ]
      },
      "default": {
        "x": 100
      }
    }
  ]
}

Why this fails
Default values in Avro are not placeholders or partially descriptive hints; they must be fully valid structural representations of the declared schema. A common mistake happens when developers define a nested record default and provide only some of the fields, assuming omitted values will somehow be inferred or auto-filled. Avro does not do that unless those omitted fields themselves define valid defaults. If a nested record requires fields x and y , providing only x makes the entire default structurally incomplete. This becomes especially easy to miss in large schemas where nested records span many fields and developers manually craft defaults under time pressure.

(9) Invalid Duration Logical Type Definition

Rule: The duration logical type extends the fixed type and must annotate a fixed size of exactly 12 bytes.

Test Case

{
  "type": "record",
  "name": "Timeline",
  "namespace": "com.devsuite",
  "fields": [
    {
      "name": "windowSize",
      "type": {
        "type": "fixed",
        "name": "Interval",
        "size": 8,
        "logicalType": "duration"
      }
    }
  ]
}

Why this fails
In Avro, duration is extremely specific. It represents three unsigned 32-bit integers corresponding to months, days, and milliseconds, which together require exactly 12 bytes of fixed storage. If duration is attached to anything other than a fixed type of size 12, the binary representation no longer matches Avro’s expected encoding contract.

(10) Fixed Type Without Valid Size

Rule: A fixed type must have a size attribute that is an integer strictly greater than zero.

Test Case

{
  "type": "record",
  "name": "HashRecord",
  "namespace": "com.devsuite.test",
  "fields": [
    {
      "name": "md5",
      "type": {
        "type": "fixed",
        "name": "MD5"
      }
    }
  ]
}

Why this fails
A fixed type exists specifically to represent a binary blob of exact known size. Without a valid positive size, Avro has no idea how much memory should be allocated or how many bytes should be read and written during serialization. Developers often scaffold fixed types quickly for hashes, binary identifiers, or protocol payloads and forget to define the actual size, treating it as metadata to fill later. But for Avro, size is fundamental to the schema’s structural definition.

DEV Community

10 Avro Schema Mistakes Even Experienced Developer Do

Top comments (0)