<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mikhail Shytsko</title>
    <description>The latest articles on DEV Community by Mikhail Shytsko (@mikh-sh).</description>
    <link>https://dev.to/mikh-sh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948102%2Fa736e791-77e9-40e8-b6df-1059059f6f5e.jpg</url>
      <title>DEV Community: Mikhail Shytsko</title>
      <link>https://dev.to/mikh-sh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mikh-sh"/>
    <language>en</language>
    <item>
      <title>Your Test Data Is Type-Correct and Still Invalid: 6 Postgres Schema Features Generators Skip</title>
      <dc:creator>Mikhail Shytsko</dc:creator>
      <pubDate>Mon, 01 Jun 2026 19:25:26 +0000</pubDate>
      <link>https://dev.to/mikh-sh/your-test-data-is-type-correct-and-still-invalid-6-postgres-schema-features-generators-skip-4c7m</link>
      <guid>https://dev.to/mikh-sh/your-test-data-is-type-correct-and-still-invalid-6-postgres-schema-features-generators-skip-4c7m</guid>
      <description>&lt;h1&gt;
  
  
  Your Test Data Is Type-Correct and Still Invalid: 6 Postgres Schema Features Generators Skip
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Composite primary keys, partial unique indexes, cross-column CHECK constraints, JSONB shape, &lt;code&gt;GENERATED ALWAYS&lt;/code&gt; columns, and row-level security all reject type-correct data, because column types are not what your schema actually enforces.&lt;/p&gt;

&lt;p&gt;A few months ago I watched a seed run finish with a clean green summary: every column populated, every type correct, a few thousand rows inserted. The first integration test then failed on an &lt;code&gt;INSERT&lt;/code&gt; the application itself ran. The generated data was valid the way a sentence with correct grammar can still be a lie. Each value matched its column type. The combination of values broke a constraint the generator never looked at.&lt;/p&gt;

&lt;p&gt;That gap has a simple cause. A column type is a per-column promise: this is an &lt;code&gt;integer&lt;/code&gt;, this is &lt;code&gt;text&lt;/code&gt;, this is &lt;code&gt;jsonb&lt;/code&gt;. Most of what a real schema enforces is not per-column. It lives one level up: across columns in a row, across rows in a table, or across the role doing the writing. A generator that thinks in columns produces data that is type-correct and still invalid.&lt;/p&gt;

&lt;p&gt;Here are six places that gap shows up in Postgres, what each one actually enforces, and a query you can run to see whether your own generated data respects it.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Composite primary keys: the tuple is unique, not the columns
&lt;/h2&gt;

&lt;p&gt;A composite primary key enforces uniqueness over the &lt;em&gt;combination&lt;/em&gt; of columns, not over each column on its own. The docs put it plainly: "the combination of values in the indicated columns is unique across the whole table, though any one of the columns need not be (and ordinarily isn't) unique."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;enrollment&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;course_id&lt;/span&gt;  &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;course_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A column-by-column generator handles this badly in two opposite ways. It either treats &lt;code&gt;student_id&lt;/code&gt; as a unique key and never lets a student enroll in two courses, or it generates both columns independently and produces duplicate &lt;code&gt;(student_id, course_id)&lt;/code&gt; pairs that collide on insert. Both are wrong, and the second one only surfaces once enough rows exist to cause a collision, usually in CI rather than on a laptop with ten rows.&lt;/p&gt;

&lt;p&gt;There is a second trap here: a primary key forces every participating column to &lt;code&gt;NOT NULL&lt;/code&gt;. Adding a primary key "will force the column(s) to be marked &lt;code&gt;NOT NULL&lt;/code&gt;," so a generator that emits an occasional &lt;code&gt;NULL&lt;/code&gt; for a nullable-looking integer will fail against a column it didn't realize was mandatory.&lt;/p&gt;

&lt;p&gt;To count the duplicate tuples your data would reject:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;course_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;enrollment&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;course_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that returns any rows, your generator is treating a tuple constraint as a set of column constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Partial unique indexes: uniqueness with a WHERE clause
&lt;/h2&gt;

&lt;p&gt;This is the one I see missed most often, because it isn't a constraint at all. It's an index, and generators that introspect constraints never see it. A partial unique index enforces uniqueness "among the rows that satisfy the index predicate, without constraining those that do not."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;one_active_subscription&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That index says: a user may have many subscriptions, but only one &lt;em&gt;active&lt;/em&gt; one. A generator that produces realistic-looking subscription histories, several rows per user with a mix of statuses, will happily hand two of them &lt;code&gt;status = 'active'&lt;/code&gt; and hit a unique violation that exists only for the active subset. Nothing in the column types hints at it. Nothing in the foreign keys hints at it. The rule lives in a &lt;code&gt;WHERE&lt;/code&gt; clause on an index.&lt;/p&gt;

&lt;p&gt;Diagnostic, to find the predicate subset that would collide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Worth knowing: you cannot express this as a table constraint with &lt;code&gt;ALTER TABLE ... ADD CONSTRAINT&lt;/code&gt;. Partial uniqueness only exists through &lt;code&gt;CREATE UNIQUE INDEX ... WHERE&lt;/code&gt;, which is exactly why constraint-only introspection misses it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. CHECK constraints: the rule that spans two columns
&lt;/h2&gt;

&lt;p&gt;A CHECK constraint can reference more than one column in the same row, and that cross-column form is where generated data falls down. A per-column generator picks each value in isolation, so it has no way to satisfy a rule that relates two of them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;bookings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;starts_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ends_at&lt;/span&gt;   &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ends_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;starts_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generate &lt;code&gt;starts_at&lt;/code&gt; and &lt;code&gt;ends_at&lt;/code&gt; independently from a plausible date range and roughly half your rows will have an end before the start. Every value is a valid timestamp. The row is still rejected.&lt;/p&gt;

&lt;p&gt;Two details that bite specifically during seeding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NULL passes the check.&lt;/strong&gt; A CHECK is satisfied when its expression is true &lt;em&gt;or&lt;/em&gt; null. The columns above are &lt;code&gt;NOT NULL&lt;/code&gt;, so it can't bite in this example, but the moment a checked column is nullable, &lt;code&gt;CHECK (ends_at &amp;gt; starts_at)&lt;/code&gt; passes on every row where &lt;code&gt;ends_at&lt;/code&gt; is null. On a nullable schema, write the diagnostic as &lt;code&gt;WHERE col IS NULL OR NOT (...)&lt;/code&gt; so those rows aren't silently skipped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;NOT VALID&lt;/code&gt; constraints lie about coverage.&lt;/strong&gt; A constraint added &lt;code&gt;NOT VALID&lt;/code&gt; is enforced for new rows immediately but never checked against existing rows until you run &lt;code&gt;VALIDATE CONSTRAINT&lt;/code&gt;. If you seed into a table that has a &lt;code&gt;NOT VALID&lt;/code&gt; check, the seed is held to the rule even though the old data isn't.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run the constraint expression as a query and count the violations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bookings&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ends_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;starts_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. JSONB: the column type that enforces almost nothing
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;jsonb&lt;/code&gt; column guarantees one thing: the value is valid JSON. It does not enforce keys, required fields, or value types. The structure "is typically unenforced," in the documentation's words. The shape your application depends on lives entirely in application code, not in the column.&lt;/p&gt;

&lt;p&gt;This is a problem for generators in both directions. A naive generator drops &lt;code&gt;'{}'&lt;/code&gt; or a random string-keyed blob into the column, it's valid JSON, the insert succeeds, and the first code path that reads &lt;code&gt;payload-&amp;gt;&amp;gt;'amount'&lt;/code&gt; gets null and breaks far away from the cause. A generator that knows the column is &lt;code&gt;jsonb&lt;/code&gt; still has no schema to generate against, because Postgres never had one to give it.&lt;/p&gt;

&lt;p&gt;You can pull &lt;em&gt;some&lt;/em&gt; of the contract back into the database with a CHECK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
    &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;events_payload_shape&lt;/span&gt;
    &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'type'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'amount'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;jsonb_typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'amount'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'number'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The explicit &lt;code&gt;payload ? 'amount'&lt;/code&gt; test is doing real work. Drop it and a row with no &lt;code&gt;amount&lt;/code&gt; key passes anyway, because &lt;code&gt;jsonb_typeof(payload -&amp;gt; 'amount')&lt;/code&gt; returns SQL NULL on a missing key, and a CHECK is satisfied by NULL. It's the same trap as Section 3, hiding in a different operator.&lt;/p&gt;

&lt;p&gt;Diagnostic, with absence treated as a violation rather than skipped over:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
   &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'type'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;jsonb_typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'amount'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'number'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;IS DISTINCT FROM&lt;/code&gt; is the key move: unlike &lt;code&gt;= 'number'&lt;/code&gt;, it returns true (a violation) when the left side is NULL, so a missing &lt;code&gt;amount&lt;/code&gt; is counted instead of silently passing. If you have no such CHECK and no such query, your generated JSON is "valid" only in the sense that Postgres accepts it, not in the sense that your application will.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Generated columns: the value you must not write
&lt;/h2&gt;

&lt;p&gt;A generated column is computed from other columns, and the docs are blunt about it: "A generated column cannot be written to directly." Try to insert one anyway and Postgres returns &lt;code&gt;cannot insert a non-DEFAULT value into column&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;line_items&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;quantity&lt;/span&gt;    &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;unit_price&lt;/span&gt;  &lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total_price&lt;/span&gt; &lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;unit_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;STORED&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A generator that builds its column list by reading every column in the table will try to insert &lt;code&gt;total_price&lt;/code&gt; and fail on the first row. A generator that simply skips unknown columns might omit a column that &lt;em&gt;isn't&lt;/em&gt; generated. Knowing which columns are write-protected is the difference, and it is not visible from the column type. As far as the type system is concerned, &lt;code&gt;total_price&lt;/code&gt; is just a &lt;code&gt;numeric&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There's a version trap here worth flagging, because it changed recently. &lt;code&gt;STORED&lt;/code&gt; generated columns arrived in Postgres 12, and through Postgres 17 the keyword was &lt;strong&gt;required&lt;/strong&gt;: omit it and you get a syntax error, so every generated column on those versions is stored. Postgres 18 made the keyword optional and &lt;strong&gt;virtual&lt;/strong&gt; the default, so an unqualified &lt;code&gt;GENERATED ALWAYS AS (...)&lt;/code&gt; is now a virtual column computed at read time. That bites on upgrade. Virtual columns can't be indexed yet (planned for a later release), so a column that used to be &lt;code&gt;STORED&lt;/code&gt; and indexed becomes un-indexable the moment someone drops the keyword. Always write &lt;code&gt;STORED&lt;/code&gt; explicitly when you mean stored. The no-direct-write restriction applies to both kinds.&lt;/p&gt;

&lt;p&gt;List the generated columns a writer must never populate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_generated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generation_expression&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;table_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'line_items'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;is_generated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ALWAYS'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Row-level security: "valid data" depends on who is writing
&lt;/h2&gt;

&lt;p&gt;Row-level security is the feature that breaks the assumption underneath all the others, namely that whether a row is valid is a property of the row. Under RLS it is a property of the row &lt;em&gt;and the role doing the write&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.tenant_id'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.tenant_id'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;WITH CHECK&lt;/code&gt; clause is applied to every &lt;code&gt;INSERT&lt;/code&gt; and &lt;code&gt;UPDATE&lt;/code&gt;: a row whose &lt;code&gt;tenant_id&lt;/code&gt; doesn't match the current tenant is rejected even though every value in it is type-correct and constraint-clean. Two facts compound this for anyone generating data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The table owner bypasses RLS by default.&lt;/strong&gt; Seed as the owner and every policy stays silent, so the data looks fine until the application connects as a normal role and the same rows turn out to be invisible or the same inserts get refused. Unless the table has &lt;code&gt;FORCE ROW LEVEL SECURITY&lt;/code&gt;, your seed never exercised the policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superusers and &lt;code&gt;BYPASSRLS&lt;/code&gt; roles skip policies entirely.&lt;/strong&gt; Seeding scripts often connect as exactly these privileged roles, so generate through one and you've tested nothing about the rules that govern real traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Diagnostic, to see which tables have policies your seed role might be quietly bypassing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;policyname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_policies&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If those tables matter and your generator connects as the owner, your "valid" rows were never measured against the rules that decide validity in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the tools actually sit
&lt;/h2&gt;

&lt;p&gt;None of this means generated test data is a bad idea. It means the question to ask a generator is not "does it produce realistic values" but "how much of the schema does it treat as input." Roughly three tiers exist today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free and DIY.&lt;/strong&gt; &lt;a href="https://github.com/joke2k/faker" rel="noopener noreferrer"&gt;Faker&lt;/a&gt;, ORM seeders, and hand-written scripts generate values per column. Relationships, table-level constraints, and the features above stay your job, in your code, kept in sync by hand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema-aware generators.&lt;/strong&gt; A newer middle tier reads the schema and treats its structure as a first-class input. &lt;a href="https://github.com/nucleuscloud/neosync" rel="noopener noreferrer"&gt;Neosync&lt;/a&gt; and Seedfast are two of several tools that take this approach in different ways. The honest trade-off is that "schema-aware" is a spectrum. Foreign keys and &lt;code&gt;NOT NULL&lt;/code&gt; are widely handled, while partial unique indexes, cross-column CHECKs, and RLS are exactly where coverage varies between tools, so it's worth testing each against your own constraints rather than trusting the label.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise TDM platforms.&lt;/strong&gt; Anonymization and masking suites that transform production data. They cover a lot, at the cost of needing production access and a setup measured in weeks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The useful question isn't which price tier a tool sits in, it's how far past column types it actually looks. Run your real constraints through any candidate before you trust it, whatever tier it claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick decision table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;If your schema relies on...&lt;/th&gt;
&lt;th&gt;Before trusting generated rows, check...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Composite primary keys&lt;/td&gt;
&lt;td&gt;duplicate tuples, and &lt;code&gt;NULL&lt;/code&gt; in any PK column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Partial unique indexes&lt;/td&gt;
&lt;td&gt;uniqueness only within the predicate subset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-column CHECKs&lt;/td&gt;
&lt;td&gt;the expression as a &lt;code&gt;WHERE NOT (...)&lt;/code&gt; query, adding &lt;code&gt;col IS NULL OR&lt;/code&gt; for nullable columns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSONB shape&lt;/td&gt;
&lt;td&gt;required keys and value types via a CHECK or a probe query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;GENERATED ALWAYS&lt;/code&gt; columns&lt;/td&gt;
&lt;td&gt;the writer skips them;&lt;code&gt;STORED&lt;/code&gt; is explicit on PG 18+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Row-level security&lt;/td&gt;
&lt;td&gt;the seed role isn't the owner or &lt;code&gt;BYPASSRLS&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When none of this matters
&lt;/h2&gt;

&lt;p&gt;Plenty of test data doesn't need to clear this bar. A unit test that touches one flat table is better served by a three-line fixture than by anything that introspects a schema. Not every column needs realistic values; sometimes &lt;code&gt;'x'&lt;/code&gt; and &lt;code&gt;1&lt;/code&gt; are the honest choice because the test doesn't care. And there's a class of rules no generator can infer, the domain invariants that live only in your head or your application, like "a refund row must point at a captured payment." Those you encode yourself, or you assert in the test.&lt;/p&gt;

&lt;p&gt;Generators aren't the problem here. The trouble is that "the insert succeeded" and "the data is valid" are two different claims, and the gap between them lives in exactly the schema features that never show up in a column type. Run the six queries above against your own generated data before you trust the green summary.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>supabase</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
