<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marcus</title>
    <description>The latest articles on DEV Community by Marcus (@marcus1968).</description>
    <link>https://dev.to/marcus1968</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4009616%2F12158dea-c717-41b8-be68-130eacdf65ed.jpg</url>
      <title>DEV Community: Marcus</title>
      <link>https://dev.to/marcus1968</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/marcus1968"/>
    <language>en</language>
    <item>
      <title>Data Quality: Dimensions and Error Classes — the Theory Behind the SQL Checks</title>
      <dc:creator>Marcus</dc:creator>
      <pubDate>Sat, 04 Jul 2026 15:38:03 +0000</pubDate>
      <link>https://dev.to/marcus1968/data-quality-dimensions-and-error-classes-the-theory-behind-the-sql-checks-47bb</link>
      <guid>https://dev.to/marcus1968/data-quality-dimensions-and-error-classes-the-theory-behind-the-sql-checks-47bb</guid>
      <description>&lt;p&gt;A lot gets written about data quality, and very little gets measured. The German-language practitioner's standard reference alone lists sixty possible quality criteria — from timeliness to reliability —, and even the lean models still arrive at six to fifteen dimensions. Yet the core of the matter is surprisingly tangible: a data error caught during loading costs you a log entry. The same error running through to the report costs a wrong invoice, a bad decision — and the trust in every number that follows. This article makes the unde&lt;/p&gt;

&lt;p&gt;rlying framework tangible, without a theory slog: it sorts the common data errors into classes, maps them to the established dimensions of data quality — and uses an honest coverage map to show which of them three generic SQL routines actually cover. And it names the two they cannot reach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The essentials up front:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data errors sort into a handful of &lt;strong&gt;classes&lt;/strong&gt; — technical vs. business, and field-level vs. record-level vs. relationship-level. That is enough to place any concrete error.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;dimensions of data quality&lt;/strong&gt; are established canon, not taste — defined criteria in the German-language standard reference by Apel et al., internationally in Wang &amp;amp; Strong, DAMA, and ISO/IEC 25012.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generic SQL checks four criteria directly&lt;/strong&gt; in the database: completeness, validity, uniqueness, and referential integrity/consistency. &lt;strong&gt;Two stay out of reach&lt;/strong&gt; — content accuracy and timeliness; both need information from &lt;em&gt;outside&lt;/em&gt; the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checking early is cheap, repairing late is expensive:&lt;/strong&gt; prevention at the source cuts total costs by roughly two thirds (Redman, as cited by Apel et al.) — and 100 % data quality is not the economic target anyway.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; none. This article provides the framing before the SQL practice begins; the theory part needs no SQL knowledge. If you want the how right away, head for the framework article &lt;a href="https://sql.marcus-belz.de/en/data-quality-checks-with-sql/" rel="noopener noreferrer"&gt;Data quality checks with SQL&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What data errors look like: a classification&lt;/li&gt;
&lt;li&gt;The dimensions of data quality&lt;/li&gt;
&lt;li&gt;What can SQL actually check?&lt;/li&gt;
&lt;li&gt;The two blind spots&lt;/li&gt;
&lt;li&gt;What bad data costs&lt;/li&gt;
&lt;li&gt;From theory to practice&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;li&gt;Related Articles&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What data errors look like: a classification
&lt;/h2&gt;

&lt;p&gt;Before slicing data quality into dimensions, it helps to look at the errors themselves. They can be sorted along two axes, and together the two axes cover practically every case.&lt;/p&gt;

&lt;p&gt;The first axis is &lt;strong&gt;technical versus business&lt;/strong&gt;. A technical (syntactic) error violates the form: letters in a numeric column, a date that is no date, a foreign key pointing nowhere. You can spot these errors without any domain knowledge — they break the rules of the data type or the model. A business (semantic) error, by contrast, is formally flawless and still wrong: an age of 200, a ship date before the order date, a customer flagged as "premium" without a single order. Only domain knowledge makes the error visible.&lt;/p&gt;

&lt;p&gt;The second axis is the &lt;strong&gt;scope&lt;/strong&gt; of the error:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;field-level&lt;/strong&gt; — the error sits in a single value (&lt;code&gt;age = 200&lt;/code&gt;, &lt;code&gt;"abc"&lt;/code&gt; in a numeric column).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;record-level&lt;/strong&gt; — the error only emerges from the interplay of several fields in one row (&lt;code&gt;ship_date &amp;lt; order_date&lt;/code&gt;) or between rows of one table (two records sharing the same key).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;relationship-level&lt;/strong&gt; — the error lies in the relationship between tables (a reference whose target does not exist).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two axes combine into a compact taxonomy in which every concrete error gets a place and a responsible check routine:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frmip929qpdek14o0e1p6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frmip929qpdek14o0e1p6.png" alt="Error-class taxonomy: a matrix of technical/syntactic versus business/semantic on one axis and field-level, record-level, relationship-level on the other. Each cell names a concrete example and the violated criterion; a band below marks the missing value as the special case completeness, cutting across all cells." width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;technical / syntactic&lt;/th&gt;
&lt;th&gt;business / semantic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;field-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"abc"&lt;/code&gt; in a numeric column, an invalid date → &lt;em&gt;validity&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;age = 200&lt;/code&gt;, a negative price → &lt;em&gt;validity (business rule)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;record-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;two records with the same primary key → &lt;em&gt;uniqueness&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ship_date &amp;lt; order_date&lt;/code&gt; → &lt;em&gt;consistency (cross-field)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;relationship-level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;foreign key pointing to a missing master → &lt;em&gt;consistency (integrity)&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;"premium" customer without an order → &lt;em&gt;business rule&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One special case falls outside the grid: the value that is &lt;strong&gt;missing&lt;/strong&gt;. An empty mandatory field is neither technically nor semantically malformed — it simply is not there. A criterion of its own is responsible for it, completeness — it cannot be pinned to any single cell but cuts across the whole matrix. Two more kinds of error are absent from the matrix entirely — the value that is well-formed and still &lt;em&gt;factually&lt;/em&gt; wrong, and the value that is &lt;em&gt;outdated&lt;/em&gt;. We will come back to both; they are the blind spots.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dimensions of data quality
&lt;/h2&gt;

&lt;p&gt;The classification says &lt;em&gt;what&lt;/em&gt; an error looks like. The &lt;strong&gt;dimensions of data quality&lt;/strong&gt; say &lt;em&gt;which property&lt;/em&gt; of good data it violates. There is no shortage of theory here — rather a surplus, and that is why the term so often stays fuzzy.&lt;/p&gt;

&lt;p&gt;The German-language practitioner's standard reference on data quality in business intelligence projects is &lt;em&gt;Datenqualität erfolgreich steuern&lt;/em&gt; ("Managing Data Quality Successfully") by Apel, Behme, Eberlein, and Merighi (3rd edition, Edition TDWI). Following Würthele, it defines data quality as a "multidimensional measure of the suitability of data to fulfill the purpose tied to its capture/generation" — a suitability that can change over time as needs change (all book quotes translated from the German original). Two things are already baked into this definition: quality is &lt;strong&gt;multidimensional&lt;/strong&gt;, and it is &lt;strong&gt;purpose-bound&lt;/strong&gt;. There is no absolute "good", only a "good enough for this purpose".&lt;/p&gt;

&lt;p&gt;How many dimensions there are is a matter of definition. The book starts with an alphabetical catalog of sixty possible quality criteria and narrows it down to the practically viable ones; for the business intelligence context it highlights six — accuracy, consistency, reliability, completeness, timeliness, and relevance. For this article, the criteria that matter are the ones data errors can technically be pinned to:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;What it demands (per Apel et al.)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Completeness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The attributes are populated with values that "semantically differ from the value &lt;code&gt;NULL&lt;/code&gt; (unknown)"; no data gets lost in transformations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Validity&lt;/strong&gt; &lt;em&gt;(in the book: formal accuracy + uniformity)&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;The values arrive in the predefined format and are represented uniformly — in practice, the allowed value range belongs here too.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Uniqueness&lt;/strong&gt; &lt;em&gt;(in the book: freedom from redundancy + key uniqueness)&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;No record describes the same real-world entity twice; the business key occurs only as often as it may.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Referential integrity / consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every foreign key uniquely references an existing primary key; values do not contradict each other — within a record, between records, across applications.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy (content)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The values match the entities of the real world — the data corresponds to reality.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Timeliness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The records reflect the current state of the modeled world and are not outdated.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two names in this table each bundle two of the book's criteria — the book slices finer than international usage does. &lt;strong&gt;Validity&lt;/strong&gt; does appear in the catalog, but the actual definition lives in the &lt;em&gt;formal&lt;/em&gt; component of accuracy (delivery in the predefined data format) and in uniformity. And what DAMA and ISO call &lt;strong&gt;uniqueness&lt;/strong&gt;, the book lists as freedom from redundancy and key uniqueness; its own criterion named "uniqueness" means something else there, namely the unambiguous interpretability of a record through its metadata. Key uniqueness the book phrases in terms of primary keys — yet the check is only meaningful on the &lt;em&gt;business&lt;/em&gt; key, because a constraint-enforced primary key cannot occur twice in the first place. The check gets interesting exactly where the constraint is (still) missing: in staging tables and at interfaces. This article sticks to the common names and means the book criteria listed above.&lt;/p&gt;

&lt;p&gt;That several such lists exist is not a contradiction but a method: group the criteria and you get a &lt;em&gt;quality model&lt;/em&gt;. The book presents two side by side — a theoretical taxonomy after Hinrichs, and the categorization of the German Society for Information and Data Quality (DGIQ), derived from a user survey and in turn based on the much-cited study by &lt;strong&gt;Wang and Strong (1996)&lt;/strong&gt; with its fifteen dimensions. Internationally, the &lt;strong&gt;six dimensions of DAMA UK&lt;/strong&gt; (2013) and the &lt;strong&gt;ISO/IEC 25012&lt;/strong&gt; standard are widely used as well. They are different cuts through the same subject — which one fits depends on the purpose. For the rest of this article we stay with the book's criteria, because they point most directly at an SQL check.&lt;/p&gt;

&lt;h2&gt;
  
  
  What can SQL actually check?
&lt;/h2&gt;

&lt;p&gt;Now for the decisive question: how many of these criteria does a handful of generic SQL checks actually reach? The practical framework of the &lt;a href="https://sql.marcus-belz.de/en/data-quality-checks-with-sql/" rel="noopener noreferrer"&gt;SQL check series&lt;/a&gt; builds on three routines — a value-based &lt;code&gt;WHERE&lt;/code&gt; check, a duplicate check, and a foreign-key check. Plot them against the criteria and you get this coverage map:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fnjj4p3u8l0wc0qrczmtb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fnjj4p3u8l0wc0qrczmtb.png" alt="Coverage map: three SQL routines (value check, duplicate check, foreign-key check) against the data quality criteria per Apel et al. Four criteria — completeness, validity, uniqueness, and referential integrity/consistency — are marked as covered, two — accuracy and timeliness — as gaps." width="799" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;verifiable with SQL?&lt;/th&gt;
&lt;th&gt;Routine&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Completeness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ yes&lt;/td&gt;
&lt;td&gt;value check (&lt;code&gt;IS NULL&lt;/code&gt; on mandatory fields)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ yes&lt;/td&gt;
&lt;td&gt;value check (value range, format, safe type conversion)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uniqueness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ yes&lt;/td&gt;
&lt;td&gt;duplicate check (&lt;code&gt;GROUP BY … HAVING count(*) &amp;gt; 1&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Referential integrity / consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ partly&lt;/td&gt;
&lt;td&gt;foreign-key check; cross-field rules in the &lt;code&gt;WHERE&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ no&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Timeliness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ no&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Four criteria, then, are within reach of generic, configurable SQL — and with surprisingly little code. "Verifiable" means: SQL &lt;em&gt;executes&lt;/em&gt; the rule; the domain has to formulate it — a &lt;code&gt;price &amp;gt; 0&lt;/code&gt; is written into no engine by default. &lt;strong&gt;Completeness&lt;/strong&gt; is an &lt;code&gt;IS NULL&lt;/code&gt; check on the mandatory columns — in the book's wording, "the individual attributes contain no &lt;code&gt;NULL&lt;/code&gt; values". &lt;strong&gt;Validity&lt;/strong&gt; is a value-range or format check; its classic case is safe type conversion, which — for example with &lt;a href="https://sql.marcus-belz.de/en/dataquality-converting-data-to-date-datetime-datetime2-time/" rel="noopener noreferrer"&gt;TRY_CONVERT&lt;/a&gt; in SQL Server — turns an invalid value into &lt;code&gt;NULL&lt;/code&gt; instead of aborting the load (more in the &lt;a href="https://sql.marcus-belz.de/en/type-conversion-basics-t-sql/" rel="noopener noreferrer"&gt;type conversion basics&lt;/a&gt;). &lt;strong&gt;Uniqueness&lt;/strong&gt; is a grouping over the business key with &lt;code&gt;HAVING count(*) &amp;gt; 1&lt;/code&gt;. &lt;strong&gt;Referential integrity&lt;/strong&gt; — in the book's wording, "every foreign key must uniquely reference an existing primary key" — is a foreign-key check.&lt;/p&gt;

&lt;p&gt;The "partly" for consistency deserves honesty — and a distinction: referential integrity is only the relational special case of consistency; the book explicitly calls its key criteria a "special alignment toward the relational database model". Consistency itself reaches further: the book explicitly includes the &lt;em&gt;reconciliation of data across different applications&lt;/em&gt; — that a value does not contradict itself across systems. A single SQL check in &lt;em&gt;one&lt;/em&gt; database cannot deliver that cross-system reconciliation; the variants checkable &lt;em&gt;inside&lt;/em&gt; the database (referential integrity, cross-field rules within a row) it covers cleanly. The boundary, however, is set by the process, not the technology: bring the data needed for the reconciliation — the master from the other system, the reference table of the other application — into the ETL process, and they sit side by side in &lt;em&gt;one&lt;/em&gt; database; the cross-system reconciliation becomes an ordinary SQL check. The foreign-key check then runs against a master that originally came from an entirely different system.&lt;/p&gt;

&lt;p&gt;The book draws this line itself, by the way — just elsewhere: in its example metrics per criterion (chapter 7, table 7–3), completeness is an automated "number of &lt;code&gt;NULL&lt;/code&gt; values" query, while content accuracy and timeliness list "user feedback" as the measuring method. Where no query can reach, a human has to answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two blind spots
&lt;/h2&gt;

&lt;p&gt;That leaves the two criteria where SQL alone has to pass — not because of a weakness of the language, but because the required information simply is not in the database. Both sit in the book's catalog right next to the checkable ones.&lt;/p&gt;

&lt;p&gt;The first blind spot is &lt;strong&gt;timeliness&lt;/strong&gt;. Whether an address is "current" cannot be read off the address itself — it looks identical yesterday and today. Timeliness demands that the data reflect "the current state of the modeled world", as the book puts it — and that takes a &lt;em&gt;temporal reference&lt;/em&gt;: a timestamp of when the value was last confirmed, and an expectation of how long it stays valid. If such a &lt;code&gt;last_verified&lt;/code&gt; field exists, SQL can check the rule (&lt;code&gt;last_verified &amp;lt; now() - interval '1 year'&lt;/code&gt;) — but then it checks the completeness and validity &lt;em&gt;of that timestamp&lt;/em&gt;, not the currency of the actual value. Without the timestamp, the criterion is invisible to a pure database check.&lt;/p&gt;

&lt;p&gt;The second blind spot is &lt;strong&gt;accuracy&lt;/strong&gt; — and it is the subtlest point of this whole article, because the book defines it as having "a content component and a formal component". The &lt;em&gt;formal&lt;/em&gt; side — the right data type, the predefined format — is exactly validity, and thus checkable. The &lt;em&gt;content&lt;/em&gt; side is not: a birth date of &lt;code&gt;1990-05-14&lt;/code&gt; is perfectly valid — right type, plausible range, clean format. Whether it is the person's &lt;em&gt;actual&lt;/em&gt; birth date, the database does not know and cannot know. That would take an external source of truth: the ID document, a population register, a second independent record. SQL compares data against &lt;em&gt;rules&lt;/em&gt;, not against the &lt;em&gt;world&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This makes the boundary more precise: accuracy is unverifiable &lt;em&gt;as long as the source of truth lies outside the database&lt;/em&gt;. As soon as it becomes available as data — say, a leading system that is co-extracted in the ETL process, the same pattern as with the cross-system consistency reconciliation —, the check becomes an ordinary SQL comparison: source against leading system, deviation equals finding. The typical case is data migration, where the target system already knows many of the records; how to build such reconciliations systematically is shown in the article on &lt;a href="https://sql.marcus-belz.de/en/verify-migration-data-quality/" rel="noopener noreferrer"&gt;verifying a migration&lt;/a&gt;. What is checked then, however, is agreement with the &lt;em&gt;designated&lt;/em&gt; source of truth — whether that source itself agrees with the world remains a governance decision, not an SQL question.&lt;/p&gt;

&lt;p&gt;This boundary is not an excuse but the honest core of the matter: an SQL check guarantees that data is &lt;em&gt;well-formed and internally coherent&lt;/em&gt; — not that it is &lt;em&gt;true&lt;/em&gt;. Whoever says "automatically checked" has to name the two criteria left out — otherwise they are selling a green checkmark as proof of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What bad data costs
&lt;/h2&gt;

&lt;p&gt;You will find plenty of charts online along the lines of "37 % of all data errors are completeness errors". No such chart appears here, deliberately — there is no citable primary source for a reliable frequency distribution per dimension, and a made-up number would be exactly the kind of data error this article is about.&lt;/p&gt;

&lt;p&gt;What &lt;em&gt;is&lt;/em&gt; citable is the economics behind it — and it has two sides. The first concerns &lt;em&gt;when&lt;/em&gt; you check. The &lt;strong&gt;1-10-100 rule&lt;/strong&gt; originally comes from quality management (Labovitz and Chang, &lt;em&gt;Making Quality Work&lt;/em&gt;, 1992) and has since been applied to data quality: what costs &lt;strong&gt;1&lt;/strong&gt; to prevent at the source costs &lt;strong&gt;10&lt;/strong&gt; to correct afterwards and &lt;strong&gt;100&lt;/strong&gt; if you let the error take effect — as a wrong invoice, a lost customer, a bad decision. The specific factors are a rule of thumb, not a law of nature; the direction, though, is undisputed and matches everyday experience: an error gets more expensive the later it surfaces. That is exactly why checking &lt;em&gt;at the source&lt;/em&gt; (staging) pays off, before the data flows on — there an error is still reportable, at the target it is fatal.&lt;/p&gt;

&lt;p&gt;The second side concerns not &lt;em&gt;when&lt;/em&gt; you check but &lt;em&gt;how far&lt;/em&gt; you should push data quality. In their book &lt;em&gt;Datenqualität erfolgreich steuern&lt;/em&gt; (3rd edition, Edition TDWI, figure 3–2, p. 43), Apel, Behme, Eberlein, and Merighi contrast two opposing cost curves: the &lt;strong&gt;costs caused by poor data quality&lt;/strong&gt; fall as quality rises — bad decisions, rework, and lost customers become rarer. The &lt;strong&gt;costs of producing and assuring good quality&lt;/strong&gt; rise instead, and disproportionately so the closer you get to 100 %. The sum of both — the &lt;em&gt;total quality cost&lt;/em&gt; — has its minimum not at 100 % but at an optimum in between. That is the core message: 100 % data quality is rarely the economic target; what you are looking for is the most cost-effective combination for the purpose at hand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6z7tpr4y10bfcu8omodc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6z7tpr4y10bfcu8omodc.png" alt="Two opposing cost curves over data quality (x-axis up to 100 percent, y-axis costs): the costs caused by poor data quality fall as quality rises, the costs of improving and assuring it rise and shoot up toward 100 percent. A dashed sum curve hints at the total costs; a gray band at the intersection marks their minimum — the economic optimum, well before 100 percent." width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Own rendering after Apel et al., "Datenqualität erfolgreich steuern" (3rd edition, Edition TDWI, figure 3–2, p. 43) — the classic total-cost-of-quality model.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The chart is deliberately simplified — a principle, not a measurement; the optimum shifts with data and purpose. Two practical consequences follow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Achieve a lot with little effort.&lt;/strong&gt; The production curve stays flat for a long stretch and only explodes at the top end. The cheap early stretch is the obvious &lt;code&gt;NULL&lt;/code&gt; values, broken formats, and duplicates that generic SQL extracts with little effort — exactly the four checkable criteria. The last few percent, and even more so the two uncheckable criteria (accuracy and timeliness), cost far more and are only worth it where the domain demands it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prevent early instead of cleaning up late.&lt;/strong&gt; Prevention at the source is the strongest lever: avoiding errors there, instead of detecting and cleansing them later, cuts total costs by roughly two thirds on average, according to the research cited in the same chapter (Redman 2008).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From theory to practice
&lt;/h2&gt;

&lt;p&gt;The framework is in place: errors sort into a handful of classes, good data describes itself in a handful of criteria, and four of them are checkable with generic SQL. The road from here into practice runs through four articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;framework&lt;/strong&gt; — &lt;a href="https://sql.marcus-belz.de/en/data-quality-checks-with-sql/" rel="noopener noreferrer"&gt;Data quality checks with SQL&lt;/a&gt; — builds the shared error table, the severity gate, and the configurable runner that generates the three routines at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validity + completeness&lt;/strong&gt; are deepened in &lt;a href="https://sql.marcus-belz.de/en/validate-data-with-sql/" rel="noopener noreferrer"&gt;Validate data with SQL&lt;/a&gt; — value ranges, mandatory fields, and the distinction between "no value" and "unknown value".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uniqueness&lt;/strong&gt; is deepened in &lt;a href="https://sql.marcus-belz.de/en/find-duplicates-with-sql/" rel="noopener noreferrer"&gt;Find duplicates with SQL&lt;/a&gt; — from &lt;code&gt;count(*) &amp;gt; 1&lt;/code&gt; to composite keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency / integrity&lt;/strong&gt; is deepened in &lt;a href="https://sql.marcus-belz.de/en/find-orphaned-records-sql/" rel="noopener noreferrer"&gt;Find orphaned records&lt;/a&gt; — checking referential integrity, even without a foreign-key constraint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the dimensions in mind, these four articles no longer read as loose tricks but as what they are: one answer each to one measurable property of good data.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What data quality dimensions are there?&lt;/strong&gt;&lt;br&gt;
It depends on the model. The German-language practitioner's standard reference (Apel et al., &lt;em&gt;Datenqualität erfolgreich steuern&lt;/em&gt;) lists sixty possible quality criteria and highlights six for the business intelligence context: accuracy, consistency, reliability, completeness, timeliness, and relevance. Internationally, the six dimensions of DAMA UK (2013) and the fifteen characteristics of ISO/IEC 25012 are widespread; the academic origin is the fifteen dimensions of Wang &amp;amp; Strong (1996). They all describe the same subject at different resolutions — which list fits depends on the purpose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between validity and accuracy?&lt;/strong&gt;&lt;br&gt;
Validity means: the value conforms to the rules — right type, valid format, allowed value range. Accuracy means: the value corresponds to reality. The standard reference bundles both under "accuracy" (a content and a formal component) — but for an SQL check, exactly this separation is decisive: a birth date can be perfectly valid (well-formed, plausible) and still factually wrong. SQL checks the formal side, because the rules live in the database; content agreement with reality needs an external source of truth and cannot be checked with SQL alone. There is one exception: if a designated source of truth is available as data — such as the leading system in a data migration —, the accuracy check becomes an SQL reconciliation against that source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many data quality dimensions can you check with SQL?&lt;/strong&gt;&lt;br&gt;
Four criteria can be checked with generic SQL directly in the database: completeness (&lt;code&gt;IS NULL&lt;/code&gt; check on mandatory fields), validity (value range/format), uniqueness (&lt;code&gt;GROUP BY … HAVING count(*) &amp;gt; 1&lt;/code&gt;), and referential integrity/consistency (foreign-key check). Not checkable are content accuracy and timeliness — both need information from outside the database (an external reference or a timestamp).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between technical and business data errors?&lt;/strong&gt;&lt;br&gt;
A technical (syntactic) error violates the form and is detectable without domain knowledge: letters in a numeric column, a foreign key pointing nowhere. A business (semantic) error is formally correct and still wrong: an age of 200 or a ship date before the order date. Generic SQL catches technical errors easily; business errors need an explicitly formulated business rule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a tool to measure data quality?&lt;/strong&gt;&lt;br&gt;
Not necessarily for the four checkable criteria — a handful of generic SQL checks with a central error table covers completeness, validity, uniqueness, and referential integrity. Specialized tools (dbt tests, Great Expectations, Soda) automate and orchestrate this more comfortably and add reporting, but at their core they check the same criteria. Accuracy and timeliness they do not solve either — not without an external reference or a timestamp.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/data-quality-checks-with-sql/" rel="noopener noreferrer"&gt;Data quality checks with SQL&lt;/a&gt; — the configurable SQL framework that casts the four checkable criteria of this article into three generic routines: shared error table, severity gate, dynamic runner.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/validate-data-with-sql/" rel="noopener noreferrer"&gt;Validate data with SQL&lt;/a&gt; — the routine for validity and completeness, including the NULL trap.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/find-duplicates-with-sql/" rel="noopener noreferrer"&gt;Find duplicates with SQL&lt;/a&gt; — the routine for uniqueness: maximum cardinality and composite keys.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/find-orphaned-records-sql/" rel="noopener noreferrer"&gt;Find orphaned records&lt;/a&gt; — the routine for consistency / referential integrity.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/data-quality-in-an-etl-process/" rel="noopener noreferrer"&gt;Data quality in an ETL process&lt;/a&gt; — the bigger picture: where in the ETL process the dimensions are checked and how bad data is isolated.&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Data Migration: SQL Server to PostgreSQL — the Complete Guide</title>
      <dc:creator>Marcus</dc:creator>
      <pubDate>Tue, 30 Jun 2026 12:46:36 +0000</pubDate>
      <link>https://dev.to/marcus1968/data-migration-sql-server-to-postgresql-the-complete-guide-4mgk</link>
      <guid>https://dev.to/marcus1968/data-migration-sql-server-to-postgresql-the-complete-guide-4mgk</guid>
      <description>&lt;p&gt;A &lt;strong&gt;data migration&lt;/strong&gt; from SQL Server to PostgreSQL rarely fails at actually copying the data. It fails at the silent differences that only surface in the target: &lt;code&gt;datetime&lt;/code&gt;, which knows no time zone, &lt;code&gt;bit&lt;/code&gt;, which is not a &lt;code&gt;boolean&lt;/code&gt;, an &lt;code&gt;IDENTITY&lt;/code&gt; that turns into a sequence, and a collation that suddenly compares case-sensitively. Anyone who sets out to &lt;strong&gt;migrate SQL Server to PostgreSQL&lt;/strong&gt; isn't just copying tables — they're translating types, schema, code and behaviour from one engine into another.&lt;/p&gt;

&lt;p&gt;This guide gives the overview: it sorts the move into five phases and names the key trip-ups for each. The depth lives in a dedicated article per phase — this one supplies the through-line that ties the phases together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The essentials up front:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Five phases instead of "moving data":&lt;/strong&gt; the move breaks down into data types, schema/DDL, data transfer, code porting and verification. Tackling them in this order spares you the typical setbacks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each phase with its own trip-up:&lt;/strong&gt; sometimes it's breaking types, sometimes &lt;code&gt;IDENTITY&lt;/code&gt; becomes a sequence, sometimes it's the transfer method or translating T-SQL into PL/pgSQL. None of them shows up until it strikes in the target — which is why verification comes last.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What tooling takes off your plate:&lt;/strong&gt; tools like &lt;code&gt;pgloader&lt;/code&gt; handle the mechanical ~80% — the uncritical types and the bulk transfer. The last 20% — type edge cases, procedure logic, triggers — stay handwork.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The through-line:&lt;/strong&gt; each phase has its own detailed article with the full depth. This hub connects them and tells you in which order they play together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisite:&lt;/strong&gt; a basic grasp of relational databases. SQL Server 2017+ as the source, PostgreSQL 16/17 as the target. Postgres concepts (sequences, &lt;code&gt;text&lt;/code&gt;, &lt;code&gt;COPY&lt;/code&gt;, PL/pgSQL) are placed in context briefly on first mention — no prior Postgres knowledge required, T-SQL basics yes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Why migrate SQL Server to PostgreSQL?&lt;/li&gt;
&lt;li&gt;The migration path in five phases&lt;/li&gt;
&lt;li&gt;One Table, Several Phases at Once&lt;/li&gt;
&lt;li&gt;What tooling takes off your plate — and where handwork remains&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;li&gt;Related articles&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why migrate SQL Server to PostgreSQL?
&lt;/h2&gt;

&lt;p&gt;The reasons are rarely technical — they're usually economic or strategic. &lt;strong&gt;Licensing costs&lt;/strong&gt; fall away: PostgreSQL is open source and usable without core or CAL licences. The &lt;strong&gt;open-source stack&lt;/strong&gt; runs freely on any infrastructure, with no vendor lock-in. And &lt;strong&gt;cloud portability&lt;/strong&gt; is high — nearly every provider offers managed PostgreSQL.&lt;/p&gt;

&lt;p&gt;This is no holy war over "Postgres is better than SQL Server". Both engines are mature, and not every workload belongs on the move. The point is sober: once the decision to &lt;strong&gt;migrate SQL Server to PostgreSQL&lt;/strong&gt; has been made, it pays to plan the move as a structured path rather than a one-off copy operation. That's exactly what this guide is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  The migration path in five phases
&lt;/h2&gt;

&lt;p&gt;A database migration is not a single step but a chain of dependent phases. Tackling them in this order avoids the typical setbacks — such as transferring data before the target schema has the right types. Each phase has its own detailed article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Data types.&lt;/strong&gt; The foundation. Most types convert one-to-one (&lt;code&gt;int&lt;/code&gt;, &lt;code&gt;numeric&lt;/code&gt;, &lt;code&gt;varchar&lt;/code&gt;, &lt;code&gt;date&lt;/code&gt;) — but &lt;code&gt;datetime&lt;/code&gt;, &lt;code&gt;bit&lt;/code&gt;, &lt;code&gt;money&lt;/code&gt;, &lt;code&gt;uniqueidentifier&lt;/code&gt;, &lt;code&gt;nvarchar&lt;/code&gt; and &lt;code&gt;tinyint&lt;/code&gt; have &lt;strong&gt;no&lt;/strong&gt; one-to-one equivalent. &lt;code&gt;datetime&lt;/code&gt; forces the time-zone question (&lt;code&gt;timestamp&lt;/code&gt; vs. &lt;code&gt;timestamptz&lt;/code&gt;), &lt;code&gt;bit&lt;/code&gt; becomes a real &lt;code&gt;boolean&lt;/code&gt;, and the Postgres &lt;code&gt;money&lt;/code&gt; type is best left untouched. Which types convert cleanly and which break is covered in &lt;a href="https://sql.marcus-belz.de/en/data-type-mapping-sql-server-postgresql/" rel="noopener noreferrer"&gt;Data Type Mapping SQL Server → PostgreSQL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Schema &amp;amp; DDL.&lt;/strong&gt; Once you have the types comes the structure around them: &lt;code&gt;IDENTITY&lt;/code&gt; becomes &lt;code&gt;GENERATED AS IDENTITY&lt;/code&gt; or a sequence, default constraints and named constraints move over, and the case sensitivity of identifiers flips from SQL-Server-tolerant to Postgres-exact. Details in &lt;a href="https://sql.marcus-belz.de/en/schema-migration-sql-server-to-postgresql/" rel="noopener noreferrer"&gt;Schema Migration SQL Server → PostgreSQL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Data transfer.&lt;/strong&gt; Only once the target schema stands do the data move. The choice of method depends on data volume and downtime tolerance: a &lt;code&gt;bcp&lt;/code&gt; export plus &lt;code&gt;COPY&lt;/code&gt;, the all-in-one tool &lt;code&gt;pgloader&lt;/code&gt;, or an ETL pipeline. Which method fits when is compared in &lt;a href="https://sql.marcus-belz.de/en/transfer-data-sql-server-to-postgresql/" rel="noopener noreferrer"&gt;Transferring Data: bcp, COPY, pgloader, ETL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Code porting.&lt;/strong&gt; Tables are only half the database. Stored procedures, functions and triggers have to be translated from T-SQL to PL/pgSQL — different error handling, different variable syntax, different idioms (&lt;code&gt;TRY_CONVERT&lt;/code&gt; has no direct counterpart). This is the part with the highest handwork share. Explored in depth in &lt;a href="https://sql.marcus-belz.de/en/port-t-sql-to-pl-pgsql/" rel="noopener noreferrer"&gt;Porting T-SQL to PL/pgSQL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Verification.&lt;/strong&gt; A migration is only done once you've proven that nothing was lost or silently corrupted: row reconciliation per table, spot-check comparisons, data-quality checks after the load. How to check that systematically is shown in &lt;a href="https://sql.marcus-belz.de/en/verify-migration-data-quality/" rel="noopener noreferrer"&gt;Verifying the Migration — Data Quality and Row Reconciliation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The phases build on one another but can be iterated — it's typical to run types and schema together on a pilot table before transferring the full load.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Table, Several Phases at Once
&lt;/h2&gt;

&lt;p&gt;To make the phase path tangible, a small table that bundles several trip-ups at once. First the source in T-SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;int&lt;/span&gt;             &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;     &lt;span class="n"&gt;nvarchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;is_active&lt;/span&gt;     &lt;span class="nb"&gt;bit&lt;/span&gt;             &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;    &lt;span class="nb"&gt;datetime&lt;/span&gt;        &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;GETDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;pk_customer&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the same as a PostgreSQL target:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;
  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;integer&lt;/span&gt;       &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;
  &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;     &lt;span class="nb"&gt;text&lt;/span&gt;          &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;is_active&lt;/span&gt;     &lt;span class="nb"&gt;boolean&lt;/span&gt;       &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
  &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;    &lt;span class="nb"&gt;timestamp&lt;/span&gt;     &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;pk_customer&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four columns, and already two phases interlock — data types and schema — in a table nobody would call migration-critical. Three of the changes you see coming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Line 3 (schema):&lt;/strong&gt; &lt;code&gt;int IDENTITY(1, 1)&lt;/code&gt; → &lt;code&gt;integer GENERATED BY DEFAULT AS IDENTITY&lt;/code&gt; — the auto-value column moves to the SQL-standard mechanism. The real work comes after the load: the sequence must be advanced to the highest value, or the next &lt;code&gt;INSERT&lt;/code&gt; collides.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Line 5 (data type):&lt;/strong&gt; &lt;code&gt;bit&lt;/code&gt; → &lt;code&gt;boolean&lt;/code&gt;, &lt;code&gt;1&lt;/code&gt; becomes &lt;code&gt;true&lt;/code&gt;. Beware: ported queries like &lt;code&gt;WHERE is_active = 1&lt;/code&gt; break in Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Line 6 (data type):&lt;/strong&gt; &lt;code&gt;datetime&lt;/code&gt; → &lt;code&gt;timestamp&lt;/code&gt;. The trap isn't &lt;code&gt;now()&lt;/code&gt; instead of &lt;code&gt;GETDATE()&lt;/code&gt;, but that &lt;code&gt;datetime&lt;/code&gt; carries no time zone: whether &lt;code&gt;timestamp&lt;/code&gt; or &lt;code&gt;timestamptz&lt;/code&gt; is right depends on whether the values were meant as local time or UTC — if you don't decide deliberately, it shifts silently later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fourth one slips by when you skim — and is the most likely to go unnoticed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Line 4 (data type):&lt;/strong&gt; &lt;code&gt;nvarchar(100)&lt;/code&gt; → &lt;code&gt;text&lt;/code&gt;. "Postgres is natively Unicode, so &lt;code&gt;text&lt;/code&gt;" is true — but it hides that the &lt;strong&gt;length limit disappears&lt;/strong&gt;. This is exactly what an auto-converter like &lt;code&gt;pgloader&lt;/code&gt; reaches for by default: &lt;code&gt;nvarchar(100)&lt;/code&gt; becomes &lt;code&gt;text&lt;/code&gt;, the limit drops without being asked. If the &lt;code&gt;100&lt;/code&gt; was just technical baggage, &lt;code&gt;text&lt;/code&gt; is the right choice. If it was a business rule — the database was not allowed to accept anything longer — a silent validation has been lost, and what belongs there is a &lt;code&gt;varchar(100)&lt;/code&gt; or a &lt;code&gt;CHECK&lt;/code&gt; constraint rather than bare &lt;code&gt;text&lt;/code&gt;. The tool makes the translation automatically — but whether the limit was intended by the domain, it cannot know: that question isn't in the type, it's in the domain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the whole guide in one table: four lines that look like search-and-replace carry two phases and at least one trap you only see if you know the data by its meaning — not just its type.&lt;/p&gt;

&lt;h2&gt;
  
  
  What tooling takes off your plate — and where handwork remains
&lt;/h2&gt;

&lt;p&gt;The honest expectation-setting first: there is &lt;strong&gt;no&lt;/strong&gt; "one click and done". Tools like &lt;code&gt;pgloader&lt;/code&gt; (free, takes schema and data over in a single run) or commercial converters handle the mechanical bulk — and that's a lot: the uncritical types, the bulk transfer, the standard constraints. As a rule of thumb they cover around 80% of the mechanics.&lt;/p&gt;

&lt;p&gt;The remaining 20% are exactly the spots where a business decision is needed that no tool can know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Breaking types&lt;/strong&gt; — &lt;code&gt;datetime&lt;/code&gt; → &lt;code&gt;timestamp&lt;/code&gt; or &lt;code&gt;timestamptz&lt;/code&gt;? &lt;code&gt;money&lt;/code&gt; → which &lt;code&gt;numeric&lt;/code&gt; scale? &lt;code&gt;tinyint&lt;/code&gt; with a preserving &lt;code&gt;CHECK&lt;/code&gt; constraint? These cases need checking, not blind adoption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedure and trigger logic&lt;/strong&gt; — T-SQL to PL/pgSQL is translation work, not search-and-replace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance tuning&lt;/strong&gt; — indexes, statistics and query plans differ; what was fast in SQL Server may need a different index in Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioural differences&lt;/strong&gt; — collation/case, NULL ordering, transaction semantics on errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The art of migration lies not in transferring the easy 80% but in the clean, verified translation of the hard 20%. This cluster devotes a dedicated article to each of these spots.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How long does a SQL-Server-to-Postgres migration take?&lt;/strong&gt;&lt;br&gt;
It depends on schema complexity and the amount of code, not primarily on data volume. The data transfer itself is often done in hours; the time goes into code porting (stored procedures, triggers) and verification. A simple database with little logic is doable in days, one with hundreds of procedures in weeks to months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I automate this completely?&lt;/strong&gt;&lt;br&gt;
No. Tools like &lt;code&gt;pgloader&lt;/code&gt; take the mechanical ~80% off your plate — the uncritical types and the bulk transfer. The last 20% (breaking types, procedure logic, triggers, performance tuning) need human decisions. Expecting "one click and done" builds in silent errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens to my stored procedures?&lt;/strong&gt;&lt;br&gt;
They have to be ported from T-SQL to PL/pgSQL — that's the most demanding phase. Error handling (&lt;code&gt;TRY/CATCH&lt;/code&gt; → &lt;code&gt;EXCEPTION&lt;/code&gt;), variable syntax and many idioms differ. There's no direct translation tool; the dedicated article on code porting shows the patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need downtime?&lt;/strong&gt;&lt;br&gt;
For the simplest variant (export → transfer → switch over), yes — the source is ideally read-only during the transfer so no changes are lost. Low-downtime strategies (logical replication, gradual cutover) are possible but considerably more involved and a topic of their own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I migrate gradually or as a big bang?&lt;/strong&gt;&lt;br&gt;
For most solo and mid-size scenarios, a big-bang cutover in a quiet maintenance window is pragmatic. Gradual migration (both systems in parallel) pays off for large, continuously available systems — but at the cost of considerable synchronisation complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this also work for Azure SQL or other source databases?&lt;/strong&gt;&lt;br&gt;
The phases (types → schema → transfer → code → verification) apply generally, and Azure SQL is closely related to SQL Server — much of it carries over. This cluster, though, is specifically tailored to &lt;strong&gt;SQL Server → PostgreSQL&lt;/strong&gt;; Oracle or MySQL as the source bring different type and dialect traps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related articles
&lt;/h2&gt;

&lt;p&gt;This guide is the &lt;strong&gt;overview&lt;/strong&gt; of the cluster on &lt;strong&gt;migrating from SQL Server to PostgreSQL&lt;/strong&gt;. Each phase has its own detailed article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data types:&lt;/strong&gt; &lt;a href="https://sql.marcus-belz.de/en/data-type-mapping-sql-server-postgresql/" rel="noopener noreferrer"&gt;Data Type Mapping SQL Server → PostgreSQL — what converts cleanly and what breaks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema &amp;amp; DDL:&lt;/strong&gt; &lt;a href="https://sql.marcus-belz.de/en/schema-migration-sql-server-to-postgresql/" rel="noopener noreferrer"&gt;Schema Migration SQL Server → PostgreSQL — Identity, Constraints, Defaults, Sequences&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data transfer:&lt;/strong&gt; &lt;a href="https://sql.marcus-belz.de/en/transfer-data-sql-server-to-postgresql/" rel="noopener noreferrer"&gt;Transferring Data: bcp, COPY, pgloader, ETL — Which Method When&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code porting:&lt;/strong&gt; &lt;a href="https://sql.marcus-belz.de/en/port-t-sql-to-pl-pgsql/" rel="noopener noreferrer"&gt;Porting T-SQL to PL/pgSQL — Migrating Procedures and Functions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification:&lt;/strong&gt; &lt;a href="https://sql.marcus-belz.de/en/verify-migration-data-quality/" rel="noopener noreferrer"&gt;Verifying the Migration — Data Quality and Row Reconciliation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more on adjacent topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/postgres-table-conventions/" rel="noopener noreferrer"&gt;Postgres Table Conventions&lt;/a&gt; — what the target schema looks like idiomatically&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/etl-vs-elt-explained/" rel="noopener noreferrer"&gt;ETL vs. ELT — Explained&lt;/a&gt; — transfer patterns and tool choice&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/data-quality-checks-with-sql/" rel="noopener noreferrer"&gt;Data Quality Checks with SQL&lt;/a&gt; — the checking framework for the verification phase&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://sql.marcus-belz.de/en/type-conversion-basics-t-sql/" rel="noopener noreferrer"&gt;Data Quality // Fundamentals of Type Conversion with T-SQL&lt;/a&gt; — background on the conversion pitfalls&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
