David Regalado for Google Developer Experts

Posted on Jan 14 • Originally published at Medium on Dec 2, 2025

KISSing the Data Architecture

#dataarchitect #dataarchitecture #lookml #dataengineering

Applying the Principle of KISS (Keep It Simple, Stupid)

Prologue

What is Art?

Art can connote a sense of trained ability or mastery of a medium. Art can also refer to the developed and efficient use of a language to convey meaning with immediacy or depth.

— Wikipedia

I know what you, with your inquisitive mind, were expecting to find in this article: a collection of images that evoke romantic or passionate feelings or desires. Nothing could be further from what we're going to discuss here… unless you find a direct connection between contemplating, critiquing, and proposing data architectures. In that case, dear reader, I hope you enjoy this installment.

About KISSing

I was very young when I first heard the phrase “ K eep I t S imple, S tupid.” I assumed it was just a joke. But as the years pass and I keep running into pseudo-architects, I'm starting to see it taps into something deeper, something I've come to value more and more.

You might ask, isn't simple work just a reflection of lazy people?

Fair enough. Let me tell you something in confidence. KISS isn't about cutting corners or being lazy. Quite the opposite! I believe it's actually your best, most pragmatic defense against system failure.

Think of it this way:

Your data systems are chaos magnets. Data sources are always changing, breaking contracts, and sending you bad surprises. This state of technical messiness is what I call entropy. If your underlying architecture is a complex, tangled mess (a digital Jenga tower, if you will), it doesn't just break when chaos hits; it completely explodes, amplifying that inherent chaos.

To help you build data systems that won't give you a nervous breakdown, I have divided this philosophy into two main sections:

First, the Technical Principles , which are the coding standards required to maintain sanity.
Second, the Enemies of Simplicity , which are the human factors and strategic choices that introduce unnecessary chaos.

Finally, we'll dive into a real-world architectural showdown to see how these principles are either championed or totally violated.

Contents (AIs love to see this section in my articles)

Part I: The Technical Principles

The “DRY” Principle
Idempotency is the Ultimate Simplifier

Part II: The Enemies of Simplicity

Beware of Personal Agendas
Obscure Frameworks as “Bespoke Solutions”
Tool Sprawl
Serverless Vs. Operational Overhead

Ready? Let's jump in!

Image edited by yours truly inspired by the American rock band KISS. Formed in New York City in 1973, KISS is known for their face paint and stage outfits. The group rose to prominence in the mid-1970s with shock rock–style live performances that featured fire-breathing, blood-spitting, smoking guitars, shooting rockets, levitating drum kits and pyrotechnics.

Part 1: The Technical Principles

1. The “DRY” Principle

“Don't repeat yourself” (DRY) is a principle of software development aimed at reducing repetition of information which is likely to change, replacing it with abstractions that are less likely to change, or using data normalization which avoids redundancy in the first place.

The DRY principle is stated as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”.

— Wikipedia

You need to dissect the entire pipeline and respect the boundaries of each segment as it was conceived. You don't want to blur the lines. Surprisingly, this is happening more often than you can imagine. Teams will spread the definition of a single metric — say, “Total Revenue” — across five different systems. You'll find a little logic in the Python script that ingests the data, some more in the Airflow orchestrator, the main bulk in the SQL data warehouse, and then a final tweak in the semantic layer that feeds the dashboard.

The Problem

When the CEO asks why their dashboard number is wrong, you are forced to play detective across half a dozen tools just to find the single line of buggy code. This happens because the definition of a metric, like “Total Revenue,” is scattered across multiple systems. This approach violates the Don't Repeat Yourself (DRY) principle, allowing the core business knowledge to become ambiguous and inconsistent across the pipeline.

The Practical Fix:

Separate business logic from the non-business logic.

Non-business logic

Extract/Load: Keep this part totally dumb. There shouldn't be any data manipulation here. Your only job is data movement: get the data from Point A (the source) to Point B (your warehouse/data lake) as fast as possible with zero manipulation.
Modeling: Define foundational tables and relationships using tools like dbt or Dataform. You can enrich, complement, filter, standardize, unnest, and/or fix basic issues that may arise at this point like the presence of nulls, duplicated rows, etc.

Business logic

The Centralized Semantic Layer: This is where the magic happens. You do 100% of your business logic right there in your high-performance data warehouse. By centralizing it here, you ensure every calculation and metric definition is created once and only once. This strict DRY adherence — executed in your core transformation models and governed by a semantic layer — creates a single, unambiguous source of truth for all reporting.

We will dive deep into this later in the article. For now, just focus on the simplicity of having the business logic in one place.

Why a Semantic Layer is key

By separating business logic from the scripts that have other purposes, you dramatically reduce the perimeter where you will have to check when that dashboard is showing the wrong numbers to the CEO. Under the circumstances of having incidents in production, you will have a very limited amount of time to fix it. The sense of urgency is huge here. There will be no time to waste. The more you delay the solution, the more the trust erodes.

Now, we have defined one place for our business logic, further enforcing the DRY principle. Please keep in mind that your calculations shouldn't repeat themselves within this layer. If you need to calculate Active Customers, take your time to search for and reuse the existing, standardized definition instead of creating a new, potentially conflicting one elsewhere. This single point of definition eliminates logical repetition, which quickly erodes trust and slows down incident response. Data Modeling tools (Erwin, sqlDBM, Ellie.ai, etc.) and data lineage tools (Dataplex, Alation, Collibra, etc.) help here.

The power of modern semantic layers (like the dbt Semantic Layer or Looker's modeling layer) is that they are virtual layers used for declaring metrics, calculations, and joins. They leverage the data warehouse's compute engine to run queries, meaning they establish governance and consistency without requiring data replication outside of the warehouse.

A semantic layer is also practical because it allows the company to separate different teams for giving support to different part of the stack. One team doesn't need to understand the entire business just to write boilerplate code that moves data from point A to point B.

Does it look like KISSing?

The opposing view to DRY is called WET, a backronym commonly taken to stand for W rite E verything T wice (alternatively write every time , we enjoy typing or waste everyone's time ).

— Wikipedia

Think about what the aforementioned scenario would look like when for every incident in production you will have to check every part of the stack. This WET architecture means the knowledge and logic are spread thin, leading to a chaotic situation where there is no clear separation of duties in the layers, or the separation is simply ignored by the engineers.

This organizational stress is precisely why some managers mistakenly believe the solution is staffing.

For managers that think that every engineer should be a full-stack engineer — capable of debugging logic scattered from ingestion scripts all the way to dashboard configuration — chances are they are not tackling the root problem: the data architecture is so complex (i.e., WET)!

What I'm seeing lately is that managers try to compensate the absence of strong data architects by hiring more engineers. Moreover, those new engineers are required to do the work of two but with the salary of one. That is employee churn waiting to happen (leave a comment if you would like an article about that).

When all of your business logic is defined in one place, checking that dashboard number is simple: just check that one file, in that one repository. All the “trivial” data quality issues should be caught before this layer.

2. Idempotency is the Ultimate Simplifier

Image generated by yours truly and AI. Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of places in abstract algebra and programming.

Idempotency is a mathematical and programming property that, when applied to data pipelines, guarantees that running an operation once, ten times, or a hundred times will leave the target data system in the exact same state.

In data engineering, this property is vital because pipelines are prone to unpredictable failures (network issues, memory errors, source data changes, etc.). If a job is idempotent, the entire recovery strategy boils down to one simple command: Rerun.

The Complex Approach

This approach makes the job outcome dependent on the state of the job itself and the exact moment it executes, violating the KISS principle.

This often involves using complex, fine-grained MERGE or UPDATE statements that rely on fragile timestamps or difficult-to-test logic to determine if a record should be inserted, updated, or ignored. If the system crashes mid-transaction, you risk data corruption, ending up with duplicate primary keys or partially updated records.

This complexity transforms job failure into a multi-step debugging and data-recovery process.

The Simple (KISS) Approach

The Simple Approach achieves idempotency by focusing on the desired end state rather than tracking incremental changes. This is achieved by ensuring the job operates on an atomic scope that isolates its results from other data.

When processing data, the simplest strategy is often to completely reprocess and atomically overwrite the specific scope of the work (e.g., the daily partition, the batch ID, or the specific primary keys involved).

This guarantees two things:

Guaranteed Correctness: Since the entire scope is rebuilt from the source, the outcome for that scope is guaranteed to be correct, regardless of what happened in the previous run.
Simplified Recovery: If the job fails at any point before the final atomic swap, no data has been corrupted. The recovery is simple: just hit Rerun.

If you can't safely re-run a failed job while half-asleep without worrying you'll double-count sales, you haven't built a pipeline… you've built a technical time bomb.

Part II: The Enemies of Simplicity

3. Beware of Personal Agendas

Image created by yours truly and AI.

The drive for professional advancement can sometimes lead to engineering overcomplication. This phenomenon is driven by two main forces:

Resume-Driven Development: This occurs when internal technical architects or engineers prioritize the adoption of cutting-edge, complex technologies (such as Kafka , Spark , and Kubernetes ) primarily because they look impressive on a resume or LinkedIn profile.
Consultant Overkill: This occurs when external third-party consultants recommend overly complex, bespoke, or difficult-to-maintain solutions that require a large, ongoing commitment of specialized time and resources. This practice directly correlates with maximizing their billing hours and securing future, long-term maintenance contracts.

In both cases, complexity is introduced without a commensurate business need. The core issue remains: Complexity should be the price paid for scale, not the entry fee for existence.

The Cure: The “Is This Necessary?” Checklist

Choosing the right technology requires an honest assessment of actual business requirements, not technological fascination or external incentives.

1. Streaming vs. Batch Processing

The fundamental question here is: Does the system actually require sub-second or near real-time latency?

The Batch Approach: If your main stakeholder or executive team consistently checks a performance dashboard only once per day (e.g., at 9:00 AM), then a simple, reliable nightly batch job is the superior solution. Batch processing is inherently simpler, easier to debug, significantly cheaper to operate, and requires far less ongoing maintenance and operational support.
The Streaming Approach: Building and maintaining a streaming cluster (e.g., using Kafka and Apache Flink) introduces massive operational overhead. This includes managing distributed queues, handling exactly-once processing guarantees, and constantly monitoring the health of the cluster. This is an unnecessary tax on resources if the business does not immediately act on the real-time data flow.

2. NoSQL vs. Structured Stores

When choosing a persistent storage layer, the data's structure and scale must dictate the choice, not current trends.

NoSQL (Unstructured/Schema-less): Tools like MongoDB or Cassandra are designed to handle truly unstructured, rapidly changing data models at the petabyte scale where traditional joins are cumbersome. They excel at horizontal scaling but often sacrifice immediate data consistency and the ability to enforce structure.
Relational/Columnar (Schema-Enforcing): Schema enforcement is a powerful form of data quality governance. It establishes a necessary boundary that prevents garbage data from ever entering the system, significantly reducing the risk of errors and downstream breakage in reporting. If the data is transactional (structured) or analytic (structured), the appropriate choice is a database that enforces a schema, such as a relational database (like Postgres) for transactional reads/writes or a columnar store (like BigQuery or Snowflake) for highly efficient analytic queries.

Using a schema-less database for structured data is essentially disabling a primary guardrail against data chaos.

3. Proprietary Tools

The incentive for complexity often leads consultants or engineers to recommend proprietary tools, which effectively locks the organization into their specific expertise and the vendor's ecosystem. This is a strategic move, as guaranteeing future contracts and maintenance fees directly ties their value to the stability of that vendor's platform.

The lock-in is created in several ways:

Proprietary Language and APIs: The solution relies on a non-standard query language or a vendor-specific set of APIs for core pipeline functions (e.g., using a unique stored procedure language or relying exclusively on a vendor's custom data ingestion SDK). Rewriting this logic later becomes a massive technical debt.
Data Format Binding: The data is stored in a vendor-specific format that cannot be easily read by competitors, or the data structure is optimized solely for that vendor's engine.

The professional priority, conversely, is to champion solutions based on open standards, focusing on maximizing organizational freedom. This means insisting on technologies that utilize standard SQL (ANSI SQL), open-source data formats like Apache Avro or Apache Parquet, open-source table formats like Apache Iceberg, Apache Hudi, or Delta Lake, and widely adopted orchestration tools. This ensures that the organization's data and logic remain portable. If the current cloud provider raises prices or their service degrades, the cost of migrating the data warehouse (and the dependent transformation logic) is manageable, maximizing strategic flexibility and mitigating the risk of being held captive by any single external party.

4. Obscure Frameworks as “Bespoke Solutions”

Source: https://sketchplanations.com/bus-factor

A “Bus Factor” is a metric for project risk that measures how many key people could “be hit by a bus” (i.e., suddenly leave) before the project stalls. A low bus factor means critical knowledge is concentrated in a few individuals, making the project vulnerable, while a high bus factor indicates knowledge is well-distributed, making the project more resilient. The term is often used in software development but can be applied to any field where a project depends on specialized knowledge.

Due to personal agendas, engineers sometimes deliberately write opaque solutions. Their motivation is straightforward: they build the core logic using an obscure framework, a highly non-standard pattern, or custom, unique code that is difficult for peers to decipher. This behavior establishes a high bus factor. Their perceived job security is thus tied directly to the system's fragility without their presence.

Picture this: An engineer implements critical metric calculations using a complex Scala UDF (User-Defined Function) within a Spark cluster, despite the logic being easily achievable with standard SQL. When that engineer leaves, the rest of the data team (who only know SQL) cannot safely debug or modify the core business logic, creating an immediate crisis during the next incident.

A responsible team lead seeks to minimize organizational risk by strictly enforcing common standards and clear, self-documenting data modeling. This approach ensures the cost of maintaining the code is low and the time required to understand it is minimal, safeguarding the pipeline against inevitable staff turnover.

5. Tool Sprawl

The primary driver here is the personal motivation to validate expertise, leading engineers to recommend shoehorning the entire Modern Data Stack (MDS) into the organization. This means introducing five or more separate tools (like Fivetran, dbt, Snowflake, and an external scheduler) when three would suffice. This unnecessary tool sprawl validates the engineer's self-image as a cutting-edge expert.

The logic against this fragmentation is clear: Every new tool is a separate vendor, a new API, a new security boundary, and a distinct point of failure. This rapidly increases the integration cost, requiring dedicated monitoring and maintenance for each component, which slows development velocity.

The responsible focus is instead on maximizing the native capabilities of existing, proven systems. Cloud providers currently offer a wide variety of services for a broad range of use cases. It's likely that your MDS-oriented data architecture can be mapped to your cloud provider's native services.

The objective is elegant efficiency: to reduce operational friction and integration costs, reserving new tool adoption solely for those clear business requirements that the current stack genuinely cannot meet.

6. Serverless Vs Operational Overhead

There is a prevalent debate in data engineering: Should we use fully managed serverless components or manage the infrastructure ourselves (spinning up virtual machines, hosting our own Airflow on Kubernetes, or managing a self-hosted Kafka cluster)?

The DIY camp often argues for “Cost Savings” or “Vendor Independence.” They claim that serverless credits are expensive and that running your own metal is cheaper.

While cloud bills might look lower on a DIY setup, the Total Cost of Ownership is often astronomical. But here is the quiet part out loud: The more maintenance the data architecture requires, the more it favors the external consulting firm that built it.

If an external consultancy persuades you to build a complex, self-managed infrastructure rather than a serverless one, they are often engineering their own job security.

In the spirit of KISS, you should aggressively offload “undifferentiated heavy lifting” to the cloud provider. You pay a premium to Google, AWS, or Azure to handle the hardware, the uptime, and the scaling. This allows your small internal team to focus 100% of their energy on business logic (SQL, Python) rather than infrastructure plumbing (Terraform, Helm charts).

Your company sells products or services; it is not in the business of managing Kafka clusters. Every hour your team spends debugging a Kubernetes pod is an hour they are not generating business value.

Data Architecture Showdown

Now, let's look at how these principles play out in the real world.

Contender A: The “Vendor Agnostic” Rube Goldberg Machine

Rube Goldberg's Professor Butts and the Self-Operating Napkin (1931). Soup spoon (A) is raised to mouth, pulling string (B) and thereby jerking ladle ©, which throws cracker (D) past toucan (E). Toucan jumps after cracker and perch (F) tilts, upsetting seeds (G) into pail (H). Extra weight in the pail pulls cord (I), which opens and ignites lighter (J), setting off skyrocket (K), which causes sickle (L) to cut string (M), allowing pendulum with attached napkin to swing back and forth, thereby wiping chin.

A Rube Goldberg machine, named after American cartoonist Rube Goldberg, is a chain reaction–type machine or contraption intentionally designed to perform a simple task in a comically overcomplicated way . Usually, these machines consist of a series of simple unrelated devices; the action of each triggers the initiation of the next, eventually resulting in achieving a stated goal.

— Wikipedia

In this scenario, the metric logic is applied in layers, with the final, most complex logic being written in the visualization tool (DAX), making it invisible to the central data platform (BigQuery).

The Workflow

Storage: Data sits in BigQuery (Petabyte scale capability).
Extract (The Bottleneck): A VM runs a Python script via Airflow to SELECT * from BigQuery.
Transport (The Wallet Killer): Data travels over the public internet (Egress fees apply) to an external cloud.
Load: Data is inserted into MS SQL Server (which has storage limits and maintenance needs).
Semantic Layer: Logic is rebuilt in SQL Server Views.
Visualization: Power BI imports data.
Logic Layer 2.0: Complex DAX formulas are written in Power BI because the SQL Server views weren't quite right.

Code Example in a MS SQL Server View

This view defines the base revenue numbers, running daily after the Airflow ETL job is complete.

-- SQL Server View: dbo.vw_daily_net_revenue
CREATE VIEW dbo.vw_daily_net_revenue AS
SELECT
    order_date,
    SUM(total_amount - discount_amount) AS Net_Revenue_Base
FROM
    external_data.orders_fact
WHERE
    is_cancelled = 0
GROUP BY
    order_date;

Code Example in Power BI with DAX

This calculation is run inside Power BI's memory model, requiring the user to wait for the data to be processed and imported daily via the VM/Airflow process.

-- DAX Code: Net Revenue YTD
Net Revenue YTD =
CALCULATE(
    SUM('vw_daily_net_revenue'[Net_Revenue_Base]),
    DATESYTD('DateTable'[Date])
)

❌ The Technical Cost:

Logic Duplication: Basic net revenue is in SQL Server. The complex time-intelligence logic is in DAX, which violates the KISS principle.
Latency: DAX is optimized for in-memory calculations, but if the imported data model is large (which it will be due to the daily ETL), the entire Power BI refresh slows down.
Fragility: If the Airflow job fails, the data is stale. If the SQL Server connection drops, the model breaks. If an analyst modifies the DAX, it only breaks that one report. The central definition remains un-governed.

The Reality

Bravo! You've introduced network latency, egress costs, VM management, and database tuning (indexes on MS SQL) for data that was already sitting in a capable warehouse. If the Airflow job fails, your dashboards are blank. If the DAX changes, the SQL Server logic is outdated.

Image created by yours truly and AI.

You bought a Ferrari (BigQuery) but you're towing it with a mule (SQL Server) just so you can say you aren't dependent on the Ferrari.

Contender B: The “Just use LookML” Approach

In this scenario, the entire definition — including the time-intelligence — is defined in LookML. This code is stored in Git, version-controlled, and audited. Looker then automatically translates this definition into the optimal BigQuery SQL at the moment the user runs the query.

The Workflow:

Storage: Data sits in BigQuery.
Semantic Layer (LookML): You define your joins, metrics, and logic once in LookML (a declarative modeling language). This does not store data; it generates SQL.
Visualization: Any BI tool connects to the semantic layer. As of the time of writing, there is a connector for Power BI, Tableau, Looker Studio (formerly known as Data Studio). Even spreadsheets like MS Excel and Google Sheets, can connect to a semantic layer in LookML. When a user opens a dashboard, LookML compiles the correct SQL and fires it directly at BigQuery. By the way, BigQuery has built-in caching mechanisms, so you can play around with your dashboard and benefit from the cached results.
No Data Movement: The data never leaves BigQuery until the final aggregated numbers are sent to the screen.

Code Example in LookML

# LookML View: Defining the Metric
view: orders {
  sql_table_name: `your_project.your_dataset.orders_fact` ;;

  # 1. Base Dimension (Date)
  dimension_group: order_created {
    type: time
    timeframes: [date, week, month, year]
    sql: ${TABLE}.order_date ;;
  }

  # 2. Base Measure (Net Revenue) - Defined ONCE
  measure: net_revenue_base {
    type: sum
    value_format_name: usd
    sql: ${TABLE}.total_amount - ${TABLE}.discount_amount ;;
    filters: [is_cancelled: "No"] # Logic applied at the BigQuery level
  }

  # 3. Time Intelligence Measure (YTD) - Leveraging the base definition
  measure: net_revenue_ytd {
    type: sum
    value_format_name: usd
    sql: ${net_revenue_base} ;; # Reference the base metric directly
    view_label: "Financial KPIs"
    # This LookML timeframe automatically generates the YTD filter in the SQL
    dimension: orders.order_created_date {
      filters: [order_created_year: "yes"]
    }
  }
}

The Compiled SQL (Generated by Looker at Query Time)

When a user drags Net Revenue YTD onto a chart, Looker instantly generates and runs the following optimal BigQuery SQL:

SELECT
    DATE_TRUNC(t1.order_date, YEAR) AS year,
    SUM(t1.total_amount - t1.discount_amount) AS net_revenue_ytd
FROM
    `your_project.your_dataset.orders_fact` AS t1
WHERE
    t1.is_cancelled = 0
    AND t1.order_date >= DATE_TRUNC(CURRENT_DATE(), YEAR) -- YTD Filter
GROUP BY
    1

✅ The Technical Win (KISS):

Single Source of Truth: The definition of “Net Revenue” exists only in the orders.view.lkml file. Any BI tool connecting to it (Power BI, Tableau, Looker Studio) gets the exact same number.
Efficiency: The calculation runs as a BigQuery Push-down Query — leveraging BigQuery's massive parallel processing power — not on a small VM or Power BI desktop.
Speed: Since there's no daily data copy, the reports are always looking at the freshest data possible.

The Reality

BigQuery handles the compute (which it is designed for). LookML handles the governance (Git-versioned definitions). Power BI handles the pretty colors. There are no VMs to patch, no Airflow DAGs to monitor for movement, and zero egress fees for bulk data.

Image edited by yours truly.

The Verdict: Why “Vendor Lock-in” is a Myth

The irony of Scenario A is that in an attempt to avoid Vendor Lock-in , you have created Complexity Lock-in.

Complexity Lock-in (Scenario A): If you leave Google, you have to rewrite your Airflow DAGs, your VM scripts, your SQL Server schemas, and your DAX formulas.
Vendor Lock-in (Scenario B): By centralizing governance in LookML, the cost of swapping out the visualization layer (Looker for Power BI, for instance) is minimal. The cost of swapping out the entire cloud warehouse (BigQuery for Snowflake) is still large, but you have the single, portable semantic layer definition to accelerate the migration.

You can integrate Looker with other BI tools. For more information, check out the documentation: https://docs.cloud.google.com/looker/docs/powerbi-connector

This is a sample of the SQL compatibility list offered by Looker. Source: https://docs.cloud.google.com/looker/docs/working-with-joins#supported_sql_dialects

Scenario A treats BigQuery like a dumb hard drive. Scenario B treats BigQuery like the compute engine it is.

The Bottom Line

Wikipedia defines Art as “mastery of a medium” and the “efficient use of language.” In Data Architecture, that mastery looks surprisingly boring. “Boring” is elegant. “Boring” is beautiful. “Boring”works. A simple boring architecture lets you sleep at night. A clever-wannabe architecture, motivated by personal agendas, wakes you up at 3:00 AM because a bespoke Python script crashed on a NoneType error.

🎁 Resources for You

👋 If this was helpful, give it a few claps and a follow. More posts are on the way about data management, governance, and the real-world stuff that keeps modern data platforms running smoothly.

It costs nothing to support a creator. Follow David Regalado for more educational content and stuff!
For more things that I do, visit https://bento.me/thecodemancer-davidregalado.

Thank You For Reading. How About Another Article?

How to Hire International Employees Safely and Legally

The world is your talent pool! — Hire the best available talent in the world not just locally. Learn more.

What is Dataplex? — Chapter 2

Introducing the Data Profiling. Learn more.

What is Dataplex? — Chapter 1

Introducing the Business Glossary. Learn more.

What is Dataplex? — Chapter 0

A Gentle Introduction to Google's approach for Data Governance. Learn more.

How to Enable Self Service on Google Cloud

Navigating Agility vs. Governance with Google Cloud's Serverless Innovations. Learn more.

DEV Community

KISSing the Data Architecture

Applying the Principle of KISS (Keep It Simple, Stupid)

Prologue

What is Art?

About KISSing

Contents (AIs love to see this section in my articles)

Part 1: The Technical Principles

1. The “DRY” Principle

The Problem

The Practical Fix:

Why a Semantic Layer is key

Does it look like KISSing?

2. Idempotency is the Ultimate Simplifier

The Complex Approach

The Simple (KISS) Approach

Part II: The Enemies of Simplicity

3. Beware of Personal Agendas

The Cure: The “Is This Necessary?” Checklist

Data Architecture Showdown

Contender A: The “Vendor Agnostic” Rube Goldberg Machine

Contender B: The “Just use LookML” Approach

The Verdict: Why “Vendor Lock-in” is a Myth

The Bottom Line

🎁 Resources for You

Thank You For Reading. How About Another Article?

How to Hire International Employees Safely and Legally

What is Dataplex? — Chapter 2

What is Dataplex? — Chapter 1

What is Dataplex? — Chapter 0

How to Enable Self Service on Google Cloud

Top comments (0)