<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lameck Odhiambo</title>
    <description>The latest articles on DEV Community by Lameck Odhiambo (@lameck_odhiambo_748e9ef18).</description>
    <link>https://dev.to/lameck_odhiambo_748e9ef18</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3952278%2F4afea2a0-ed01-42f9-8a14-0ce8258f8063.jpg</url>
      <title>DEV Community: Lameck Odhiambo</title>
      <link>https://dev.to/lameck_odhiambo_748e9ef18</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lameck_odhiambo_748e9ef18"/>
    <language>en</language>
    <item>
      <title>Data Modeling, Joins, Relationships and Schemas</title>
      <dc:creator>Lameck Odhiambo</dc:creator>
      <pubDate>Mon, 22 Jun 2026 07:52:40 +0000</pubDate>
      <link>https://dev.to/lameck_odhiambo_748e9ef18/data-modeling-joins-relationships-and-schemas-26ln</link>
      <guid>https://dev.to/lameck_odhiambo_748e9ef18/data-modeling-joins-relationships-and-schemas-26ln</guid>
      <description>&lt;p&gt;Before data reaches its final used destination it needs to be organized in a structured way to enable easy retrieval performance and good storage - This is data modeling. Data modeling is the process of creating a blue print of how data is connected, stored and retrieved in a system. This enables you to create an organized structure for your tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasons for data modeling
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Data consistency&lt;/li&gt;
&lt;li&gt;Optimize performance of the queries&lt;/li&gt;
&lt;li&gt;Scalability and maintenance&lt;/li&gt;
&lt;li&gt;Optimize cost in storage&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Layers of data modeling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1.Conceptual Data Model&lt;/strong&gt;&lt;br&gt;
The highest-level, business-focused view. It defines what data is being collected and how business concepts relate to one another that is subject, characteristics and relation. Invovles gathering information from stake holders.&lt;br&gt;
&lt;strong&gt;Agile&lt;/strong&gt; and &lt;strong&gt;Waterfall&lt;/strong&gt; method of gathering requirements - Waterfall and Agile are two fundamentally different approaches to project management. Waterfall is a linear, step-by-step process where each phase must be completed before the next begins. Agile is an iterative, flexible approach that breaks projects into smaller cycles for continuous improvement and rapid delivery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvo94bruryr4vrtiuwit6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvo94bruryr4vrtiuwit6.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focus:Business entities (e.g., Customers, Products, Orders) and their relationships.&lt;/li&gt;
&lt;li&gt;Audience: Business stakeholders, domain experts, and product managers.&lt;/li&gt;
&lt;li&gt;Details: Tech-agnostic; no attributes, data types, or system implementations are specified&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Logical Data Model&lt;/strong&gt;&lt;br&gt;
The bridge between the business requirements and the technical solution. It defines structure by establishing facts (events) and dimensions (context).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focus: Data attributes, primary/foreign keys, and specific data objects.&lt;/li&gt;
&lt;li&gt;Audience: Data architects and business analysts.&lt;/li&gt;
&lt;li&gt;Details:Technology-neutral; independent of the specific database management system (DBMS) being used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here we come up with an ER(Entity)- Relation Diagram&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Firz4krzdu017ih9fabcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Firz4krzdu017ih9fabcp.png" alt=" " width="800" height="677"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Physical Data Model&lt;/strong&gt;&lt;br&gt;
The most technical and concrete layer. It dictates exactly how the data will be stored and structured in a specific database.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focus: Table names, column specifications, data types, storage methods, and compression techniques.&lt;/li&gt;
&lt;li&gt;Audience: Database administrators, developers, and data engineers.&lt;/li&gt;
&lt;li&gt;Details: Highly specific to a chosen engine (e.g., PostgreSQL, Snowflake, BigQuery)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Types of Data modeling
&lt;/h1&gt;

&lt;h1&gt;
  
  
  1. OLTP (Online Transactional Processing)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;The process of designing databases to handle high volumes of fast, real-time, day-to-day transactions (such as e-commerce checkouts or banking transfers). Its primary goal is to ensure data integrity, eliminate redundancy, and support rapid write, update, and delete operations.&lt;/li&gt;
&lt;li&gt;This is the fast step taken before moving data to a datawarehouse from  a database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Principles of OLTP Modeling
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Normalization (up to 3NF):&lt;/strong&gt; Data is broken down into smaller, logical tables to eliminate duplication. For instance, a customer’s address will live in a single Addresses table rather than being repeated on every single order.&lt;br&gt;
&lt;strong&gt;Entity-Relationship (ER) Design:&lt;/strong&gt; Models are created by identifying distinct entities (e.g., Customers, Products, Orders) and establishing strict relationships (e.g., one-to-many, many-to-many) between them.&lt;br&gt;
&lt;strong&gt;ACID Compliance:&lt;/strong&gt; The model prioritizes atomicity, consistency, isolation, and durability so that complex, multi-step transactions either succeed entirely or roll back cleanly without data corruption&lt;/p&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Implement Strong Constraints:&lt;/strong&gt; Use Primary Keys (PK), Foreign Keys (FK), UNIQUE constraints, and NOT NULL rules at the database level to enforce strict data integrity.&lt;br&gt;
&lt;strong&gt;Index Wisely:&lt;/strong&gt; Index your Primary and Foreign Keys to speed up row retrieval, but avoid over-indexing, as this will slow down write-heavy transactions.&lt;br&gt;
&lt;strong&gt;Choose the Right Technology:&lt;/strong&gt; Utilize robust Relational Database Management Systems (RDBMS) like postgresql, oracle or MySQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Data Types in Data Modeling
&lt;/h3&gt;

&lt;p&gt;Data types are generally divided into standard primitive types and advanced complex structures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Numeric Types&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Integer:&lt;/em&gt; Stores whole numbers without decimals (e.g., ID numbers or inventory counts).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Float / Real:&lt;/em&gt; Stores approximate numerical values with fractional decimals for scientific data.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Decimal / Numeric:&lt;/em&gt; Stores exact fixed-point decimals, making it ideal for financial amounts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;String and Text Types&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;CHAR:&lt;/em&gt; Holds fixed-length text character strings, padding shorter inputs with spaces.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;VARCHAR:&lt;/em&gt; Holds variable-length text strings up to a specified maximum length.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;TEXT / CLOB:&lt;/em&gt; Stores large blocks of character data, such as product descriptions or articles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Date and Time TypesDATE:&lt;/strong&gt; Records calendar dates consisting of the year, month, and day.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;TIME:&lt;/em&gt; Captures precise hours, minutes, and seconds.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;TIMESTAMP:&lt;/em&gt; Combines date and time to track real-time systemic events or logs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Logical and Binary&lt;/strong&gt; &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;TypesBoolean:&lt;/em&gt; Evaluates to true or false states to support logical checks.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;BLOB:&lt;/em&gt; Keeps raw binary large objects, including uploaded imagery, video files, or document attachments.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Complex and Semi-Structured Types&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Array:&lt;/em&gt; Groups a list of multiple values inside a single column field.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Struct / JSON:&lt;/em&gt; Embeds a nested key-value format block to represent flexible, semi-structured object details&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Primary keys in a Database
&lt;/h2&gt;

&lt;p&gt;In a database, a key is an attribute (column) or a collection of attributes used to uniquely identify rows within a table and establish relationships between multiple tables. Keys are foundational for enforcing data integrity, preventing duplication, and ensuring efficient data retrieval.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5ch6pi31nr3zftqvars4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5ch6pi31nr3zftqvars4.png" alt=" " width="681" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Database Keys Matter
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Enforce Uniqueness&lt;/em&gt;: They stop identical duplicate rows from muddying your datasets.&lt;br&gt;
&lt;em&gt;Connect Data:&lt;/em&gt; They link related concepts (e.g., matching a CustomerID foreign key in an Orders table back to the master Customers profile).&lt;br&gt;
&lt;em&gt;Speed Up Searches:&lt;/em&gt; Database engines automatically build indexes around key fields, drastically accelerating query performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Relationships in a database
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Database relationships are logical links established between two or more tables based on a common column. In a relational database management system (RDBMS) like MySQL or PostgreSQL, these connections dictate how records interact. They use Primary Keys (PK) and Foreign Keys (FK) to eliminate redundant data and maintain data integrity&lt;/li&gt;
&lt;li&gt;Inorder to connect different entities in a database we need relationships, to configure relationships we need cardinality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcwybibpp6dfx034da7pm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fcwybibpp6dfx034da7pm.png" alt=" " width="738" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Used Cases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One to Many&lt;/strong&gt;&lt;br&gt;
A one-to-many (1:N) relationship occurs when a single record in one table (the parent) links to multiple records in another table (the child), but each child record maps back to exactly one parent record. It is the most common pattern in database design because it minimizes redundant data and enforces clear hierarchies&lt;br&gt;
e.g Customers and Orders, Departments and Employees&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2og08qw5e4m28dbj0kg3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2og08qw5e4m28dbj0kg3.png" alt=" " width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Many to Many&lt;/strong&gt;&lt;br&gt;
A many-to-many (M:N) relationship occurs in a database when multiple records in one table are associated with multiple records in another table. Relational database systems cannot link two tables directly in this manner because doing so violates database normalization principles, leading to severe data duplication and maintenance issues.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdft58u6ozmc7gymqrqvp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdft58u6ozmc7gymqrqvp.png" alt=" " width="288" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One to One&lt;/strong&gt;&lt;br&gt;
A one-to-one (1:1) database relationship occurs when a single record in Table A is linked to exactly one record in Table B, and vice versa. It means each row in either table has a maximum of one matching row on the opposite side. eg Person and Passport,Country and Capital City,Car and License Plate,Store User and Shopping Cart,Employee and Desk Assignment,App Account and Premium Subscription &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxhvh0lbpmfchc53o0ybf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fxhvh0lbpmfchc53o0ybf.png" alt=" " width="385" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Normalization
&lt;/h2&gt;

&lt;p&gt;Database normalization is a systematic design process used to organize data in a relational database to minimize data redundancy and eliminate data modification anomalies. e.g 1NF, 2NF,3NF&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After creating these relationships, the next process is the last layer  that is the physical layer. Implementing the conceptual layer and logical layer by writing SQL scripts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  2. OLAP Data Modeling(Online Analytical Processing)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source of data are the databases created using OLTP data modeling.
Online Analytical Processing (OLAP) data modeling structures data for rapid querying and business intelligence. It organizes information into a multidimensional model &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Database ---------&amp;gt; Bronze -----------&amp;gt; Silver ----------&amp;gt; Gold&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bronze&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact replica of tables from database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Silver&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transformed data&lt;/li&gt;
&lt;li&gt;Aggregations e.g One big table(OBT)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Gold&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dimension data model&lt;/li&gt;
&lt;li&gt;Fact and Dimensions Tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsyv5d86u4xfrun2zi6gc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsyv5d86u4xfrun2zi6gc.png" alt=" " width="800" height="629"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most common ways to physically structure OLAP models are through specific schemas in a data warehouse&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Star Schema&lt;/strong&gt;&lt;br&gt;
The most widely used and recognizable model.Structure: Consists of a central fact table surrounded by multiple dimension tables.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fact Table:&lt;/em&gt; Contains the quantitative measurements (e.g., Sales Amount, Units Sold) and foreign keys mapping to the dimensions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Dimension Tables:&lt;/em&gt; Highly denormalized tables containing descriptive attributes (e.g., Customer Name, Store Location, Product Category).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Benefit&lt;/em&gt;: Simplicity and extremely fast read times, as it requires fewer table joins to get analytical results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4h5arazyf630l53ehr9x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4h5arazyf630l53ehr9x.png" alt=" " width="279" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Snowflake Schema&lt;/strong&gt;&lt;br&gt;
A refinement of the star schema.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Structure:&lt;/em&gt; Similar to the star schema, but the dimension tables are normalized, meaning they branch out into sub-dimension tables.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; A "Product" dimension might connect to a "Category" sub-dimension, which connects to a "Department" sub-dimension.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Benefit:&lt;/em&gt; Reduces data redundancy and takes up less storage space, though queries may require more complex joins&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqkh74w7n5mevw5mj1765.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqkh74w7n5mevw5mj1765.png" alt=" " width="318" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F26mcee9jip9u6mgono2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F26mcee9jip9u6mgono2c.png" alt=" " width="456" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dimensions tables do change with time, hence need for SCDs( Slowly Changing Dimensions)&lt;/p&gt;

&lt;p&gt;Type 0 - No change&lt;br&gt;
Type 1 - Upsert / Overwrite&lt;br&gt;
Type 2 - Tracking history of changes&lt;br&gt;
Type 3 - Adds new column&lt;/p&gt;

&lt;p&gt;SCDS explained check here &lt;a href="https://en.wikipedia.org/wiki/Slowly_changing_dimension" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Slowly_changing_dimension&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Joins in data modeling
&lt;/h2&gt;

&lt;p&gt;Joins are heavily used in both OLTP and OLAP, but they are used for completely different reasons and perform differently in each system.&lt;br&gt;
In data modeling, joins are operations used to combine rows from two or more tables horizontally into a single dataset, based on a related common key (such as an ID). They are fundamental for integrating normalized databases and bringing related data together for reporting and analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  OLTP (Online Transaction Processing)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;How it’s used:&lt;/em&gt; Joins are necessary. OLTP systems process day-to-day business transactions (like an e-commerce checkout) and use highly normalized schemas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Goal:&lt;/em&gt; Data is split into many small tables (e.g., customers, orders, products) to prevent duplication and ensure fast, accurate data entry.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Impact:&lt;/em&gt; Queries join a few tables together, but they typically only touch a very small number of rows (e.g., a single customer's specific order), making these joins extremely fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  OLAP (Online Analytical Processing)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;How it’s used:&lt;/em&gt; Joins are typically used in relational data warehouses (using Star or Snowflake schemas) to connect a central Fact table to surrounding Dimension tables.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Goal:&lt;/em&gt; OLAP is designed for complex, historical analysis scanning millions of rows. Because large-scale joins are computationally expensive, OLAP models use denormalization (duplicating some data) to keep joins to a minimum and boost query performance.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Impact:&lt;/em&gt; Queries involve multi-table joins and massive aggregations, which naturally take longer (seconds or minutes) but yield deep business insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4 Primary Types of Joins
&lt;/h3&gt;

&lt;p&gt;The type of join you choose determines how unmatched data (rows that don't share a common key) is handled:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;INNER JOIN:&lt;/strong&gt; Returns only the rows where there is a matching value in both tables. If a record doesn't exist on both sides, it is excluded.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdzdvkgznc2vuj4muswln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdzdvkgznc2vuj4muswln.png" alt=" " width="231" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LEFT JOIN (Left Outer):&lt;/strong&gt; Returns all rows from the left table, and the matching rows from the right table. If there is no match on the right side, the result will contain null for the right-hand columns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbeuf4vho6b6772l6tzqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbeuf4vho6b6772l6tzqf.png" alt=" " width="231" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RIGHT JOIN (Right Outer):&lt;/strong&gt; Returns all rows from the right table, and the matching rows from the left table. If there is no match on the left side, the result will show null for the left-hand columns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvfychzc999xsnwyhmpw3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvfychzc999xsnwyhmpw3.png" alt=" " width="231" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FULL JOIN (Full Outer):&lt;/strong&gt; Returns all records when there is a match in either the left or right tables. If there is no match on either side, the result will contain null.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvybb0fjm75uoeo16iuxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvybb0fjm75uoeo16iuxl.png" alt=" " width="231" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Point worth noting
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Joins&lt;/strong&gt;&lt;/em&gt; vs. &lt;em&gt;&lt;strong&gt;Relationships&lt;/strong&gt;&lt;/em&gt; Joins physically merge or combine datasets to create a new, static result set (commonly used in SQL queries or Power Query).Relationships establish an ongoing, logical connection between tables so the modeling tool (like Microsoft Power BI or Tableau) can calculate metrics across the tables dynamically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Master Blueprint of Data Success&lt;/strong&gt;&lt;br&gt;
Data modeling is not just about organizing tables. It is the secret blueprint that turns messy, raw numbers into powerful business insights. By mastering schemas, relationships, and joins, you build a solid foundation for any data project.The Star Schema serves as your ultimate map, keeping your data clean and organized. Relationships act as smart bridges, letting your tables talk to each other without creating clutter. Meanwhile, joins work like glue to merge data when you need a single, complete view.When these three tools work together, magic happens. Your reports run faster, your numbers stay accurate, and your business can grow without slowing down. In short, a great data model turns confusing data into clear, actionable answers.&lt;/p&gt;

</description>
      <category>data</category>
      <category>database</category>
      <category>datamodeling</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Linux Fundamentals for Data Engineers</title>
      <dc:creator>Lameck Odhiambo</dc:creator>
      <pubDate>Mon, 08 Jun 2026 19:30:49 +0000</pubDate>
      <link>https://dev.to/lameck_odhiambo_748e9ef18/linux-fundamentals-for-data-engineers-2162</link>
      <guid>https://dev.to/lameck_odhiambo_748e9ef18/linux-fundamentals-for-data-engineers-2162</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Linux is a popular open-source operating system modeled after UNIX (Think of Unix as the original blueprint or architectural inspiration, and Linux as a modern, completely independent recreation built using that same blueprint). At its core is the Linux kernel - the base code that manages the communication between a computer's hardware and software.&lt;/p&gt;

&lt;h1&gt;
  
  
  Used cases of Linux other than in Data Engineering?
&lt;/h1&gt;

&lt;p&gt;You likely use Linux every day without realizing it:&lt;br&gt;
&lt;strong&gt;Mobile Devices&lt;/strong&gt;: The Android operating system is built on top of the Linux kernel.&lt;br&gt;
&lt;strong&gt;Servers &amp;amp; Cloud&lt;/strong&gt;: The vast majority of web servers and cloud services (like AWS and Google Cloud) run on Linux.&lt;br&gt;
Smart Home &amp;amp; IoT: Smart TVs, routers, and embedded devices often use Linux.&lt;br&gt;
&lt;strong&gt;Supercomputers&lt;/strong&gt;: An estimated 90% of the world’s supercomputers run on Linux for peak performance and efficiency.&lt;br&gt;
&lt;strong&gt;Gaming&lt;/strong&gt;: Handheld gaming devices and PC gaming platforms (like SteamOS) rely heavily on Linux to run Windows-based games.&lt;/p&gt;

&lt;p&gt;Because we are focusing on Data Engineering lets see how Data Engineers use Linux come along...&lt;/p&gt;

&lt;p&gt;Data engineers use Linux as the underlying foundation for modern data infrastructure, since nearly all cloud environments, container systems, and big data frameworks run natively on Linux servers. &lt;/p&gt;

&lt;h2&gt;
  
  
  Linux used cases for Data Engineers
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Processing data before python touches it&lt;/li&gt;
&lt;li&gt;Building Automation &amp;amp; Ingestion scripts&lt;/li&gt;
&lt;li&gt;Interracting with Cloud Systems and remote servers&lt;/li&gt;
&lt;li&gt;Deploying containers and Orchestration tools&lt;/li&gt;
&lt;li&gt;Debugging and Infrastructure monitoring&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Sample Linux Commands
&lt;/h2&gt;

&lt;h3&gt;
  
  
  File &amp;amp; Directory management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt;                     &lt;span class="c"&gt;# List all files (including hidden) with details&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-lh&lt;/span&gt;                     &lt;span class="c"&gt;# List files with human-readable sizes&lt;/span&gt;
&lt;span class="nb"&gt;pwd&lt;/span&gt;                        &lt;span class="c"&gt;# Print current working directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/dir            &lt;span class="c"&gt;# Change directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~                       &lt;span class="c"&gt;# Go to home directory&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; -                       &lt;span class="c"&gt;# Go back to previous directory&lt;/span&gt;

&lt;span class="nb"&gt;mkdir &lt;/span&gt;foldername           &lt;span class="c"&gt;# Create directory&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; dir1/dir2/dir3    &lt;span class="c"&gt;# Create nested directories&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;filename.txt         &lt;span class="c"&gt;# Create empty file&lt;/span&gt;

&lt;span class="nb"&gt;cp &lt;/span&gt;file.txt /dest/         &lt;span class="c"&gt;# Copy file&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; folder/ /dest/       &lt;span class="c"&gt;# Copy folder recursively&lt;/span&gt;
&lt;span class="nb"&gt;mv &lt;/span&gt;oldname newname         &lt;span class="c"&gt;# Rename or move file/folder&lt;/span&gt;
&lt;span class="nb"&gt;rm &lt;/span&gt;file.txt                &lt;span class="c"&gt;# Remove file&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; folder/             &lt;span class="c"&gt;# Remove folder and contents (use with caution!)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  System Information
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;                   &lt;span class="c"&gt;# Show kernel and system info&lt;/span&gt;
lsb_release &lt;span class="nt"&gt;-a&lt;/span&gt;             &lt;span class="c"&gt;# Show distribution info&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/os-release        &lt;span class="c"&gt;# Show OS details&lt;/span&gt;
&lt;span class="nb"&gt;hostname&lt;/span&gt;                   &lt;span class="c"&gt;# Show hostname&lt;/span&gt;
&lt;span class="nb"&gt;uptime&lt;/span&gt;                     &lt;span class="c"&gt;# Show system uptime&lt;/span&gt;
free &lt;span class="nt"&gt;-h&lt;/span&gt;                    &lt;span class="c"&gt;# Show memory usage (human readable)&lt;/span&gt;
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;                      &lt;span class="c"&gt;# Show disk space usage&lt;/span&gt;
&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-sh&lt;/span&gt; /path               &lt;span class="c"&gt;# Show size of directory&lt;/span&gt;
top                        &lt;span class="c"&gt;# Live process viewer (press q to quit)&lt;/span&gt;
htop                       &lt;span class="c"&gt;# Better interactive process viewer (if installed)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Process Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ps aux                     &lt;span class="c"&gt;# List all running processes&lt;/span&gt;
ps aux | &lt;span class="nb"&gt;grep &lt;/span&gt;nginx        &lt;span class="c"&gt;# Find specific process&lt;/span&gt;
&lt;span class="nb"&gt;kill &lt;/span&gt;1234                  &lt;span class="c"&gt;# Kill process by PID&lt;/span&gt;
&lt;span class="nb"&gt;kill&lt;/span&gt; &lt;span class="nt"&gt;-9&lt;/span&gt; 1234               &lt;span class="c"&gt;# Force kill process&lt;/span&gt;
pkill nginx                &lt;span class="c"&gt;# Kill process by name&lt;/span&gt;
&lt;span class="nb"&gt;jobs&lt;/span&gt;                       &lt;span class="c"&gt;# List background jobs&lt;/span&gt;
&lt;span class="nb"&gt;fg&lt;/span&gt; %1                      &lt;span class="c"&gt;# Bring job to foreground&lt;/span&gt;
&lt;span class="nb"&gt;bg&lt;/span&gt; %1  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                # Send job to background
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  File searching &amp;amp; Content
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;find / &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.txt"&lt;/span&gt; 2&amp;gt;/dev/null   &lt;span class="c"&gt;# Find files by name&lt;/span&gt;
locate filename                    &lt;span class="c"&gt;# Fast search (needs updatedb)&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"search text"&lt;/span&gt; file.txt        &lt;span class="c"&gt;# Search inside file&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"text"&lt;/span&gt; /path/              &lt;span class="c"&gt;# Recursive search in directory&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;file.txt                       &lt;span class="c"&gt;# Display file content&lt;/span&gt;
less file.txt                      &lt;span class="c"&gt;# View file with scrolling&lt;/span&gt;
&lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 20 file.txt                &lt;span class="c"&gt;# First 20 lines&lt;/span&gt;
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 20 file.txt                &lt;span class="c"&gt;# Last 20 lines&lt;/span&gt;
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /var/log/syslog            &lt;span class="c"&gt;# Follow log file in real-time&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Networking
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip addr show               &lt;span class="c"&gt;# Show network interfaces (modern)&lt;/span&gt;
ifconfig                   &lt;span class="c"&gt;# Show interfaces (older)&lt;/span&gt;
ping google.com            &lt;span class="c"&gt;# Test connectivity&lt;/span&gt;
curl &lt;span class="nt"&gt;-I&lt;/span&gt; https://example.com &lt;span class="c"&gt;# Get HTTP headers&lt;/span&gt;
wget https://example.com/file.zip
ssh user@192.168.1.100     &lt;span class="c"&gt;# SSH into remote server&lt;/span&gt;
scp file.txt user@host:/path/   &lt;span class="c"&gt;# Copy file via SSH&lt;/span&gt;
netstat &lt;span class="nt"&gt;-tuln&lt;/span&gt;              &lt;span class="c"&gt;# Show listening ports&lt;/span&gt;
ss &lt;span class="nt"&gt;-tuln&lt;/span&gt;                   &lt;span class="c"&gt;# Modern alternative to netstat&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Package Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#### Debian/Ubuntu&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt upgrade
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;htop
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt remove htop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  User &amp;amp; Permissions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;whoami&lt;/span&gt;                     &lt;span class="c"&gt;# Current user&lt;/span&gt;
&lt;span class="nb"&gt;sudo command&lt;/span&gt;               &lt;span class="c"&gt;# Run as superuser&lt;/span&gt;
su - username              &lt;span class="c"&gt;# Switch user&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;755 file.sh          &lt;span class="c"&gt;# Change permissions (rwxr-xr-x)&lt;/span&gt;
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x script.sh         &lt;span class="c"&gt;# Make executable&lt;/span&gt;
&lt;span class="nb"&gt;chown &lt;/span&gt;user:group file.txt  &lt;span class="c"&gt;# Change owner&lt;/span&gt;
&lt;span class="nb"&gt;id&lt;/span&gt;                         &lt;span class="c"&gt;# Show user/group IDs&lt;/span&gt;
passwd                     &lt;span class="c"&gt;# Change password&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compression &amp;amp; Archives
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-czvf&lt;/span&gt; archive.tar.gz /folder/     &lt;span class="c"&gt;# Create compressed tarball&lt;/span&gt;
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xzvf&lt;/span&gt; archive.tar.gz              &lt;span class="c"&gt;# Extract&lt;/span&gt;
zip &lt;span class="nt"&gt;-r&lt;/span&gt; archive.zip folder/            &lt;span class="c"&gt;# Create zip&lt;/span&gt;
unzip archive.zip                     &lt;span class="c"&gt;# Extract zip&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Practical Example
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2ng20yaskfw4qhwsgsl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2ng20yaskfw4qhwsgsl.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnxjty6ipm5vpb9pto2i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnxjty6ipm5vpb9pto2i.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Linux is the essential foundation for modern data engineering. Mastery of Linux command-line skills, shell scripting, text processing, process management, and server administration is critical for building, managing, and troubleshooting data pipelines effectively.As data infrastructure grows more complex with cloud, containers, and tools like Spark, Kafka, Airflow, and Kubernetes, strong Linux knowledge provides a significant competitive edge. It enables faster automation, better problem-solving, and higher efficiency.Key Takeaway: Investing in Linux fundamentals offers one of the best returns for any data engineer. The terminal is the primary language of data platforms — master it to unlock greater productivity and career growth.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>techtalks</category>
    </item>
  </channel>
</rss>
