<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Edmund Eryuba</title>
    <description>The latest articles on DEV Community by Edmund Eryuba (@edmund_eryuba).</description>
    <link>https://dev.to/edmund_eryuba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3709646%2F1933a2c9-5b22-43f1-afba-7b59f1701990.png</url>
      <title>DEV Community: Edmund Eryuba</title>
      <link>https://dev.to/edmund_eryuba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/edmund_eryuba"/>
    <language>en</language>
    <item>
      <title>Data Management Systems: Transactional to Analytical Architectures</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Mon, 04 May 2026 07:29:01 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/data-management-systems-from-transactional-databases-to-analytical-architectures-b7j</link>
      <guid>https://dev.to/edmund_eryuba/data-management-systems-from-transactional-databases-to-analytical-architectures-b7j</guid>
      <description>&lt;p&gt;Data is no longer treated as a byproduct of business operations and has become one of the most valuable organizational assets. Every interaction on a banking application, e-commerce platform, hospital system, logistics network or social media service generates data continuously. As organizations increasingly adopt digital workflows, cloud platforms, machine learning systems and real-time applications, the amount of data being generated has grown exponentially.&lt;/p&gt;

&lt;p&gt;This rapid expansion introduces significant challenges. Organizations must ensure that data remains accurate, secure, accessible and useful while simultaneously supporting millions of users and analytical operations. Businesses are not only expected to store data efficiently, but also to transform it into meaningful insights that influence strategic decisions, operational efficiency and customer experiences.&lt;br&gt;
Modern data management exists to address these challenges.&lt;/p&gt;

&lt;p&gt;This article explores the major systems and architectures used in contemporary data management, beginning with traditional &lt;strong&gt;databases&lt;/strong&gt; and extending into modern analytical ecosystems such as &lt;strong&gt;data warehouses&lt;/strong&gt;, &lt;strong&gt;data lakes&lt;/strong&gt;, and &lt;strong&gt;data lakehouses&lt;/strong&gt;. Particular attention is given to the distinction between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) systems, since these two paradigms form the operational and analytical backbone of most modern data infrastructures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data management approaches overview:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Primary purpose&lt;/th&gt;
&lt;th&gt;Data type&lt;/th&gt;
&lt;th&gt;Query style&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Database (OLTP)&lt;/td&gt;
&lt;td&gt;Real-time transactional processing&lt;/td&gt;
&lt;td&gt;Structured&lt;/td&gt;
&lt;td&gt;Short, frequent reads/writes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database (OLAP)&lt;/td&gt;
&lt;td&gt;Analytical queries &amp;amp; reporting&lt;/td&gt;
&lt;td&gt;Structured/Semi&lt;/td&gt;
&lt;td&gt;Complex, long-running aggregations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Lake&lt;/td&gt;
&lt;td&gt;Raw data storage at scale&lt;/td&gt;
&lt;td&gt;Any (structured, semi, unstructured)&lt;/td&gt;
&lt;td&gt;Ad hoc, exploratory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Warehouse&lt;/td&gt;
&lt;td&gt;Aggregated analytics &amp;amp; BI&lt;/td&gt;
&lt;td&gt;Structured (cleaned)&lt;/td&gt;
&lt;td&gt;Pre-defined analytical queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Lakehouse&lt;/td&gt;
&lt;td&gt;Unified storage + analytics&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Both transactional &amp;amp; analytical&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Understanding Data Management
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data management&lt;/strong&gt; refers to the collection of practices, technologies and architectural strategies used to acquire, store, organize, secure, process and analyze data throughout its lifecycle.&lt;br&gt;
At its core, effective data management ensures that organizations can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliably process operational activities &lt;/li&gt;
&lt;li&gt;Maintain data consistency and integrity &lt;/li&gt;
&lt;li&gt;Support analytical and business intelligence workflows &lt;/li&gt;
&lt;li&gt;Scale systems as data volume increases &lt;/li&gt;
&lt;li&gt;Secure sensitive information &lt;/li&gt;
&lt;li&gt;Enable data-driven decision making &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The complexity of managing data arises because different types of workloads require different architectural approaches. A banking application processing financial transfers has very different requirements from a business intelligence dashboard analyzing five years of customer purchasing behavior.&lt;/p&gt;

&lt;p&gt;This distinction introduces two foundational concepts in data systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OLTP&lt;/strong&gt; systems, which handle operational transactions in real time. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OLAP&lt;/strong&gt; systems, which handle analytical processing and large-scale querying.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding the relationship between these systems provides the foundation for understanding modern data architectures.&lt;/p&gt;




&lt;h2&gt;
  
  
  OLTP Systems: Powering Real-Time Operations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Online Transaction Processing (OLTP)&lt;/strong&gt; systems are designed to manage day-to-day operational activities. These systems process large numbers of small, fast, real-time transactions while ensuring that the data remains accurate and consistent.&lt;/p&gt;

&lt;p&gt;Whenever a customer transfers money using a banking application, purchases an item online, books a flight ticket or updates a user profile, an OLTP system is involved behind the scenes.&lt;br&gt;
The defining characteristic of OLTP systems is their focus on transactional reliability. These systems must process operations quickly while supporting thousands or even millions of simultaneous users.&lt;/p&gt;

&lt;p&gt;A modern e-commerce platform provides a useful example. When a customer places an order, several operations happen almost instantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The payment is validated &lt;/li&gt;
&lt;li&gt;Inventory levels are updated &lt;/li&gt;
&lt;li&gt;The order record is created &lt;/li&gt;
&lt;li&gt;Shipping information is generated &lt;/li&gt;
&lt;li&gt;Transaction confirmations are sent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any part of the operation fails, the system must ensure that the database does not become inconsistent. For example, inventory should not decrease if payment processing fails.&lt;/p&gt;

&lt;p&gt;This reliability is achieved through the use of ACID properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomicity&lt;/strong&gt; ensures transactions fully succeed or fully fail &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; maintains valid data states &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt; prevents concurrent transactions from interfering &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durability&lt;/strong&gt; guarantees committed data persists after failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because OLTP systems prioritize speed and consistency, they commonly rely on highly structured relational databases.&lt;/p&gt;

&lt;p&gt;Popular OLTP databases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.mysql.com/" rel="noopener noreferrer"&gt;MySQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.oracle.com/" rel="noopener noreferrer"&gt;Oracle Database&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.microsoft.com/en-us/sql-server" rel="noopener noreferrer"&gt;Microsoft SQL Server&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These systems typically use normalized schemas to reduce redundancy and improve transactional integrity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Limitations of Transactional Systems
&lt;/h2&gt;

&lt;p&gt;Although OLTP systems excel at operational processing, they are not optimized for deep analytical workloads.&lt;/p&gt;

&lt;p&gt;Consider a multinational retailer attempting to answer questions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which products generated the highest revenue over the last five years? &lt;/li&gt;
&lt;li&gt;Which region experienced the fastest growth? &lt;/li&gt;
&lt;li&gt;What customer segment has the highest retention rate? &lt;/li&gt;
&lt;li&gt;Which marketing campaigns produced the best conversion rates? &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These queries often require scanning millions or billions of records and performing complex aggregations across historical datasets.&lt;/p&gt;

&lt;p&gt;Running such analytical queries directly on operational OLTP systems creates performance problems. Transactional systems are optimized for short, fast queries and not computationally intensive analytical workloads.&lt;/p&gt;

&lt;p&gt;This challenge led to the development of OLAP systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  OLAP Systems: Transforming Data into Insight
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Online Analytical Processing (OLAP)&lt;/strong&gt; systems are designed specifically for analytical workloads, reporting, forecasting and business intelligence.&lt;/p&gt;

&lt;p&gt;Unlike OLTP systems, which focus on operational speed, OLAP systems focus on extracting strategic insights from large volumes of historical data.&lt;/p&gt;

&lt;p&gt;Organizations use OLAP systems to answer complex business questions, identify patterns, predict trends, and support executive decision making.&lt;/p&gt;

&lt;p&gt;For example, a retail organization may use OLAP systems to analyze:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seasonal purchasing behavior &lt;/li&gt;
&lt;li&gt;Customer segmentation trends &lt;/li&gt;
&lt;li&gt;Revenue performance across regions &lt;/li&gt;
&lt;li&gt;Supply chain inefficiencies &lt;/li&gt;
&lt;li&gt;Long-term sales forecasting &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OLAP systems are therefore optimized for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex joins and aggregations &lt;/li&gt;
&lt;li&gt;Large-scale reads &lt;/li&gt;
&lt;li&gt;Historical data analysis &lt;/li&gt;
&lt;li&gt;Multidimensional querying &lt;/li&gt;
&lt;li&gt;High-volume analytical processing &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of storing only current operational data, OLAP systems typically maintain years of historical information aggregated from multiple operational systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Comparing OLTP and OLAP Architectures
&lt;/h2&gt;

&lt;p&gt;Although OLTP and OLAP systems both manage data, they solve fundamentally different problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OLTP vs OLAP comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;OLTP&lt;/th&gt;
&lt;th&gt;OLAP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Record operational transactions&lt;/td&gt;
&lt;td&gt;Analyze historical data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple inserts, updates, lookups&lt;/td&gt;
&lt;td&gt;Complex aggregations, multi-table joins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Volume per Query&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rows (single record)&lt;/td&gt;
&lt;td&gt;Millions to billions of rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milliseconds&lt;/td&gt;
&lt;td&gt;Seconds to minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Thousands of concurrent users&lt;/td&gt;
&lt;td&gt;Tens to hundreds of users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema Design&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Normalized (3NF)&lt;/td&gt;
&lt;td&gt;Denormalized (Star/Snowflake)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Row-oriented&lt;/td&gt;
&lt;td&gt;Column-oriented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Freshness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time (seconds)&lt;/td&gt;
&lt;td&gt;Near real-time to batch (hours/days)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Users&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application users, customers&lt;/td&gt;
&lt;td&gt;Analysts, data scientists, executives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data History&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current/recent operational state&lt;/td&gt;
&lt;td&gt;Months to years of history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup Priority&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuous, mission-critical&lt;/td&gt;
&lt;td&gt;Important, but less time-sensitive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example Systems&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MySQL, PostgreSQL, Oracle&lt;/td&gt;
&lt;td&gt;Snowflake, BigQuery, Redshift&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An airline reservation system is an example of an OLTP environment because it processes live ticket bookings continuously. A business intelligence dashboard analyzing global travel trends over ten years represents an OLAP workload.&lt;/p&gt;

&lt;p&gt;This architectural separation is essential because attempting to optimize a single system for both operational and analytical workloads often leads to poor performance in both areas.&lt;/p&gt;

&lt;p&gt;As organizations matured technologically, they began building specialized systems dedicated to analytics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Warehouses: Centralized Analytical Repositories
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;data warehouse&lt;/strong&gt; is a centralized system designed to support OLAP workloads by consolidating data from multiple operational sources into a structured analytical environment.&lt;/p&gt;

&lt;p&gt;Data warehouses allow organizations to combine information from different departments and systems into a unified repository for analysis and reporting.&lt;/p&gt;

&lt;p&gt;Instead of querying live transactional systems directly, analysts query the warehouse.&lt;br&gt;
This approach improves both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Operational system performance &lt;/li&gt;
&lt;li&gt;Analytical query efficiency &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data warehouses commonly support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Executive dashboards &lt;/li&gt;
&lt;li&gt;Business intelligence tools &lt;/li&gt;
&lt;li&gt;Financial reporting &lt;/li&gt;
&lt;li&gt;KPI monitoring &lt;/li&gt;
&lt;li&gt;Predictive analytics &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data is typically moved into warehouses through ETL or ELT pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ETL (Extract, Transform, Load)&lt;/strong&gt;: Data is extracted, cleaned, transformed, and then loaded into the warehouse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ELT (Extract, Load, Transform)&lt;/strong&gt;: Raw data is loaded first and transformed within the warehouse itself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern cloud data warehouses include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/bigquery?gad_campaignid=23729599437" rel="noopener noreferrer"&gt;Google BigQuery&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/redshift/" rel="noopener noreferrer"&gt;Amazon Redshift&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.snowflake.com/en/" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://azure.microsoft.com/en-us/products/synapse-analytics" rel="noopener noreferrer"&gt;Azure Synapse Analytics&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Data Lakes: Managing Raw and Large-Scale Data
&lt;/h2&gt;

&lt;p&gt;As organizations began collecting increasingly diverse forms of data&lt;br&gt;
such as logs, multimedia, IoT streams and machine learning datasets, traditional warehouses became insufficient for certain workloads.&lt;br&gt;
This led to the emergence of data lakes.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;data lake&lt;/strong&gt; is a large-scale storage environment capable of storing raw data in its original format without requiring immediate transformation.&lt;br&gt;
Unlike warehouses, which impose predefined schemas, data lakes often use a schema-on-read approach, meaning structure is applied later during analysis.&lt;/p&gt;

&lt;p&gt;Data lakes are particularly useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Machine learning workloads &lt;/li&gt;
&lt;li&gt;Streaming data ingestion &lt;/li&gt;
&lt;li&gt;Scientific research &lt;/li&gt;
&lt;li&gt;IoT ecosystems &lt;/li&gt;
&lt;li&gt;Large-scale log analytics &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common technologies associated with data lakes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://hadoop.apache.org/" rel="noopener noreferrer"&gt;Apache Hadoop&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://spark.apache.org/" rel="noopener noreferrer"&gt;Apache Spark&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/s3/" rel="noopener noreferrer"&gt;Amazon S3&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While highly scalable and flexible, early data lakes often suffered from governance and quality-control problems, resulting in poorly organized “data swamps.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Lakehouses: Bridging Operational Flexibility and Analytics
&lt;/h2&gt;

&lt;p&gt;To overcome the limitations of both warehouses and data lakes, modern architectures increasingly adopt the concept of the &lt;strong&gt;data lakehouse&lt;/strong&gt;.&lt;br&gt;
A data lakehouse combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The scalability and flexibility of data lakes &lt;/li&gt;
&lt;li&gt;The governance and analytical performance of warehouses &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lakehouses support both business intelligence and machine learning workloads within a unified architecture.&lt;br&gt;
They introduce features such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ACID transactions &lt;/li&gt;
&lt;li&gt;Metadata governance &lt;/li&gt;
&lt;li&gt;Versioned datasets &lt;/li&gt;
&lt;li&gt;High-performance querying &lt;/li&gt;
&lt;li&gt;Open storage formats &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Popular lakehouse technologies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://delta.io/" rel="noopener noreferrer"&gt;Delta Lake&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://iceberg.apache.org/" rel="noopener noreferrer"&gt;Apache Iceberg&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hudi.apache.org/" rel="noopener noreferrer"&gt;Apache Hudi&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architectural evolution reflects the growing need for unified platforms capable of supporting increasingly complex data ecosystems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rise of Integrated Data Architectures
&lt;/h2&gt;

&lt;p&gt;Modern organizations rarely rely on a single data system. Instead, they build interconnected ecosystems where different technologies handle different responsibilities.&lt;/p&gt;

&lt;p&gt;A modern architecture may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OLTP databases for operational processing &lt;/li&gt;
&lt;li&gt;Streaming platforms for real-time ingestion &lt;/li&gt;
&lt;li&gt;Data lakes for raw storage &lt;/li&gt;
&lt;li&gt;Warehouses for analytics &lt;/li&gt;
&lt;li&gt;Lakehouses for unified workloads &lt;/li&gt;
&lt;li&gt;Business intelligence tools for reporting &lt;/li&gt;
&lt;li&gt;Machine learning platforms for predictive modeling &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Workflow orchestration platforms such as Apache Airflow help coordinate these pipelines and automate data movement across systems.&lt;/p&gt;

&lt;p&gt;This layered architecture enables organizations to process operational workloads efficiently while simultaneously extracting strategic insights from historical and large-scale data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Modern data management is fundamentally about balancing operational efficiency with analytical capability.&lt;/p&gt;

&lt;p&gt;OLTP systems ensure that real-time business operations remain fast, reliable, and consistent. OLAP systems transform accumulated data into strategic insight through large-scale analysis and reporting. Data warehouses centralize structured analytical workloads, data lakes enable flexible large-scale storage and lakehouses attempt to unify both worlds into a scalable modern architecture.&lt;/p&gt;

&lt;p&gt;As organizations continue generating unprecedented amounts of data, the ability to design and manage these interconnected systems becomes increasingly critical. Businesses that successfully integrate transactional reliability with analytical intelligence gain not only operational stability, but also the strategic advantage necessary to compete in a data-driven world.&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>datastructures</category>
      <category>data</category>
    </item>
    <item>
      <title>Automating Data Workflows with Apache Airflow</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Mon, 27 Apr 2026 14:41:22 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/automating-data-workflows-with-apache-airflow-1llg</link>
      <guid>https://dev.to/edmund_eryuba/automating-data-workflows-with-apache-airflow-1llg</guid>
      <description>&lt;p&gt;As organizations become increasingly data-driven, the scale of their pipelines has grown from modest daily batches to continuous, high-volume streams. What appears to be overwhelming complexity is, in practice, a matter of structure and discipline; imposed through the right tools. Apache Airflow embodies this principle as a batch-oriented orchestration framework, enabling the construction of scheduled, reliable data pipelines in Python while seamlessly integrating the diverse technologies that define modern data ecosystems.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Airflow actually does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="(https://airflow.apache.org)"&gt;Apache Airflow&lt;/a&gt; is an open source tool used to write, schedule, and manage workflows as code. Whenever you have actions that depend on one another and must be performed in a specific order, you can define them as a workflow in Airflow.&lt;/p&gt;

&lt;p&gt;Workflows in Airflow are modelled as &lt;strong&gt;DAGs (Directed Acyclic Graphs)&lt;/strong&gt;. A DAG is simply a collection of tasks with defined dependencies between them. The "acyclic" part just means there are no circular loops; Task A might trigger Task B, but Task B can never circle back and trigger Task A. This constraint keeps pipelines predictable and debuggable.&lt;/p&gt;

&lt;p&gt;At its core, Airflow does three things well: it defines workflows as DAGs, schedules and executes tasks on a timeline and tracks state and dependencies so you always know what ran, when, and whether it succeeded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting up Airflow on Linux
&lt;/h2&gt;

&lt;p&gt;A stable Airflow deployment starts with a clean Python environment. Skipping this step is a common source of dependency conflicts down the road.&lt;/p&gt;

&lt;p&gt;Start by creating and activating a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv airflow_env
&lt;span class="nb"&gt;source &lt;/span&gt;airflow_env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then install Airflow and initialise its metadata database. This is the internal database Airflow uses to track DAG runs, task states, and logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;apache-airflow
airflow db init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once that's done, start the webserver and scheduler as separate processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow webserver &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
airflow scheduler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The web UI will be available at &lt;code&gt;http://localhost:8080&lt;/code&gt;. If this is a fresh installation, you'll need to create an admin user before you can log in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow &lt;span class="nb"&gt;users &lt;/span&gt;create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--firstname&lt;/span&gt; Admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lastname&lt;/span&gt; User &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; Admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Getting your DAG to appear in the UI
&lt;/h2&gt;

&lt;p&gt;A surprisingly common stumbling block is writing a DAG that simply doesn't show up in the Airflow interface. If that happens to you, run through this checklist before assuming something deeper is wrong.&lt;/p&gt;

&lt;p&gt;Your DAG file must live in the &lt;code&gt;~/airflow/dags/&lt;/code&gt; directory, have a &lt;code&gt;.py&lt;/code&gt; extension, contain no syntax errors and instantiate the DAG object at the &lt;strong&gt;global scope&lt;/strong&gt; of the file (not inside a function). After placing the file correctly, restart the scheduler and refresh the UI. The scheduler needs to re-parse the DAGs folder before new files become visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Designing the ETL pipeline
&lt;/h2&gt;

&lt;p&gt;The pipeline in this guide follows a classic three-stage structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extract&lt;/strong&gt; — fetch stock data from an external API (such as Polygon.io)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transform&lt;/strong&gt; — clean and reshape the data using Pandas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt; — write the results to a PostgreSQL database&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  TaskFlow API vs. Operators and XComs
&lt;/h3&gt;

&lt;p&gt;Airflow offers two styles for wiring tasks together. The &lt;strong&gt;TaskFlow API&lt;/strong&gt; uses Python decorators (&lt;code&gt;@task&lt;/code&gt;) and handles data passing automatically; it's clean and concise, but abstracts away some of what's happening under the hood. &lt;strong&gt;Operators with XComs&lt;/strong&gt;, by contrast, give you explicit control over how data flows between tasks.&lt;/p&gt;

&lt;p&gt;XCom (short for "cross-communication") is Airflow's built-in message bus for passing small pieces of data between tasks. Here's how the pattern looks across all three stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_api_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raw_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Transform
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raw_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;clean_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Load
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;clean_data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One important caveat: &lt;strong&gt;XCom is designed for small payloads only.&lt;/strong&gt; If you're passing large DataFrames between tasks, store them in external storage (an S3 bucket or a staging table) and pass only a reference through XCom.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting to PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Airflow manages external connections through its Connections store. Add your PostgreSQL connection via the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow connections add postgres_default &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--conn-uri&lt;/span&gt; &lt;span class="s2"&gt;"postgresql+psycopg2://airflow_user:password@localhost:5432/stocks_db"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In your DAG, reference this connection by ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;PostgresHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgres_conn_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres_default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The connection ID in your DAG must match exactly what you registered in Airflow's Connections store. If they don't match, the task will fail with a connection error — and the error message won't always make it obvious why.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fixing database permission errors
&lt;/h2&gt;

&lt;p&gt;If your pipeline fails with a message like &lt;code&gt;ERROR: permission denied for schema public&lt;/code&gt;, the issue is almost certainly that your database user lacks the necessary privileges. Fix it by granting the required permissions in PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;airflow_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;airflow_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is easy to miss because the user might be able to connect to the database but still be blocked from creating or writing to tables in the public schema.&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling API constraints
&lt;/h2&gt;

&lt;p&gt;When fetching data from an external API, you may encounter a &lt;code&gt;403 NOT_AUTHORIZED&lt;/code&gt; error even when your credentials are correct. This usually means your API subscription doesn't cover the time range you're requesting. Many free or basic tiers restrict historical data access.&lt;/p&gt;

&lt;p&gt;A simple fix is to narrow your query window to recent data only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need broader historical access, you'll need to upgrade your API plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scheduling and time management
&lt;/h2&gt;

&lt;p&gt;Airflow operates entirely in &lt;strong&gt;UTC internally&lt;/strong&gt;, which is worth keeping in mind when debugging timing issues. When defining a DAG, always specify &lt;code&gt;start_date&lt;/code&gt;, &lt;code&gt;schedule_interval&lt;/code&gt;, and &lt;code&gt;catchup&lt;/code&gt; explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@hourly&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;catchup&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;catchup=False&lt;/code&gt; is particularly important. Without it, Airflow will attempt to backfill all the runs it "missed" between &lt;code&gt;start_date&lt;/code&gt; and now, which can trigger dozens or hundreds of unexpected runs the first time you enable a DAG.&lt;/p&gt;




&lt;h2&gt;
  
  
  The DAG execution lifecycle
&lt;/h2&gt;

&lt;p&gt;Understanding what happens when a DAG runs helps enormously when something goes wrong. The sequence is: the scheduler parses your DAG file, creates a DAG Run, queues the individual tasks, hands them to the executor, and then updates task states (success, failed, retrying) as they complete.&lt;/p&gt;

&lt;p&gt;You can configure retry behaviour in the &lt;code&gt;default_args&lt;/code&gt; dictionary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;default_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retry_delay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means any failed task will automatically retry twice, with a three-minute gap between attempts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability and Debugging
&lt;/h2&gt;

&lt;p&gt;Airflow's UI gives you a clear view into what's happening at every level. The &lt;strong&gt;Graph View&lt;/strong&gt; shows task dependencies at a glance, the &lt;strong&gt;Tree View&lt;/strong&gt; shows run history over time, and clicking into any task gives you access to its full execution logs.&lt;/p&gt;

&lt;p&gt;You can also pull logs directly from the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow tasks logs &amp;lt;dag_id&amp;gt; &amp;lt;task_id&amp;gt; &amp;lt;execution_date&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When debugging a failure, always start with the task logs. They usually tell you exactly what went wrong, whether it's a Python exception, a connection error, or an API rejection.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common failure points at a glance
&lt;/h2&gt;

&lt;p&gt;Most issues in an Airflow workflow fall into a handful of categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DAG not appearing in UI&lt;/strong&gt; — wrong directory, syntax error in the file, or the scheduler hasn't been restarted since the file was added.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection issues&lt;/strong&gt; — mismatched connection ID between your DAG code and Airflow's Connections store, or incorrect credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database permission errors&lt;/strong&gt; — the database user hasn't been granted the right privileges on the target schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API failures&lt;/strong&gt; — rate limiting, subscription restrictions, or requesting data outside the allowed time window.&lt;/p&gt;




&lt;h2&gt;
  
  
  Moving toward production
&lt;/h2&gt;

&lt;p&gt;Running Airflow locally on SQLite is fine for development, but not suitable for production use. When you're ready to take things further, consider these changes:&lt;/p&gt;

&lt;p&gt;Swap SQLite for &lt;strong&gt;PostgreSQL or MySQL&lt;/strong&gt; as Airflow's metadata database — SQLite has locking issues under concurrent load. Move from the default &lt;code&gt;SequentialExecutor&lt;/code&gt; to a &lt;strong&gt;CeleryExecutor&lt;/strong&gt; or &lt;strong&gt;KubernetesExecutor&lt;/strong&gt; to run tasks in parallel. Add a &lt;strong&gt;remote logging backend&lt;/strong&gt; (S3, Elasticsearch) so logs persist beyond the local machine. And implement &lt;strong&gt;secrets management&lt;/strong&gt; — environment variables, HashiCorp Vault, or Airflow's built-in Secrets Backend — rather than storing credentials in plain text.&lt;/p&gt;




&lt;h2&gt;
  
  
  Insight
&lt;/h2&gt;

&lt;p&gt;At a high level, building this pipeline involved: setting up a Linux environment with Airflow, configuring the metadata database, writing a DAG using Operators and XComs, integrating an external stock data API, transforming the results with Pandas, loading into PostgreSQL with the right permissions, wiring up Airflow connections, and monitoring everything through the scheduler and web UI.&lt;/p&gt;

&lt;p&gt;The patterns used here include modular tasks, explicit data flow, external system integration, scheduled automation and observability. These are essential procedures that are also applicable at production-level data pipelines. The choice to use Operators over the TaskFlow API gives one a deeper understanding of how Airflow manages execution and state, which pays dividends when things inevitably break in unexpected ways.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>airflow</category>
      <category>database</category>
      <category>etl</category>
    </item>
    <item>
      <title>ETL vs ELT: Two Paradigms, One Goal</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:09:58 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/etl-vs-elt-two-paradigms-one-goal-12fc</link>
      <guid>https://dev.to/edmund_eryuba/etl-vs-elt-two-paradigms-one-goal-12fc</guid>
      <description>&lt;p&gt;What are the similarities, differences, benefits and use cases of ELT and ETL.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipelines at a glance
&lt;/h2&gt;

&lt;p&gt;ELT &lt;em&gt;(extract, load, transform)&lt;/em&gt; and ETL &lt;em&gt;(extract, transform, load)&lt;/em&gt; are both data integration processes that move raw data from a source system to a target database. These data sources can be in multiple, different repositories or in legacy systems that are then transferred using ELT or ETL to a target data location. Both approaches move data from source to destination but where &lt;em&gt;&lt;strong&gt;transformation&lt;/strong&gt;&lt;/em&gt; happens changes everything about cost, speed, flexibility, and the kind of analytics you can build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ELT (extract, load, transform)?
&lt;/h2&gt;

&lt;p&gt;With ELT, unstructured data is extracted from a source system and loaded onto a target system to be transformed later, as needed. This unstructured, extracted data is made available to business intelligence systems, and there is no need for data staging. &lt;/p&gt;

&lt;p&gt;ELT leverages data warehousing to do basic data transformations, such as data validation or removal of duplicated data. These processes are updated in real-time and used for large amounts of raw data. ELT is a newer process that has not reached its full potential compared to its older sister, ETL. The ELT process was originally based on hard-coded SQL scripts. Those SQL scripts are more likely to have potential coding errors than the more advanced methods used in ETL.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ETL (extract, transform, load)?
&lt;/h2&gt;

&lt;p&gt;With ETL, unstructured data is extracted from a source system and specific data points and potential “keys” are identified prior to loading data into the target systems. &lt;/p&gt;

&lt;p&gt;In a traditional ETL scenario, the source data is extracted to a staging area and moved into the target system. In the staging area, the data undergoes a transformation process that organizes and cleans all data types. This transformation process allows for the now structured data to be compatible with the target data storage systems. &lt;/p&gt;

&lt;h3&gt;
  
  
  Where they share common ground
&lt;/h3&gt;

&lt;p&gt;Despite their differences, the two patterns share a substantial foundation; both solve the same fundamental problem of moving heterogeneous data into a single analytical store.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data integration:&lt;/strong&gt; Both consolidate data from multiple disparate sources into a unified destination&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transformation necessity:&lt;/strong&gt; Neither skips transformation, they only differ in when and where it happens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration:&lt;/strong&gt; Both require scheduling, dependency management, and error-handling tooling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data quality concerns:&lt;/strong&gt; Validation, deduplication, and schema enforcement are needed in both patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Lineage tracking, logging, and alerting are essential regardless of order.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where they diverge
&lt;/h3&gt;

&lt;p&gt;The key differences between ELT and ETL are the order of operations between the two processes that make them uniquely suited for different situations. Other differences are in data size and data types that each process can handle. &lt;/p&gt;

&lt;h4&gt;
  
  
  Performance and scalability
&lt;/h4&gt;

&lt;p&gt;ETL pipelines transform data on a dedicated server whose capacity is fixed, when data volumes spike, you hit a ceiling. ELT offloads transformation to cloud warehouses like BigQuery, Snowflake, or Redshift, which scale compute horizontally on demand. For organizations processing billions of rows, ELT's elastic compute model is a significant structural advantage.&lt;/p&gt;

&lt;h4&gt;
  
  
  Data freshness and raw access
&lt;/h4&gt;

&lt;p&gt;Because ELT loads raw data first, analysts retain access to the original source records. If a transformation rule is wrong, you can fix it and rerun without re-ingesting from the source. With ETL, if data was dropped or transformed before loading, it may be gone permanently, making reruns more expensive and re-extraction from source systems often necessary.&lt;/p&gt;

&lt;h4&gt;
  
  
  Compliance and sensitivity
&lt;/h4&gt;

&lt;p&gt;ETL's pre-load transformation gives security teams an easier lever: strip or mask personally identifiable information before it ever enters the warehouse. ELT stores raw, potentially sensitive data in the destination system, demanding robust row-level security, column masking, and access policies inside the warehouse itself. It is manageable, but a larger governance surface area.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of ELT and ETL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ELT Strengths
&lt;/h3&gt;

&lt;p&gt;The ELT approach enables faster implementation than the ETL process, though the data is messy once it is moved. The transformation occurs after the load function, preventing the migration slowdown that can occur during this process. ELT decouples the transformation and load stages, ensuring that a coding error (or other error in the transformation stage) does not halt the migration effort. Additionally, ELT avoids server scaling issues by using the processing power and size of the data warehouse to enable transformation (or scalable computing) on a large scale. ELT also works with cloud data warehouse solutions to support structured, unstructured, semi-structured and raw data types.&lt;/p&gt;

&lt;h3&gt;
  
  
  ETL Strengths
&lt;/h3&gt;

&lt;p&gt;ETL takes longer to implement but results in cleaner data. This process is well suited for smaller target data repositories that require less frequent updating. ETL also works with cloud data warehouses by using cloud-based SaaS platforms and onsite data warehouses.&lt;br&gt;
There are also many open-source and commercial ETL tools with capabilities and benefits that include the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comprehensive automation and ease-of-use functions that can automate the entire data flow and make recommendations on rules for the extract, transform and load process.&lt;/li&gt;
&lt;li&gt;A visual drag-and-drop interface used for specifying rules and data flows.&lt;/li&gt;
&lt;li&gt;Support for complex data management to assist with complex calculations, data integrations and string manipulation.&lt;/li&gt;
&lt;li&gt;Security and compliance that encrypt sensitive data and are certified compliant with industry or government regulations. This provides a more secure way to encrypt, remove or mask specific data fields to protect client’s privacy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ELT
&lt;/h3&gt;

&lt;p&gt;An ELT process is best used in high-volume data sets or real-time data use environments. Specific examples include the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-volume environments:&lt;/strong&gt; Meteorological systems like weather services collect, collate and use large amounts of data on a regular basis. Businesses with large transaction volumes also fall into this category. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-native data platforms:&lt;/strong&gt; Using scalable warehouses such as Snowflake, Databricks, and BigQuery that leverage microservices, containerization, and distributed storage enabling modern and flexible analytics architectures..&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time ingestion systems:&lt;/strong&gt; Stock exchanges generate and use large amounts of data in real-time, where delays can be harmful. Additionally, large-scale distributors of materials and components need real-time access to current data for business intelligence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ETL
&lt;/h2&gt;

&lt;p&gt;ETL is best used for synchronizing several data use environments and migrating data from legacy systems. The following are some specific examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need for data synchronization from several sources:&lt;/strong&gt; Companies that are merging their ventures may have multiple consumers, supplies and partners in common. This data can be stored in separate data repositories and formatted differently. ETL works to transform the data in a unified format before loading it onto the target data location.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updating and migrating data from legacy systems:&lt;/strong&gt; The legacy systems require the ETL process to transform the data into a compatible format with the new structure of the target database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The verdict
&lt;/h2&gt;

&lt;p&gt;ETL predates the cloud era and remains the right choice when the destination system cannot bear transformation workloads, or when sensitive data must be sanitized before storage. ELT has become the dominant pattern for modern data teams precisely because cloud warehouses turned transformation into a solved compute problem; cheap, fast, and version-controlled through SQL.&lt;/p&gt;

&lt;p&gt;In practice, many mature data platforms run both: ELT for the bulk of analytical pipelines, and targeted ETL steps where compliance or system constraints demand it. Understanding the tradeoffs of each and not treating one as universally superior is what separates thoughtful data architecture from following a trend.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>datascience</category>
      <category>database</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Connecting Power BI to a SQL Database (PostgreSQL)</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Fri, 13 Mar 2026 17:16:20 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/connecting-power-bi-to-a-sql-database-postgresql-4lk1</link>
      <guid>https://dev.to/edmund_eryuba/connecting-power-bi-to-a-sql-database-postgresql-4lk1</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Power Query is a data transformation and data preparation engine. Power Query comes with a graphical interface for getting data from sources and a Power Query editor for applying transformations. Because the engine is available in many products and services, the destination where the data is stored depends on where Power Query is used. Using Power Query, you can perform the extract, transform, and load (ETL) processing of data.&lt;/p&gt;

&lt;p&gt;Microsoft Power BI is a comprehensive Business intelligence platform that uses Power Query to ingest data, then adds modelling, visualization, and sharing capabilities. Power Query preps the data, while Power BI turns it into reports&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4zfdiwey7xt5bajqhav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4zfdiwey7xt5bajqhav.png" alt="Power Query" width="739" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why use PostgreSQL?
&lt;/h3&gt;

&lt;p&gt;Maintaining dynamic database systems is critical in today’s digital landscape, especially considering the rate in which newer technologies emerge. PostgreSQL is &lt;strong&gt;expandable&lt;/strong&gt; and &lt;strong&gt;versatile&lt;/strong&gt; so it can quickly support a variety of specialized use cases with powerful extension ecosystem, which covers things from time-series data types to geospatial analytics.&lt;/p&gt;

&lt;p&gt;Its versatile and approachable design makes PostgreSQL a “&lt;em&gt;one-size-fits-all&lt;/em&gt;” solution for many enterprises looking for cost-effective and efficient ways to improve their database management systems.&lt;/p&gt;

&lt;p&gt;Built as an &lt;strong&gt;open-source&lt;/strong&gt; database solution, PostgreSQL is completely free from licensing restrictions, vendor lock-in potential, or the risk of over-deployment. &lt;/p&gt;

&lt;p&gt;Expert developers and commercial enterprises who understand the limitations of traditional database systems heavily support PostgreSQL. They work diligently to provide a battle-tested, best-of-breed relational database management system.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Power Query helps with data acquisition
&lt;/h3&gt;

&lt;p&gt;Business users spend up to 80 percent of their time on data preparation, which delays the work of analysis and decision-making. Several challenges contribute to this situation and Power Query helps address many of them.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enables &lt;strong&gt;wide range connectivity&lt;/strong&gt; of data sources, including data of all sizes and shapes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; of experience, and parity of query capabilities over all data sources.&lt;/li&gt;
&lt;li&gt;Highly &lt;strong&gt;interactive&lt;/strong&gt; and &lt;strong&gt;intuitive&lt;/strong&gt; experience for rapidly and iteratively building queries over any data source, of any size.&lt;/li&gt;
&lt;li&gt;When using Power Query to access and transform data, you define a &lt;strong&gt;repeatable process&lt;/strong&gt; (query) that can be easily refreshed in the future to get up-to-date data. &lt;/li&gt;
&lt;li&gt;Power Query offers the ability to &lt;strong&gt;work against a subset of the entire data set&lt;/strong&gt; to define the required data transformations, allowing you to easily filter down and transform your data to a manageable size.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Power Query experiences
&lt;/h3&gt;

&lt;p&gt;The Power Query user experience is provided through the Power Query editor user interface. The goal of this interface is to help you apply the transformations you need simply by interacting with a user-friendly set of ribbons, menus, buttons, and other interactive components.&lt;/p&gt;

&lt;p&gt;The Power Query editor is the primary data preparation experience. In the editor, you can connect to a wide range of data sources and apply hundreds of different data transformations by previewing data and selecting transformations from the UI. These data transformation capabilities are common across all data sources, whatever the underlying data source limitations.&lt;/p&gt;

&lt;p&gt;When you create a new transformation step by interacting with the components of the Power Query interface, Power Query automatically creates the M code required to do the transformation so you don't need to write any code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Power BI to a Local PostgreSQL Database
&lt;/h2&gt;

&lt;p&gt;Connecting Microsoft Power BI to an SQL database allows you to import data and build dashboards directly from your database tables. The exact steps depend slightly on the SQL engine (e.g., PostgreSQL, MySQL, or Microsoft SQL Server), but the workflow is mostly the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Get data in Power BI Desktop
&lt;/h3&gt;

&lt;p&gt;Launch Power BI Desktop. On the &lt;em&gt;Home screen&lt;/em&gt; you will see options for selecting a data source or start with a blank report. Click on the &lt;strong&gt;Blank report&lt;/strong&gt; option to be directed to the &lt;strong&gt;Home&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fyalt0t18usb0p1k8sx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fyalt0t18usb0p1k8sx.png" alt="Blank Report" width="765" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Power BI Desktop, you can directly select an Excel worksheet, a Power BI semantic model, a SQL server database, direct data entry, Dataverse data or recently used data source without using the &lt;strong&gt;Get data&lt;/strong&gt; option. &lt;/p&gt;

&lt;p&gt;From the &lt;strong&gt;Data&lt;/strong&gt; ribbon, selecting &lt;strong&gt;Get Data&lt;/strong&gt; provides additional methods to select the desired connector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhatzo8txhe55e529p5y6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhatzo8txhe55e529p5y6.png" alt="Get" width="453" height="656"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select the More option which opens a Get Data window that contains a complete list of available connectors. &lt;/p&gt;

&lt;h3&gt;
  
  
  2. Connection settings
&lt;/h3&gt;

&lt;p&gt;After clicking Get Data: Choose Database &amp;gt; Select PostgreSQL Database &amp;gt; Click Connect&lt;/p&gt;

&lt;p&gt;Power BI includes a built-in PostgreSQL connector that allows direct communication with PostgreSQL servers.&lt;/p&gt;

&lt;p&gt;Scroll to &lt;strong&gt;PostgeSQL database&lt;/strong&gt;, select the option and click on &lt;strong&gt;Connect&lt;/strong&gt; to close the window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqte3w9kyujju6ck8fm25.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqte3w9kyujju6ck8fm25.png" alt="Get Data" width="800" height="772"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;PostgreSQL database&lt;/strong&gt; dialog that appears, provide the name of the server and database.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsclxvfg3weru0k1tczj2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsclxvfg3weru0k1tczj2.png" alt="PgSQL" width="593" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select either the  &lt;strong&gt;Import&lt;/strong&gt; or &lt;strong&gt;DirectQuery&lt;/strong&gt; data connectivity mode. Use Import for snapshots or DirectQuery for live data.&lt;/p&gt;

&lt;p&gt;Power BI allows you to optionally include a SQL query in the &lt;strong&gt;Advanced Options&lt;/strong&gt; section if you want to retrieve only specific data from the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Authentication Credentials
&lt;/h3&gt;

&lt;p&gt;If you're connecting to this database for the first time, select the authentication type you want to use, and then enter your credentials. The authentication types available are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database (Username and password)&lt;/li&gt;
&lt;li&gt;Microsoft account (Microsoft Entra ID)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx20kpd0fdc07tyb0i9io.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx20kpd0fdc07tyb0i9io.png" alt="Authenticate" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These credentials must match the PostgreSQL database user account.&lt;/p&gt;

&lt;p&gt;Once authenticated, Power BI establishes a connection to the PostgreSQL server and retrieves metadata about the available tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data preview
&lt;/h3&gt;

&lt;p&gt;The goal of the data preview stage is to provide you with a user-friendly way to preview and select your data. &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Navigator&lt;/strong&gt; lists all tables in the database, allowing one to preview each table, select multiple tables or load them directly. Either select &lt;strong&gt;Load&lt;/strong&gt; to load the data or &lt;strong&gt;Transform Data&lt;/strong&gt; to continue transforming the data in Power Query editor. &lt;/p&gt;

&lt;p&gt;The data preview pane on the right side of the window shows a preview of the data from the object you selected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Power BI to a Cloud PostgreSQL Database (Aiven)
&lt;/h2&gt;

&lt;p&gt;Organizations often host databases in the cloud rather than on local machines. One example is Aiven, which provides managed PostgreSQL services.&lt;/p&gt;

&lt;p&gt;Connecting Power BI to a cloud PostgreSQL database is similar to connecting to a local database, but additional security settings are required.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Obtain Connection Details from Aiven
&lt;/h3&gt;

&lt;p&gt;Get Connection Details: Log in to your Aiven Web Console and navigate to your PostgreSQL service to find the Host, Port, Database Name, and Username/Password.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6gy7sz8agsxblvc2f8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6gy7sz8agsxblvc2f8j.png" alt="Aiven" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Download the SSL Certificate
&lt;/h3&gt;

&lt;p&gt;Most cloud database providers require encrypted connections for security reasons. Aiven provides an SSL certificate that ensures secure communication between Power BI and the database.&lt;/p&gt;

&lt;p&gt;From the Aiven console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to Connection Information&lt;/li&gt;
&lt;li&gt;Download the CA Certificate&lt;/li&gt;
&lt;li&gt;Save the certificate file on your computer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;SSL encryption ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data transmitted over the internet cannot be intercepted&lt;/li&gt;
&lt;li&gt;Authentication of the database server&lt;/li&gt;
&lt;li&gt;Secure communication between the client and server&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Connect Using Power BI
&lt;/h3&gt;

&lt;p&gt;Open Power BI: Open Power BI Desktop, click Get Data on the Home ribbon, and select More....&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmf8vhnw56s7gdmfnkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxmmf8vhnw56s7gdmfnkh.png" alt="More" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Select connector:&lt;/strong&gt; Choose Database &amp;gt; PostgreSQL database and click Connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe8xf06a2hq2bxr8dhwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe8xf06a2hq2bxr8dhwi.png" alt="Datag" width="707" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enter Credentials:&lt;/strong&gt; Provide the server and database name, your Aiven username and password and enable SSL if required. Power BI will use the certificate to verify the database server and establish a secure connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdzvk4ffv6wi9agw0ew2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdzvk4ffv6wi9agw0ew2.png" alt="pgsql db" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Loading Data and Creating Relationships
&lt;/h3&gt;

&lt;p&gt;After connecting to the database, the tables are loaded into Power BI’s data model.&lt;/p&gt;

&lt;p&gt;Power BI automatically detects relationships based on matching column names such as &lt;code&gt;customer_id&lt;/code&gt; or &lt;code&gt;product_id&lt;/code&gt;. &lt;br&gt;
However, relationships can also be created manually.&lt;/p&gt;

&lt;p&gt;These relationships form a data model, which defines how tables interact with each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Power BI is a powerful business intelligence platform that allows organizations to transform raw data into meaningful insights. By connecting Power BI to SQL databases such as PostgreSQL, companies can access large datasets, build interactive dashboards and make data-driven decisions.&lt;/p&gt;

&lt;p&gt;Connecting to a local PostgreSQL database involves selecting the PostgreSQL connector in Power BI, entering the server and database details, authenticating with credentials, and loading tables into the Power BI model. When connecting to cloud databases such as those hosted on Aiven, additional security measures such as SSL certificates ensure that the connection is encrypted and secure.&lt;/p&gt;

&lt;p&gt;Once the data is loaded, Power BI allows analysts to create relationships between tables, forming a structured data model that supports accurate analysis. Strong SQL skills further enhance a Power BI analyst’s ability to retrieve, filter, aggregate, and prepare data efficiently before building reports.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>dataengineering</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Designing Efficient Queries with SQL Joins and Window Functions</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Mon, 02 Mar 2026 11:08:12 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/designing-efficient-queries-with-sql-joins-and-window-functions-447k</link>
      <guid>https://dev.to/edmund_eryuba/designing-efficient-queries-with-sql-joins-and-window-functions-447k</guid>
      <description>&lt;p&gt;SQL(&lt;em&gt;Structured Query Language&lt;/em&gt;) is a powerful tool to search through large amounts of data and return specific information for analysis. Learning SQL is crucial for anyone aspiring to be a data analyst, data engineer, or data scientist, and helpful in many other fields such as web development or marketing.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL Joins
&lt;/h2&gt;

&lt;p&gt;JOINS in SQL are commands which are used to combine rows from two or more tables, based on a related column between those tables. They are predominantly used when a user is trying to extract data from tables which have one-to-many or many-to-many relationships between them.&lt;/p&gt;

&lt;p&gt;There are mainly four types of joins that you need to understand. They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;(INNER) JOIN&lt;/li&gt;
&lt;li&gt;LEFT (OUTER) JOIN&lt;/li&gt;
&lt;li&gt;RIGHT (OUTER) JOIN&lt;/li&gt;
&lt;li&gt;FULL (OUTER) JOIN&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  INNER JOIN
&lt;/h3&gt;

&lt;p&gt;INNER JOIN is used to retrieve rows where matching values exist in both tables. It helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combining records based on a related column.&lt;/li&gt;
&lt;li&gt;Returning only matching rows from both tables.&lt;/li&gt;
&lt;li&gt;Excluding non-matching data from the result set.&lt;/li&gt;
&lt;li&gt;Ensuring accurate data relationships between tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu2197m57t1949wpkufx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu2197m57t1949wpkufx.png" alt="INNER Join" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  LEFT JOIN
&lt;/h3&gt;

&lt;p&gt;LEFT JOIN is used to retrieve all rows from the left table and matching rows from the right table. It helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returning all records from the left table.&lt;/li&gt;
&lt;li&gt;Showing matching data from the right table.&lt;/li&gt;
&lt;li&gt;Displaying NULL values where no match exists in the right table.&lt;/li&gt;
&lt;li&gt;Performing outer joins, also known as LEFT OUTER JOIN.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
LEFT JOIN right_table
ON left_table.id = right_table.id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez41dby5s8essarrtq8k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez41dby5s8essarrtq8k.png" alt="Left Join" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RIGHT JOIN
&lt;/h3&gt;

&lt;p&gt;RIGHT JOIN is used to retrieve all rows from the right table and the matching rows from the left table. It helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returning all records from the right-side table.&lt;/li&gt;
&lt;li&gt;Showing matching data from the left-side table.&lt;/li&gt;
&lt;li&gt;Displaying NULL values where no match exists in the left table.&lt;/li&gt;
&lt;li&gt;Performing outer joins, also known as RIGHT OUTER JOIN.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table 
RIGHT JOIN right_tale
ON left_table.id = right_table.id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkya3bsmfp6lp4mj6pnvk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkya3bsmfp6lp4mj6pnvk.png" alt="Right Join" width="800" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  FULL JOIN
&lt;/h3&gt;

&lt;p&gt;FULL JOIN is used to combine the results of both LEFT JOIN and RIGHT JOIN. It helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Returning all rows from both tables.&lt;/li&gt;
&lt;li&gt;Showing matching records from each table.&lt;/li&gt;
&lt;li&gt;Displaying NULL values where no match exists in either table.&lt;/li&gt;
&lt;li&gt;Providing complete data from both sides of the join.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table 
FULL JOIN right_tale
ON left_table.id = right_table.id; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0k8dorr2pydcdwn4djb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0k8dorr2pydcdwn4djb.png" alt="Full Join" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Insights
&lt;/h3&gt;

&lt;p&gt;SQL joins are fundamental for relational data modeling, enabling the combination of rows from multiple tables based on defined relationships, typically via primary and foreign keys. &lt;/p&gt;

&lt;p&gt;Proper join selection directly affects result cardinality, null propagation, and business logic interpretation. Performance considerations include indexing join columns, minimizing unnecessary joins and understanding join order in execution plans. &lt;/p&gt;

&lt;p&gt;Key takeaways are that joins operationalize relational integrity, drive multi-table analytics and must be designed carefully to avoid duplication, unintended filtering or performance degradation especially in high-volume transactional or analytical databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL Window Functions
&lt;/h2&gt;

&lt;p&gt;A window function in SQL is a type of function that performs a calculation across a specific set of rows (the 'window' in question), defined by an &lt;code&gt;OVER()&lt;/code&gt; clause.&lt;/p&gt;

&lt;p&gt;Window functions use values from one or multiple rows to return a value for each row, which makes them different from traditional aggregate functions, which return a single value for multiple rows.&lt;/p&gt;

&lt;p&gt;Similar to aggregate function &lt;code&gt;GROUP BY&lt;/code&gt;, a window function performs calculations across multiple rows. Unlike aggregate functions, a window function does not group rows into one single row.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funh439xdkbv5ut6xcww5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funh439xdkbv5ut6xcww5.png" alt="Window Functions" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Key components of SQL window functions
&lt;/h3&gt;

&lt;p&gt;The syntax for window functions is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_1, column_2, column_3, function()
OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name
FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this syntax:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;SELECT&lt;/code&gt; clause defines the columns you want to select from the &lt;code&gt;table_name&lt;/code&gt; table.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;function()&lt;/code&gt; is the window function you want to use.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;OVER&lt;/code&gt; clause defines the partitioning and ordering of rows in the window.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;PARTITION BY&lt;/code&gt; clause divides rows into partitions based on the specified &lt;code&gt;partition_expression&lt;/code&gt;; if not specified, the result set will be treated as a single partition.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;ORDER BY&lt;/code&gt; clause uses the specified &lt;code&gt;order_expression&lt;/code&gt; to define the order in which rows will be processed within each partition; if not specified, rows will be processed in an undefined order.&lt;/li&gt;
&lt;li&gt;Finally, &lt;code&gt;output_column_name&lt;/code&gt; is the name of your output column.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the key SQL window function components. One more thing worth mentioning is that window functions are applied after the processing of &lt;code&gt;WHERE&lt;/code&gt;, &lt;code&gt;GROUP BY&lt;/code&gt;, and &lt;code&gt;HAVING&lt;/code&gt; clauses. This means you can use the output of your window functions in subsequent clauses of your queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  The OVER() clause
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;OVER()&lt;/code&gt; clause in SQL is essentially the core of window functions. It determines the partitioning and ordering of a rowset before the associated window function is applied. &lt;br&gt;
The &lt;code&gt;OVER()&lt;/code&gt; clause can be applied with functions to compute aggregated values such as moving averages, running totals, cumulative aggregates, or top N per group results.&lt;/p&gt;
&lt;h3&gt;
  
  
  The PARTITION BY clause
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;PARTITION BY&lt;/code&gt; clause is used to partition the rows of a table into groups. This comes in handy when dealing with large datasets that need to be split into smaller parts, which are easier to manage. &lt;br&gt;
&lt;code&gt;PARTITION BY&lt;/code&gt; is always used inside the &lt;code&gt;OVER()&lt;/code&gt; clause; if it is omitted, the entire table is treated as a single partition.&lt;/p&gt;
&lt;h3&gt;
  
  
  The ORDER BY clause
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ORDER BY&lt;/code&gt; determines the order of rows within a partition; if it is omitted, the order is undefined. &lt;br&gt;
For instance, when it comes to ranking functions, &lt;code&gt;ORDER BY&lt;/code&gt; specifies the order in which ranks are assigned to rows.&lt;/p&gt;
&lt;h3&gt;
  
  
  Frame Specification
&lt;/h3&gt;

&lt;p&gt;In the same &lt;code&gt;OVER()&lt;/code&gt; clause, you can specify the upper and lower bounds of a window frame using one of the two subclauses, &lt;code&gt;ROWS&lt;/code&gt; or &lt;code&gt;RANGE&lt;/code&gt;. The basic syntax for both of these subclauses is essentially the same:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ROWS BETWEEN lower_bound AND upper_bound&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RANGE BETWEEN lower_bound AND upper_bound&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And in some cases, they might even return the same result. However, there's an important difference.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;ROWS&lt;/code&gt; subclause, the frame is defined by beginning and ending row positions. Offsets are differences in row numbers from the current row number.&lt;/p&gt;

&lt;p&gt;As opposed to that, in the &lt;code&gt;RANGE&lt;/code&gt; subclause, the frame is defined by a value range. Offsets are differences in row values from the current row value.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of SQL Window Functions
&lt;/h2&gt;

&lt;p&gt;Window functions in SQL Server are divided into three main types: aggregate, ranking, and value functions. Let's have a brief overview of each.&lt;/p&gt;
&lt;h3&gt;
  
  
  Aggregate Window Functions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AVG()&lt;/code&gt;: returns the average of the values in a group, ignoring null values.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MAX()&lt;/code&gt;: returns the maximum value in the expression.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MIN()&lt;/code&gt;: returns the minimum value in the expression.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SUM()&lt;/code&gt;: returns the sum of all the values, or only the DISTINCT values, in the expression.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;COUNT()&lt;/code&gt;: returns the number of items found in a group.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;STDEV()&lt;/code&gt;: returns the statistical standard deviation of all values in the specified expression.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;STDEVP()&lt;/code&gt;: returns the statistical standard deviation for the population for all values in the specified expression.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;VAR()&lt;/code&gt;: returns the statistical variance of all values in the specified expression; it may be followed by the OVER clause.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;VARP()&lt;/code&gt;: returns the statistical variance for the population for all values in the specified expression.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sample query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT name, salary,
  SUM(salary) OVER (PARTITION BY dept) AS dept_total,
  AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ranking Window Functions
&lt;/h3&gt;

&lt;p&gt;Used to assign rank or position within partitions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ROW_NUMBER()&lt;/code&gt;: assigns a unique sequential integer to rows within a partition of a result set.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RANK()&lt;/code&gt;: assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DENSE_RANK()&lt;/code&gt;: assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PERCENT_RANK()&lt;/code&gt;: calculates the relative rank of a row within a group of rows.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NTILE()&lt;/code&gt;: distributes rows in an ordered partition into a specified number of approximately equal groups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sample query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT name, salary,
  RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank
FROM employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Offset(Value) Window Functions
&lt;/h3&gt;

&lt;p&gt;Used to access data from other rows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;LAG()&lt;/code&gt;: retrieves values from rows that precede the current row in the result set.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LEAD()&lt;/code&gt;: retrieves values from rows that follow the current row in the result set.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;FIRST_VALUE()&lt;/code&gt;: returns the first value in an ordered set of values within a partition.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LAST_VALUE()&lt;/code&gt;: returns the last value in an ordered set of values within a partition.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NTH_VALUE()&lt;/code&gt;: returns the value of the nth row in the ordered set of values.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CUME_DIST()&lt;/code&gt;: returns the cumulative distribution of a value in a group of values.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sample Query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT date, revenue,
  LAG(revenue, 1) OVER (ORDER BY date) AS prev_month,
  revenue - LAG(revenue, 1) OVER (ORDER BY date) AS change
FROM monthly_sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;SQL window functions provide a powerful analytical layer within standard SQL, enabling complex calculations across related rows while preserving row-level granularity. Unlike &lt;code&gt;GROUP BY&lt;/code&gt;, they do not collapse result sets, which makes them ideal for scenarios requiring both detail and aggregate insight in the same query. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;OVER()&lt;/code&gt; clause is central, with &lt;code&gt;PARTITION BY&lt;/code&gt; defining logical groups, &lt;code&gt;ORDER BY&lt;/code&gt; controlling calculation sequence, and optional frame specifications (&lt;code&gt;ROWS&lt;/code&gt; or &lt;code&gt;RANGE&lt;/code&gt;) refining scope. &lt;/p&gt;

&lt;p&gt;Key functional categories include aggregate window functions for running totals and moving averages, ranking functions such as &lt;code&gt;ROW_NUMBER()&lt;/code&gt; and &lt;code&gt;RANK()&lt;/code&gt; for ordered comparisons and offset functions like &lt;code&gt;LAG()&lt;/code&gt; and &lt;code&gt;LEAD()&lt;/code&gt; for time-series or sequential analysis. &lt;/p&gt;

&lt;p&gt;When used correctly, window functions significantly reduce query complexity, eliminate the need for self-joins in many analytical patterns and improve expressiveness in reporting and business intelligence workloads.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Turning Data into Insight: An Analyst’s Guide to Power BI</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Sun, 08 Feb 2026 18:44:08 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/turning-data-into-insight-an-analysts-guide-to-power-bi-3lc1</link>
      <guid>https://dev.to/edmund_eryuba/turning-data-into-insight-an-analysts-guide-to-power-bi-3lc1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: The reality of messy business data
&lt;/h2&gt;

&lt;p&gt;In most organizations, data rarely arrives in a clean, analysis-ready format. Analysts typically receive information from multiple sources: spreadsheets maintained by business teams, exports from transactional systems, cloud applications, and enterprise platforms such as ERPs or CRMs. These datasets often contain inconsistent formats, missing values, duplicate records, and unclear naming conventions.&lt;/p&gt;

&lt;p&gt;Working directly with such data leads to unreliable metrics, incorrect aggregations and ultimately poor business decisions. This is where Power BI plays a critical role. Power BI is not just a visualization tool, it is an analytical platform that allows analysts to clean, model, and interpret data before presenting it in a form that decision-makers can trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  From raw data to business action: The analyst workflow
&lt;/h2&gt;

&lt;p&gt;A typical analytical workflow in Power BI follows a logical sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt; raw data from multiple sources e.g., imports from excel, databases or online services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean&lt;/strong&gt; and &lt;strong&gt;transform&lt;/strong&gt; the data using Power Query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt; the data into a meaningful structure.&lt;/li&gt;
&lt;li&gt;Create business &lt;strong&gt;logic using DAX.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Design &lt;strong&gt;dashboards&lt;/strong&gt; that communicate insight.&lt;/li&gt;
&lt;li&gt;Enable &lt;strong&gt;decisions&lt;/strong&gt; and &lt;strong&gt;actions&lt;/strong&gt; by stakeholders.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step builds on the previous one. If any stage is poorly executed, the final insight becomes misleading, regardless of how attractive the dashboard looks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleaning and transforming data with power query
&lt;/h2&gt;

&lt;p&gt;Data cleaning is the foundation of all reliable analytics. Common data quality issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Columns stored in the wrong data type.&lt;/li&gt;
&lt;li&gt;Missing or null values.&lt;/li&gt;
&lt;li&gt;Duplicate customer or transaction records.&lt;/li&gt;
&lt;li&gt;Inconsistent naming and coding systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues directly affect calculations. For example, a null freight value treated as blank instead of zero will distort average shipping costs. Duplicate customer records inflate revenue totals. Incorrect data types prevent time-based analysis entirely.&lt;/p&gt;

&lt;p&gt;Power Query provides a transformation layer where analysts can reshape data without altering the original source. This ensures reproducibility and auditability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Transformation Principles
&lt;/h2&gt;

&lt;p&gt;There are several key principles that should guide an analyst in their approach to data transformation: &lt;/p&gt;

&lt;h3&gt;
  
  
  1. Remove what is not need
&lt;/h3&gt;

&lt;p&gt;Unnecessary columns increase model size, memory usage, and cognitive complexity. Every column should justify its existence in a business question.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Standardize naming
&lt;/h3&gt;

&lt;p&gt;Column and table names should reflect business language, not system codes.&lt;br&gt;
For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Cust_ID → Customer ID&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vSalesTbl → Sales&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This improves both usability and long-term maintainability.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Handle missing and invalid values
&lt;/h3&gt;

&lt;p&gt;Nulls, errors, and placeholders must be explicitly addressed. Analysts must decide whether missing values represent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero&lt;/li&gt;
&lt;li&gt;Unknown&lt;/li&gt;
&lt;li&gt;Not applicable
Each choice has analytical consequences.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Remove duplicates strategically
&lt;/h3&gt;

&lt;p&gt;Duplicates should be removed only when they represent the same real-world entity. Otherwise, analysts risk deleting legitimate records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building meaningful data models
&lt;/h2&gt;

&lt;p&gt;Most analytical errors in Power BI do not come from DAX formulas or charts. They come from poor data models.&lt;/p&gt;

&lt;p&gt;A strong model reflects how the business actually operates. This typically follows a star schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact tables&lt;/strong&gt;: transactions (Sales, Orders, Payments)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dimension tables&lt;/strong&gt;: descriptive attributes (Date, Product, Customer, Region)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This structure ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correct aggregations.&lt;/li&gt;
&lt;li&gt;Predictable filter behavior.&lt;/li&gt;
&lt;li&gt;High performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without proper modeling, even simple metrics like “Total Sales by Region” can produce incorrect results due to ambiguous relationships or double counting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating business logic with DAX
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DAX&lt;/strong&gt; (Data Analysis Expressions)  is a library of functions and operators that can be combined to build formulas and expressions in Power BI, Analysis Services, and Power Pivot in Excel data models. It enables dynamic, context-aware analysis that goes beyond traditional spreadsheet formulas.&lt;/p&gt;

&lt;p&gt;Examples of business logic encoded in DAX:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What counts as “Revenue”?&lt;/li&gt;
&lt;li&gt;How is “Customer Retention” defined?&lt;/li&gt;
&lt;li&gt;What is the official “Profit Margin” formula?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These definitions must be centralized and reusable. Measures become the organization’s single source of analytical truth.&lt;/p&gt;

&lt;p&gt;DAX uses a formula syntax similar to Excel but extends it with advanced functions designed specifically for tabular data models in Power BI. It allows users to create measures, calculated columns and calculated tables to perform dynamic and context-aware calculations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Measures vs Calculated Columns
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calculated columns&lt;/strong&gt;: A calculated column is a column that you add to an existing table (in the model designer) and then create a DAX formula that defines the column's values. They operate row by row and are stored in memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measures&lt;/strong&gt; are evaluated dynamically where results change based on report context.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Creating Measures for Advanced Calculations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Measures are a core component of DAX used for calculations on aggregated data.&lt;/li&gt;
&lt;li&gt;They are evaluated at query time not stored in the data model&lt;/li&gt;
&lt;li&gt;Measures respond dynamically to filters, slicers and report context&lt;/li&gt;
&lt;li&gt;Commonly used measures include SUM, AVERAGE and COUNT&lt;/li&gt;
&lt;li&gt;DAX supports both implicit and explicit measures&lt;/li&gt;
&lt;li&gt;Using correct data types is essential for accurate measure calculations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most analytical metrics, measures are preferred, because they respond to filters, slicers, and user interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Context: The Core of Correct Analytics
&lt;/h3&gt;

&lt;p&gt;Context is one of the most important concepts in DAX because it determines how and where a formula is evaluated. It is what makes DAX calculations dynamic: the same formula can return different results depending on the row, cell, or filters applied in a report. &lt;/p&gt;

&lt;p&gt;Without understanding context, it becomes difficult to build accurate measures, optimize performance, or troubleshoot unexpected results.&lt;/p&gt;

&lt;p&gt;There are three main types of context in DAX:&lt;/p&gt;

&lt;h3&gt;
  
  
  Row Context
&lt;/h3&gt;

&lt;p&gt;Refers to the &lt;em&gt;current row&lt;/em&gt; being evaluated. It is most commonly seen in calculated columns, where the formula is applied row by row.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filter Context
&lt;/h3&gt;

&lt;p&gt;It is the set of filters applied to the data. These filters can come from slicers and visuals in the report, or they can be explicitly defined inside a DAX formula.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Context
&lt;/h3&gt;

&lt;p&gt;Created by the layout of the report itself.&lt;/p&gt;

&lt;p&gt;If analysts misunderstand context, they produce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong totals.&lt;/li&gt;
&lt;li&gt;Misleading KPIs.&lt;/li&gt;
&lt;li&gt;Inconsistent executive reports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary, context is the foundation of how DAX works. It controls what data a formula can “see” and therefore directly affects the result of every calculation. Mastering row, query, and filter context is essential for building reliable, high-performing, and truly dynamic analytical models in Power BI and other tabular environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing dashboards that communicate insight
&lt;/h2&gt;

&lt;p&gt;Designing interactive dashboards helps businesses make data-driven decisions. A dashboard is not a collection of charts. It is a decision interface.&lt;/p&gt;

&lt;p&gt;It is essential to design professional reports that focus on optimizing layouts for different audiences, and leveraging Power BI’s interactive features.&lt;/p&gt;

&lt;p&gt;Good dashboards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Highlight trends and deviations.&lt;/li&gt;
&lt;li&gt;Compare performance against targets.&lt;/li&gt;
&lt;li&gt;Expose anomalies and risks.&lt;/li&gt;
&lt;li&gt;Support follow-up questions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad dashboards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Show too many metrics.&lt;/li&gt;
&lt;li&gt;Focus on visuals over meaning.&lt;/li&gt;
&lt;li&gt;Require explanation to interpret.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7n1810hjmrvo1upaz5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm7n1810hjmrvo1upaz5m.png" alt="Sample Dashboard Data" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning Dashboards into Business Decisions
&lt;/h2&gt;

&lt;p&gt;This is the most important step, and the most neglected.&lt;/p&gt;

&lt;p&gt;Dashboards should answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which regions are underperforming?&lt;/li&gt;
&lt;li&gt;Which products drive the most margin?&lt;/li&gt;
&lt;li&gt;Where is customer churn increasing?&lt;/li&gt;
&lt;li&gt;What happens if we change pricing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real business actions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reallocating marketing budgets.&lt;/li&gt;
&lt;li&gt;Optimizing inventory levels.&lt;/li&gt;
&lt;li&gt;Identifying operational bottlenecks.&lt;/li&gt;
&lt;li&gt;Redesigning sales strategies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If no decision changes because of a dashboard, then the analysis failed in capturing key business indicators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common pitfalls that undermine analytical value
&lt;/h2&gt;

&lt;p&gt;Even experienced analysts fall into these traps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Treating Power BI as a visualization tool instead of a modeling tool.&lt;/li&gt;
&lt;li&gt;Writing complex DAX on top of poor data models.&lt;/li&gt;
&lt;li&gt;Using calculated columns instead of measures.&lt;/li&gt;
&lt;li&gt;Ignoring filter propagation and relationship direction.&lt;/li&gt;
&lt;li&gt;Optimizing visuals before validating metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues lead to highly polished dashboards with fundamentally wrong numbers, an undesired outcome in analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Power BI provides an integrated analytical environment where data preparation, semantic modeling, calculation logic, and visualization are combined into a single workflow.&lt;/p&gt;

&lt;p&gt;The analytical value of the platform does not emerge from individual components such as Power Query, DAX, or reports in isolation, but from how these components are systematically designed and aligned with business requirements.&lt;/p&gt;

&lt;p&gt;Effective use of Power BI requires analysts to impose structure on raw data, define consistent relationships, implement reusable calculation logic through measures and ensure that visual outputs reflect correct filter and evaluation contexts. &lt;/p&gt;

&lt;p&gt;When these layers are properly engineered, Power BI supports reliable aggregation, scalable analytical models, and consistent interpretation of metrics across the organization, enabling stakeholders to base operational and strategic decisions on a shared and technically sound analytical foundation.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>datamodelling</category>
      <category>powerbi</category>
      <category>datastructures</category>
    </item>
    <item>
      <title>Data Modelling for High Performance and Accurate Analytics in Power BI</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Sun, 01 Feb 2026 09:16:40 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/data-modelling-for-high-performance-and-accurate-analytics-in-power-bi-5988</link>
      <guid>https://dev.to/edmund_eryuba/data-modelling-for-high-performance-and-accurate-analytics-in-power-bi-5988</guid>
      <description>&lt;p&gt;This article explores data modelling in Power BI with a focus on different schema types and explains how proper modelling enhances performance and ensures accurate reporting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Data Modelling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Data modelling&lt;/em&gt;&lt;/strong&gt; is one of the most critical steps in building effective business intelligence (BI) solutions. In Power BI, data modelling refers to the process of structuring data into related tables, defining relationships and creating a logical framework that supports efficient querying, accurate calculations and meaningful reporting.&lt;/p&gt;

&lt;p&gt;A well-designed data model is not just about organizing tables; it directly impacts report performance, usability, scalability, and the correctness of insights derived from data. Poor data modelling leads to slow reports, incorrect aggregations, complex DAX expressions, and ultimately unreliable business decisions.&lt;/p&gt;

&lt;p&gt;In Power BI, data modelling involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifying business entities (facts and dimensions)&lt;/li&gt;
&lt;li&gt;Structuring tables logically&lt;/li&gt;
&lt;li&gt;Defining relationships between tables&lt;/li&gt;
&lt;li&gt;Setting cardinality and filter direction&lt;/li&gt;
&lt;li&gt;Creating calculated columns and measures&lt;/li&gt;
&lt;li&gt;Ensuring data granularity and consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is a Schema in Power BI?
&lt;/h2&gt;

&lt;p&gt;In Power BI, a schema refers to the structure and organization of data within a data model. Schemas define how data is connected and related within the model, influencing the efficiency and performance of data queries and reports. Understanding schemas requires modelers to classify their model tables as either dimension or fact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fact tables
&lt;/h3&gt;

&lt;p&gt;Fact tables store quantitative, transactional data that can be sales orders, quantity sold, revenue, profit and more. A fact table contains dimension key columns that relate to dimension tables, and numeric measure columns. The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table.&lt;/p&gt;

&lt;p&gt;Example of a fact table: Consider a simple sales analytics scenario in Power BI.&lt;/p&gt;

&lt;p&gt;This table stores transactional (measurable) data.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SalesID&lt;/th&gt;
&lt;th&gt;DateKey&lt;/th&gt;
&lt;th&gt;ProductKey&lt;/th&gt;
&lt;th&gt;CustomerKey&lt;/th&gt;
&lt;th&gt;Quantity&lt;/th&gt;
&lt;th&gt;SalesAmount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1001&lt;/td&gt;
&lt;td&gt;20240101&lt;/td&gt;
&lt;td&gt;501&lt;/td&gt;
&lt;td&gt;301&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1002&lt;/td&gt;
&lt;td&gt;20240101&lt;/td&gt;
&lt;td&gt;502&lt;/td&gt;
&lt;td&gt;302&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1003&lt;/td&gt;
&lt;td&gt;20240102&lt;/td&gt;
&lt;td&gt;501&lt;/td&gt;
&lt;td&gt;303&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1004&lt;/td&gt;
&lt;td&gt;20240103&lt;/td&gt;
&lt;td&gt;503&lt;/td&gt;
&lt;td&gt;301&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contains foreign keys to dimensions.&lt;/li&gt;
&lt;li&gt;Contains numeric measures.&lt;/li&gt;
&lt;li&gt;Has many rows (high volume).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dimension tables
&lt;/h3&gt;

&lt;p&gt;Dimension tables describe the business entities that are modelled. Entities can include products, people, places, and concepts including time itself. A dimension table contains a key column (or columns) that acts as a unique identifier, and other columns. Other columns support filtering and grouping your data.&lt;/p&gt;

&lt;p&gt;This table provides descriptive attributes about products.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ProductKey&lt;/th&gt;
&lt;th&gt;ProductName&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Brand&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;501&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Dell&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;502&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Sony&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;503&lt;/td&gt;
&lt;td&gt;Mouse&lt;/td&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Logitech&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Types of Schemas in Power BI:&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Star Schema
&lt;/h2&gt;

&lt;p&gt;The star schema consists of a central fact table connected directly to multiple dimension tables, much like the appearance of a star.&lt;br&gt;
The central fact table contains quantitative data (e.g., sales), while the dimension tables hold descriptive attributes related to the facts (e.g. Employee, Date, Territory). Dimension tables are not connected to each other.&lt;/p&gt;

&lt;p&gt;Star schemas are ideal for straightforward reporting and querying. They are efficient for read-heavy operations, making them suitable for dashboards and summary reports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgc7rgra129a65q3exmao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgc7rgra129a65q3exmao.png" alt="Power BI Star Schema" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Snowflake Schema
&lt;/h2&gt;

&lt;p&gt;The snowflake schema is a normalized version of the star schema. In this design, dimension tables are further divided into related tables, resulting in a more complex structure.&lt;br&gt;
The normalization process eliminates redundancy by splitting dimension tables into multiple related tables. This results in a web-like structure, resembling a snowflake.&lt;/p&gt;

&lt;p&gt;Snowflake schemas are used in scenarios requiring detailed data models and efficient storage. They are beneficial when dealing with large datasets where data redundancy needs to be minimized.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cubxjxf50x74bwpfflg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cubxjxf50x74bwpfflg.png" alt="Snowflake Schema" width="520" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Galaxies Schema (Fact Constellation Schema)
&lt;/h2&gt;

&lt;p&gt;The galaxies schema involves multiple fact tables that share dimension tables, creating a complex, interconnected data model.&lt;/p&gt;

&lt;p&gt;This schema consists of multiple fact tables linked to shared dimension tables, enabling the analysis of different business processes within a single model.&lt;/p&gt;

&lt;p&gt;Galaxies schemas are suitable for large-scale enterprise environments where multiple related business processes need to be analysed. They support complex queries and detailed reporting across various domains.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxna99qc2m9ca576v23q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxna99qc2m9ca576v23q.png" alt="Galaxies Schema" width="617" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing schemas in Power BI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Creating a Star Schema
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Set Up Fact and Dimension Tables&lt;/em&gt;: Identify and create the central fact table and surrounding dimension tables.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Link Tables&lt;/em&gt;: Establish relationships between the fact table and dimension tables using foreign keys.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Optimize for Performance&lt;/em&gt;: Index key columns and use efficient data types to enhance query performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  b. Implementing a Snowflake Schema
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Normalize Dimension Tables&lt;/em&gt;: Split dimension tables into related sub-tables to reduce redundancy.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Create Relationships&lt;/em&gt;: Define relationships between sub-tables and the main dimension tables, ensuring referential integrity.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Optimize Storage&lt;/em&gt;: Use appropriate storage and indexing strategies to manage complex joins efficiently.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  c. Setting Up a Galaxies Schema
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;Identify Fact Tables&lt;/em&gt;: Determine the various fact tables needed for different business processes.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Share Dimension Tables&lt;/em&gt;: Create shared dimension tables to link multiple fact tables.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Ensure Efficient Querying&lt;/em&gt;: Design the schema to support complex queries and optimize performance through indexing and data partitioning.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Role of data modelling in performance and reliable reporting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Query and Engine Performance
&lt;/h3&gt;

&lt;p&gt;Query and engine performance refers to how efficiently Power BI processes data when users interact with reports. Power BI’s VertiPaq engine performs best when data is organised using a star schema, where fact tables store numerical data and dimension tables store descriptive attributes. &lt;/p&gt;

&lt;p&gt;This structure improves compression, reduces the number of joins required during query execution, and simplifies DAX calculations. As a result, reports load faster, visuals respond more quickly, and overall system performance improves.&lt;/p&gt;

&lt;h3&gt;
  
  
  b. Memory and Scalability
&lt;/h3&gt;

&lt;p&gt;Memory and scalability describe the ability of a Power BI model to handle large and growing datasets. Proper data modelling controls dataset size by reducing column cardinality and removing unnecessary fields. Low-cardinality columns compress efficiently, while column pruning helps minimise memory usage and refresh time. &lt;/p&gt;

&lt;p&gt;By structuring data into lean fact and dimension tables rather than wide flat tables, Power BI models become more scalable and capable of supporting high data volumes without performance degradation.&lt;/p&gt;

&lt;h3&gt;
  
  
  c. Correct Aggregations and Metrics
&lt;/h3&gt;

&lt;p&gt;Correct aggregations ensure that reported values accurately reflect business operations. This depends on defining clear data granularity and using proper relationship structures. Each fact table must represent a consistent level of detail, and filters should flow from dimensions to facts. &lt;/p&gt;

&lt;p&gt;Poor modelling can result in double counting, ambiguous totals, or misleading KPIs. A well-designed model prevents these issues by enforcing one-to-many relationships and maintaining logical data structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  d. Filter Propagation and User Trust
&lt;/h3&gt;

&lt;p&gt;Filter propagation determines how slicers and filters affect report visuals. In a properly modelled system, filters behave predictably and consistently across all visuals, allowing users to explore data intuitively. &lt;/p&gt;

&lt;p&gt;When modelling is poor, filters may behave inconsistently, leading to confusing or contradictory results. Reliable filter behavior builds user trust and ensures that insights derived from reports are credible and easy to interpret.&lt;/p&gt;

&lt;h3&gt;
  
  
  e. Maintainability and Governance
&lt;/h3&gt;

&lt;p&gt;Maintainability refers to how easy the model is to manage and extend over time. A strong data model supports reusable measures, consistent dimensions, and standard business definitions across reports. This creates a single source of truth for the organisation, reduces duplication of logic, and simplifies governance. &lt;/p&gt;

&lt;p&gt;As a result, the reporting environment becomes easier to maintain, more consistent, and more reliable for long-term decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Understanding different schemas in Power BI is crucial for designing efficient data models. Each schema has unique advantages: the star schema is ideal for straightforward reporting and querying, offering simplicity and ease of use; the snowflake schema provides detailed, normalized structures, reducing redundancy and optimizing storage; and the galaxies schema supports complex, large-scale data models with multiple fact tables sharing dimension tables. &lt;/p&gt;

&lt;p&gt;Choosing the right schema improves query performance, data storage efficiency, and data refresh operations. By mastering these schemas, you can create robust and scalable data models, enabling your organization to make data-driven decisions effectively.&lt;/p&gt;

&lt;p&gt;Understanding and implementing different schemas in Power BI is crucial for designing efficient and effective data models. Each schema type; star, snowflake, and galaxies, offers unique benefits and use cases. By mastering these schemas, you can create robust data models that support comprehensive analysis and insightful reporting. Experiment with different designs based on your data needs and continue refining your skills to become a Power BI expert.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>powerfuldevs</category>
      <category>datastructures</category>
      <category>powerbi</category>
    </item>
    <item>
      <title>Linux for Data Engineers: From Terminal to Text Editing</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Sun, 25 Jan 2026 16:13:21 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/introduction-to-linux-for-data-engineers-299o</link>
      <guid>https://dev.to/edmund_eryuba/introduction-to-linux-for-data-engineers-299o</guid>
      <description>&lt;p&gt;Linux is an &lt;strong&gt;open-source&lt;/strong&gt; operating system that is based on the Unix operating system. It was created by Linus Torvalds in 1991.&lt;br&gt;
Open-source means that the source code of the operating system is available to the public. This allows anyone to modify the original code, customize it, and distribute the new operating system to potential users.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why should you learn about Linux?
&lt;/h2&gt;

&lt;p&gt;In today's data center landscape, Linux and Microsoft Windows stand out as the primary contenders, with Linux having a major share.&lt;/p&gt;

&lt;p&gt;Here are several compelling reasons to learn Linux:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Given the prevalence of Linux &lt;strong&gt;hosting&lt;/strong&gt;, there is a high chance that your application will be hosted on Linux. So, learning Linux as a data engineer or developer becomes increasingly valuable.&lt;/li&gt;
&lt;li&gt;With &lt;strong&gt;cloud computing&lt;/strong&gt; becoming the norm, chances are high that your cloud instances will rely on Linux.&lt;/li&gt;
&lt;li&gt;Linux serves as the &lt;strong&gt;foundation&lt;/strong&gt; for many operating systems for the Internet of Things (IoT) and mobile applications.&lt;/li&gt;
&lt;li&gt;Linux is built for &lt;strong&gt;automation&lt;/strong&gt;, which is central to data engineering. Linux enables repeatability, fault tolerance and observability of the entire workflow.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What is a Linux Kernel?
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;kernel&lt;/strong&gt; is the central component of an operating system that manages the computer and its hardware operations. It handles memory operations and CPU time.&lt;/p&gt;

&lt;p&gt;The kernel acts as a bridge between applications and the hardware-level data processing using inter-process communication and system calls.&lt;br&gt;
The kernel loads into memory first when an operating system starts and remains there until the system shuts down. It is responsible for tasks like disk management, task management, and memory management.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is a Linux distribution?
&lt;/h2&gt;

&lt;p&gt;The Linux kernel is reused and configured differently across distributions. You can further combine different utilities and software to create a completely new operating system.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Linux distribution&lt;/strong&gt; or &lt;strong&gt;distro&lt;/strong&gt; is a version of the Linux operating system that includes the Linux kernel, system utilities, and other software. Being open source, a Linux distribution is a collaborative effort involving multiple independent open-source development communities.&lt;/p&gt;

&lt;p&gt;Today, there are thousands of Linux distributions to choose from, offering differing goals and criteria for selecting and supporting the software provided by their distribution.&lt;/p&gt;

&lt;p&gt;Distributions vary from one to the other, but they generally have several common characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A distribution consists of a Linux kernel.&lt;/li&gt;
&lt;li&gt;It supports user space programs.&lt;/li&gt;
&lt;li&gt;A distribution may be small and single-purpose or include thousands of open-source programs.&lt;/li&gt;
&lt;li&gt;Some means of installing and updating the distribution and its components should be provided.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some popular Linux distributions are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://ubuntu.com/" rel="noopener noreferrer"&gt;Ubuntu&lt;/a&gt;: One of the most widely used and popular Linux distributions. It is user-friendly and recommended for beginners.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://linuxmint.com/" rel="noopener noreferrer"&gt;Linux Mint&lt;/a&gt;: Based on Ubuntu, Linux Mint provides a user-friendly experience with a focus on multimedia support.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.archlinux.org/" rel="noopener noreferrer"&gt;Arch Linux&lt;/a&gt;: Popular among experienced users, Arch is a lightweight and flexible distribution aimed at users who prefer a DIY approach.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://manjaro.org/" rel="noopener noreferrer"&gt;Manjaro&lt;/a&gt;: Based on Arch Linux, Manjaro provides a user-friendly experience with pre-installed software and easy system management tools.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kali.org/" rel="noopener noreferrer"&gt;Kali Linux&lt;/a&gt;: Kali Linux provides a comprehensive suite of security tools and is mostly focused on cybersecurity and hacking.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  How to install and access Linux
&lt;/h2&gt;

&lt;p&gt;There are various methods that can be utilized in order to access Linux including on a Windows machine. This section goes into detail exploring these methods.&lt;/p&gt;
&lt;h3&gt;
  
  
  Install Linux as the primary OS
&lt;/h3&gt;

&lt;p&gt;Installing Linux as the primary OS is the most efficient way to use Linux, as you can use the full power of your machine.&lt;br&gt;
We'll focus on installing Ubuntu, which is one of the most popular Linux distributions. Linux has other numerous distributions suited for user specific applications that can be explored based on user preference. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1&lt;/strong&gt; – Download the &lt;a href="https://ubuntu.com/download" rel="noopener noreferrer"&gt;Ubuntu&lt;/a&gt; iso file. Make sure to select a stable release that is labelled "LTS". LTS stands for Long Term Support which means you can get free security and maintenance updates for a long time (usually 5 years).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 2&lt;/strong&gt; – Create a bootable pen drive: There are a number of softwares that can create a bootable pen drive. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 3&lt;/strong&gt; – Boot from the pen drive: Once your bootable pen drive is ready, insert it and boot from the pen drive. The boot menu depends on your laptop. You can google the boot menu for your laptop model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 4&lt;/strong&gt; – Follow the prompts. Once, the boot process starts, select try or install ubuntu.
The process will take some time. Once the GUI appears, you can select the language, and keyboard layout and continue. Enter your login and name. Remember the credentials as you will need them to log in to your system and access full privileges. Wait for the installation to complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 5&lt;/strong&gt; – Restart: Click on restart now and remove the pen drive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 6&lt;/strong&gt; – Login: Login with the credentials you entered earlier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And there you go! Now you can install apps and customize your desktop.&lt;/p&gt;
&lt;h3&gt;
  
  
  Accessing the terminal
&lt;/h3&gt;

&lt;p&gt;An important part is learning about the &lt;strong&gt;terminal&lt;/strong&gt; where you'll run all the commands and see the magic happen. You can search for the terminal by pressing the "windows" key and typing "terminal". &lt;br&gt;
The shortcut for opening the terminal is &lt;code&gt;ctrl + alt + t&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also open the terminal from inside a folder. Right click where you are and click on "Open in Terminal". This will open the terminal in the same path.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use Linux on a Windows machine
&lt;/h2&gt;

&lt;p&gt;Sometimes you might need to run both Linux and Windows side by side. Luckily, there are some ways you can get the best of both worlds without getting different computers for each operating system.&lt;br&gt;
This section explores a few ways to use Linux on a Windows machine. &lt;/p&gt;
&lt;h3&gt;
  
  
  Option 1: "Dual-boot" Linux + Windows
&lt;/h3&gt;

&lt;p&gt;With dual boot, you can install Linux alongside Windows on your computer, allowing you to choose which operating system to use at startup.&lt;/p&gt;

&lt;p&gt;This requires partitioning your hard drive and installing Linux on a separate partition. With this approach, you can only use one operating system at a time.&lt;/p&gt;
&lt;h3&gt;
  
  
  Option 2: Use Windows Subsystem for Linux (WSL)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Windows Subsystem for Linux&lt;/strong&gt; provides a compatibility layer that lets you run Linux binary executables natively on Windows.&lt;/p&gt;

&lt;p&gt;Using WSL has some advantages. The setup for WSL is simple and not time-consuming. It is lightweight compared to virtual machines (VMs) where you have to allocate resources from the host machine. You don't need to install any ISO or virtual disc image for Linux machines which tend to be heavy files. You can use Windows and Linux side by side.&lt;/p&gt;
&lt;h3&gt;
  
  
  How to install WSL2
&lt;/h3&gt;

&lt;p&gt;First, enable the Windows Subsystem for Linux option in settings.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to Start. Search for "Turn Windows features on or off."&lt;/li&gt;
&lt;li&gt;Check the option "Windows Subsystem for Linux" if it isn't already.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9207kkwiwapvxitirtmo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9207kkwiwapvxitirtmo.png" alt="Windows features" width="549" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next, open your command prompt and provide the installation commands.&lt;/li&gt;
&lt;li&gt;Open Command Prompt as an administrator:&lt;/li&gt;
&lt;li&gt;Run the command below:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wsl –install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: By default, Ubuntu will be installed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Once installation is complete, you'll need to reboot your Windows machine. So, restart your Windows machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once installation of Ubuntu is complete, you'll be prompted to enter your username and password.&lt;br&gt;
And, that's it! You are ready to use Ubuntu.&lt;br&gt;
Launch Ubuntu by searching from the start menu.&lt;/p&gt;
&lt;h3&gt;
  
  
  Option 3: Use a Virtual Machine (VM)
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;virtual machine&lt;/strong&gt; (VM) is a software emulation of a physical computer system. It allows you to run multiple operating systems and applications on a single physical machine simultaneously.&lt;/p&gt;

&lt;p&gt;You can use virtualization software such as &lt;strong&gt;Oracle VirtualBox&lt;/strong&gt; or &lt;strong&gt;VMware&lt;/strong&gt; to create a virtual machine running Linux within your Windows environment. This allows you to run Linux as a guest operating system alongside Windows.&lt;/p&gt;

&lt;p&gt;VM software provides options to allocate and manage hardware resources for each VM, including CPU cores, memory, disk space, and network bandwidth. You can adjust these allocations based on the requirements of the guest operating systems and applications.&lt;/p&gt;
&lt;h3&gt;
  
  
  Option 4: Use a Browser-based Solution
&lt;/h3&gt;

&lt;p&gt;Browser-based solutions are particularly useful for quick testing, learning, or accessing Linux environments from devices that don't have Linux installed.&lt;br&gt;
You can either use online code editors or web-based terminals to access Linux. Note that you usually don't have full administration privileges in these cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Online code editors:&lt;/strong&gt; They offer editors with built-in Linux terminals. While their primary purpose is coding, you can also utilize the Linux terminal to execute commands and perform tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://replit.com/" rel="noopener noreferrer"&gt;Replit&lt;/a&gt; is an example of an online code editor, where you can write your code and access the Linux shell at the same time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web-based Linux terminals:&lt;/strong&gt; Online Linux terminals allow you to access a Linux command-line interface directly from your browser. These terminals provide a web-based interface to a Linux shell, enabling you to execute commands and work with Linux utilities.&lt;br&gt;
One such example is &lt;a href="https://jslinux.org/" rel="noopener noreferrer"&gt;JSLinux&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Option 5: Use a Cloud-based Solution
&lt;/h3&gt;

&lt;p&gt;Instead of running Linux directly on your Windows machine, you can consider using cloud-based Linux environments or &lt;strong&gt;virtual private servers&lt;/strong&gt; (VPS) to access and work with Linux remotely.&lt;/p&gt;

&lt;p&gt;Services like &lt;strong&gt;Amazon EC2&lt;/strong&gt;, &lt;strong&gt;Microsoft Azure&lt;/strong&gt;, or &lt;strong&gt;DigitalOcean&lt;/strong&gt; provide Linux instances that you can connect to from your Windows computer. Note that some of these services offer free tiers, but they are not usually free in the long run.&lt;/p&gt;
&lt;h2&gt;
  
  
  Introduction to Bash Shell and System Commands
&lt;/h2&gt;

&lt;p&gt;The Linux command line is provided by a program called the &lt;strong&gt;shell&lt;/strong&gt;. Over the years, the shell program has evolved to cater to various options.&lt;/p&gt;

&lt;p&gt;Different users can be configured to use different shells. But most users prefer to stick with the current default shell. The default shell for many Linux distros is the GNU Bourne-Again Shell (&lt;code&gt;bash&lt;/code&gt;). Bash is succeeded by the Bourne shell (&lt;code&gt;sh&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;To find out your current shell, open your terminal and enter the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo $SHELL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Command breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;echo&lt;/code&gt; command is used to print on the terminal.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;$SHELL&lt;/code&gt; is a special variable that holds the name of the current shell.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my setup, the output is /bin/bash. This means that I am using the bash shell.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo82oq9d55m3q5dvhhndb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo82oq9d55m3q5dvhhndb.png" alt="bin bash" width="800" height="112"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bash is very powerful as it can simplify certain operations that are hard to accomplish efficiently with a GUI (or Graphical User Interface). Remember that most servers do not have a GUI, and it is best to learn to use the powers of a command line interface (CLI).&lt;/p&gt;

&lt;h3&gt;
  
  
  Terminal vs Shell
&lt;/h3&gt;

&lt;p&gt;The terms &lt;strong&gt;terminal&lt;/strong&gt; and &lt;strong&gt;shell&lt;/strong&gt; are often used interchangeably, but they refer to different parts of the command-line interface.&lt;br&gt;
The terminal is the interface you use to interact with the shell. The shell is the command interpreter that processes and executes your commands.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is a prompt?
&lt;/h3&gt;

&lt;p&gt;When a shell is used interactively, it displays a $ when it is waiting for a command from the user. This is called the &lt;strong&gt;shell prompt&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[username@host ~]$
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the shell is running as root, the prompt is changed to #.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[root@host ~]#
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Command Structure
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;command&lt;/strong&gt; is a program that performs a specific operation. Once you have access to the shell, you can enter any command after the &lt;code&gt;$&lt;/code&gt; sign and see the output on the terminal.&lt;br&gt;
Generally, Linux commands follow this syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;command [options] [arguments]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the breakdown of the above syntax:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;command&lt;/code&gt;: This is the name of the command you want to execute. &lt;code&gt;ls&lt;/code&gt; (list), &lt;code&gt;cp&lt;/code&gt; (copy), and &lt;code&gt;rm&lt;/code&gt; (remove) are common Linux commands.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[options]&lt;/code&gt;: Options, or flags, often preceded by a hyphen (-) or double hyphen (--), modify the behavior of the command. They can change how the command operates. For example, &lt;code&gt;ls -a&lt;/code&gt; uses the &lt;code&gt;-a&lt;/code&gt; option to display hidden files in the current directory.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[arguments]&lt;/code&gt;: Arguments are the inputs for the commands that require one. These could be filenames, user names, or other data that the command will act upon. For example, in the command &lt;code&gt;cat access.log&lt;/code&gt;, &lt;code&gt;cat&lt;/code&gt; is the command and &lt;code&gt;access.log&lt;/code&gt; is the input. As a result, the &lt;code&gt;cat&lt;/code&gt; command displays the contents of the &lt;code&gt;access.log&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Options and arguments are not required for all commands. Some commands can be run without any options or arguments, while others might require one or both to function correctly. You can always refer to the command's manual to check the options and arguments it supports. You can view a command's manual using the &lt;code&gt;man&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;You can access the manual page for ls with &lt;code&gt;man ls&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd5mlvngd6qv50bqlq4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd5mlvngd6qv50bqlq4p.png" alt="man ls" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manual pages are a great and quick way to access the documentation. I highly recommend going through man pages for the commands that you use the most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing Files From the Command line
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Linux File-system Hierarchy
&lt;/h3&gt;

&lt;p&gt;All files in Linux are stored in a file-system. It follows an inverted-tree-like structure because the root is at the topmost part.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/&lt;/code&gt; is the root directory and the starting point of the file system. The root directory contains all other directories and files on the system. The &lt;code&gt;/&lt;/code&gt; character also serves as a directory separator between path names. For example, &lt;code&gt;/home/alice&lt;/code&gt; forms a complete path.&lt;br&gt;
You can learn more about the file system using the &lt;code&gt;man hier command&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdw22ay6a6ydn6vnkawq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdw22ay6a6ydn6vnkawq3.png" alt="man hier output" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Navigating the Linux File-system
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;absolute path&lt;/strong&gt; is the full path from the root directory to the file or directory. It always starts with a &lt;code&gt;/&lt;/code&gt;. For example, &lt;code&gt;/home/john/documents&lt;/code&gt;.&lt;br&gt;
The &lt;strong&gt;relative path&lt;/strong&gt;, on the other hand, is the path from the current directory to the destination file or directory. It does not start with a &lt;code&gt;/&lt;/code&gt;. For example, &lt;code&gt;documents/work/project&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Locating your current directory:&lt;/strong&gt; You can locate your current directory in the Linux file system using the &lt;code&gt;pwd&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32rl61a89cakdxv491ln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32rl61a89cakdxv491ln.png" alt="pwd command" width="800" height="189"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Changing directories:&lt;/strong&gt; The command to change directories is &lt;code&gt;cd&lt;/code&gt; and it stands for &lt;strong&gt;&lt;em&gt;change directory&lt;/em&gt;&lt;/strong&gt;. You can use the &lt;code&gt;cd&lt;/code&gt; command to navigate to a different directory.&lt;/p&gt;

&lt;p&gt;Some other commonly used &lt;code&gt;cd&lt;/code&gt; shortcuts are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cd ..&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go back one directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cd ../..&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go back two directories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cd or cd ~&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go to the home directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cd -&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go to the previous path&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Managing Files and Directories
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Creating new directories:&lt;/strong&gt; You can create an empty directory using the &lt;code&gt;mkdir&lt;/code&gt; command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# creates an empty directory named "foo" in the current folder
mkdir foo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also create directories recursively using the &lt;code&gt;-p&lt;/code&gt; option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating new files:&lt;/strong&gt; The touch command creates an empty file. You can use it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# creates empty file "file.txt" in the current folder
touch file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The file names can be chained together if you want to create multiple files in a single command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# creates empty files "file1.txt", "file2.txt", and "file3.txt" in the current folder

touch file1.txt file2.txt file3.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Removing files and directories:&lt;/strong&gt; You can use the &lt;code&gt;rm&lt;/code&gt; command to remove both files and non-empty directories. The &lt;code&gt;rmdir&lt;/code&gt; command removes an empty directory.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rm file.txt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Removes the file file.txt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rm -r directory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Removes the directory directory and its contents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rm -f file.txt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Removes the file file.txt without prompting for confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;rmdir&lt;/code&gt; directory&lt;/td&gt;
&lt;td&gt;Removes an empty directory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Copying files using the cp command:&lt;/strong&gt; To copy files in Linux, use the cp command.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Syntax to copy files: &lt;code&gt;cp source_file destination_of_file&lt;/code&gt;
This command copies a file named file1.txt to a new file location /home/adam/log.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cp file1.txt /home/adam/logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cp&lt;/code&gt; command also creates a copy of one file with the provided name.&lt;br&gt;
This command copies a file named file1.txt to another file named file2.txt in the same folder.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cp file1.txt file2.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Moving and renaming files and folders:&lt;/strong&gt; The mv command is used to rename and move files and folders from one directory to the other.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Syntax to move files: &lt;code&gt;mv source_file destination_directory&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Moves a file named file1.txt to a directory named backup

mv file1.txt backup/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To move a directory and its contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mv dir1/ backup/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Renaming files and folders in Linux is also done with the &lt;code&gt;mv&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;Syntax to rename files: &lt;code&gt;mv old_name new_name&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Renames a file from file1.txt to file2.txt

mv file1.txt file2.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Locating Files and Folders:&lt;/strong&gt; The find command lets you efficiently search for files, folders, and character and block devices.&lt;br&gt;
Below is the basic syntax of the &lt;code&gt;find&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;find /path/ -type f -name file-to-search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/path&lt;/code&gt; is the path where the file is expected to be found. This is the starting point for searching files. The path can also be/or . which represents the root and current directory, respectively.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-type&lt;/code&gt; represents the file descriptors. They can be any of the below:&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;f&lt;/code&gt; – &lt;strong&gt;Regular file&lt;/strong&gt; such as text files, images, and hidden files.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;d&lt;/code&gt; – &lt;strong&gt;Directory&lt;/strong&gt;. These are the folders under consideration.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;l&lt;/code&gt; – &lt;strong&gt;Symbolic link&lt;/strong&gt;. Symbolic links point to files and are similar to shortcuts.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;c&lt;/code&gt; – &lt;strong&gt;Character devices&lt;/strong&gt;. Files that are used to access character devices are called character device files. Drivers communicate with character devices by sending and receiving single characters (bytes, octets). Examples include keyboards, sound cards, and the mouse.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;b&lt;/code&gt; – &lt;strong&gt;Block devices&lt;/strong&gt;. Files that are used to access block devices are called block device files. Drivers communicate with block devices by sending and receiving entire blocks of data. Examples include USB and CD-ROM&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-name&lt;/code&gt; is the name of the file type that you want to search.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Basic Commands for Viewing Files
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Display files and files contents:&lt;/strong&gt; The &lt;code&gt;cat&lt;/code&gt; command in Linux is used to display the contents of a file. &lt;/p&gt;

&lt;p&gt;Here is the basic syntax of the cat command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat [options] [file]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to view the contents of a file named file.txt, you can use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat file.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will display all the contents of the file on the terminal at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Viewing text files interactively using &lt;code&gt;less&lt;/code&gt; and &lt;code&gt;more&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;While &lt;code&gt;cat&lt;/code&gt; displays the entire file at once, &lt;code&gt;less&lt;/code&gt; and &lt;code&gt;more&lt;/code&gt; allow you to view the contents of a file interactively. This is useful when you want to scroll through a large file or search for specific content.&lt;/p&gt;

&lt;p&gt;The syntax of the &lt;code&gt;less&lt;/code&gt; command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;less [options] [file]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;more&lt;/code&gt; command is similar to &lt;code&gt;less&lt;/code&gt; but has fewer features. It is used to display the contents of a file one screen at a time.&lt;br&gt;
The syntax of the more command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;more [options] [file]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Essentials of Text Editing in Linux
&lt;/h2&gt;

&lt;p&gt;Text editing skills using the command line are one of the most crucial skills in Linux. In this section, you will learn how to use two popular text editors in Linux: Vim and Nano. Vim and nano are safe choices to learn text editing as they are present on most Linux distributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mastering Vim: Introductory Guide to Vim
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction to Vim
&lt;/h3&gt;

&lt;p&gt;Vim is a popular text editing tool for the command line. Vim comes with its advantages: it is powerful, customizable, and fast. Vim has two variations: &lt;strong&gt;Vim&lt;/strong&gt; (vim) and &lt;strong&gt;Vim tiny&lt;/strong&gt; (vi). Vim tiny is a smaller version of Vim that lacks some features of Vim.&lt;/p&gt;

&lt;p&gt;Here are some reasons why you should consider learning Vim:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most servers are accessed via a CLI, so in system administration, you don't necessarily have the luxury of a GUI. But Vim will always be there.&lt;/li&gt;
&lt;li&gt;Vim uses a keyboard-centric approach, as it is designed to be used without a mouse, which can significantly speed up editing tasks once you have learned the keyboard shortcuts. This also makes it faster than GUI tools.&lt;/li&gt;
&lt;li&gt;Vim is suitable for all – beginners and advanced users. Vim supports complex string searches, highlighting searches, and much more. Through plugins, Vim provides extended capabilities to developers and system admins that includes code completion, syntax highlighting, file management, version control, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The three Vim modes
&lt;/h2&gt;

&lt;p&gt;You need to know the 3 operating modes of Vim and how to switch between them. Keystrokes behave differently in each command mode. The three modes are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Command mode.&lt;/li&gt;
&lt;li&gt;Edit mode.&lt;/li&gt;
&lt;li&gt;Visual mode.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Command Mode.
&lt;/h3&gt;

&lt;p&gt;When you start Vim, you land in the command mode by default. This mode allows you to access other modes.&lt;br&gt;
To switch to other modes, you need to be present in the command mode first&lt;/p&gt;
&lt;h3&gt;
  
  
  Edit Mode
&lt;/h3&gt;

&lt;p&gt;This mode allows you to make changes to the file. To enter edit mode, press I while in command mode. &lt;/p&gt;
&lt;h3&gt;
  
  
  Visual mode
&lt;/h3&gt;

&lt;p&gt;This mode allows you to work on a single character, a block of text, or lines of text. Let's break it down into simple steps. Remember, use the below combinations when in command mode.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Shift + V&lt;/code&gt; → Select multiple lines.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Ctrl + V&lt;/code&gt; → Block mode&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V&lt;/code&gt; → Character mode
The visual mode comes in handy when you need to copy and paste or edit lines in bulk.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Extended command mode.
&lt;/h3&gt;

&lt;p&gt;The extended command mode allows you to perform advanced operations like searching, setting line numbers, and highlighting text. We'll cover extended mode in the next section.&lt;/p&gt;
&lt;h2&gt;
  
  
  Shortcuts in Vim: Making Editing Faster
&lt;/h2&gt;

&lt;p&gt;Note: All these shortcuts work in the command mode only.&lt;/p&gt;
&lt;h3&gt;
  
  
  Basic Navigation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;h&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move left&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;j&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;k&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move up&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;l&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move right&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the beginning of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the end of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the beginning of the file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;G&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the end of the file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+d&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move half-page down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+u&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move half-page up&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Editing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;i&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enter insert mode before the cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;I&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enter insert mode at the beginning of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;a&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enter insert mode after the cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;A&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enter insert mode at the end of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;o&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open a new line below the current line and enter insert mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;O&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open a new line above the current line and enter insert mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;x&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete the character under the cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete the current line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;yy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Yank (copy) the current line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;p&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Paste below the cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;P&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Paste above the cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Searching and Replacing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search for a pattern which will take you to its next occurrence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;?&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search for a pattern that will take you to its previous occurrence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;n&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Repeat the last search in the same direction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;N&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Repeat the last search in the opposite direction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;:%s/old/new/g&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Replace all occurrences of &lt;code&gt;old&lt;/code&gt; with &lt;code&gt;new&lt;/code&gt; in the file&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Exiting
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;:w&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Save the file but don't exit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;:q&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quit Vim (fails if there are unsaved changes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:wq&lt;/code&gt; or &lt;code&gt;:x&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Save and quit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;:q!&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quit without saving&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Multiple Windows
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:split&lt;/code&gt; or &lt;code&gt;:sp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Split the window horizontally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:vsplit&lt;/code&gt; or &lt;code&gt;:vsp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Split the window vertically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+w followed by h/j/k/l&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Navigate between split windows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Mastering Nano
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Getting started with Nano: The user-friendly text editor
&lt;/h3&gt;

&lt;p&gt;Nano is a user-friendly text editor that is easy to use and is perfect for beginners. It is pre-installed on most Linux distributions.&lt;br&gt;
To create a new file using Nano, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To start editing an existing file with Nano, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano filename
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  List of key bindings in Nano
&lt;/h2&gt;

&lt;h3&gt;
  
  
  General
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+X&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exit Nano (prompting to save if changes are made)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+O&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Save the file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+R&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read a file into the current file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+G&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Display the help text&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Editing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+K&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cut the current line and store it in the cutbuffer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+U&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Paste the contents of the cutbuffer into the current line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Alt+6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Copy the current line and store it in the cutbuffer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ctrl+J&lt;/td&gt;
&lt;td&gt;Justify the current paragraph&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Navigation
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+A&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the beginning of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+E&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Move to the end of the line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+C&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Display the current line number and file information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+_ (Ctrl+Shift+-)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Go to a specific line (and optionally, column) number&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+Y&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scroll up one page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+V&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scroll down one page&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Search and Replace
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+W&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search for a string (then Enter to search again)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Alt+W&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Repeat the last search but in the opposite direction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+\&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search and replace&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Miscellaneous
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+T&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Invoke the spell checker, if available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete the character under the cursor (does not cut it)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Ctrl+L&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Refresh (redraw) the current screen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Alt+U&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Undo the last operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Alt+E&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Redo the last undone operation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This article introduced Linux from both a conceptual and practical perspective, covering its core components, common distributions, and different ways to access it. We explored essential command-line skills, including file system navigation, system commands, and text editing using Vim and Nano.&lt;/p&gt;

&lt;p&gt;For data engineers, Linux is a critical platform because most data systems and cloud infrastructures run on it. Mastery of Linux enables efficient automation, system management, troubleshooting, and deployment of data pipelines. As a result, Linux is not just a supporting skill, but a foundational requirement for working effectively in modern data engineering environments.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>dataengineering</category>
      <category>opensource</category>
    </item>
    <item>
      <title>An Introduction to Git: Concepts, Commands, and Workflows</title>
      <dc:creator>Edmund Eryuba</dc:creator>
      <pubDate>Sat, 17 Jan 2026 18:35:17 +0000</pubDate>
      <link>https://dev.to/edmund_eryuba/an-introduction-to-git-concepts-commands-and-workflows-1je3</link>
      <guid>https://dev.to/edmund_eryuba/an-introduction-to-git-concepts-commands-and-workflows-1je3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdio90ub4gq7v6ao12t2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdio90ub4gq7v6ao12t2.png" title="Basic Git Workflow" alt="Github workflow diagram" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Git?
&lt;/h2&gt;

&lt;p&gt;Git is a &lt;em&gt;distributed&lt;/em&gt; &lt;strong&gt;version control system&lt;/strong&gt; designed to track changes in source code and manage collaboration among developers. It enables individuals and teams to maintain a complete history of a project, revert to previous versions when needed, and work on the same codebase without conflicts. Because each developer has a full copy of the repository, Git offers high performance, reliability, and the ability to work offline. Git is the core technology behind popular platforms such as GitHub, GitLab, and Bitbucket.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Git Bash
&lt;/h2&gt;

&lt;p&gt;Git Bash is a &lt;strong&gt;command-line interface&lt;/strong&gt; that provides a Unix-like shell environment for using Git on Windows systems. It allows users to run Git commands in a terminal similar to those found on Linux and macOS, making cross-platform development easier. In addition to Git commands, Git Bash supports basic shell operations such as navigating directories, creating files, and managing folders, which are commonly used during development workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Initializing and Managing a Repository
&lt;/h2&gt;

&lt;p&gt;To begin tracking a project with Git, a repository is initialized using the &lt;code&gt;git init&lt;/code&gt; command. This creates a hidden &lt;code&gt;.git&lt;/code&gt; directory that stores all version history and configuration data. Files are then added to the staging area with &lt;code&gt;git add .&lt;/code&gt;, which prepares changes for inclusion in a commit. A commit is created using &lt;code&gt;git commit -m "message"&lt;/code&gt;, capturing a snapshot of the staged changes along with a descriptive message. The &lt;code&gt;git status&lt;/code&gt; command is used regularly to view the current state of files in the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Working with Remote Repositories
&lt;/h2&gt;

&lt;p&gt;Remote repositories enable collaboration by allowing developers to share code through platforms such as GitHub. The git clone command creates a local copy of an existing remote repository. Once changes are made locally, they can be uploaded to the remote repository using git push. To retrieve updates made by others, the git pull command is used, which fetches and merges changes into the local branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Branching and Collaboration Basics
&lt;/h2&gt;

&lt;p&gt;Branching allows developers to work on new features or fixes independently without affecting the main codebase. The &lt;code&gt;git branch&lt;/code&gt; command lists available branches, while &lt;code&gt;git checkout&lt;/code&gt; or &lt;code&gt;git switch&lt;/code&gt; is used to move between them. After completing work on a branch, changes can be merged back into the main branch. Through these features, Git and Git Bash provide a structured and efficient approach to version control and team collaboration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started with the Git Workflow
&lt;/h2&gt;

&lt;p&gt;To get started, it's important to know the basics of how Git works. You may choose to do the actual work within a terminal, an app like GitHub Desktop, or through GitHub.com. Below are the basic git terminologies aligned with GitHub usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repository (Repo)
&lt;/h2&gt;

&lt;p&gt;On GitHub, a repository is an online-hosted version of your project. It stores your code, commit history, issues, pull requests, and documentation. Most collaboration happens around the GitHub repository, which acts as the central reference point for all contributors. A local repository is connected to GitHub using a remote named origin.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin https://github.com/username/repository.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Commit
&lt;/h2&gt;

&lt;p&gt;A commit is a recorded change to the repository. On GitHub, commits are visible in the Commits tab, where collaborators can review what changed and who made the change. Each commit pushed to GitHub becomes part of the shared project history.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "Add login validation"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Branch
&lt;/h2&gt;

&lt;p&gt;In GitHub, branches are heavily used to isolate work. The default branch is usually main, which represents production-ready code. Feature development and bug fixes are done in separate branches, which are later merged into main through pull requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch feature-auth
git switch feature-auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Push
&lt;/h2&gt;

&lt;p&gt;Pushing sends your local commits to GitHub, making them visible to collaborators. Until you push, your commits exist only on your local machine. This step is essential for teamwork and backup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push origin feature-auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pull
&lt;/h2&gt;

&lt;p&gt;Pulling retrieves updates from GitHub and merges them into your local branch. This ensures you are working with the latest version of the project, especially when multiple contributors are involved.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git pull origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pull Request (PR)
&lt;/h2&gt;

&lt;p&gt;A pull request is a GitHub feature used to propose merging one branch into another, usually into main. It allows team members to review code, leave comments, request changes, and approve work before it becomes part of the main codebase. Pull requests are central to GitHub-based collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merge
&lt;/h2&gt;

&lt;p&gt;Merging on GitHub usually happens through a pull request rather than directly via the command line. Once approved, GitHub merges the feature branch into the target branch and records the merge in the project history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fork
&lt;/h2&gt;

&lt;p&gt;A fork is a GitHub-specific feature that creates a personal copy of someone else’s repository under your account. Forks are common in open-source projects where contributors do not have direct access to the main repository. Changes made in a fork are submitted back using a pull request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Collaboration workflow
&lt;/h2&gt;

&lt;p&gt;A typical GitHub collaboration workflow involves cloning the repository, creating a branch, committing changes, pushing the branch to GitHub, opening a pull request, undergoing code review, and merging the changes. GitHub enhances Git by adding visibility, discussion, and project management features around this workflow.&lt;/p&gt;




&lt;p&gt;When using GitHub, beginners should always pull before starting work, create one branch per feature or fix, push changes frequently, use pull requests instead of direct merges, and write clear commit messages. This approach keeps the repository clean, traceable, and easy to collaborate on.&lt;/p&gt;

</description>
      <category>github</category>
      <category>git</category>
      <category>dataengineering</category>
      <category>githubactions</category>
    </item>
  </channel>
</rss>
