<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joan</title>
    <description>The latest articles on DEV Community by Joan (@joanwanjiru).</description>
    <link>https://dev.to/joanwanjiru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F879704%2Ffbf71d6d-395e-4c40-b300-a111cbec37aa.png</url>
      <title>DEV Community: Joan</title>
      <link>https://dev.to/joanwanjiru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joanwanjiru"/>
    <language>en</language>
    <item>
      <title>Made easy: Installing dbt and Building Your First Model 'Haay!'</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Mon, 12 May 2025 19:55:40 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/made-easy-installing-dbt-and-building-your-first-model-haay-1lja</link>
      <guid>https://dev.to/joanwanjiru/made-easy-installing-dbt-and-building-your-first-model-haay-1lja</guid>
      <description>&lt;p&gt;Prequisite: Python and SQL knowledge.&lt;br&gt;
Install python and dbt extension on VS code.&lt;br&gt;
Steps:&lt;br&gt;
open terminal;&lt;br&gt;
&lt;code&gt;cd &amp;lt;your_dir&amp;gt;&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--create python virtual environment
python -m venv dbt_venv

--activate the env on cmd/powershell
.\dbt_venv\Scripts\activate

--to deactivate the venv
deactivate

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Installing dbt; in this case I am using dbt-postgresadapter (otherwise free to use other integrations &lt;a href="https://docs.getdbt.com/docs/core/pip-install" rel="noopener noreferrer"&gt;Install with pip&lt;/a&gt;),&lt;br&gt;
together with dbt core which is an open-source tool that enables data practitioners to transform data and is suitable for users who prefer to manually set up dbt and locally maintain it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;python -m pip install dbt-core dbt-postgres&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;add .dbt in the users home directory, user dbt will create and maintain profiles .yml which is the dbt configuration file(db and user creadentials are stored)&lt;br&gt;
&lt;code&gt;mkdir $home\.dbt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;initialize dbt project&lt;br&gt;
&lt;code&gt;dbt init&lt;/code&gt; and then follow to the command prompts that will appear.&lt;/p&gt;

&lt;p&gt;Navigate to the project folder that was created:&lt;br&gt;
&lt;code&gt;cd  dbt_project&lt;/code&gt;&lt;br&gt;
Verify the connection to your data platform and dbt using:&lt;br&gt;
&lt;code&gt;dbt debug&lt;/code&gt; command. &lt;/p&gt;

&lt;p&gt;Create a dbt model; an sql query that is designed to perform a certain transformation task on the data platform.&lt;br&gt;
It's important to note dbt makes use of CTEs for improved readability and modularity.&lt;/p&gt;

&lt;p&gt;create a .sql file, and making use of CTEs and save.&lt;br&gt;
To run the model use:&lt;br&gt;
&lt;code&gt;dbt run&lt;/code&gt;&lt;br&gt;
which creates a view in your data platform with the same name as the model.&lt;/p&gt;

&lt;p&gt;Its also important to note that at the end of the .yml file,&lt;br&gt;
the default materialization for dbt models is a view and can be updated to a table either at the .yml file or at the model&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6yi9w17jx9t7zcjx6la.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6yi9w17jx9t7zcjx6la.png" alt="Default dbt model materialization" width="800" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliovatehiypddj7fhhy9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fliovatehiypddj7fhhy9.png" alt="Updating materialization on .yml file" width="800" height="162"&gt;&lt;/a&gt;&lt;br&gt;
Updating materialization on the model;&lt;br&gt;
&lt;code&gt;{{ config(materialized = 'table')}}&lt;/code&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Recommendations for Normalization between OLAP and OLTP systems</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Tue, 10 Sep 2024 08:53:23 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/recommendations-for-normalization-between-olap-vs-oltp-systems-1ea9</link>
      <guid>https://dev.to/joanwanjiru/recommendations-for-normalization-between-olap-vs-oltp-systems-1ea9</guid>
      <description>&lt;p&gt;&lt;strong&gt;OLAP (Online Analytical Processing)&lt;/strong&gt; and &lt;strong&gt;OLTP (Online Transaction Processing)&lt;/strong&gt; systems differ due to their distinct purposes and usage patterns. Here’s a breakdown:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Normalization in OLTP Systems&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;OLTP systems focus on daily transactional data operations like inserting, updating, and deleting data quickly. Normalization in OLTP databases is critical to ensure data integrity, eliminate redundancy, and improve data efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendations for OLTP:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High Normalization (3NF and above)&lt;/strong&gt;: OLTP databases should follow a highly normalized structure, often up to the Third Normal Form (3NF) or beyond. This helps reduce data redundancy, ensuring that each piece of information is stored only once. It makes updates efficient and maintains consistency across the system.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1NF (First Normal Form)&lt;/strong&gt;: Ensure that the table has no repeating groups, and each field contains atomic values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2NF (Second Normal Form)&lt;/strong&gt;: All non-key attributes must depend on the primary key, eliminating partial dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3NF (Third Normal Form)&lt;/strong&gt;: Eliminate transitive dependencies, where non-key attributes depend on other non-key attributes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The goal is to make the system efficient for fast transactional operations like insertions and updates while maintaining data consistency.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Normalization in OLAP Systems&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;OLAP systems are designed for complex queries and reporting, where data is analyzed and aggregated over time. The focus is on read-heavy operations like running complex queries for reports and trends, rather than real-time updates or inserts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendations for OLAP:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Denormalization&lt;/strong&gt;: Unlike OLTP, OLAP systems often use denormalized structures. This means merging related tables and duplicating some data for faster querying and easier aggregation. In OLAP, data redundancy is acceptable because the focus is on optimizing read performance, not minimizing storage or maintaining quick updates.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Star Schema&lt;/strong&gt;: This is a common design where a central fact table is surrounded by dimension tables. Each dimension is denormalized to allow quicker joins and easier reporting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snowflake Schema&lt;/strong&gt;: A variation of the star schema, but more normalized. Dimension tables are further divided into additional related tables. This increases the complexity but reduces redundancy, offering a middle ground.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Denormalization helps OLAP systems avoid the need for multiple joins in complex queries, making analysis faster, especially with large datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Differences:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;OLTP Normalization&lt;/th&gt;
&lt;th&gt;OLAP Denormalization&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast, frequent transactional operations&lt;/td&gt;
&lt;td&gt;Complex queries, reporting, and analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Normalization Level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (up to 3NF or higher)&lt;/td&gt;
&lt;td&gt;Low (denormalized, star or snowflake schema)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Redundancy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minimized&lt;/td&gt;
&lt;td&gt;Acceptable to improve query performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple queries involving small datasets&lt;/td&gt;
&lt;td&gt;Complex queries involving large datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Update Frequency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frequent updates and inserts&lt;/td&gt;
&lt;td&gt;Infrequent bulk loading and queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Join Operations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Efficient joins due to normalized structure&lt;/td&gt;
&lt;td&gt;Avoids multiple joins by denormalizing data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why These Differences Matter:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OLTP&lt;/strong&gt;: Normalization is key to ensure consistency and avoid data anomalies, especially when handling frequent updates. It also minimizes storage by eliminating redundant data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OLAP&lt;/strong&gt;: Denormalization is used to optimize read-heavy queries where performance is prioritized. Since updates are less frequent, maintaining multiple copies of data is not a major concern.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OLTP systems&lt;/strong&gt; use &lt;strong&gt;highly normalized structures&lt;/strong&gt; for efficient transaction processing and data integrity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OLAP systems&lt;/strong&gt; use &lt;strong&gt;denormalized structures&lt;/strong&gt; to optimize for complex queries and reporting performance.&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Understanding data engineering with Datacamp</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Wed, 09 Aug 2023 13:07:37 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/understanding-data-engineering-with-datacamp-2ang</link>
      <guid>https://dev.to/joanwanjiru/understanding-data-engineering-with-datacamp-2ang</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data Processing&lt;/strong&gt;: converting raw data into meaningful information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data processing Value:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove unwanted data&lt;/li&gt;
&lt;li&gt;Optimize memory. process and network costs&lt;/li&gt;
&lt;li&gt;Convert data from one type to another&lt;/li&gt;
&lt;li&gt;Organize data&lt;/li&gt;
&lt;li&gt;To fit into a schema/structure &lt;/li&gt;
&lt;li&gt;Increase productivity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How data engineers process data:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data manipulation, cleaning and tidying tasks e.g. dealing with missing values&lt;/li&gt;
&lt;li&gt;Store data in a sanely structured database&lt;/li&gt;
&lt;li&gt;Create views on top of the database tables for easy access of the database&lt;/li&gt;
&lt;li&gt;Normalize the data&lt;/li&gt;
&lt;li&gt;Optimize the performance of the databases e.g. indexing the data for easier retrieve.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Tools used in data processing
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxbs80jg5w3hj5vwe122.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxbs80jg5w3hj5vwe122.png" alt="Tools used in data processing" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Processing&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;can apply to any task listed in data processing.&lt;/li&gt;
&lt;li&gt;Scheduling holds each piece and organize how they work together.&lt;/li&gt;
&lt;li&gt;Runs tasks in a specific order and resolves all dependencies correctly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scheduling data:&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Manually&lt;/em&gt;: manual update of the employee data&lt;br&gt;
&lt;em&gt;Automatically&lt;/em&gt; :Run at a specific time say update employee table daily at 6AM.&lt;br&gt;
&lt;em&gt;Automatically run if a specified condition is met&lt;/em&gt;  known as &lt;em&gt;&lt;strong&gt;sensor Scheduling&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Ingestion&lt;/strong&gt;:&lt;br&gt;
Batches &amp;amp; Streams&lt;br&gt;
Batch processing: Group records at intervals, often cheaper&lt;br&gt;
Steaming: sends individual records right away into the database, new signing in.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tools used in scheduling
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furle1gur5ysynvfujzg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furle1gur5ysynvfujzg2.png" alt="Tools used in scheduling" width="800" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel computing/processing&lt;/strong&gt;&lt;br&gt;
It's the basis of modern data processing tools, necessary because of memory and processing power.&lt;br&gt;
How it works:&lt;br&gt;
Split tasks up into several smaller subtasks&lt;br&gt;
Distribute these subtasks over several computing &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits and risks of parallel computing&lt;/strong&gt;&lt;br&gt;
pros&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extra processing power&lt;/li&gt;
&lt;li&gt;reduced memory footprint
cons&lt;/li&gt;
&lt;li&gt;moving data incurs a cost&lt;/li&gt;
&lt;li&gt;communication time&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloud Computing vs On premises computing
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7z20e6h6z9vdxitig6n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7z20e6h6z9vdxitig6n.png" alt="cloud providers" width="605" height="860"&gt;&lt;/a&gt;&lt;br&gt;
servers on premises: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Incur cost for equipment's&lt;/li&gt;
&lt;li&gt;need space&lt;/li&gt;
&lt;li&gt;electrical and maintenance cost&lt;/li&gt;
&lt;li&gt;enough power for peak moments&lt;/li&gt;
&lt;li&gt;processing power unused at quieter times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Server on the cloud:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pay as you go&lt;/li&gt;
&lt;li&gt;No need for space&lt;/li&gt;
&lt;li&gt;use resources we need an d when we need them&lt;/li&gt;
&lt;li&gt;closed to the user the better latency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Cloud Computing for Data storage&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9arqpppeb8u1pusdbwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9arqpppeb8u1pusdbwb.png" alt="Data storage" width="800" height="431"&gt;&lt;/a&gt;&lt;br&gt;
pros&lt;br&gt;
Database reliability: data replication&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Introduction to Data Engineering in Microsoft Fabric</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Wed, 02 Aug 2023 06:38:11 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/introduction-to-data-engineering-in-microsoft-fabric-421f</link>
      <guid>https://dev.to/joanwanjiru/introduction-to-data-engineering-in-microsoft-fabric-421f</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data engineering&lt;/strong&gt; in Microsoft Fabric enables users to design, build, and maintain infrastructures and systems that enable their organizations to collect, store, process, and analyze large volumes of data.&lt;/p&gt;

&lt;p&gt;Fabric data engineering: enables you to;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create and manage your data using a &lt;strong&gt;lakehouse&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Design &lt;strong&gt;data pipelines&lt;/strong&gt; to copy data into your lakehouse&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;spark job definitions&lt;/strong&gt; to submit batch/streaming job to spark cluster&lt;/li&gt;
&lt;li&gt;use &lt;strong&gt;notebooks&lt;/strong&gt; to write code for ELT processes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd7m077kqqk44c4d0e1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwd7m077kqqk44c4d0e1j.png" alt="Fabric data Engineering homepage" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Lakehouse:&lt;/strong&gt;&lt;br&gt;
Data architectures that enables organizations to store and manage structures data in a single location, using tools and frameworks to process and analyze that data e.g. SQL queries on &lt;br&gt;
the SQL endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is an Apache Spark job definition:&lt;/strong&gt;&lt;br&gt;
These are sets of instructions that define how to execute a job on a &lt;strong&gt;spark cluster&lt;/strong&gt;.&lt;br&gt;
For instance: input/output data source, the transformation and the configuration settings for the spark application.&lt;br&gt;
Spark job definition allows data engineers to submit batch/streaming job to spark cluster, perform transformations on the data hosted in the lakehouse etc. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a notebook:&lt;/strong&gt;&lt;br&gt;
An interactive compute environment that allows users to create and share documents containing live code, equations visualizations, and narrative text. &lt;br&gt;
Users can write code in Python, R, and Scala to perform data ingestion, preparation, analysis, and other data-related tasks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a data pipeline:&lt;/strong&gt;&lt;br&gt;
Series of steps that are used to collect, process, and transform raw data to a format that can be used for analysis and decision-making.&lt;br&gt;
data pipelines are crucial in that they help move data from its source to its destination in a reliable, scalable and efficient way.&lt;/p&gt;

&lt;p&gt;Reference, &lt;a href="https://learn.microsoft.com/en-us/fabric/data-engineering/data-engineering-overview" rel="noopener noreferrer"&gt;Data Engineering in Microsoft Fabric&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SQL Server Recovery Model</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Fri, 07 Oct 2022 12:52:51 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/sql-server-recovery-model-43ca</link>
      <guid>https://dev.to/joanwanjiru/sql-server-recovery-model-43ca</guid>
      <description>&lt;h2&gt;
  
  
  Introduction to SQL Server Recovery Model
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Recovery Model:&lt;/strong&gt; Is a database control property that controls: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How transactions are logged &lt;/li&gt;
&lt;li&gt;Whether the transaction log requires/ allows backing up.&lt;/li&gt;
&lt;li&gt;What kinds of restore operations are available(Simple, Full, Bulk-logged recovery model)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create a &lt;strong&gt;sample DB HR&lt;/strong&gt;, in it create &lt;strong&gt;Table People&lt;/strong&gt; and &lt;strong&gt;insert some values:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Create Database HR
CREATE DATABASE HR;

GO
-- swith the current databse to HR
USE HR;

-- Create Table Poeple in DB HR
CREATE TABLE People(
Id INT IDENTITY PRIMARY KEY,
FristName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
);

--Insert some values into Poeple Table
INSERT INTO People (FristName,LastName)
    Values('John', 'Doe'),
            ('Joan', 'Njeri'),
            ('Jane', 'M'),
            ('Kyle', 'G')
GO 
-- Query all items from Table People
SELECT * FROM People;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To view the recovery model of a database use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USE master;

GO 
/** To view Recovery model for HR DB **/

SELECT name, recovery_model_desc

FROM master.sys.databases

ORDER BY name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt92btt2m5oa3b4bprva.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt92btt2m5oa3b4bprva.png" alt="Image description" width="514" height="178"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;NOTE:&lt;/strong&gt; It is possible to change the recovery model using;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ALTER DATABASE database_name 
SET RECOVERY recovery_model;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, let's try changing the recovery model from &lt;strong&gt;FULL&lt;/strong&gt; to &lt;strong&gt;SIMPLE&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GO 
-- Change Recovery model for HR Database from FULL to SIMPLE

ALTER DATABASE HR
SET RECOVERY SIMPLE;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g80xphr1ol9l1rlzafq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2g80xphr1ol9l1rlzafq.png" alt="Image description" width="532" height="174"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Differences in Recovery Models
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. SIMPLE Recovery Model&lt;/strong&gt;&lt;br&gt;
SQL Server deletes transaction logs from the transaction log files at every check point. Also, this model do not store transaction records therefore making it impossible to use advanced backup strategies to minimize data loss.&lt;br&gt;
Thus, use this model only if the database can be reloaded from other sources e.g. database used for reporting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. FULL Recovery Model&lt;/strong&gt;&lt;br&gt;
Unlike Simple recovery model, in FULL Recovery Model, SQL Server &lt;em&gt;keeps the transaction log files until the BACKUP LOG statement is executed, deleting the transaction logs from the transaction log files&lt;/em&gt;.&lt;br&gt;
Meaning, if BACKUP LOG statement is not run regularly SQL Server keeps all the transaction log files until the transaction log files are full and the database becomes inaccessible. &lt;br&gt;
FULL Recovery model allows you to restore the database at any point in time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Point: &lt;em&gt;Schedule BACKUP LOG statement to run at regular intervals in cases of FULL Recovery Model&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. BULK_LOGGED Recovery Model&lt;/strong&gt;&lt;br&gt;
It has almost similar behaviors to those of &lt;strong&gt;FULL&lt;/strong&gt; but used in bulk-logged operations such as &lt;code&gt;BULK INSERT&lt;/code&gt; of flat files into a database allowing recording of the operations in the transaction log files. Also, it does  not allow you to perform restore of the database at any point in time. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bulk_logged recovery model scenario:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For a periodical bulk data load that uses FULL Recovery model, SET Recovery model to BULK_LOGGED&lt;/li&gt;
&lt;li&gt;Load the data into the DB&lt;/li&gt;
&lt;li&gt;After data load completes, SET back the recovery model to FULL&lt;/li&gt;
&lt;li&gt;Back up the database. 
For more, visit &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/recovery-models-sql-server?view=sql-server-ver16" rel="noopener noreferrer"&gt;Recovery Models (SQL Server)&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>database</category>
      <category>programming</category>
      <category>sql</category>
    </item>
    <item>
      <title>Introduction to Data Structures and Algorithms</title>
      <dc:creator>Joan</dc:creator>
      <pubDate>Mon, 20 Jun 2022 19:20:34 +0000</pubDate>
      <link>https://dev.to/joanwanjiru/data-structures-101-introduction-to-data-structures-and-algorithms-2lhd</link>
      <guid>https://dev.to/joanwanjiru/data-structures-101-introduction-to-data-structures-and-algorithms-2lhd</guid>
      <description>&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Structures&lt;/strong&gt; &lt;br&gt;
Denotes a certain way of organizing, storing and managing data flow to increase efficiency (with &lt;strong&gt;respect&lt;/strong&gt; to time and &lt;strong&gt;memory&lt;/strong&gt;) of a program in a computer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Algorithms&lt;/strong&gt; &lt;br&gt;
A set of instructions to be executed in a certain way &lt;br&gt;
to get the desired output.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Classification of Data Structures
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Primitive Data Structures:&lt;/strong&gt; these are numbers and characters built in a program meaning they can be manipulated by machine level instructions. Ex. &lt;em&gt;integers, characters, Booleans..&lt;/em&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Non-Primitive Data Structures:&lt;/strong&gt; they are derived from primitive data structures thus can not be manipulated by machine level instructions. They form a set of data elements either in homogenous(same data types) or heterogenous(different data types).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next,&lt;br&gt;
Non-Primitive Data Structures are further divided into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Linear data structures&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Elements in a linear data structure maintain a linear relationship among them and although data is arranged in a linear form, arrangement in memory may not be a sequential.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ex. &lt;em&gt;Arrays,&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Non-Linear data structures&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This kind of data structure data elements form a hierarchical relationship among them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ex. &lt;em&gt;Trees and graphs&lt;/em&gt;  &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        Classification of Data Structure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cfzwnvx03p359vqfna.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cfzwnvx03p359vqfna.png" alt="Classification of Data Structures" width="645" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Structures can be of two types:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Static Data Structures:&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The size of this type of structure is fixed meaning data elements can be modified without changing the memory space allocated to it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;e.g. &lt;em&gt;Arrays&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Data Structures:&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This data structure allows changing the size of the memory allocated and contents of the structure can be modified during the operations performed to it or at runtime. e.g. &lt;em&gt;Linked Lists&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Comparison between Static vs Dynamic Data Structures&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Static Data Structures&lt;/th&gt;
&lt;th&gt;Dynamic Data Structures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed memory size&lt;/td&gt;
&lt;td&gt;size can be randomly updated during run time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory allocation done prior to program execution&lt;/td&gt;
&lt;td&gt;Memory allocation done during program execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overflow is not possible to occur since memory allocation is fixed&lt;/td&gt;
&lt;td&gt;Has possibilities Overflow or underflow since memory allocation is dynamic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>python</category>
      <category>algorithms</category>
    </item>
  </channel>
</rss>
