<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amoskinuthia</title>
    <description>The latest articles on DEV Community by Amoskinuthia (@amoskinuthia).</description>
    <link>https://dev.to/amoskinuthia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F852643%2F5ba4254c-a898-4304-886f-31bb83b733b4.png</url>
      <title>DEV Community: Amoskinuthia</title>
      <link>https://dev.to/amoskinuthia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amoskinuthia"/>
    <language>en</language>
    <item>
      <title>Data Engineering Toolbox</title>
      <dc:creator>Amoskinuthia</dc:creator>
      <pubDate>Fri, 16 Sep 2022 15:31:10 +0000</pubDate>
      <link>https://dev.to/amoskinuthia/data-engineering-toolbox-409e</link>
      <guid>https://dev.to/amoskinuthia/data-engineering-toolbox-409e</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Data engineering toolbox&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A data engineer is an IT professional whose responsibility is to ensure that data is available in the right place, secure, and in the required form for analysis. They are referred to as data engineers because their work revolves around designing systems and processes that to collect data from diverse sources and lead them to a central storage i.e data warehouse or a data lake. To do this a data engineer needs several tools and technologies. The tools and the skills vary depending on the amount of data to be handled or processed. A data engineer must work in conjunction with other departments in the company they work for to better understand the requirements of the data they need to work on. Mostly they work with executives, data analysts and data scientists. after understanding the kind of data the company needs the engineer will advise the company on the technology they need to deploy. In this post, I will discuss the various tools an engineer can have in their toolbox and their use cases and options.&lt;br&gt;
&lt;strong&gt;Data engineering skill sets&lt;/strong&gt;&lt;br&gt;
To be able to design systems and solutions for data engineers must be equipped with software development technologies, having a development mindset enables them to use a wide variety of programming languages and even learn new ones easily to build data pipelines through which data passes, in this pipelines the data is transformed and put in the required form before being deposited in the data warehouse or data lake since they have a background. It is impossible to master all the available languages but a solid foundation is enough to learn a new technology on the go. Below is a listing of the technologies in data engineering in job listings in the year 2020 by Jeff Hale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt;&lt;br&gt;
Structured query language(SQL) is a language used when communicating with relational databases. &lt;br&gt;
Relational databases are databases that store related data i.e data organized in preset conditions and relationships where data is fed in tables with rows and columns. Some people have argued that SQL is not one language and they have divided it into data definition language(DDL) – This one deals with creating or modifying the database like creating and altering tables e.g using CREATE and ALTER commands, data manipulation language(DML) – this enables users to query data using commands like SELECT, UPDATE, DELETE, etc and data control language(DCL)- this enables access controls and security using commands like GRAND and REVOKE.&lt;br&gt;
SQL has several that have some features different but advanced from the standard SQL, some of these dialects include ;&lt;br&gt;
PL/SQL – procedural language/SQL&lt;br&gt;
Transact- SQL &lt;br&gt;
PostgreSQL &lt;br&gt;
MySQL&lt;br&gt;
This is a must-have tool for all professionals working with the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt; &lt;br&gt;
This is the most used language in the field of data it is a general purpose, high-level, interpreted programming language. Being a higher-level language it is easier to learn since beginners will not have to learn or understand what happens under the hood when a program is run.&lt;br&gt;
Being a general-purpose language it can be used for quite a variety of domains like web applications development, automation, data engineering, data science, AI, machine learning, software development, mobile applications and so much more.&lt;br&gt;
Being an interpreted language it doesn’t need a compiler and uses an interpreter that reads the source code line by line while it is executing. The main reason why python has become the de facto language for data is because of its simple syntax and its rich third-party libraries that have been developed for data applications.&lt;br&gt;
&lt;strong&gt;NOSQL&lt;/strong&gt;&lt;br&gt;
NoSQL refers to an approach to storing and accessing data that is unstructured unlike in relational databases, NoSQL data is modeled in other forms other than the tabular forms in relational databases, NoSQL is preferred where high scalability and availability are required especially in big data where data is continuously growing. Examples of NoSQL  databases are MongoDB and Cassandra.&lt;br&gt;
&lt;strong&gt;Cloud platforms&lt;/strong&gt;&lt;br&gt;
Due to the huge amount of data being generated each day data engineers must be well of the different cloud technologies available to store data in the cloud there are many cloud platform providers and mastering and or more is an invaluable tool for a data engineer. Amazon Web services(AWS), Microsoft Azure, and Google cloud platforms are the leading in the industry.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Open frameworks *&lt;/em&gt;&lt;br&gt;
There are several data engineering frameworks used to work on big data mastering the following will keep you equipped for data engineering roles;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Apache Spark&lt;/li&gt;
&lt;li&gt; Hadoop&lt;/li&gt;
&lt;li&gt; Kafka&lt;/li&gt;
&lt;li&gt; MapReduce&lt;/li&gt;
&lt;li&gt; Perhaps Hive&lt;/li&gt;
&lt;li&gt; Apache Airflow&lt;/li&gt;
&lt;li&gt; Apache Storm&lt;/li&gt;
&lt;li&gt; Apache SAMOA(Scalable Advanced Massive Online Analysis).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In conclusion, data engineering is an ever-growing and evolving field and new tools are being invented daily to be efficient one has to keep learning to remain upto date. The goal is to develop efficient systems that are stable and reliable in collecting and maintaining data. A solid foundation of the above technologies will keep you ahead and you can always learn other tools on the go.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data Engineering 101</title>
      <dc:creator>Amoskinuthia</dc:creator>
      <pubDate>Sun, 21 Aug 2022 16:51:22 +0000</pubDate>
      <link>https://dev.to/amoskinuthia/data-engineering-101-3911</link>
      <guid>https://dev.to/amoskinuthia/data-engineering-101-3911</guid>
      <description>&lt;p&gt;Data engineering is a very distinct role in the field of IT and is at the top of the ranks in the data roles stack since it lays the foundation for the rest of the professionals like; business analysts, data analysts and data scientists. Precisely data engineering is the process of preparing data for data analysis and general consumption. This done through complex processes that ensure that the needed datasets are collected, transported and stored in the right place and can be accessed by consumers in the right format. Therefore, a data engineer is responsible for deciding the data to be collected, how to collect it, how to transfer and store it securely in the right format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools used in data engineering**
&lt;/h2&gt;

&lt;p&gt;In the recent past big data has continued to grow immensely and it is projected to grow even further, recent projections show that by the year 2025 the world will be producing around 463 exabytes of data those are huge amounts of data, literally 18-digit amounts of data! This amount of data needs special skills and tools to be handled correctly, the commonly used tools and techniques include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Programing using languages such as Python, Java, R, Scala etc. these languages are used to come up with tools such the data pipelines and for writing automation scripts.&lt;/li&gt;
&lt;li&gt;Cloud computing technologies help in storing the data in large amounts as they can handle growing data. Examples of the providers include Microsoft Azure, Amazon web services and Google cloud platform. &lt;/li&gt;
&lt;li&gt;Database management systems that include SQL and NOSQL systems like MySQL, mongo DB etc. these systems are used to store and manipulate data.&lt;/li&gt;
&lt;li&gt;Machine learning – though machine learning is mostly used by data scientists, data engineers need the skills so that they understand the needs of the data scientists so that they can serve them better.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Path to becoming a data engineer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;*&lt;em&gt;A university degree *&lt;/em&gt;&lt;br&gt;
 A university degree in computer science, mathematics and statistics, physics or any other related field is important but not mandatory employers look at the skill set and the ability to solve problems.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Certifications *&lt;/em&gt;&lt;br&gt;
There are a number of certifications that test specific skills and give certificates, most of these certificates are taught by industry experts and they are an added advantage in the job market. Some examples of platforms offering certifications include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Udacity
&lt;/li&gt;
&lt;li&gt; Coursera has a number of them&lt;/li&gt;
&lt;li&gt; Edx&lt;/li&gt;
&lt;li&gt; Udemy
&lt;/li&gt;
&lt;li&gt; Google&lt;/li&gt;
&lt;li&gt; Microsoft&lt;/li&gt;
&lt;li&gt; Aws &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;*&lt;em&gt;Bootcamps and communities *&lt;/em&gt;&lt;br&gt;
These add value since they are made of people practicing data engineering and other related skills.&lt;/p&gt;

&lt;p&gt;After acquiring the necessary skills what’s next? One needs to brand as data engineer in order to secure a job this can be done by creating an online portfolio and include projects that one has undertaken,  host them on personal websites or on GitHub.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Introduction to python for data engineering</title>
      <dc:creator>Amoskinuthia</dc:creator>
      <pubDate>Sun, 24 Apr 2022 14:22:10 +0000</pubDate>
      <link>https://dev.to/amoskinuthia/python-basics-101-32c3</link>
      <guid>https://dev.to/amoskinuthia/python-basics-101-32c3</guid>
      <description>&lt;p&gt;Do you want to build a website? an application? a video game&lt;br&gt;
name it or manipulate data, python got you covered. Python is a multi-purpose high level programming language that can help you develop or do almost anything.&lt;br&gt;
It is easily available for free and open source , it has gained popularity and it is being used by big organizations like Google, Disney and even NASA.&lt;br&gt;
Python has evolved and now we have python 3 which is the latest version. Python is famous among data handlers that is data engineers, data scientists and data analysts, this because it has a wide variety of libraries and modules like Pandas, numpy, scikit-learn,seaborn, matplotlib etc that make data preparation,data processing and data analysis efficient. data engineers use python to create data pipelines and to write script to automate data cleaning amongst other processes&lt;br&gt;
lets try it!&lt;br&gt;
lets try writing our 1st python program and see how simple it is&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(`hello world!`)
output: hello world!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The print statement will be followed by a parentheses that enclose what we want displayed as our output&lt;br&gt;
&lt;strong&gt;python syntax&lt;/strong&gt; &lt;br&gt;
&lt;em&gt;whitespaces and indentation&lt;/em&gt;&lt;br&gt;
Python uses whitespace and indentation to construct the code structure unlike other programming languages that use semicolons to separate statements.&lt;br&gt;
&lt;em&gt;comments&lt;/em&gt;&lt;br&gt;
Comments help us document our code for future reference &lt;br&gt;
we have several ways of writing comments in python &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;using #
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`#this is a comment`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;it is mostly used for single line comments&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;using ***
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`*** this is a multiline comment***`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;em&gt;identifiers&lt;/em&gt;&lt;br&gt;
Identifiers are used to identify variables, functions  and other objects in python.&lt;br&gt;
The name of an identifier should begin with a letter or an _&lt;br&gt;
python is case sensitive and therefore care should be taken when naming them.&lt;br&gt;
it is important to note that python key words should not be used as identifiers&lt;br&gt;
&lt;em&gt;keywords&lt;/em&gt;&lt;br&gt;
Some keywords in python are listed below&lt;br&gt;
False &lt;br&gt;
class&lt;br&gt;&lt;br&gt;
None&lt;br&gt;&lt;br&gt;
True&lt;br&gt;&lt;br&gt;
while&lt;br&gt;
raise&lt;br&gt;
this are words which have a special meaning in python&lt;br&gt;
you can use the code below to list all the keywords&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import keyword 
print(keyword.kwlist) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data types&lt;/strong&gt; &lt;br&gt;
strings &lt;br&gt;
integers&lt;br&gt;
floats&lt;br&gt;
*&lt;em&gt;control flow *&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Boolean and comparisons&lt;/em&gt;&lt;br&gt;
Booleans have two values that is true and false &lt;br&gt;
&lt;em&gt;if statements&lt;/em&gt;&lt;br&gt;
with if statements the condition check and if its true the statement is executed otherwise they are not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x=45
if x&amp;gt;5:
    print ("x is greater than 5")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;else statements&lt;/em&gt;&lt;br&gt;
The else statement can be used to execute statements if the if statements is false.&lt;br&gt;
Same as if statements the code inside the statement needs to be indented.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x=4
if x==5:
 print("yes")
else:
 print("no")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;while loops&lt;/em&gt;&lt;br&gt;
we use while loops to repeat a block of code several times&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
i=1
while i&amp;lt;=5:
 print(i)
 i=i+1
print("finished!")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;_break _&lt;br&gt;
This is used to break a while loop if the conditions we want are matched&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`i=0
while true:
 print(i)
 i=i+1
if i&amp;gt;=5:
 print("breaking")
break`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
  </channel>
</rss>
