<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kasamba Lumwagi</title>
    <description>The latest articles on DEV Community by Kasamba Lumwagi (@kasambx).</description>
    <link>https://dev.to/kasambx</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F911910%2Ff3c7a7a2-7624-461c-9bca-1bab4999a1d3.jpeg</url>
      <title>DEV Community: Kasamba Lumwagi</title>
      <link>https://dev.to/kasambx</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kasambx"/>
    <language>en</language>
    <item>
      <title>Data Engineering 102 ;Introduction to Python for Data Engineering</title>
      <dc:creator>Kasamba Lumwagi</dc:creator>
      <pubDate>Mon, 29 Aug 2022 20:07:00 +0000</pubDate>
      <link>https://dev.to/kasambx/data-engineering-101-introduction-to-python-for-data-engineering-410b</link>
      <guid>https://dev.to/kasambx/data-engineering-101-introduction-to-python-for-data-engineering-410b</guid>
      <description>&lt;p&gt;Commit yourself to learn the basics of python specifically the areas that are related to data Engineering.&lt;br&gt;
You will need Python to be installed in the Operating system that you are using be it Linux, Mac OS, or the most common Windows. These are the topics that you should put more emphasis on:-&lt;/p&gt;

&lt;p&gt;1.&lt;strong&gt;&lt;em&gt;MATH EXPRESSIONS&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Syntax  Math            Meaning&lt;br&gt;
a+b   a+b           addition&lt;br&gt;
a-b   a-b)            subtraction&lt;br&gt;
a*b   a\times b\    multiplication&lt;br&gt;
a/b   a\div b\, division (see note below)&lt;br&gt;
a//b      a\div b\      division - in python 2.2 &amp;amp; abv&lt;br&gt;
a%b   a mod b           modulo&lt;br&gt;
-a    -a            negation&lt;br&gt;
abs(a)    |a|           absolute value&lt;br&gt;
a**b      a^{b}          exponent&lt;br&gt;
math.sqrt           square root&lt;/p&gt;

&lt;p&gt;2.&lt;strong&gt;&lt;em&gt;Strings&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Strings in python are surrounded by either single quotation marks, or double quotation marks.&lt;br&gt;
e.g&lt;/p&gt;

&lt;p&gt;'hello' is the same as "hello".&lt;/p&gt;

&lt;p&gt;You can display a string literal with the print() function:&lt;/p&gt;

&lt;p&gt;for more info :&lt;br&gt;
&lt;a href="https://www.w3schools.com/python/python_strings.asp"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Variables&lt;/em&gt;&lt;/strong&gt;
Variables are containers for storing data values.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A variable is created the moment you first assign a value to it.&lt;/p&gt;

&lt;p&gt;x = 5&lt;br&gt;
 y = "John"&lt;br&gt;
 print(x)&lt;br&gt;
 print(y)&lt;/p&gt;

&lt;p&gt;for more info:&lt;br&gt;
&lt;a href="https://www.w3schools.com/python/python_variables.asp"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4.&lt;strong&gt;&lt;em&gt;Loops&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Python provides three ways for executing the loops&lt;/p&gt;

&lt;p&gt;(a)While Loop: &lt;br&gt;
In python, while loop is used to execute a block of statements repeatedly until a given condition is satisfied. And when the condition becomes false, the line immediately after the loop in the program is executed.&lt;/p&gt;

&lt;p&gt;while expression:&lt;br&gt;
    statement(s)&lt;/p&gt;

&lt;p&gt;(b) &lt;strong&gt;&lt;em&gt;For in Loop:&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
 For loops are used for sequential traversal. For example: traversing a list or string or array etc. In Python, there is no C style for loop, i.e., for (i=0; i&amp;lt;n; i++). There is “for in” loop which is similar to for each loop in other languages. Let us learn how to use for in loop for sequential traversals&lt;/p&gt;

&lt;p&gt;for iterator_var in sequence:&lt;br&gt;
    statements(s)&lt;/p&gt;

&lt;p&gt;(c)&lt;strong&gt;&lt;em&gt;Nested Loops:&lt;/em&gt;&lt;/strong&gt; &lt;br&gt;
Python programming language allows to use one loop inside another loop. Following section shows few examples to illustrate the concept. &lt;br&gt;
Syntax: &lt;/p&gt;

&lt;p&gt;while expression:&lt;br&gt;
   while expression:&lt;br&gt;
      statement(s)&lt;br&gt;
   statement(s)&lt;/p&gt;

&lt;p&gt;5.&lt;strong&gt;&lt;em&gt;Functions.&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
A function is a block of code which only runs when it is called.&lt;/p&gt;

&lt;p&gt;The basic syntax is:&lt;br&gt;
 def my_function():&lt;br&gt;
   print("Hello from a function")&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;6.List, Tuples, Dictionary and sets&lt;/em&gt;&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Its also important to learn how to connect databases with :-&lt;br&gt;
1.BOTO3&lt;br&gt;
2.Psycopg2&lt;br&gt;
3.mysql&lt;/p&gt;

&lt;p&gt;Together with the following essentials;-&lt;br&gt;
1). JSON&lt;br&gt;
2). JSONSCHEMA&lt;br&gt;
3). datetime&lt;br&gt;
4). Pandas&lt;br&gt;
5). Numpy. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data Engineering 101 : Introduction to Data Engineering</title>
      <dc:creator>Kasamba Lumwagi</dc:creator>
      <pubDate>Sat, 20 Aug 2022 19:41:00 +0000</pubDate>
      <link>https://dev.to/kasambx/data-engineering-101-introduction-to-data-engineering-2d95</link>
      <guid>https://dev.to/kasambx/data-engineering-101-introduction-to-data-engineering-2d95</guid>
      <description>&lt;p&gt;Before i thought data engineering and data science were just similar fields, well this changed this week  data engineering entails the making of quality data available from various resources, maintain databases, build data pipelines, query data, data preprocessing, Feature Engineering, Apache hadoop and spark, Develop data workflows using Airflow etc while data science is about  building ML algorithms, building data and ML models and deploy them, have statistical and mathematical knowledge and measure, optimize and improve results.&lt;/p&gt;

&lt;p&gt;Week one down i've got the basic layout of the topics that we will be tackling finding it kinda easy for i now have a clear path on what it will take for me to become a data engineering. The following tools serve greatly in data engineering:&lt;br&gt;
1.Cloud .&lt;br&gt;
(AWS, Azure, GCP) master one but get a good grasp of all. AWS most preferably for its widely used. this will be used as a setup Development Environment to learn building Data Engineering Applications on GCP, AWS, Microsoft Azure.&lt;/p&gt;

&lt;p&gt;2.Programming Language &lt;br&gt;
I would recommend you to use python its faster and contains libraries and frameworks that are best suited for data engineering tasks .&lt;/p&gt;

&lt;p&gt;3.SQL(Structured Query Language )&lt;br&gt;
Makes it easy to manipulate databases. &lt;/p&gt;

&lt;p&gt;4.A TEXT EDITOR &lt;br&gt;
Visual Studio Code &lt;/p&gt;

&lt;p&gt;5.Anaconda &lt;/p&gt;

&lt;p&gt;Anaconda is a distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment&lt;/p&gt;

&lt;p&gt;6.Hadoop&lt;br&gt;
It is an open-source framework that provides a distributed file system for big data sets. This allows users to process and transform big data sets into useful information using the MapReduce Programming Model of data processing.&lt;/p&gt;

&lt;p&gt;7.Pyspark&lt;br&gt;
PySpark is a data analytics tool created by Apache Spark Community for using Python along with Spark. It allows us to work with RDD (Resilient Distributed Dataset) and DataFrames in Python. PySpark has numerous features that make it such an amazing framework and when it comes to deal with the huge amount of data PySpark provides us fast and Real-time processing, flexibility, in-memory computation, and various other features. It is a Python library to use Spark which combines the simplicity of Python language with the efficiency of Spark.&lt;/p&gt;

&lt;p&gt;THE TOPICS TO BE COVERED ARE-&lt;br&gt;
1). Data Engineering&lt;br&gt;
-What’s Data Engineering&lt;br&gt;
-Why Data Engineering&lt;br&gt;
-Data Engineers — ML Engineers — Data Scientists&lt;/p&gt;

&lt;p&gt;2). Python for Data Engineering&lt;br&gt;
-Basic Python with Project&lt;br&gt;
-Advanced Python with Project&lt;br&gt;
-Techniques and Optimization  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scripting and Automation
-Shell Scripting
-CRON
-ETL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;4). Relational Databases and SQL&lt;br&gt;
-RDBMS&lt;br&gt;
-Data Modeling&lt;br&gt;
-Basic SQL&lt;br&gt;
-Advanced SQL&lt;br&gt;
-Big Query&lt;/p&gt;

&lt;p&gt;5). NoSQL Data bases and Map Reduce&lt;br&gt;
-Unstructured Data&lt;br&gt;
-Advanced ETL&lt;br&gt;
-Map-Reduce&lt;br&gt;
-Data Warehouses&lt;br&gt;
-Data API&lt;/p&gt;

&lt;p&gt;6). Data Analysis&lt;br&gt;
-Pandas&lt;br&gt;
-Numpy&lt;br&gt;
-Web Scraping&lt;br&gt;
-Data Visualization&lt;/p&gt;

&lt;p&gt;7). Data Processing Techniques&lt;br&gt;
-Batch Processing : Apache Spark&lt;br&gt;
-Stream Processing — Spart Streaming&lt;br&gt;
-Build Data Pipelines&lt;br&gt;
-Target Databases&lt;br&gt;
-Machine learning Algorithms&lt;/p&gt;

&lt;p&gt;8). Big Data&lt;br&gt;
-Big data basics&lt;br&gt;
-HDFS in detail&lt;br&gt;
-Hadoop Yarn&lt;br&gt;
-Sqoop Hadoop&lt;br&gt;
-Hadoop Yarn&lt;br&gt;
-Hive&lt;br&gt;
-Pig&lt;br&gt;
-Hbase&lt;/p&gt;

&lt;p&gt;9). WorkFlows&lt;br&gt;
-Introduction to Airflow&lt;br&gt;
-Airflow hands on project&lt;/p&gt;

&lt;p&gt;10). Infrastructure&lt;br&gt;
-Docker&lt;br&gt;
-Kubernetes&lt;br&gt;
-Business Intelligence&lt;/p&gt;

&lt;p&gt;11). Cloud Computing&lt;br&gt;
-AWS&lt;br&gt;
-Google Cloud Platform&lt;br&gt;
-Microsoft Azure &lt;/p&gt;

&lt;p&gt;THE FIRE KEEPS ON BURNING.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
