<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kimeu</title>
    <description>The latest articles on DEV Community by Kimeu (@kimeu22).</description>
    <link>https://dev.to/kimeu22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F918505%2F413f40da-6663-47c9-b978-d35869214580.jpg</url>
      <title>DEV Community: Kimeu</title>
      <link>https://dev.to/kimeu22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kimeu22"/>
    <language>en</language>
    <item>
      <title>Introduction to Data Engineering</title>
      <dc:creator>Kimeu</dc:creator>
      <pubDate>Sun, 04 Sep 2022 16:36:26 +0000</pubDate>
      <link>https://dev.to/kimeu22/introduction-to-data-engineering-5fgm</link>
      <guid>https://dev.to/kimeu22/introduction-to-data-engineering-5fgm</guid>
      <description>&lt;p&gt;Data Engineering is a discipline that entails collecting, translating and validating data for analysis. A good data engineer makes quality data available for analysis and data-driven decision making.There are four disciplines a data engineer should be well aversed with:&lt;br&gt;
Data. There are different types of data file formats for example csv,tsv,json.&lt;br&gt;
Data stores and repository. This include relational and non-relational databases, data lakes and data warehouses &lt;br&gt;
Data pipelines. Entails collecting and gathering data from different sources &lt;br&gt;
Analytics and data driven decision making.&lt;br&gt;
Python language is the preferred programming language for data engineering as it has a wide variety of packages which are easy to import and enhance performance in data wrangling, ETL(Extract, Transform, Load), Feature engineering.&lt;/p&gt;

</description>
      <category>python</category>
    </item>
    <item>
      <title>Introduction to Python for Data Engineering</title>
      <dc:creator>Kimeu</dc:creator>
      <pubDate>Wed, 31 Aug 2022 10:57:09 +0000</pubDate>
      <link>https://dev.to/kimeu22/introduction-to-python-for-data-engineering-55n1</link>
      <guid>https://dev.to/kimeu22/introduction-to-python-for-data-engineering-55n1</guid>
      <description>&lt;p&gt;Python is one of the best programming languages for data analysis due to a variety of  packages e.g Pandas and Numpy, that enable its efficiency. &lt;br&gt;
For one to be an expert in data engineering,he or she needs knowledge in software development and data analysis.&lt;br&gt;
Python works well with data analysis as Python code can be interpreted by Jupyter notebook.&lt;br&gt;
For example, when trying to change a datatype of a column to integer data type&lt;br&gt;
df['colName'].astype(int)&lt;br&gt;
Data analysis is made easier through Jupyter notebook,an app that you can easily perform operations on data to get meaning from a collected dataset as it allows one to import packages.&lt;br&gt;
One has to understand how Jupyter differs from Python data types.&lt;br&gt;
Jupyter notebook stores strings as objects while python stores them as strings.&lt;br&gt;
During data collection, it's advised to use API to get data and not web-scrapping. Reason being, with web scraping the underlying html structure can be changed and one cannot reproduce the same results on performing on the dataset.&lt;br&gt;
To install python packages on any environment use "pip install package-name". To install any packages on a conda environment use "conda install package-name"&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>jupyter</category>
      <category>python</category>
    </item>
  </channel>
</rss>
