<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Haji Rufai</title>
    <description>The latest articles on DEV Community by Haji Rufai (@thyalpha001).</description>
    <link>https://dev.to/thyalpha001</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F912296%2F8007456a-d83a-4495-b748-48eaa0f94666.png</url>
      <title>DEV Community: Haji Rufai</title>
      <link>https://dev.to/thyalpha001</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thyalpha001"/>
    <language>en</language>
    <item>
      <title>Introduction to Python for Data Engineering</title>
      <dc:creator>Haji Rufai</dc:creator>
      <pubDate>Thu, 01 Sep 2022 17:40:10 +0000</pubDate>
      <link>https://dev.to/thyalpha001/introduction-to-python-for-data-engineering-2pjp</link>
      <guid>https://dev.to/thyalpha001/introduction-to-python-for-data-engineering-2pjp</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rleY6kh7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gyngv0eo41hkzhk0tdy4.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rleY6kh7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gyngv0eo41hkzhk0tdy4.PNG" alt="Chart on Python basics for data engineering" width="547" height="391"&gt;&lt;/a&gt;&lt;br&gt;
&lt;b&gt;Yes hello!&lt;/b&gt; With increasing interest in data engineering expertise among organizations, we have seen a rise in the demand for data engineers. We have seen Python as one of the main pillars in &lt;a href="https://dev.to/thyalpha001/101-data-engineering-3bpo"&gt;data engineering&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Well, what is Python? Why is it preferred for data engineering? And finally, most importantly, the &lt;strong&gt;scope&lt;/strong&gt; - and how to get started.&lt;/p&gt;







&lt;h1&gt;
  
  
  What is python?
&lt;/h1&gt;

&lt;p&gt;Python is a 4 GL (fourth generation) dynamically typed programming language. Thus, it is high-level and hence easier to learn and understand. &lt;/p&gt;

&lt;p&gt;Python has seen an increase on its use due to its ease of use and flexibility. You'll get a nice introduction to python &lt;a href="https://dev.to/seniorcitizen/introduction-to-python-456f"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why python for data engineering.
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A data engineer's job entails interacting with various data formats. Python is the best choice in these situations. Its standard library facilitates simple management. One of the most popular data file types are csv files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A data engineer is often required to use APIs to retrieve data from databases. The data in such cases is usually stored in JSON (JavaScript Object Notation) format, and Python has a library named JSON-JSON to handle such type of data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The responsibility of a data engineer is not only to obtain data from different sources but also to process it. One of the most popular data process engines is Apache Spark which works with Python DataFrames and even offers an API, PySpark, to build scalable big data projects.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Directed acyclic graphs (DAGs) are used with data engineering tools like Apache Airflow, Apache NiFi, etc. DAGs are nothing more than task specification codes written in Python. Data engineers will therefore be better able to utilise these technologies by learning Python.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, maybe not least, Python has got tonnes of libraries that a data engineer finds useful:&lt;/p&gt;
&lt;h2&gt;
  
  
  Some of the python libraries for data engineering.
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Pandas
&lt;/h3&gt;

&lt;p&gt;Pandas is the Python library popular among data analysts and data scientists. It is equally useful for data engineers, who use it for reading, writing, querying, and manipulating data. Pandas dataframes are extremely compatible with two popular data types: csv and json.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Psycopg2, pyodbc, sqlalchemy
&lt;/h3&gt;

&lt;p&gt;When someone hears the word "database," they often picture information kept in the form of tables with different rows and columns. A relational database (RDB) is the name given to this kind of database.&lt;br&gt;
There are many ways to communicate with these databases, and the majority of them rely on Structured Query Language (SQL). MyPostgreSQL is one such solution that is well-liked by data engineers, and Python has a number of libraries to connect to it, including pyodbc, Sqlalchemy, and psycopg2.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Scientific Python (SciPy)
&lt;/h3&gt;

&lt;p&gt;As its name suggests, SciPy is a Python library that provides a number of functions for rapid mathematical operations. This library allows a data engineer to do mathematical computations on their data for more accurate analysis.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. BeautifulSoup
&lt;/h3&gt;

&lt;p&gt;This well-known library is used for web scraping and data mining. For the purpose of preparing their data, data engineers use this to extract information from websites and work with JSON/HTML data formats.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. Petl
&lt;/h3&gt;

&lt;p&gt;Petl is a Python package for extracting, modifying, and loading tabular data. Data engineers use this library for building ETL (Extract, Transform, and Load) pipelines.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Scope (And Lets get started)
&lt;/h1&gt;

&lt;p&gt;Python is general-purpose programming language that is used in many field from web development, automation , networking, etc. you mention it.&lt;/p&gt;

&lt;p&gt;A data engineer does not need to know every Python there is because each one is a large field on its own, which is another journey not on our roadmap, and thus we do not need to know in detail. &lt;/p&gt;

&lt;p&gt;For example, python for web development (flask and Django), machine learning—well, a data engineer does not need to get deep into machine learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  0. Getting started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install Anaconda on your machine
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.anaconda.com/products/distribution"&gt;Anaconda/download&lt;/a&gt;&lt;br&gt;
No comprising here.&lt;/p&gt;

&lt;p&gt;Anaconda is a free and open source distribution that consists of all the packages and web programs such as Jupyter that you'll need.&lt;/p&gt;

&lt;p&gt;After installing anaconda you'll automatically have Jupyter notebook installed which is a great python IDE which saves your source files as .ipynb,&lt;/p&gt;

&lt;p&gt;Jupyter runs on your browser:&lt;/p&gt;

&lt;h4&gt;
  
  
  Illustration
&lt;/h4&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--654uDWuD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/76b8za2u6jz8s3rw11wd.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--654uDWuD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/76b8za2u6jz8s3rw11wd.PNG" alt="Jupyter notebook demo appearance" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
The above image shows on how Jupyter when opened will look on your machine. You can navigate to the folder where you want to create your .ipynb file &lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eXNoA_kM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pd7jqnw7mzka34ejkuva.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eXNoA_kM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pd7jqnw7mzka34ejkuva.PNG" alt="Jupyter notebook outlook" width="800" height="354"&gt;&lt;/a&gt;&lt;br&gt;
When you are on your desired directory, You then click on &lt;em&gt;'New'&lt;/em&gt; then select by clicking &lt;em&gt;'Python 3 (ipykernel)'&lt;/em&gt; to open your ipynb file. &lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Psaw_ZTS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cs80wr1753wug5yb4d9b.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Psaw_ZTS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cs80wr1753wug5yb4d9b.PNG" alt="Jupyter notebook rename illustration" width="800" height="293"&gt;&lt;/a&gt;&lt;br&gt;
It will open untitled ipynb file (notebook file) which will look like the above picture.&lt;br&gt;
You can rename your notebook file by clicking on the 'untitled' as shown. &lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b6YV5EU5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cfp8qsdfr89t757cb7ma.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b6YV5EU5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cfp8qsdfr89t757cb7ma.PNG" alt="Working with Jupyter notebook" width="800" height="287"&gt;&lt;/a&gt;&lt;br&gt;
Start your stuff there! Oh yes, Press 'shift + Enter' to run your cell(the rectangular input field for your code).&lt;br&gt;
All da best.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Python basics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where to Learn
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://learnpython.org/"&gt;learnpython.org&lt;/a&gt;&lt;br&gt;
It is a nice interactive website and beginner friendly for python language. There are several topics arranged in order, for each topic there is a coding exercise at the end to test you if you have mastered the topic.&lt;/p&gt;

&lt;p&gt;The good part (not the lazy part) there is solution to all exercises!!&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Data structures and algorithms
&lt;/h2&gt;

&lt;p&gt;Learning data structure and algorithms is mandatory for a good data engineer and it will also sharpen you to a better programmer.  This concept should be in your RAM!!&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to Learn
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.udacity.com/course/data-structures-and-algorithms-in-python--ud513"&gt;Google/free/Udacity/data structures and algorithms&lt;/a&gt;&lt;br&gt;
The comprehensive course will make you grasp in depth data structures and algorithms. The good part is that it is also taught in Python. And Yes it is free!&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Python Statistics
&lt;/h2&gt;

&lt;p&gt;A data engineer needs to have a base in mathematics of data and should have a ground on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Descriptive and inferential statistics. &lt;/li&gt;
&lt;li&gt;Probability distributions&lt;/li&gt;
&lt;li&gt;Hypothesis testing&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Where to learn:
&lt;/h3&gt;

&lt;p&gt;Resource: &lt;a href="https://www.fd.cvut.cz/department/k611/PEDAGOG/THO_A/A_soubory/statistics_firstfive.pdf"&gt;Brief/Comprehensive/pdf&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Python Developer
&lt;/h2&gt;

&lt;p&gt;You have been coding for a while. Now you need to learn how to write clean code.&lt;/p&gt;

&lt;p&gt;That is where Python Enhancement Proposal 8 (PEP-8) comes in place. It is a document written in 2001 by Guido van Rossum (the developer), Barry Warsaw, and Nick Coghlan. &lt;/p&gt;

&lt;p&gt;The primary focus of PEP 8 is to improve the readability and consistency of Python code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Code is read more often than it is written.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Guido van Rossum , the creator of Python programming language.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to learn
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://peps.python.org/pep-0008/"&gt;Official/Documenation&lt;/a&gt;&lt;br&gt;
&lt;a href="https://realpython.com/python-pep8/"&gt;realpython.com/pep8&lt;/a&gt;&lt;br&gt;
Yes you need to learn how to write clean code. You will need to how to proper document your functions, methods, etc&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;em&gt;Micro Illustration&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--131Oqjvl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sksogg5jtksitk9hgzmr.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--131Oqjvl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sksogg5jtksitk9hgzmr.PNG" alt="Properly writing assignment code example" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;









&lt;h1&gt;
  
  
  CONCLUSION
&lt;/h1&gt;

&lt;p&gt;Without debate, we can conclude that python is the first choice programming language for a data engineer. Well, congrats till here!&lt;/p&gt;

&lt;h1&gt;
  
  
  REFERENCE:
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://explore-datascience.net/"&gt;https://explore-datascience.net/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.projectpro.io/article/python-for-data-engineering/592"&gt;https://www.projectpro.io/article/python-for-data-engineering/592&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/c/DataEngUncomplicated"&gt;https://www.youtube.com/c/DataEngUncomplicated&lt;/a&gt;&lt;/p&gt;







&lt;p&gt;&lt;em&gt;&lt;strong&gt;Yes, well have you started your track yet?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataengineering</category>
      <category>pythonfordataengineering</category>
      <category>beginners</category>
    </item>
    <item>
      <title>101 DATA ENGINEERING</title>
      <dc:creator>Haji Rufai</dc:creator>
      <pubDate>Sun, 21 Aug 2022 13:27:00 +0000</pubDate>
      <link>https://dev.to/thyalpha001/101-data-engineering-3bpo</link>
      <guid>https://dev.to/thyalpha001/101-data-engineering-3bpo</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yBZDpm7F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l69ydkgdpmkowbb0fp2d.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yBZDpm7F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l69ydkgdpmkowbb0fp2d.PNG" alt="Data pipeline chart" width="800" height="454"&gt;&lt;/a&gt;&lt;br&gt;
Hello there!! You may have heard or not about Data Engineering and Data Engineers. What is it? Who are these data engineers? What do they do? Are they paid well? I mean if yes! what stuff (technology and knowledge) do they have? Maybe I want to become one if its feasible - Then: ..&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;/p&gt;







&lt;h1&gt;
  
  
  What is Data Engineering?
&lt;/h1&gt;

&lt;p&gt;Data engineering is the profession of creating systems for massively scalable data collection, storing, and analysis.&lt;/p&gt;

&lt;p&gt;That's it. Read it again if you are just interested.&lt;/p&gt;

&lt;h3&gt;
  
  
  Relationship and difference with Data Science
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GIR9VEtB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wxq1wzis28tudkq8j2ig.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GIR9VEtB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wxq1wzis28tudkq8j2ig.jpeg" alt="A chart comparing and contrasting Data science and Data Engineer" width="700" height="393"&gt;&lt;/a&gt;&lt;br&gt;
Data science and data engineering are both big data and data analytics fields that have gained momentum in recent years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data science&lt;/strong&gt; is a multi-disciplinary field that involves extracting knowledge from data to solve problems. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data engineers&lt;/strong&gt; are those who are responsible for building the data pipeline that will move data from where it is stored to where it can be used e.g. by Data scientists. &lt;/p&gt;




&lt;h1&gt;
  
  
  What does a Data engineer do?
&lt;/h1&gt;

&lt;p&gt;A data engineer is a person responsible for the data engineering aspects of building and maintaining data systems. &lt;/p&gt;

&lt;p&gt;With a background in machine learning, analytics, and data sciences, the data engineer works to combine and manage data sets while providing its users with a wealth of data visualization tools.&lt;/p&gt;




&lt;h1&gt;
  
  
  Pay??
&lt;/h1&gt;

&lt;p&gt;Hehe, data engineers are among the top paid tech professionals, more than software engineers and data scientists. &lt;/p&gt;

&lt;p&gt;Furthermore, why is it better ? For a job posting, there are around 8x fewer applicants than software engineers.&lt;/p&gt;




&lt;h1&gt;
  
  
  Requirements?
&lt;/h1&gt;

&lt;p&gt;No requirements but a degree in Mathematics, Computer Science, Engineering and related field is a plus.&lt;/p&gt;

&lt;p&gt;Though the majority of those entering the field are software engineers, data analysts, and data scientists, the field is not limited to complete beginners. &lt;/p&gt;




&lt;h1&gt;
  
  
  What is the career path?
&lt;/h1&gt;

&lt;p&gt;There are self-taught data engineers, but the most effective are those who follow a guideline on what and what to cover. There are online courses eg &lt;a href="https://www.udacity.com/course/data-engineer-nanodegree--nd027"&gt;Udacity Data Engineering&lt;/a&gt;, &lt;a href="https://www.edx.org/professional-certificate/ibm-data-engineering?index=product&amp;amp;queryID=f0c5a1093f6466b9ff98f989037626fa&amp;amp;position=10&amp;amp;linked_from=autocomplete"&gt;edx.IBM Professional Data Engineering&lt;/a&gt; etc. most of which are premium (Ouch!).&lt;/p&gt;

&lt;p&gt;That's not the end. You can follow a certain guideline and strictly stick with it, and each topic can be independently found on the internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here are the base foundations that you need
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Python
&lt;/h3&gt;

&lt;p&gt;No compromising here. You need to get started and dig deep into Python. Also, keep in mind that it is Python, not any other programming language. &lt;/p&gt;

&lt;h4&gt;
  
  
  Where to learn
&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://learnpython.org/"&gt;LearnPython.org&lt;/a&gt;&lt;br&gt;
It is a nice interactive website and beginner friendly for python language. There are several topics arranged in order, for each topic there is a coding exercise at the end to test your if you have mastered the topic.&lt;/p&gt;

&lt;p&gt;The good part (not the lazy part) there is solution to all exercises!!&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Structured Query Language (SQL)
&lt;/h3&gt;

&lt;p&gt;A data engineer interacts a lot more with databases than a back-end software engineer and a data scientist. To grasp SQL is mandatory for data engineers.&lt;/p&gt;

&lt;h4&gt;
  
  
  Where to learn
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.w3schools.com/sql/"&gt;W3Schools/SQL&lt;/a&gt;&lt;br&gt;
There is such a large SQL resource at w3school that you may be overwhelmed by it, especially if you are starting. But hey! It is very &lt;em&gt;interactive&lt;/em&gt; and &lt;em&gt;user-friendly&lt;/em&gt; and should be considered 50% of your journey of data engineering database conquest. The good part is that you can practice your SQL skills on their platform, which is very flexible. Take your time , take key notes as you learn. Learn how you learn best and improve on it. Cheers!&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Big Data (Spark and Hadoop)
&lt;/h3&gt;

&lt;p&gt;Let's talk about big data and the trend. The term big data is often used to describe large data sets and data warehouses used by businesses to store and analyze large amounts of data. &lt;/p&gt;

&lt;p&gt;The technologies (framework) used are spark and Hadoop.&lt;/p&gt;

&lt;h4&gt;
  
  
  Where to learn
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.simplilearn.com/learn-hadoop-spark-basics-skillup"&gt;SkillUP by simplilearn&lt;/a&gt;&lt;br&gt;
The above will give you the best dive in even if you are beginner. Yes ,It's free!&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cloud Computing
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What is it?
&lt;/h4&gt;

&lt;p&gt;Cloud computing is the use of computing resources (computers, storage, networking equipment, and applications) that are accessed through a web browser, over the Internet. &lt;/p&gt;

&lt;p&gt;These resources are provided remotely - usually through the Internet - so that users do not have to be at the workplace to get their computing needs met.&lt;/p&gt;

&lt;p&gt;There are several cloud computing providers. Some of them are: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Cloud Platform (GCP)&lt;/li&gt;
&lt;li&gt;Amazon Web Service (AWS)&lt;/li&gt;
&lt;li&gt;Microsoft Azure&lt;/li&gt;
&lt;li&gt;Oracle Cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  What you need to have
&lt;/h4&gt;

&lt;p&gt;You need to master at least one cloud service and have a basic understanding of the others, as different companies use different cloud services.&lt;/p&gt;

&lt;p&gt;My opinion master with AWS first as it is the largest cloud service provider.&lt;/p&gt;

&lt;h4&gt;
  
  
  Where to learn
&lt;/h4&gt;

&lt;p&gt;Again &lt;a href="https://www.simplilearn.com/learn-aws-services-basics-free-course-skillup"&gt;Simplilearn-SkillUp Getting started with AWS fundamental&lt;/a&gt; will give you a headstart especially if you are a beginner. &lt;/p&gt;

&lt;p&gt;It is a total of 4 hours of video divided into well-explained lessons, and it is free. From there, you are now robust enough to go to 999 Data Engineering.&lt;/p&gt;







&lt;h1&gt;
  
  
  How long will it take?
&lt;/h1&gt;

&lt;p&gt;With the right plan and guidelines, giving 30 hrs/week for 6 months, you are ready for it!&lt;/p&gt;







&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Starting the data engineering journey can be easy. But maintaining the quest needs &lt;a href="https://www.youtube.com/watch?v=H14bBuluwB8"&gt;grit&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Have some patience as you begin your journey. For sure, you will face difficulties in grasping some concepts on the way (as you learn and do projects).&lt;/p&gt;

&lt;p&gt;Another thing I want to add is &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Difficulty is relative. With some concepts, you may find it easier and others harder compared to your peers. Push yourself and learn.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Congrats!&lt;/p&gt;









&lt;p&gt;&lt;em&gt;&lt;strong&gt;Have you started your data engineering journey yet?&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
