<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nginacloud</title>
    <description>The latest articles on DEV Community by Nginacloud (@nginacloud).</description>
    <link>https://dev.to/nginacloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1026942%2F2d4f6a32-ee1a-48d3-9a88-1b1d9de34357.png</url>
      <title>DEV Community: Nginacloud</title>
      <link>https://dev.to/nginacloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nginacloud"/>
    <language>en</language>
    <item>
      <title>Beginner's Guide to SQL for Data Analysis</title>
      <dc:creator>Nginacloud</dc:creator>
      <pubDate>Sun, 27 Jul 2025 20:37:33 +0000</pubDate>
      <link>https://dev.to/nginacloud/beginners-guide-to-sql-for-data-analysis-27lg</link>
      <guid>https://dev.to/nginacloud/beginners-guide-to-sql-for-data-analysis-27lg</guid>
      <description>&lt;p&gt;In today’s data-driven world, the ability to extract, analyze, and interpret data has become a critical skill across industries. Whether you're in finance, healthcare, marketing, or tech, understanding how to work with data is no longer optional—it's essential. One of the most powerful and accessible tools for data analysis is SQL (Structured Query Language). If you're new to SQL and wondering how it fits into data analysis, this guide is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is SQL?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt; is a programming language used to manage and manipulate relational databases. It allows you to access and work with data stored in tables, making it ideal for querying large datasets efficiently. SQL is the backbone of many popular database systems, including MySQL, PostgreSQL, Microsoft SQL Server, and SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Use SQL for Data Analysis?&lt;/strong&gt;&lt;br&gt;
SQL is a favorite among data analysts for several reasons:&lt;/p&gt;

&lt;p&gt;Simplicity: Its syntax is straightforward and readable, even for non-programmers.&lt;/p&gt;

&lt;p&gt;Efficiency: SQL can process and filter millions of rows in seconds.&lt;/p&gt;

&lt;p&gt;Universality: It works across many database systems.&lt;/p&gt;

&lt;p&gt;Integration: SQL can be used alongside tools like Excel, Python, R, and Power BI.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Started with SQL
&lt;/h2&gt;

&lt;p&gt;To begin analyzing data with SQL, you'll need access to a database. Many free platforms like SQLite, MySQL, or cloud-based environments like Google BigQuery or PostgreSQL on Render are great for practice.&lt;/p&gt;

&lt;p&gt;Here are some fundamental concepts and commands every beginner should know:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. SELECT: Retrieving Data
&lt;/h3&gt;

&lt;p&gt;The SELECT statement is the cornerstone of SQL. It lets you choose specific columns from a table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT first_name, last_name, age FROM customers;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. WHERE: Filtering Records
&lt;/h3&gt;

&lt;p&gt;Use WHERE to filter rows based on conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM orders
WHERE order_date &amp;gt;= '2024-01-01' AND amount &amp;gt; 100;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. ORDER BY: Sorting Results
&lt;/h3&gt;

&lt;p&gt;Sort your results using ORDER BY.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT name, salary FROM employees
ORDER BY salary DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. GROUP BY: Aggregating Data
&lt;/h3&gt;

&lt;p&gt;For summary statistics, use GROUP BY with aggregate functions like COUNT(), SUM(), AVG().&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. JOIN: Combining Tables
&lt;/h3&gt;

&lt;p&gt;Data is often spread across multiple tables. Use JOIN to bring them together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT customers.name, orders.amount
FROM customers
JOIN orders ON customers.id = orders.customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. LIMIT: Restricting Output
&lt;/h3&gt;

&lt;p&gt;If you only want to see a subset of results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM products
LIMIT 10;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical Tips&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Comment your queries: Use -- to explain parts of your SQL queries for future reference.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;SQL vs. Excel for Data Analysis&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;While Excel is familiar and user-friendly, SQL is better suited for large datasets and repeatable, automated analysis. SQL also offers better control over data cleaning, transformation, and aggregation.&lt;/p&gt;

&lt;p&gt;SQL is a must-have tool in a data analyst’s toolkit. Its ability to handle complex queries across large datasets makes it indispensable for anyone seeking to make data-driven decisions. With consistent practice and exploration, you’ll quickly move from writing basic queries to performing advanced analyses and uncovering powerful insights.&lt;/p&gt;

&lt;p&gt;Whether you're analyzing sales performance, customer behavior, or financial trends, SQL gives you the edge to work smarter with data.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>analytics</category>
    </item>
    <item>
      <title>The Ultimate Guide to Data Analytics.</title>
      <dc:creator>Nginacloud</dc:creator>
      <pubDate>Sun, 25 Aug 2024 19:37:52 +0000</pubDate>
      <link>https://dev.to/nginacloud/the-ultimate-guide-to-data-analytics-3o8i</link>
      <guid>https://dev.to/nginacloud/the-ultimate-guide-to-data-analytics-3o8i</guid>
      <description>&lt;p&gt;Data analysis involves a series of steps and methods that help transform raw data into meaningful insights. Forging a data analysis career involves gaining a competitive edge given the challenges in the evolving market using a combination of programming, statistical methods, and real-world applications.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This guide highlights basic processes and examples essential for beginner level data analysis track.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Foundations of Data Analysis
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Data Structures
&lt;/h2&gt;

&lt;p&gt;Data structures are a specific way of organizing data in a specialized format on a computer so that it can be organized, processed, stored and retrieved quickly and effectively, essential for large datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Operations in Data Structures
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Searching - locating a piece inside a specific data structure. This may be done in structures like arrays and lists.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sorting - ordering data elements in a data structure in a certain order; ascending or descending.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Insertion - adding new data to the structure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Updating and deleting - modifying or deleting existing data structure parts.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Data Types
&lt;/h2&gt;

&lt;p&gt;Understanding data types helps determine the kind of operations one can perform on the data. Different data types require different analysis techniques, visualization and data preparation.&lt;/p&gt;

&lt;p&gt;a) Qualitative Data: Represents non-numerical information that describes the qualities or characteristics of a variable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Nominal Data&lt;/em&gt;: Categories without a specific order or ranking (e.g., Gender, Types of Fruits).&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Ordinal Data&lt;/em&gt;: Categories with a defined order or ranking, but without measurable differences between ranks (e.g., Education Level, Customer Satisfaction Ratings).&lt;br&gt;
b) Quantitative Data: Represents numerical values that measure the quantity or magnitude of a variable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Discrete Data&lt;/em&gt;: Countable values (e.g., Number of Students, Cars Sold).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Continuous Data&lt;/em&gt;: Measurable values that can take any number within a range (e.g., Height, Temperature).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;c) Date and Time Data: Specific points in time or durations, crucial for time-based analysis and forecasting.&lt;/p&gt;

&lt;p&gt;d) Compound Data Types: Combines multiple data types within a single dataset or variable to store complex data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Arrays&lt;/em&gt;: Homogeneous data structures for numerical computations.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Lists&lt;/em&gt;: Ordered, mutable collections of elements that can contain different data types.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Tuples&lt;/em&gt;: Ordered, immutable collections, often used for storing related data.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Dictionaries&lt;/em&gt;: Unordered collections of key-value pairs, useful for fast lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Data Collection and Preparation
&lt;/h1&gt;

&lt;p&gt;Data collection involves distinguishing between primary and secondary data sources. Primary data can be collected using web scraping tools like Scrapy, Beautiful Soup, and Selenium, or through APIs. Secondary data is obtained from existing or external databases. &lt;a href="https://github.com/Nginacloud/Webcrawl2" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;scrapy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.chrome.service&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.chrome.options&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Options&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support.ui&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebDriverWait&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;expected_conditions&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;EC&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.common.by&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;By&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapy.http&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HtmlResponse&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Analysis Techniques
&lt;/h2&gt;

&lt;p&gt;Each technique is unique to specific nature of data and objectives one has.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descriptive analysis - this provides a summary of historical data, quantitatively.
&lt;em&gt;Central tendency&lt;/em&gt; (mean, median, mode)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Python&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="c1"&gt;#assuming file name is age
#
&lt;/span&gt;&lt;span class="n"&gt;mean_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#
&lt;/span&gt;&lt;span class="n"&gt;median_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;median_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#
&lt;/span&gt;&lt;span class="n"&gt;mode_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Variability&lt;/em&gt; (range, variance, standard deviation)&lt;br&gt;
&lt;code&gt;SQL&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;variance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Variance_value&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;--std deviation&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;stddev&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Stddev_value&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Frequency distribution&lt;/em&gt; (tables and charts)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;#table
&lt;/span&gt;&lt;span class="n"&gt;freq_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;freq_table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#chart (Histogram)
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edgecolor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency Distribution&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Inferential analysis - makes inferences and predictions about a population based on sample of data.
&lt;em&gt;Hypothesis testing&lt;/em&gt; : t-tests, chi-square tests
&lt;em&gt;Regression analysis&lt;/em&gt; : linear regression
&lt;em&gt;ANOVA&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;f_oneway&lt;/span&gt;

&lt;span class="c1"&gt;# Sample data
&lt;/span&gt;&lt;span class="n"&gt;group1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;group2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;group3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;f_stat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;f_oneway&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;group2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;group3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;F-Statistic:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f_stat&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;P-Value:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Confidence intervals&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;
&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;std_err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std_err&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ppf&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;confidence_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Confidence Interval:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence_interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Exploratory Data Analysis(EDA) - Exploring and identifying patterns, trends, and relationships within the data.
&lt;em&gt;Data visualization&lt;/em&gt; - scatter plots, histograms. box plots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Summary statistics&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Correlation matrices&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Heatmaps&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coolwarm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Correlation Heatmap&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Text analysis - deriving meaningful information from text data; such as keywords, phrases, sentiments or patterns using statistical and machine learning techniques.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Natural language processing(NLP) - A method for analyzing and interpreting human language data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Analysis Process
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Define the objective; what you want to achieve with the analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Collection; from various sources using respective methods.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Cleaning; by handling missing values and inconsistencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exploratory Data Analysis; to understand and discover patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Analysis; applying appropriate analysis methods based on the objectives.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interpret Results; translating to actionable insights and providing recommendations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Visualization and Reporting; to present findings in a clear and accessible way.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Follow this guide to develop a foundational skill set that covers basic aspects of data analysis, from foundational knowledge to techniques and applications. This approach ensures you are well-equipped to tackle real-world data challenges and make impactful data-driven decisions.&lt;/p&gt;

</description>
      <category>dataanalysis</category>
      <category>career</category>
      <category>beginners</category>
      <category>guide</category>
    </item>
    <item>
      <title>Understanding Your Data: The Essentials of Exploratory Data Analysis</title>
      <dc:creator>Nginacloud</dc:creator>
      <pubDate>Sun, 11 Aug 2024 15:56:07 +0000</pubDate>
      <link>https://dev.to/nginacloud/understanding-your-data-the-essentials-of-exploratory-data-analysis-2p39</link>
      <guid>https://dev.to/nginacloud/understanding-your-data-the-essentials-of-exploratory-data-analysis-2p39</guid>
      <description>&lt;h1&gt;
  
  
  What is EDA?
&lt;/h1&gt;

&lt;p&gt;Exploratory data analysis is how best data is manipulated to get the answers one needs. This helps make it easy for data analysts to discover patterns, check assumptions, test a hypothesis or reveal a better understanding of the dataset.&lt;/p&gt;

&lt;h1&gt;
  
  
  Four primary types of EDA
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Univariate non-graphical&lt;/strong&gt;&lt;br&gt;
This type focuses on analyzing a single variable at a time without using visualizations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descriptive Statistics:
Measures like mean, median, mode, variance, standard deviation, and range.&lt;/li&gt;
&lt;li&gt;Frequency Distribution:
Count of occurrences for each value in the dataset.&lt;/li&gt;
&lt;li&gt;Percentiles and Quartiles:
Identifying specific points in the data distribution (e.g., 25th, 50th, and 75th percentiles).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F317epn0i9d35qa8af7y1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F317epn0i9d35qa8af7y1.png" alt="Result" width="548" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, in percentiles;&lt;br&gt;
25th Percentile: The value below which 25% of the data falls.&lt;br&gt;
50th Percentile: The median value&lt;br&gt;
75th Percentile: The value below which 75% of the data falls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Univariate graphical&lt;/strong&gt;&lt;br&gt;
This type also focuses on a single variable but uses visualizations to better understand its distribution. Common visual tools include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Histograms
Show the distribution of a variable by grouping data into bins.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Temp_C&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;kde&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Distribution of Temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Temperature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Frequency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbnckngyn9oge1wm5iba.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbnckngyn9oge1wm5iba.png" alt="Temperature histogram" width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Box Plots
Display the distribution of data based on five summary statistics (minimum, first quartile, median, third quartile, and maximum).&lt;/li&gt;
&lt;li&gt;Density Plots
Smoothed version of a histogram that shows the data distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multivariate non-graphical&lt;/strong&gt;&lt;br&gt;
This type analyzes relationships between two or more variables without visual aids:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlation Analysis
Examining the linear relationship between two variables using correlation coefficients.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;correlation_matrix = df.corr()
print(correlation_matrix)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghg5kbrkctxmef57quax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghg5kbrkctxmef57quax.png" alt="correlation" width="580" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-tabulation
Summarizing data by showing the relationship between categorical variables.&lt;/li&gt;
&lt;li&gt;Covariance
Measuring the extent to which two variables change together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multivariate graphical&lt;/strong&gt;&lt;br&gt;
This type involves visualizing relationships between multiple variables to identify patterns and interactions. Common visual tools include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scatter Plots
Show the relationship between two continuous variables.&lt;/li&gt;
&lt;li&gt;Pair Plots
Provide scatter plots for all possible pairs of variables in the dataset.&lt;/li&gt;
&lt;li&gt;Heatmaps
Display correlation or other matrix-based data, using color to represent values.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;correlation_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coolwarm&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Correlation Matrix&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91bkptfto5ij698cexo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91bkptfto5ij698cexo4.png" alt="Heatmap" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Red - (closer to 1) represent positive correlation &lt;br&gt;
Blue - (closer to -1) represent negative correlation&lt;br&gt;
white shades - little to no correlation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3D Plots
Visualize the relationship between three variables simultaneously.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Tools and Libraries
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Python-Based Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pandas &lt;br&gt;
A powerful data manipulation library that offers tools for data cleaning, aggregation, and simple statistical analysis. It integrates well with other Python libraries for visualizations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Matplotlib &lt;br&gt;
A plotting library for creating static, animated, and interactive visualizations in Python. It’s often used for basic graphs like histograms, scatter plots, and line plots.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Seaborn&lt;br&gt;
Seaborn provides a high-level interface for drawing attractive and informative statistical graphics, such as pair plots, heatmaps, and box plots.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Jupyter Notebooks&lt;/strong&gt;&lt;br&gt;
This allows you to create and share documents containing live code, equations, visualizations, and narrative text. It’s highly flexible for combining code, output, and documentation in one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BI Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tableau 
A popular business intelligence tool that allows for drag-and-drop creation of interactive dashboards, visualizations, and in-depth data analysis.&lt;/li&gt;
&lt;li&gt;Power BI 
Microsoft’s business analytics service that offers powerful data visualization and reporting capabilities, making it a strong tool for EDA in a business context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Excel and Spreadsheet Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microsoft Excel: A widely used tool for data analysis that offers built-in features for EDA, such as pivot tables, descriptive statistics, and basic charts like histograms and scatter plots.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t5k5zegmp4dfs7l3c3a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6t5k5zegmp4dfs7l3c3a.png" alt="Excel" width="644" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>python</category>
      <category>learning</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>The Ultimate Guide to Data Analytics: Techniques and Tools</title>
      <dc:creator>Nginacloud</dc:creator>
      <pubDate>Sat, 03 Aug 2024 19:02:49 +0000</pubDate>
      <link>https://dev.to/nginacloud/the-ultimate-guide-to-data-analytics-techniques-and-tools-216h</link>
      <guid>https://dev.to/nginacloud/the-ultimate-guide-to-data-analytics-techniques-and-tools-216h</guid>
      <description>&lt;p&gt;In today's world, data analytics is not just a tool but a fundamental capability for organizations seeking to stay competitive and make informed decisions. As data continues to grow exponentially, the ability to effectively analyze and interpret this data has become crucial. This guide explores the essential techniques and tools necessary to harness the power of data, enabling organizations to drive strategic decision-making and maintain a competitive edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding data analysis
&lt;/h2&gt;

&lt;p&gt;The application of statistical methods to analyze and interpret data does necessitate application of efficient tools and techniques.&lt;/p&gt;

&lt;p&gt;The data analysis process has structured steps from raw data through to actionable solutions; so,&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the Data Analysis Process/ Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data Collection&lt;/strong&gt; involves gathering data from relevant sources with a focus on ensuring data quality, integrity, and credibility. This step requires selecting reliable data sources and verifying the information's accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Cleaning&lt;/strong&gt; prepares the data for analysis by addressing inconsistencies and errors. This involves removing missing values, correcting inaccuracies, and standardizing data formats to ensure a clear and reliable flow for subsequent analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Correcting data entry errors
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Allice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Davidd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;David&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt; helps in gaining a deeper understanding of the data. Techniques such as data visualization, statistical summaries, and database management are used to explore data distributions and relationships.&lt;br&gt;
&lt;code&gt;This query counts number of requests created per day&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="c1"&gt;-- Aggregate daily counts by month&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;-- Subquery to compute daily counts&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_created&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
          &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;
         &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_created&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;daily_count&lt;/span&gt;
 &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;
 &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;OUTPUT&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx17qlp29e7jnke6mhyp8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx17qlp29e7jnke6mhyp8.png" alt="OUTPUT" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Transformation&lt;/strong&gt; adjusts the data based on the analysis objectives. This might involve normalization, aggregation, or feature extraction to prepare the data for specific analyses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interpretation and Visualization&lt;/strong&gt; focuses on conveying findings in a clear and actionable manner. Using charts, graphs, and summary statistics helps present data insights effectively, making complex information accessible to stakeholders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation of Insights&lt;/strong&gt; translates data findings into actionable solutions or strategies. This step involves developing and executing strategies based on data insights to drive decision-making and achieve organizational goals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Analytics Techniques
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Descriptive Statistics&lt;/strong&gt;&lt;br&gt;
Descriptive Statistics summarizes and describes the main features of a dataset. Key measures of central tendency (mean, median) and variability (standard deviation, variance) are calculated. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt; uncovers patterns, trends, and relationships within the data. Techniques such as data visualization and correlation analysis are used to identify trends and relationships between variables. Simply answering questions and presenting facts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools for Data Analytics
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Programming Languages&lt;/strong&gt; like Python are versatile and come with extensive libraries for data manipulation and machine learning. Notable libraries include Pandas for data manipulation, NumPy for numerical computations, and Scikit-learn for machine learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Visualization Tools&lt;/strong&gt; include Matplotlib, a basic plotting library in Python for creating various visualizations, and Seaborn, which offers advanced and aesthetically pleasing charts. Power BI is another tool for creating interactive reports and dashboards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database Management Systems&lt;/strong&gt; such as SQL (Structured Query Language) are essential for managing and querying relational databases. SQL is also a specialized programming language, crucial for handling large datasets and performing complex queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices: Perfecting the art
&lt;/h2&gt;

&lt;p&gt;Mastery of such is an art in terms of how data is presented and interpreted and perfecting includes;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Effective data visualization&lt;/li&gt;
&lt;li&gt;Narrative crafting&lt;/li&gt;
&lt;li&gt;Attention to detail&lt;/li&gt;
&lt;li&gt;Innovation and creativity and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quality and clarity in data analysis are achieved through continuous practice and staying updated with new advances in tools and techniques. Adhering to best practices ensures successful data analysis process and insightful outcomes.&lt;/p&gt;

</description>
      <category>python</category>
      <category>coding</category>
      <category>data</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Introduction to Python for Data Science</title>
      <dc:creator>Nginacloud</dc:creator>
      <pubDate>Sat, 18 Feb 2023 17:39:07 +0000</pubDate>
      <link>https://dev.to/nginacloud/introduction-to-python-for-data-science-3l03</link>
      <guid>https://dev.to/nginacloud/introduction-to-python-for-data-science-3l03</guid>
      <description>&lt;p&gt;&lt;strong&gt;Python101&lt;/strong&gt;&lt;br&gt;
Python is high-level a programming language created for specific task but can be used across a wide range of domain, &lt;em&gt;general purpose language&lt;/em&gt;.&lt;br&gt;
It has its standard library &lt;em&gt;built-in modules&lt;/em&gt; making it an easy and simple language to learn.&lt;/p&gt;
&lt;h1&gt;
  
  
  Syntax and Semantics in python
&lt;/h1&gt;

&lt;p&gt;Compared to other languages like java, python syntax is written in English making it easier to write, read and understand.&lt;br&gt;
Python has fewer syntactic exceptions and special cases like curly brackets &lt;em&gt;{}&lt;/em&gt; are allowed but rarely used.&lt;br&gt;
&lt;em&gt;Here are the top concepts to master for your data career&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indentation and whitespaces&lt;/strong&gt;&lt;br&gt;
Python uses indentation rather than curly brackets{} to structure its code. Indentation is the spaces at the beginning of a code line.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; if 5 &amp;gt; 2
  print('five is greater than two')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Identifiers&lt;/strong&gt;&lt;br&gt;
These are user defined names used to identify variables, module, class, function or other object.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Rules followed in defining identifiers&lt;/em&gt;&lt;br&gt;
*cannot start with a number&lt;br&gt;
*no spacing &lt;br&gt;
*name can be a letter A to Z, a to z or an underscore(_)&lt;br&gt;
*name can be followed by zero or more letters, underscore or digits (0 to 9)&lt;br&gt;
*case sensitive&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comments&lt;/strong&gt;&lt;br&gt;
Statements used to describe a code. &lt;br&gt;
Hash (#) is used to mark a comment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#This in a comment
 print("To more life!")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;variables&lt;/strong&gt;&lt;br&gt;
Basically, this a container that stores data values.&lt;br&gt;
Created when you assign a value to it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = 2
y = 'word'
 print(x)
 print(y) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;casting&lt;/strong&gt;&lt;br&gt;
Specifying the data type of a variable&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = float(9)  #x will be 9.0
y = str(4)  #y will be '4'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Variables are case-sensitive&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a = 4
A = 9.0   #A will not overwrite a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Rules followed when naming variables&lt;/em&gt;&lt;br&gt;
*Variable names should start with a letter or an underscore (_). *They cannot start with a number.&lt;br&gt;
*Variable names can only contain letters, numbers, and underscores. They cannot contain any other special characters such as !, @, #, $, %, etc.&lt;br&gt;
*Variable names are case sensitive. For example, "myVar" and "myvar" are two different variables.&lt;br&gt;
*Variable names should be descriptive and meaningful. &lt;br&gt;
*If a variable name consists of multiple words, it is recommended to use underscores to separate the words. For example, "first_name" instead of "firstname".&lt;br&gt;
*It is not recommended to use built-in keywords or function names as variable names. For example, "print" is a built-in function in *Python, so it should not be used as a variable name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;String&lt;/strong&gt;&lt;br&gt;
Strings are made unique from integers but surrounding them with single or double quotation marks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print("String")
print('integer')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Booleans values&lt;/strong&gt;&lt;br&gt;
These are mainly known as an expression of True or False&lt;br&gt;
When you run a condition of an if statement, Python returns True or False.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a = 200
b = 33

if b &amp;gt; a:
  print("b is greater than a")
else:
  print("b is not greater than a")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Arithmetic Operations&lt;/strong&gt;&lt;br&gt;
The (+) symbol represents addition.&lt;br&gt;
The (-) symbol represents subtraction&lt;br&gt;
The (&lt;em&gt;) symbol represents multiplication.&lt;br&gt;
The (/) symbol represents division.&lt;br&gt;
The (%) is used to express the modulus- this produces a remainder of the integer division&lt;br&gt;
The (&lt;/em&gt;*) symbol represents an exponent- raises a number to the power of another&lt;br&gt;
The (//) symbol represents floor division- returns the whole number part of the division.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functions&lt;/strong&gt;&lt;br&gt;
This is a block of code which runs when called. It is defined using the &lt;em&gt;def&lt;/em&gt; keyword.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def my_function():
 print("Hello World")
my_function()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Arrays&lt;/strong&gt;&lt;br&gt;
A variable that can hold more than one value at a time.&lt;br&gt;
Python does not have a built-in support for arrays but python lists can be used instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Development Environment
&lt;/h2&gt;

&lt;p&gt;These are software platforms that facilitate to maximize programmer productivity.&lt;br&gt;
They are commonly the Integrated Development Environment (IDEs).&lt;br&gt;
Examples of such are the visual studio code, Jupyter Notebook, Spyder etcetera.&lt;br&gt;
They help programmers code and debug programs easily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why python?
&lt;/h3&gt;

&lt;p&gt;Many frameworks and libraries- saves time and effort in development examples, NumPY and Pandas.&lt;br&gt;
Reliability and speed.&lt;br&gt;
Easy to learn and use- it is the common first-language choice for developers or students.&lt;/p&gt;

&lt;h4&gt;
  
  
  What can python do
&lt;/h4&gt;

&lt;p&gt;Due to python's simplified syntax, it has been adopted by programmers for tasks like;&lt;br&gt;
AI and machine learning&lt;br&gt;
Data Visualization&lt;br&gt;
Programming Applications&lt;br&gt;
Web Development&lt;br&gt;
Game Development&lt;br&gt;
among others.&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>data</category>
    </item>
  </channel>
</rss>
