<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrea_Aline</title>
    <description>The latest articles on DEV Community by Andrea_Aline (@andreaaline).</description>
    <link>https://dev.to/andreaaline</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1081903%2F0caf9297-52f9-4bb9-8c23-0ee664beca63.jpeg</url>
      <title>DEV Community: Andrea_Aline</title>
      <link>https://dev.to/andreaaline</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/andreaaline"/>
    <language>en</language>
    <item>
      <title>Using pandas.cut() in Python for Data Analysis: Creating Number and Date Intervals</title>
      <dc:creator>Andrea_Aline</dc:creator>
      <pubDate>Sat, 13 May 2023 12:22:30 +0000</pubDate>
      <link>https://dev.to/andreaaline/using-pandascut-in-python-for-data-analysis-creating-number-and-date-intervals-708</link>
      <guid>https://dev.to/andreaaline/using-pandascut-in-python-for-data-analysis-creating-number-and-date-intervals-708</guid>
      <description>&lt;p&gt;In this article, we will explore how to use the pandas.cut() method to create number and date intervals for data analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is pandas.cut()?
&lt;/h2&gt;

&lt;p&gt;Python &lt;em&gt;pandas.cut()&lt;/em&gt; is a method in the pandas library that allows you to split a continuous variable into intervals.&lt;/p&gt;

&lt;p&gt;This method creates a new categorical variable based on the bins you specify.&lt;/p&gt;

&lt;p&gt;The bins can be specified as a list of numbers or as a number of evenly spaced intervals.&lt;/p&gt;

&lt;p&gt;This method is commonly used in data analysis to group continuous data into categories or bins. This is useful to create categories for data transformation, time series analysis and to turn data visualizations more informative.&lt;/p&gt;

&lt;p&gt;If you want a deeper understanding about those subjects, I recommend the book &lt;strong&gt;&lt;a href="https://amzn.to/41AM571"&gt;Python for Data Analysis&lt;/a&gt;&lt;/strong&gt;, a definitive guide on how to deal with data using Python. You can find it &lt;strong&gt;&lt;a href="https://amzn.to/41AM571"&gt;here&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now, let’s move to the first example on how to use &lt;em&gt;pandas.cut()&lt;/em&gt; method.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Number Intervals with pandas.cut()
&lt;/h2&gt;

&lt;p&gt;Suppose we have a dataset of student grades, and we want to categorize them into letter grades (A, B, C, D, and F).&lt;/p&gt;

&lt;p&gt;We can do this by creating bins based on the grade ranges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KYC7hQYK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s3u38p7ph435ovtvlada.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KYC7hQYK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s3u38p7ph435ovtvlada.png" alt="Screenshot of code importing pandas library and creating a series called grades" width="769" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let’s create the bins for the grades:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--F8mpHfyA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/40fwbzncvh1wef2lsuwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F8mpHfyA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/40fwbzncvh1wef2lsuwb.png" alt="Series called bins" width="457" height="77"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We want to categorize the grades into the following letter grades: F (below 60), D (60–69), C (70–79), B (80–89), and A (90–100).&lt;/p&gt;

&lt;p&gt;We can achieve this by using the &lt;em&gt;pandas.cut()&lt;/em&gt; method:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4aN32IKC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/67plni1z3up6jcsxhpas.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4aN32IKC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/67plni1z3up6jcsxhpas.png" alt="letter_grades variable created using pandas.cut()" width="720" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resulting variable &lt;em&gt;letter_grades&lt;/em&gt; is a categorical variable with the letter grades for each grade in the dataset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OyCkfpoA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v1dvi97smg7aspbp2elp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OyCkfpoA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v1dvi97smg7aspbp2elp.png" alt="letter_grades variable" width="717" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also sort and group it, if you would like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Qid-s9wn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gpg2fut32fwlpz0rftbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Qid-s9wn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gpg2fut32fwlpz0rftbo.png" alt="letter_grades variable grouped and sorted" width="655" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Date Intervals with pandas.cut()
&lt;/h2&gt;

&lt;p&gt;Now let’s see how to use pandas.cut() to create date intervals.&lt;/p&gt;

&lt;p&gt;Suppose we have a dataset of daily sales, and we want to categorize them into monthly intervals. We can do this by creating bins based on the month ranges.&lt;/p&gt;

&lt;p&gt;First, once more, we need to import the pandas library and create a sample dataset:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--23uNjSi1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2r9sruv02hah0bixjyr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--23uNjSi1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2r9sruv02hah0bixjyr6.png" alt="Import pandas library and create a dataset" width="720" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let’s create the bins for the sales:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--u_0NNG7v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0x0inj0ge6h880fj4xon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--u_0NNG7v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0x0inj0ge6h880fj4xon.png" alt="Create bins for monthly intervals" width="632" height="84"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the labels:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vSs5MGQh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3z2eg253jumej201t0p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vSs5MGQh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3z2eg253jumej201t0p1.png" alt="Create labels for each interval" width="720" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We want to categorize the sales into monthly intervals. We can achieve this by using the &lt;em&gt;pandas.cut()&lt;/em&gt; method:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--plT4sXip--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jws4gwoe5yrxr6pb9xo4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--plT4sXip--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jws4gwoe5yrxr6pb9xo4.png" alt="Categorize sales data into monthly intervals using pandas.cut() method" width="720" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that’s the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LdBXx2aS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hhabumra3ql6x55vjux9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LdBXx2aS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hhabumra3ql6x55vjux9.png" alt="Print the resulting data frame" width="657" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Grouping numbers in intervals is useful to plot concise charts, in this case, using monthly_sales in the X-axis. It turns the chart more compact and easier to read.&lt;/p&gt;

&lt;p&gt;This is crucial when presenting data, as explained in &lt;strong&gt;&lt;a href="https://amzn.to/3HaAV0D"&gt;Storytelling with Data&lt;/a&gt;&lt;/strong&gt;, the definitive handbook on how to communicate effectively with data.&lt;/p&gt;

&lt;p&gt;Find it &lt;strong&gt;&lt;a href="https://amzn.to/3HaAV0D"&gt;here&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, &lt;em&gt;pandas.cut()&lt;/em&gt; is a powerful method in the pandas library that allows you to split a continuous variable into intervals.&lt;/p&gt;

&lt;p&gt;By using this method, you can create categorical variables for data analysis and draw insights from raw data.&lt;/p&gt;

&lt;p&gt;If you want to learn more about data analysis with Python, I highly recommend the following books:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://amzn.to/3H9EDrm"&gt;Python for Data Analysis 3e: Data Wrangling with pandas, NumPy, and Jupyter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://amzn.to/40I9dQ4"&gt;Think Python: How to Think Like a Computer Scientist&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://amzn.to/3oEzL7m"&gt;Learning Python: Powerful Object-Oriented Programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://amzn.to/41Oyk4q"&gt;Time Series Forecasting in Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>analytics</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
