<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ryan Perry</title>
    <description>The latest articles on DEV Community by Ryan Perry (@ryan_perry_aa806d7a49198e).</description>
    <link>https://dev.to/ryan_perry_aa806d7a49198e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F567971%2F1027a605-d1f4-435c-8616-86de19f7acdc.jpg</url>
      <title>DEV Community: Ryan Perry</title>
      <link>https://dev.to/ryan_perry_aa806d7a49198e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ryan_perry_aa806d7a49198e"/>
    <language>en</language>
    <item>
      <title>How to Debug Ruby Performance Issues Using Profiling</title>
      <dc:creator>Ryan Perry</dc:creator>
      <pubDate>Fri, 31 Dec 2021 18:50:14 +0000</pubDate>
      <link>https://dev.to/ryan_perry_aa806d7a49198e/how-to-debug-ruby-performance-issues-using-profiling-2210</link>
      <guid>https://dev.to/ryan_perry_aa806d7a49198e/how-to-debug-ruby-performance-issues-using-profiling-2210</guid>
      <description>&lt;h3&gt;
  
  
  Pyroscope Rideshare Example
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135726784-0c367d3f-c9e5-4e3f-91be-761d4d6d21b1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135726784-0c367d3f-c9e5-4e3f-91be-761d4d6d21b1.gif" alt="ruby_example_architecture_05"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: For documentation on the Pyroscope ruby gem visit &lt;a href="https://pyroscope.io/docs/ruby/" rel="noopener noreferrer"&gt;Pyroscope's website&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;In this example there is a simplified, basic use case of Pyroscope. We simulate a "ride share" company which has three endpoints found in &lt;code&gt;server.rb&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/bike&lt;/code&gt;    : calls the &lt;code&gt;order_bike(search_radius)&lt;/code&gt; function to order a bike&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/car&lt;/code&gt;     : calls the &lt;code&gt;order_car(search_radius)&lt;/code&gt; function to order a car&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/scooter&lt;/code&gt; : calls the &lt;code&gt;order_scooter(search_radius)&lt;/code&gt; function to order a scooter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also simulate running 3 distinct servers in 3 different regions (via &lt;a href="https://github.com/pyroscope-io/pyroscope/blob/main/examples/ruby/docker-compose.yml" rel="noopener noreferrer"&gt;docker-compose.yml&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;us-east-1&lt;/li&gt;
&lt;li&gt;us-west-1&lt;/li&gt;
&lt;li&gt;eu-west-1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the most useful capabilities of Pyroscope is the ability to tag your data in a way that is meaningful to you. In this case, we have two natural divisions, and so we "tag" our data to represent those:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;region&lt;/code&gt;: statically tags the region of the server running the code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vehicle&lt;/code&gt;: dynamically tags the endpoint (similar to how one might tag a controller rails)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tagging static region
&lt;/h2&gt;

&lt;p&gt;Tagging something static, like the &lt;code&gt;region&lt;/code&gt;, can be done in the initialization code in the &lt;code&gt;config.tags&lt;/code&gt; variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pyroscope.configure do |config|
  config.app_name = "ride-sharing-app"
  config.server_address = "http://pyroscope:4040"
  config.tags = {
    "region": ENV["REGION"],                     # Tags the region based of the environment variable
  }
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tagging dynamically within functions
&lt;/h2&gt;

&lt;p&gt;Tagging something more dynamically, like we do for the &lt;code&gt;vehicle&lt;/code&gt; tag can be done inside our utility &lt;code&gt;find_nearest_vehicle()&lt;/code&gt; function using a &lt;code&gt;Pyroscope.tag_wrapper&lt;/code&gt; block&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def find_nearest_vehicle(n, vehicle)
  Pyroscope.tag_wrapper({ "vehicle" =&amp;gt; vehicle }) do
    ...code to find nearest vehicle
  end
end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this block does, is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the tag &lt;code&gt;{ "vehicle" =&amp;gt; "car" }&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;execute the &lt;code&gt;find_nearest_vehicle()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Before the block ends it will (behind the scenes) remove the &lt;code&gt;{ "vehicle" =&amp;gt; "car" }&lt;/code&gt; from the application since that block is complete&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resulting flamgraph / performance results from the example
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Running the example
&lt;/h3&gt;

&lt;p&gt;To run the example run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Pull latest pyroscope image:
docker pull pyroscope/pyroscope:latest

# Run the example project:
docker-compose up --build

# Reset the database (if needed):
# docker-compose down
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this example will do is run all the code mentioned above and also send some mock-load to the 3 servers as well as their respective 3 endpoints. If you select our application: &lt;code&gt;ride-sharing-app.cpu&lt;/code&gt; from the dropdown, you should see a flamegraph that looks like this. After we give 20-30 seconds for the flamegraph to update and then click the refresh button we see our 3 functions at the bottom of the flamegraph taking CPU resources &lt;em&gt;proportional to the size&lt;/em&gt; of their respective &lt;code&gt;search_radius&lt;/code&gt; parameters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where's the performance bottleneck?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139566972-2f04b826-d05c-4307-9b60-4376840001ab.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139566972-2f04b826-d05c-4307-9b60-4376840001ab.jpg" alt="ruby_first_slide_01-01"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first step when analyzing a profile outputted from your application, is to take note of the &lt;em&gt;largest node&lt;/em&gt; which is where your application is spending the most resources. In this case, it happens to be the &lt;code&gt;order_car&lt;/code&gt; function. &lt;/p&gt;

&lt;p&gt;The benefit of using the Pyroscope package, is that now that we can investigate further as to &lt;em&gt;why&lt;/em&gt; the &lt;code&gt;order_car()&lt;/code&gt; function is problematic. Tagging both &lt;code&gt;region&lt;/code&gt; and &lt;code&gt;vehicle&lt;/code&gt; allows us to test two good hypotheses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Something is wrong with the &lt;code&gt;/car&lt;/code&gt; endpoint code&lt;/li&gt;
&lt;li&gt;Something is wrong with one of our regions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To analyze this we can select one or more tags from the "Select Tag" dropdown:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135525308-b81e87b0-6ffb-4ef0-a6bf-3338483d0fc4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135525308-b81e87b0-6ffb-4ef0-a6bf-3338483d0fc4.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Narrowing in on the Issue Using Tags
&lt;/h2&gt;

&lt;p&gt;Knowing there is an issue with the &lt;code&gt;order_car()&lt;/code&gt; function we automatically select that tag. Then, after inspecting multiple &lt;code&gt;region&lt;/code&gt; tags, it becomes clear by looking at the timeline that there is an issue with the &lt;code&gt;us-west-1&lt;/code&gt; region, where it alternates between high-cpu times and low-cpu times.&lt;/p&gt;

&lt;p&gt;We can also see that the &lt;code&gt;mutex_lock()&lt;/code&gt; function is consuming almost 70% of CPU resources during this time period. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139566994-f3f8c2f3-6bc4-40ca-ac4e-8fc862d0c0ad.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139566994-f3f8c2f3-6bc4-40ca-ac4e-8fc862d0c0ad.jpg" alt="ruby_second_slide_01"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing two time periods
&lt;/h2&gt;

&lt;p&gt;Using Pyroscope's "comparison view" we can actually select two different time ranges from the timeline to compare the resulting flamegraphs. The pink section on the left timeline results in the left flamegraph and the blue section on the right represents the right flamegraph.&lt;/p&gt;

&lt;p&gt;When we select a period of low-cpu utilization, and a period of high-cpu utilization we can see that there is clearly different behavior in the &lt;code&gt;mutex_lock()&lt;/code&gt; function where it takes &lt;strong&gt;23% of CPU&lt;/strong&gt; during low-cpu times and &lt;strong&gt;70% of CPU&lt;/strong&gt; during high-cpu times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139567004-96064c5b-570c-48a4-aa9a-07a46a0646a5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139567004-96064c5b-570c-48a4-aa9a-07a46a0646a5.jpg" alt="ruby_third_slide_01-01"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Diff Between Two Flamegraphs
&lt;/h2&gt;

&lt;p&gt;While the difference &lt;em&gt;in this case&lt;/em&gt; is stark enough to see in the comparison view, sometimes the diff between the two flamegraphs is better visualized with them overlayed over each other. Without changing any parameters, we can simply select the diff view tab and see the difference represented in a color-coded diff flamegraph.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139567016-3f738923-2429-4f93-8fe0-cc0ca8c765fd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F139567016-3f738923-2429-4f93-8fe0-cc0ca8c765fd.jpg" alt="ruby_fourth_slide_01-01"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  More use cases
&lt;/h3&gt;

&lt;p&gt;We have been beta testing this feature with several different companies and some of the ways that we've seen companies tag their performance data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tagging controllers&lt;/li&gt;
&lt;li&gt;Tagging regions&lt;/li&gt;
&lt;li&gt;Tagging jobs from a redis or sidekiq queue&lt;/li&gt;
&lt;li&gt;Tagging commits&lt;/li&gt;
&lt;li&gt;Tagging staging / production environments&lt;/li&gt;
&lt;li&gt;Tagging different parts of their testing suites&lt;/li&gt;
&lt;li&gt;Etc...&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Future Roadmap
&lt;/h3&gt;

&lt;p&gt;We would love for you to try out this example and see what ways you can adapt this to your ruby application. Continuous profiling has become an increasingly popular tool for the monitoring and debugging of performance issues (arguably the fourth pillar of observability). &lt;/p&gt;

&lt;p&gt;We'd love to continue to improve this gem by adding things like integrations with popular tools, memory profiling, etc. and we would love to hear what features &lt;em&gt;you would like to see&lt;/em&gt;. &lt;/p&gt;

</description>
      <category>ruby</category>
      <category>performance</category>
      <category>observability</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to Profile Python Code</title>
      <dc:creator>Ryan Perry</dc:creator>
      <pubDate>Wed, 08 Dec 2021 18:13:31 +0000</pubDate>
      <link>https://dev.to/ryan_perry_aa806d7a49198e/how-to-profile-python-code-4f2a</link>
      <guid>https://dev.to/ryan_perry_aa806d7a49198e/how-to-profile-python-code-4f2a</guid>
      <description>&lt;h3&gt;
  
  
  Pyroscope Rideshare Example
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135728737-0c5e54ca-1e78-4c6d-933c-145f441c96a9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135728737-0c5e54ca-1e78-4c6d-933c-145f441c96a9.gif" alt="python_example_architecture_05_00"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: For documentation on the Pyroscope pip package visit &lt;a href="https://pyroscope.io/docs/python/" rel="noopener noreferrer"&gt;our website&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;In this example we show a simplified, basic use case of Pyroscope. We simulate a "ride share" company which has three endpoints found in &lt;code&gt;server.py&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/bike&lt;/code&gt;    : calls the &lt;code&gt;order_bike(search_radius)&lt;/code&gt; function to order a bike&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/car&lt;/code&gt;     : calls the &lt;code&gt;order_car(search_radius)&lt;/code&gt; function to order a car&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/scooter&lt;/code&gt; : calls the &lt;code&gt;order_scooter(search_radius)&lt;/code&gt; function to order a scooter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also simulate running 3 distinct servers in 3 different regions (via &lt;a href="https://github.com/pyroscope-io/pyroscope/blob/main/examples/python/docker-compose.yml" rel="noopener noreferrer"&gt;docker-compose.yml&lt;/a&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;us-east-1&lt;/li&gt;
&lt;li&gt;us-west-1&lt;/li&gt;
&lt;li&gt;eu-west-1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the most useful capabilities of Pyroscope is the ability to tag your data in a way that is meaningful to you. In this case, we have two natural divisions, and so we "tag" our data to represent those:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;region&lt;/code&gt;: statically tags the region of the server running the code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vehicle&lt;/code&gt;: dynamically tags the endpoint (similar to how one might tag a controller rails)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tagging static region
&lt;/h2&gt;

&lt;p&gt;Tagging something static, like the &lt;code&gt;region&lt;/code&gt;, can be done in the initialization code in the &lt;code&gt;config.tags&lt;/code&gt; variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pyroscope.configure(
    app_name       = "ride-sharing-app",
    server_address = "http://pyroscope:4040",
    tags           = {
        "region":   f'{os.getenv("REGION")}', # Tags the region based off the environment variable
    }
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tagging dynamically within functions
&lt;/h2&gt;

&lt;p&gt;Tagging something more dynamically, like we do for the &lt;code&gt;vehicle&lt;/code&gt; tag can be done inside our utility &lt;code&gt;find_nearest_vehicle()&lt;/code&gt; function using a &lt;code&gt;with pyroscope.tag_wrapper()&lt;/code&gt; block&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def find_nearest_vehicle(n, vehicle):
    with pyroscope.tag_wrapper({ "vehicle": vehicle}):
        i = 0
        start_time = time.time()
        while time.time() - start_time &amp;lt; n:
            i += 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this block does, is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the tag &lt;code&gt;{ "vehicle" =&amp;gt; "car" }&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;execute the &lt;code&gt;find_nearest_vehicle()&lt;/code&gt; function&lt;/li&gt;
&lt;li&gt;Before the block ends it will (behind the scenes) remove the &lt;code&gt;{ "vehicle" =&amp;gt; "car" }&lt;/code&gt; from the application since that block is complete&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resulting flamegraph / performance results from the example
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Running the example
&lt;/h3&gt;

&lt;p&gt;To run the example run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Pull latest pyroscope image:
docker pull pyroscope/pyroscope:latest

# Run the example project:
docker-compose up --build

# Reset the database (if needed):
# docker-compose down
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this example will do is run all the code mentioned above and also send some mock-load to the 3 servers as well as their respective 3 endpoints. If you select our application: &lt;code&gt;ride-sharing-app.cpu&lt;/code&gt; from the dropdown, you should see a flamegraph that looks like this (below). After we give 20-30 seconds for the flamegraph to update and then click the refresh button we see our 3 functions at the bottom of the flamegraph taking CPU resources &lt;em&gt;proportional to the size&lt;/em&gt; of their respective &lt;code&gt;search_radius&lt;/code&gt; parameters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where's the performance bottleneck?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135881284-c75a5b65-6151-44fb-a459-c1f9559cb51a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135881284-c75a5b65-6151-44fb-a459-c1f9559cb51a.jpg" alt="python_first_slide_05"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first step when analyzing a profile outputted from your application, is to take note of the &lt;em&gt;largest node&lt;/em&gt; which is where your application is spending the most resources. In this case, it happens to be the &lt;code&gt;order_car&lt;/code&gt; function. &lt;/p&gt;

&lt;p&gt;The benefit of using the Pyroscope package, is that now that we can investigate further as to &lt;em&gt;why&lt;/em&gt; the &lt;code&gt;order_car()&lt;/code&gt; function is problematic. Tagging both &lt;code&gt;region&lt;/code&gt; and &lt;code&gt;vehicle&lt;/code&gt; allows us to test two good hypotheses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Something is wrong with the &lt;code&gt;/car&lt;/code&gt; endpoint code&lt;/li&gt;
&lt;li&gt;Something is wrong with one of our regions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To analyze this we can select one or more tags from the "Select Tag" dropdown:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135525308-b81e87b0-6ffb-4ef0-a6bf-3338483d0fc4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135525308-b81e87b0-6ffb-4ef0-a6bf-3338483d0fc4.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Narrowing in on the Issue Using Tags
&lt;/h2&gt;

&lt;p&gt;Knowing there is an issue with the &lt;code&gt;order_car()&lt;/code&gt; function we automatically select that tag. Then, after inspecting multiple &lt;code&gt;region&lt;/code&gt; tags, it becomes clear by looking at the timeline that there is an issue with the &lt;code&gt;us-west-1&lt;/code&gt; region, where it alternates between high-cpu times and low-cpu times.&lt;/p&gt;

&lt;p&gt;We can also see that the &lt;code&gt;mutex_lock()&lt;/code&gt; function is consuming almost 70% of CPU resources during this time period. &lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805908-ae9a1650-51fc-457a-8c47-0b56e8538b08.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805908-ae9a1650-51fc-457a-8c47-0b56e8538b08.jpg" alt="python_second_slide_05"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing two time periods
&lt;/h2&gt;

&lt;p&gt;Using Pyroscope's "comparison view" we can actually select two different time ranges from the timeline to compare the resulting flamegraphs. The pink section on the left timeline results in the left flamegraph, and the blue section on the right represents the right flamegraph.&lt;/p&gt;

&lt;p&gt;When we select a period of low-cpu utilization and a period of high-cpu utilization we can see that there is clearly different behavior in the &lt;code&gt;mutex_lock()&lt;/code&gt; function where it takes &lt;strong&gt;51% of CPU&lt;/strong&gt; during low-cpu times and &lt;strong&gt;78% of CPU&lt;/strong&gt; during high-cpu times.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805969-55fdee40-fe0c-412d-9ec0-0bbc6a748ed4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805969-55fdee40-fe0c-412d-9ec0-0bbc6a748ed4.jpg" alt="python_third_slide_05"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Diff Between Two Flamegraphs
&lt;/h2&gt;

&lt;p&gt;While the difference &lt;em&gt;in this case&lt;/em&gt; is stark enough to see in the comparison view, sometimes the diff between the two flamegraphs is better visualized with them overlayed over each other. Without changing any parameters, we can simply select the diff view tab and see the difference represented in a color-coded diff flamegraph.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805986-594ffa3b-e735-4f91-875d-4f76fdff2b60.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F23323466%2F135805986-594ffa3b-e735-4f91-875d-4f76fdff2b60.jpg" alt="python_fourth_slide_05"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  More use cases
&lt;/h3&gt;

&lt;p&gt;We have been beta testing this feature with several different companies and some of the ways that we've seen companies tag their performance data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tagging controllers&lt;/li&gt;
&lt;li&gt;Tagging regions&lt;/li&gt;
&lt;li&gt;Tagging jobs from a redis / sidekiq / rabbitmq queue&lt;/li&gt;
&lt;li&gt;Tagging commits&lt;/li&gt;
&lt;li&gt;Tagging staging / production environments&lt;/li&gt;
&lt;li&gt;Tagging different parts of their testing suites&lt;/li&gt;
&lt;li&gt;Etc...&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Future Roadmap
&lt;/h3&gt;

&lt;p&gt;We would love for you to try out this example and see what ways you can adapt this to your python application. Continuous profiling has become an increasingly popular tool for the monitoring and debugging of performance issues (arguably the fourth pillar of observability). &lt;/p&gt;

&lt;p&gt;We'd love to continue to improve this pip package by adding things like integrations with popular tools, memory profiling, etc. and we would love to hear what features &lt;em&gt;you would like to see&lt;/em&gt;. &lt;/p&gt;

</description>
      <category>python</category>
      <category>performance</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>O(log n) makes continuous profiling possible in production</title>
      <dc:creator>Ryan Perry</dc:creator>
      <pubDate>Thu, 11 Mar 2021 03:33:13 +0000</pubDate>
      <link>https://dev.to/ryan_perry_aa806d7a49198e/o-log-n-makes-continuous-profiling-possible-in-production-1377</link>
      <guid>https://dev.to/ryan_perry_aa806d7a49198e/o-log-n-makes-continuous-profiling-possible-in-production-1377</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LF-Z92hR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/110414341-8ad0c000-8044-11eb-9628-7b24e50295b2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LF-Z92hR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/110414341-8ad0c000-8044-11eb-9628-7b24e50295b2.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  O(log n) makes continuous profiling possible
&lt;/h1&gt;

&lt;p&gt;Pyroscope is software that lets you &lt;strong&gt;continuously&lt;/strong&gt; profile your code to debug performance issues down to a line of code. With just a few lines of code it will do the following:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pyroscope Agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Polls the stack trace every 0.01 seconds to see which functions are consuming resources&lt;/li&gt;
&lt;li&gt;Batches that data into 10s blocks and sends it to Pyroscope server&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pyroscope Server
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Receives data from the Pyroscope agent and processes it to be stored efficiently&lt;/li&gt;
&lt;li&gt;Pre-aggregates profiling data for fast querying when data needs to be retrieved&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Storage Efficiency
&lt;/h2&gt;

&lt;p&gt;The challenge with continuous profiling is that if you just take frequent chunks of profiling data, compress it, and store it somewhere, it becomes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Too much data to store efficiently&lt;/li&gt;
&lt;li&gt;Too much data to query quickly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We solve these problems by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Using a combination of tries and trees to compress data efficiently&lt;/li&gt;
&lt;li&gt;Using segment trees to return queries for any timespan of data in O(log n) vs O(n) time complexity&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: Turning the profiling data into a tree
&lt;/h2&gt;

&lt;p&gt;The simplest way to represent profiling data is in a list of string each one representing a stack trace and a number of times this particular stack trace was seen during a profiling session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;server.py&lt;span class="p"&gt;;&lt;/span&gt;fast_function&lt;span class="p"&gt;;&lt;/span&gt;work 2
server.py&lt;span class="p"&gt;;&lt;/span&gt;slow_function&lt;span class="p"&gt;;&lt;/span&gt;work 8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first obvious thing we do is we turn this data into a tree. Conviniently, this represenation also makes it easy to later generate flamegraphs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2uX_sN4B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110378930-0f065180-800b-11eb-9357-71724bc7258c.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2uX_sN4B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110378930-0f065180-800b-11eb-9357-71724bc7258c.gif" alt="raw_vs_flame_graph"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Compressing the stack traces into trees saves space on repeated elements. By using trees, we go from having to store common paths like &lt;code&gt;net/http.request&lt;/code&gt; in the db multiple times to only having to store it 1 time and saving a reference to the location at which it's located. This is fairly standard with profiling libraries since its the lowest hanging fruit when it comes to optimizing storage with profiling data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UQ9aAloi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110227218-e109fb80-7eaa-11eb-81a8-cdf2b3944f1c.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UQ9aAloi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110227218-e109fb80-7eaa-11eb-81a8-cdf2b3944f1c.gif" alt="fast-compress-stack-traces"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Adding tries to store individual symbols more efficiently
&lt;/h2&gt;

&lt;p&gt;So now that we've compressed the raw profiling data by converting into a tree, many of the nodes in this compressed tree contain symbols that also share repeated elements with other nodes. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;net/http.request;net/io.read 100 samples
net/http.request;net/io.write 200 samples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While the &lt;code&gt;net/http.request&lt;/code&gt;, &lt;code&gt;net/io.read&lt;/code&gt;, and &lt;code&gt;net/io.write&lt;/code&gt; functions differ they share the same common ancestor of &lt;code&gt;net/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each of these lines can be serialized using a prefix tree as follows. This means that instead of storing the same prefixes multiple times, we can now just store them once in a trie and access them by storing a pointer to their position in memory:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xaw7jVir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110520399-446e7600-80c3-11eb-84e9-ecac7c0dbf23.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xaw7jVir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110520399-446e7600-80c3-11eb-84e9-ecac7c0dbf23.gif" alt="storage-design-0"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this basic example we save ~80% of space going from 39 bytes to 8 bytes. Typically symbol names are much longer and as the number of symols grows, storage requirements grow logarithmically rather than linearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 + 2: Combining the trees with the tries
&lt;/h2&gt;

&lt;p&gt;In the end, by using a tree to compress the raw profiling data and then using tries to compress the symbols we get the following storage amounts for our simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| data type           | bytes |
|---------------------|-------|
| raw data            | 93    |
| tree                | 58    |
| tree + trie         | 10    |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see this is a 9x improvement for a fairly trivial case. In real world scenarios the compression factor gets much larger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UxAbZcE2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110262208-ca75aa00-7f67-11eb-8f16-0572a4641ee1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UxAbZcE2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110262208-ca75aa00-7f67-11eb-8f16-0572a4641ee1.gif" alt="combine-segment-and-prefix_1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Optimizing for fast reads using Segment Trees
&lt;/h2&gt;

&lt;p&gt;Now that we have a way of storing the data efficiently the next problem that arises is how do we query it efficiently. The way we solve this problem is by pre-aggregating the profiling data and storing it in a special segment tree.&lt;/p&gt;

&lt;p&gt;Every 10s Pyroscope agent sends a chunk of profiling data to the server whuch writes the data into the db with the corresponding timestamp. You'll notice that each write happens once, but is replicated multiple times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Each layer represents a time block of larger units so in this case for every two 10s time blocks, one 20s time block is created. This is to make reading the data more efficient (more on that in a second)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LD3m0ZTh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110259555-196a1200-7f5d-11eb-9223-218bb4b34c6b.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LD3m0ZTh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110259555-196a1200-7f5d-11eb-9223-218bb4b34c6b.gif" alt="segment_tree_animation_1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Turn reads from O(n) to O(log n)
&lt;/h2&gt;

&lt;p&gt;If you don't use segment trees and just write data in 10 second chunks the time complexity for the reads becomes a function of how many 10s units the query asks for. If you want 1 year of data, you'll have to then merge 3,154,000 trees representing the profiling data. By using segment trees you can effictevely decrease the amount of merge operations from O(n) to O(log n).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HAtKBEGI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110277713-b98a6000-7f8a-11eb-942f-3a924a6e0b09.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HAtKBEGI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://user-images.githubusercontent.com/23323466/110277713-b98a6000-7f8a-11eb-942f-3a924a6e0b09.gif" alt="segment_tree_reads"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Help us add more profilers
&lt;/h2&gt;

&lt;p&gt;We spent a lot of time on solving this storage / querying problem because we wanted to make software that can do truly continuous profiling in production without causing too much overhead.&lt;/p&gt;

&lt;p&gt;While Pyroscope currently supports 4 languages, we would love to add more.&lt;/p&gt;

&lt;p&gt;Any sampling profiler that can export data in the "raw" format linked above can become a Profiling agent with Pyroscope. We'd love your help building out profilers for other languages!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[x] Go&lt;/li&gt;
&lt;li&gt;[x] Python&lt;/li&gt;
&lt;li&gt;[x] eBPF&lt;/li&gt;
&lt;li&gt;[x] Ruby&lt;/li&gt;
&lt;li&gt;[ ] &lt;a href="https://github.com/pyroscope-io/pyroscope/issues/94"&gt;Java&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;a href="https://github.com/pyroscope-io/pyroscope/issues/83#issuecomment-784947654"&gt;Rust&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;a href="https://github.com/pyroscope-io/pyroscope/issues/8"&gt;Node&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;a href="https://github.com/pyroscope-io/pyroscope/issues/30"&gt;PHP&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to help contribute or need help setting up Pyroscope heres how you can reach us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Join our &lt;a href="https://pyroscope.io/slack"&gt;Slack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Set up a time to meet with us &lt;a href="https://pyroscope.io/setup-call"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Write an &lt;a href="https://github.com/pyroscope-io/pyroscope/issues"&gt;issue&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://twitter.com/PyroscopeIO"&gt;Twitter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>performance</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Debug Performance issues in Python with Pyroscope (OSS)</title>
      <dc:creator>Ryan Perry</dc:creator>
      <pubDate>Tue, 26 Jan 2021 20:47:58 +0000</pubDate>
      <link>https://dev.to/ryan_perry_aa806d7a49198e/debug-performance-issues-in-python-with-pyroscope-oss-3chb</link>
      <guid>https://dev.to/ryan_perry_aa806d7a49198e/debug-performance-issues-in-python-with-pyroscope-oss-3chb</guid>
      <description>&lt;h1&gt;
  
  
  How to Debug Performance Issues in Python with profilers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Using flame graphs to get to the root of the problem
&lt;/h2&gt;

&lt;p&gt;I know from personal experience that debugging performance issues on Python servers can be incredibly frustrating. Usually, increased traffic or a transient bug would cause end users to report that something was wrong. &lt;/p&gt;

&lt;p&gt;More often than not, it's &lt;em&gt;impossible&lt;/em&gt; to exactly replicate the conditions under which the bug occured, and so I was stuck trying to figure out which part of our code/infrastructure was responsible for the performance issue on our server.&lt;/p&gt;

&lt;p&gt;This article explains how to use flame graphs to continuously profile your code and reveal exactly which lines are responsible for those pesky performance issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You should care about CPU performance
&lt;/h2&gt;

&lt;p&gt;CPU utilization is a metric of application performance commonly used by companies that run their software in the cloud (i.e. on AWS, Google Cloud, etc). &lt;/p&gt;

&lt;p&gt;In fact, Netflix performance architect Brendan Gregg mentioned that decreasing CPU usage by even 1% is seen as an enormous improvement because of the resource savings that occur at that scale. However, smaller companies can see similar benefits when improving performance because regardless of size, CPU is often directly correlated with two very important facets of running software:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How much money you're spending on servers - The more CPU resources you need, the more it costs to run servers&lt;/li&gt;
&lt;li&gt;End-user experience - The more load placed on your server's CPUs, the slower your website or server becomes &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So when you see a graph of CPU utilization that looks like this:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ybxXC8nb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105662478-aa40ce80-5e84-11eb-800a-57735c688fc9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ybxXC8nb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105662478-aa40ce80-5e84-11eb-800a-57735c688fc9.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During the period of 100% CPU utilization, you can assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;End-users are having a frustrating experience (i.e. App / Website is loading slow) &lt;/li&gt;
&lt;li&gt;Server costs will increase after you provision new servers to handle the additional load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is: &lt;strong&gt;which part of the code is responsible for the increase in CPU utilization?&lt;/strong&gt; That's where flame graphs come in!&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use flame graphs to debug performance issues (and save $66,000 on servers)
&lt;/h2&gt;

&lt;p&gt;Let's say the flame graph below represents the timespan that corresponds with the "incident" in the picture above where CPU usage spiked. During this spike, the server's CPUs were spending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;75% of time in &lt;code&gt;foo()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;25% of time in &lt;code&gt;bar()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;$100,000 on server costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uUxqZQVu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105620812-75197b00-5db5-11eb-92af-33e356d9bb42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uUxqZQVu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105620812-75197b00-5db5-11eb-92af-33e356d9bb42.png" alt="pyro_python_blog_example_00-01"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can think of a flame graph like a super detailed pie chart, where:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The width of the flame graph represents 100% of the time range&lt;/li&gt;
&lt;li&gt;Each node represents a function&lt;/li&gt;
&lt;li&gt;The biggest nodes are taking up most of the CPU resources&lt;/li&gt;
&lt;li&gt;Each node is called by the node above it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this case, &lt;code&gt;foo()&lt;/code&gt; is taking up 75% of the total time range, so we can improve &lt;code&gt;foo()&lt;/code&gt; and the functions it calls in order to decrease our CPU usage (and save $$).&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating a flame graph and Table with Pyroscope
&lt;/h2&gt;

&lt;p&gt;To recreate this example with actual code, we'll use Pyroscope - an open-source continuous profiler that was built specifically for debugging performance issues. To simulate the server doing work, I've created a &lt;code&gt;work(duration)&lt;/code&gt; function that simulates doing work for the duration passed in. This way, we can replicate &lt;code&gt;foo()&lt;/code&gt; taking 75% of time and &lt;code&gt;bar()&lt;/code&gt; taking 25% of the time by producing this flame graph from the code below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_wlP29sO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105665338-acf2f200-5e8b-11eb-87b7-d94b7bdda0fc.png" class="article-body-image-wrapper"&gt;&lt;img width="897" alt="foo_75_bar_25_minutes_30" src="https://res.cloudinary.com/practicaldev/image/fetch/s--_wlP29sO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105665338-acf2f200-5e8b-11eb-87b7-d94b7bdda0fc.png"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# where each iteration simulates CPU time
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="c1"&gt;# This would simulate a CPU running for 7.5 seconds
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;75000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This would simulate a CPU running for 2.5 seconds
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, let's say you optimize your code to decrease &lt;code&gt;foo()&lt;/code&gt; time from 75000 to 8000, but left all other portions of the code the same. The new code and flame graph would look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1EQUa438--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105665392-cd22b100-5e8b-11eb-97cc-4dfcceb44cdc.png" class="article-body-image-wrapper"&gt;&lt;img width="935" alt="foo_25_bar_75_minutes_10" src="https://res.cloudinary.com/practicaldev/image/fetch/s--1EQUa438--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105665392-cd22b100-5e8b-11eb-97cc-4dfcceb44cdc.png"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This would simulate a CPU running for 0.8 seconds
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# work(75000)
&lt;/span&gt;    &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This would simulate a CPU running for 2.5 seconds
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Improving &lt;code&gt;foo()&lt;/code&gt; saved us $66,000
&lt;/h2&gt;

&lt;p&gt;Thanks to the flame graphs, we were able to identify immediately that &lt;code&gt;foo()&lt;/code&gt; was the bottleneck in our code. After optimizing it, we were able to significantly decrease our cpu utilization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--d3idjzur--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105666001-1a535280-5e8d-11eb-9407-c63955ba86a1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--d3idjzur--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/23323466/105666001-1a535280-5e8d-11eb-9407-c63955ba86a1.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This means your total CPU utilization decreased 66%. If you were paying $100,000 dollars for your servers, you could now manage the same load for just $34,000. &lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>performance</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
