boroskoyo

Posted on Aug 11, 2022 • Edited on Aug 31, 2022 • Originally published at Medium

Production Debuggers — 2022 Benchmark Results

#java #benchmark #debug #cloud

Production Debuggers — 2022 Benchmark Results — Part-1

Passive impact of agents under high load

Case1: Passive impact under high load

Story Behind the Research

We develop Sidekick to give developers new abilities for collecting data from their running applications. On our road to making live debugging & observability easier for developers, our performance impact is among the most questioned topics. To answer this question, we decided to make a research to observe how much overhead Sidekick and its competitors Lightrun and Rookout bring to the applications.

This benchmarking research consists of 3 parts.

Case1: Passive impact under high load
Case2: Comparing Sidekick & Lightrun & Rookout load test performance with active tracepoints
Case3: Comparing Sidekick and Rookout performances with active tracepoints with 1000 hit limits under under high load

All the cases will be consisted of load tests and we will perform these tests using JMeter. You check it out here: https://jmeter.apache.org/

To test out the performance impact we have decided to go with the pet clinic app example served by the Spring community itself for a fair comparison. Below you can see the system’s overall design.

You can find our testing repo below if you want to repeat this test yourself. Both pet clinic example and our testing repo including JMeter test JMX files can be found in this repo:
https://github.com/runsidekick/sidekick-load-test

JMeter & Throughput Shaping Timer plugin link:
(https://jmeter.apache.org/)
(https://jmeter-plugins.org/wiki/ThroughputShapingTimer/)

Hardware info:

We will be using AWS EC2 instances with default settings to run. You can find image details below.


"Images": \[   
{  
  "Architecture": "x86_64",  
  "CreationDate": "2022-04-28T21:51:14.000Z",  
  "ImageId": "ami-0ca285d4c2cda3300",  
  "ImageLocation": "amazon/amzn2-ami-kernel-5.10-hvm-2.0.20220426.0-x86_64-gp2",  
  "ImageType": "machine",  
  "Public": true,  
  "OwnerId": "137112412989",  
  "PlatformDetails": "Linux/UNIX",  
  "UsageOperation": "RunInstances",  
  "State": "available",  
  "BlockDeviceMappings": \[  
    {  
      "DeviceName": "/dev/xvda",  
      "Ebs": {  
        "DeleteOnTermination": true,  
        "SnapshotId": "snap-0ce66497b69246e3d",  
        "VolumeSize": 8,  
        "VolumeType": "gp2",  
        "Encrypted": false  
      }  
    }  
  \],  
  "Description": "Amazon Linux 2 Kernel 5.10 AMI 2.0.20220426.0 x86_64 HVM gp2",  
  "EnaSupport": true,  
  "Hypervisor": "xen",  
  "ImageOwnerAlias": "amazon",  
  "Name": "amzn2-ami-kernel-5.10-hvm-2.0.20220426.0-x86_64-gp2",  
  "RootDeviceName": "/dev/xvda",  
  "RootDeviceType": "ebs",  
  "SriovNetSupport": "simple",  
  "VirtualizationType": "hvm"  
}  
\]

There are multiple EC2 instance types in AWS environment. Amazon EC2 C5 instances deliver cost-effective high performance at a low price per compute ratio for running advanced compute-intensive workloads. We have used c5.4xlarge with default configurations for all our cases. You can find the details for this instance below. If you would prefer to learn more about EC2 C5 instances, you can check out the official AWS website.


{  
  "InstanceTypes": \[  
    {  
      "InstanceType": "c5.4xlarge",  
      "CurrentGeneration": true,  
      "FreeTierEligible": false,  
      "SupportedUsageClasses": \[  
        "on-demand",  
        "spot"  
      \],  
      "SupportedRootDeviceTypes": \[  
        "ebs"  
      \],  
      "SupportedVirtualizationTypes": \[  
        "hvm"  
      \],  
      "BareMetal": false,  
      "Hypervisor": "nitro",  
      "ProcessorInfo": {  
        "SupportedArchitectures": \[  
          "x86_64"  
        \],  
        "SustainedClockSpeedInGhz": 3.4  
      },  
      "VCpuInfo": {  
        "DefaultVCpus": 16,  
        "DefaultCores": 8,  
        "DefaultThreadsPerCore": 2,  
        "ValidCores": \[  
          2,  
          4,  
          6,  
          8  
        \],  
        "ValidThreadsPerCore": \[  
          1,  
          2  
        \]  
      },  
      "MemoryInfo": {  
        "SizeInMiB": 32768  
      },  
      "InstanceStorageSupported": false,  
      "EbsInfo": {  
        "EbsOptimizedSupport": "default",  
        "EncryptionSupport": "supported",  
        "EbsOptimizedInfo": {  
          "BaselineBandwidthInMbps": 4750,  
          "BaselineThroughputInMBps": 593.75,  
          "BaselineIops": 20000,  
          "MaximumBandwidthInMbps": 4750,  
          "MaximumThroughputInMBps": 593.75,  
          "MaximumIops": 20000  
        },  
        "NvmeSupport": "required"  
      },  
      "NetworkInfo": {  
        "NetworkPerformance": "Up to 10 Gigabit",  
        "MaximumNetworkInterfaces": 8,  
        "MaximumNetworkCards": 1,  
        "DefaultNetworkCardIndex": 0,  
        "NetworkCards": \[  
          {  
            "NetworkCardIndex": 0,  
            "NetworkPerformance": "Up to 10 Gigabit",  
            "MaximumNetworkInterfaces": 8  
          }  
        \],  
        "Ipv4AddressesPerInterface": 30,  
        "Ipv6AddressesPerInterface": 30,  
        "Ipv6Supported": true,  
        "EnaSupport": "required",  
        "EfaSupported": false  
      },  
      "PlacementGroupInfo": {  
        "SupportedStrategies": \[  
          "cluster",  
          "partition",  
          "spread"  
        \]  
      },  
      "HibernationSupported": true,  
      "BurstablePerformanceSupported": false,  
      "DedicatedHostsSupported": true,  
      "AutoRecoverySupported": true  
    }  
  \]  
}

For the databases we will be using db.t2.micro MySQLinstances. T2 instances are burstable general-purpose performance instances that provide a baseline level of CPU performance with the ability to burst above the baseline. T2 instances are a good choice for a variety of database workloads including micro-services and test and staging databases. General details for t2.micro instance hardware info is shared below. For getting more information, you can visit the official AWS website.


{  
  "InstanceTypes": \[  
    {  
      "InstanceType": "t2.micro",  
      "CurrentGeneration": true,  
      "FreeTierEligible": true,  
      "SupportedUsageClasses": \[  
        "on-demand",  
        "spot"  
      \],  
      "SupportedRootDeviceTypes": \[  
        "ebs"  
      \],  
      "SupportedVirtualizationTypes": \[  
        "hvm"  
      \],  
      "BareMetal": false,  
      "Hypervisor": "xen",  
      "ProcessorInfo": {  
        "SupportedArchitectures": \[  
          "i386",  
          "x86_64"  
        \],  
        "SustainedClockSpeedInGhz": 2.5  
      },  
      "VCpuInfo": {  
        "DefaultVCpus": 1,  
        "DefaultCores": 1,  
        "DefaultThreadsPerCore": 1  
      },  
      "MemoryInfo": {  
        "SizeInMiB": 1024  
      },  
      "InstanceStorageSupported": false,  
      "EbsInfo": {  
        "EbsOptimizedSupport": "unsupported",  
        "EncryptionSupport": "supported",  
        "NvmeSupport": "unsupported"  
      },  
      "NetworkInfo": {  
        "NetworkPerformance": "Low to Moderate",  
        "MaximumNetworkInterfaces": 2,  
        "MaximumNetworkCards": 1,  
        "DefaultNetworkCardIndex": 0,  
        "NetworkCards": \[  
          {  
            "NetworkCardIndex": 0,  
            "NetworkPerformance": "Low to Moderate",  
            "MaximumNetworkInterfaces": 2  
          }  
        \],  
        "Ipv4AddressesPerInterface": 2,  
        "Ipv6AddressesPerInterface": 2,  
        "Ipv6Supported": true,  
        "EnaSupport": "unsupported",  
        "EfaSupported": false  
      },  
      "PlacementGroupInfo": {  
        "SupportedStrategies": \[  
          "partition",  
          "spread"  
        \]  
      },  
      "HibernationSupported": true,  
      "BurstablePerformanceSupported": true,  
      "DedicatedHostsSupported": false,  
      "AutoRecoverySupported": true  
    }  
  \]  
}

The case:

In this case, we are investigating the passive impact of Java Agents of production debuggers which we took in our scope.

Agents in this case:

Agentless: Agentless version of the pet clinic app which we will use as a reference index for our comparison.
Sidekick: https://www.runsidekick.com
Rookout: https://www.rookout.com
Lightrun: https://www.lightrun.com

For all agents we will have separate EC2 instances and we will be using JMeter for all our tests. For each case only dependent variable will be the agent.

We will make same amount of requests to the same endpoints for each setup and observe the impact of each agent via comparing their latencies and throughputs.

Below you can find our JMeter setup for the load test:

The pattern above(shaping-timer-load) will be repeated for each setup. You can download the .jmx file from here: https://github.com/runsidekick/sidekick-load-test/blob/master/petclinic-app/src/test/jmeter/petclinic_test_plan.jmx

Passive Impact Benchmark Results

Agentless (Ref):

Reference Statistics

Lightrun:

Lightrun Statistics

Rookout:

Rookout Statistics

Sidekick:

Sidekick Statistics

Results Summary:

Statistics comparison table

Observing that the least amount of Transactions/s is 982.30 and it is only 1.14% lower than the reference value we can see how low impact is done when agents are in their passive state. Furthermore, Min & Median latency values are almost identical and we can use graphs to observe Max values are ususally one time occurrences that does not represent a negative outcome.

Conclusion:

All 3 alternatives and the agentless instance performed almost identically. This means you can ship your applications with any live debugger agent and don’t worry about performance losses.

Bearing in mind that this performance research is done under the constraints of idle positions. To be more specific, we plugged in the agents and did not use the products yet. We did not put any tracepoints and sent any requests from the agents.

In this research, we came to the conclusion that neither the Sidekick agent nor the Rookout, and Lightrun agents bring very little negligible overhear compared to the application without any agent plugged.

This means no performance loss is gained using these 3 Java agents in the idle situation. In another perspective, production debuggers differ from APMs or Error/bug tracking tool in means of performance.

Sidekick is going open source to allow self-hosting and make live debugging more accessible.

Subscribe and get the latest news from Sidekick Open-Source here:

https://www.producthunt.com/upcoming/sidekick-open-source-collect-traces-and-generate-logs-on-demand-without-stopping-redeploying-your-applications

DEV Community