DEV Community: Akshay Siwal

Interview Question: How to Analyze iostat Output

Akshay Siwal — Tue, 17 Dec 2024 21:04:54 +0000

Scenario: Diagnosing Disk I/O Latency

You suspect that a disk is experiencing high latency during peak traffic. To monitor real-time disk performance, you run:

iostat -y -x 5 3

Output:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
           1.50    0.00    0.40    0.10    0.00   98.00  

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util  
sda                 0.00     1.00    10.00   20.00     0.50     1.00    100.00     0.50   25.00    20.00    30.00   5.00   15.00  

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
           2.00    0.00    0.50    0.20    0.00   97.30  

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util  
sda                 0.00     2.00    15.00   25.00     0.75     1.25    120.00     0.60   30.00    25.00    35.00   6.00   20.00

What is iostat?

iostat is a Linux/Unix command-line utility that provides detailed statistics about CPU usage and input/output (I/O) performance of storage devices (disks, partitions, or logical volumes). It is part of the sysstat package and is widely used by system administrators and SREs to diagnose performance bottlenecks related to disk I/O and CPU utilization.

Let’s go through the full output of iostat step by step, explaining each section and metric in detail..

Command Used

iostat -x 5 2

-x: Displays extended statistics for devices.
5: Interval in seconds between reports.
2: Number of reports (including the first one).

Example Output

Linux 5.15.0-73-generic (hostname)   12/18/2024  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.50    0.00    1.00    5.00    0.00   91.50

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
sda                 0.00     1.00    10.00   20.00     0.50     1.00    100.00     0.50   25.00    20.00    30.00   5.00   15.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.00    0.00    1.50    4.00    0.00   91.50

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
sda                 0.00     2.00    15.00   25.00     0.75     1.25    120.00     0.60   30.00    25.00    35.00   6.00   20.00

Section 1: CPU Statistics

The first section of the output shows CPU utilization metrics.

Header:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

Metrics Explained:

%user: Percentage of CPU time spent on user processes (non-kernel processes).
%nice: Percentage of CPU time spent on user processes with a "nice" priority (low-priority tasks).
%system: Percentage of CPU time spent on kernel/system processes.
%iowait: Percentage of CPU time spent waiting for I/O operations (e.g., disk or network) to complete.
%steal: Percentage of CPU time "stolen" by the hypervisor for other virtual machines (in virtualized environments).
%idle: Percentage of CPU time spent idle (not doing any work).

Example:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.50    0.00    1.00    5.00    0.00   91.50

Interpretation:
- 2.5% of the CPU is being used by user processes.
- 1% is being used by system/kernel processes.
- 5% of the CPU is waiting for I/O operations to complete (this is significant and could indicate a disk bottleneck).
- 91.5% of the CPU is idle, meaning there is plenty of CPU capacity available.

Section 2: Device I/O Statistics

The second section provides detailed statistics for each storage device (e.g., /dev/sda).

Header:

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util

Metrics Explained:

Device: The name of the storage device (e.g., sda, nvme0n1).
rrqm/s: The number of read requests merged per second. If multiple read requests are queued for the same block, they are merged into one request.
wrqm/s: The number of write requests merged per second.
r/s: The number of read requests completed per second.
w/s: The number of write requests completed per second.
rMB/s: The amount of data read from the device per second (in megabytes).
wMB/s: The amount of data written to the device per second (in megabytes).
avgrq-sz: The average size of I/O requests (in sectors). Larger values indicate larger I/O operations.
avgqu-sz: The average number of I/O requests in the queue. Higher values indicate more queuing and potential contention.
await: The average time (in milliseconds) for I/O requests to be completed, including both queue time and service time.
- High await values indicate that I/O operations are taking too long, which could be due to disk contention or slow storage.
r_await: The average time (in milliseconds) for read requests to be completed.
w_await: The average time (in milliseconds) for write requests to be completed.
svctm: The average service time (in milliseconds) for I/O requests. This is the time the device spends servicing requests, excluding queue time.
%util: The percentage of time the device was busy handling I/O requests. If this value is close to 100%, the device is saturated and may be a bottleneck.

Example:

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
sda                 0.00     1.00    10.00   20.00     0.50     1.00    100.00     0.50   25.00    20.00    30.00   5.00   15.00

Interpretation:
- rrqm/s and wrqm/s: Very low values (0.00 and 1.00), meaning there is little merging of I/O requests.
- r/s and w/s: The device is handling 10 read requests and 20 write requests per second.
- rMB/s and wMB/s: The device is reading 0.5 MB/s and writing 1 MB/s.
- avgrq-sz: The average request size is 100 sectors (50 KB per request, as 1 sector = 512 bytes).
- avgqu-sz: The average queue size is 0.5, meaning there is some queuing but not excessive.
- await: The average time for I/O requests is 25 ms, which is relatively high and could indicate a performance issue.
- r_await: Read requests take 20 ms on average.
- w_await: Write requests take 30 ms on average.
- svctm: The service time is 5 ms, meaning the device itself is fast, but the queuing time is causing delays.
- %util: The device is 15% utilized, so it is not saturated.

Second Report

The second report shows updated statistics after 5 seconds.

Device             rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
sda                 0.00     2.00    15.00   25.00     0.75     1.25    120.00     0.60   30.00    25.00    35.00   6.00   20.00

Changes:
- r/s and w/s: Read and write requests have increased to 15 and 25 per second, respectively.
- rMB/s and wMB/s: Read and write throughput have increased to 0.75 MB/s and 1.25 MB/s.
- await: The average time for I/O requests has increased to 30 ms, indicating worsening performance.
- %util: The device utilization has increased to 20%, meaning the disk is busier.

How to Use This Data

High await and %util:
- If await is high and %util is close to 100%, the disk is likely a bottleneck.
- Solution: Upgrade to faster storage (e.g., SSDs) or optimize the application to reduce disk I/O.
High avgqu-sz:
- If avgqu-sz is high, it indicates queuing, which could be due to contention or insufficient IOPS.
- Solution: Increase IOPS (e.g., provisioned IOPS on AWS EBS) or reduce the number of concurrent I/O operations.
Low svctm but High await:
- If svctm is low but await is high, the delay is in the queue rather than the device itself.
- Solution: Investigate the application or workload causing excessive I/O.
High r_await or w_await:
- If read or write latency is significantly higher than the other, it could indicate a specific issue with read or write operations.
- Solution: Optimize the workload (e.g., caching for reads, batching for writes).

Conclusion

The iostat command provides a wealth of information about CPU and disk performance. By understanding the metrics and their relationships, you can diagnose performance bottlenecks and take corrective actions. In this example, the high await values and increasing %util suggest that the disk is becoming a bottleneck, and further investigation or optimization is needed.

Tech Interview Series: What Happens When You malloc 2 GB but Don't Use It?

Akshay Siwal — Tue, 17 Dec 2024 14:41:23 +0000

What Happens When You malloc 2 GB but Don’t Use It?

malloc reserves virtual memory, not physical memory:
- When you call malloc(2GB), the operating system reserves 2 GB of address space for your process in the virtual memory.
- No physical RAM is allocated yet because you haven’t accessed (touched) the memory.
RES and VIRT behavior:
- VIRT will increase by 2 GB because the reserved memory adds to your process's virtual address space.
- RES will not increase at this point because physical memory (RAM) has not been allocated.
When does RES increase?
- Physical memory is only allocated when you touch (read/write) the memory.
  - For example, if you write to a page of the allocated memory, the kernel will allocate a physical RAM page to your process. This will reflect in the RES size.

Question:

If I malloc 2 GB of memory but don’t use it (i.e., the pages are allocated virtually but not yet touched), will it count toward the RES (resident memory)? And since the memory hasn't been touched, can other processes still use that space?

1. If I malloc 2 GB and do not use it, will it be counted in RES?

No, it will not be counted in RES.

Here’s why:

When you call malloc(2GB), the operating system reserves virtual address space for the requested memory.
However, no physical memory (RAM) is allocated until you actually access or "touch" those pages.
This is due to the lazy allocation strategy used by modern operating systems. Pages are only backed by physical memory (loaded into RAM) when they are accessed for the first time.

In top command:

VIRT will increase by 2 GB because the virtual memory address space has been reserved.
RES will remain unchanged because no physical memory is used yet.

2. What happens when you “touch” the memory?

The first time you write to a page in the allocated memory:

The operating system generates a page fault.
It assigns a physical page (RAM) to the virtual address space.
That page now counts towards the RES value.

So, RES only grows as you actually use the memory.

3. Can other processes use the unallocated memory (from malloc)?

Yes, absolutely.

Here’s why:

When you call malloc, you are only reserving virtual address space in your process. The physical memory (RAM) is not yet allocated.
Since the physical RAM is not committed to your process, it remains free for other processes to use.
Until you “touch” the memory (write to it), the operating system doesn’t allocate RAM to it.

Analogy to Simplify

Think of the OS as a hotel manager and memory as rooms:

Virtual Address Space: You’ve “booked” 2 GB of rooms (via malloc), but the hotel manager only writes your name in the reservation book.
Physical Memory: Rooms in the hotel (RAM). The manager doesn’t hand over any rooms to you yet.
Touching Memory: When you enter the rooms (write to the memory), the manager actually allocates rooms (physical memory) for you.
Until you use the rooms, they are still available for other guests (processes).
Your name in the reservation book (VIRT) just says you’ve reserved them if needed.

Quick Example with Code

Here’s a C example to test this:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    printf("PID: %d\n", getpid());
    printf("Allocating 2 GB using malloc...\n");

    char *ptr = malloc(2L * 1024 * 1024 * 1024); // Allocate 2 GB
    if (!ptr) {
        perror("malloc failed");
        return 1;
    }

    printf("Press Enter to touch memory...\n");
    getchar(); // Pause to check top command before touching

    for (long i = 0; i < 2L * 1024 * 1024 * 1024; i += 4096) {
        ptr[i] = 0; // Touch memory 4KB at a time
    }

    printf("Memory touched. Press Enter to exit...\n");
    getchar(); // Pause to check top command after touching

    free(ptr);
    return 0;
}

Steps to Test:

Compile and run the program: gcc test.c -o test && ./test
Note the PID and check the top command output:

Before touching memory:
- VIRT increases by 2 GB.
- RES remains small.
After touching memory:
- RES increases to reflect the actual physical memory used.

Conclusion

malloc without usage:
- Increases VIRT, but not RES
- Physical RAM is not allocated.
Other processes:
- The unallocated memory is still free for other processes to use.
- Physical RAM is only committed to your process when you "touch" the memory.
Key Point:
- malloced but unused memory increases VIRT but not RES.
- Virtual memory reservation (VIRT) does not mean physical memory usage (RES).
- Physical memory is allocated lazily – only when you access the memory.

Let me know if you'd like further clarification! 🚀

How to Choose the Right AWS Region: Key Factors and Real-Life Lessons

Akshay Siwal — Mon, 28 Oct 2024 00:31:27 +0000

How to Choose the Right AWS Region: Key Factors and Real-Life Lessons

Picking an AWS Region may seem like a small detail at first, but it's one of the most impactful decisions you can make in setting up your cloud infrastructure. Getting it right means better performance, compliance, and cost savings; getting it wrong can lead to unforeseen challenges. Here's a straightforward guide to the top factors you should consider, with real-life examples of companies who've nailed it - or learned the hard way.

1. Stay Compliant with Data Laws and Governance Rules

Why It Matters: Choosing an AWS Region means deciding where your data is stored. For industries with tight data residency laws, like finance or healthcare, this decision is critical. AWS Regions keep your data in specific locations, so it won't leave the region without your approval.
When It Works: Salesforce leveraged AWS to launch its Hyperforce infrastructure, which allows the company to deploy its services in various regions while ensuring compliance with local data residency regulations. This strategic move enables Salesforce to quickly adapt to data sovereignty laws in different countries. Read more here.
When It Doesn't:

Google found itself in hot water when it didn't offer similar regional storage flexibility, which led to regulatory pushback in Europe. They faced hefty fines and damage to user trust. Read more here.
Meta (Facebook) incurred a €1.2 billion fine for non-compliance with GDPR, highlighting the importance of choosing the right AWS region to avoid legal repercussions. Read more here.

Pro Tip: Before choosing your region, check in with your legal or compliance team to ensure you're aligning with any local regulations.

2. Choose a Region Close to Your Customers for Faster Performance

Why It Matters: Just like how having a warehouse near your customers reduces delivery times, placing your servers close to users reduces the time it takes for data to travel. This leads to smoother, faster user experiences.
When It Works: Netflix does this well, hosting data near key user populations across the globe to minimize buffering and keep viewers engaged. Read more here.
When It Doesn't: Snapchat struggled initially due to geographical distance from users, resulting in high latency and frustrating app performance. Read more here.

Pro Tip: Use AWS's network tools to test latency for different regions, and choose one close to your main users to ensure great performance.

3. Ensure the Services You Need Are Available in Your Region

Why It Matters: Not every AWS service is available in every region. Sometimes, new or niche services, like machine learning or real-time applications, are only offered in select regions first.
When It Works: Airbnb chose AWS Regions where all the services it needed were ready and fully supported, allowing them to offer a seamless experience for travelers without delays. Read more here.
When It Doesn't: Some startups in Europe faced delays when their chosen AWS region lacked essential services, driving up costs and slowing projects.

Pro Tip: The AWS Regional Service Availability Guide is an excellent resource for finding the AWS services you need and verifying their availability in your selected region.

4. Check the Regional Pricing Differences

Why It Matters: AWS costs vary by region due to factors like local operating expenses. Just like some cities are pricier to live in than others, some AWS Regions are pricier to run in.
When It Works: Expedia made use of regional pricing differences by selecting regions where they could maximize cost savings without compromising on quality. Read more here.
When It Doesn't: Smaller companies have learned this lesson the hard way, finding themselves over budget after choosing pricier regions and later moving to less costly options for sustainability.

Pro Tip: The AWS Pricing Calculator is a great tool to compare costs by region and help you choose a budget-friendly option.

Final Thoughts

Choosing an AWS Region is a strategic decision that will impact your project for years to come. From compliance and latency to service availability and cost, each factor affects your application's performance and your bottom line. By carefully weighing each of these aspects and learning from the experiences of others, you can make the best choice for your business and your users.
Stay Informed: Check AWS's global infrastructure updates to track new region launches and service availability for optimal planning.

Is Your Data Really Safe in the Cloud?

Akshay Siwal — Wed, 22 Jul 2020 01:13:26 +0000

I am sure you or the company you work for never want to see such a message on the website of the company. Unfortunately, for some companies, this worst nightmare has come to reality, and few out of them were almost on the verge of losing business until they recovered or negotiated with hackers.

One sad story

Ashley Madison, which got hacked on 15 July 2015 by a group called "The Impact Team" and threatened to expose users' identities, if its parent company, Avid Life Media, did not shut down Ashley Madison and its sister site, Established Men. Few users committed suicide as their highly personal data was made public on torrent. The story does not end here. It is 2020 now, and Ashley Madison users are still being blackmailed.

After being attacked depending on what a company does, consequences may vary, but one side effect is always there.

Any guesses?

Yes, You are correct. The company loses its customers' trust, which does not come in one day.

Enough of this sad story. Let us end it here and think about how you can prevent this from happening with your product. These days most of the companies are offering SAAS and are on Cloud because being on Cloud gives much flexibility. However, just like all good things come with some side effects, the Cloud is no exception. It has a side effect of inadvertently exposing Cloud resources to the public.

We already know how much damage a publically exposed resources can cause to a company, that is why every company has a security team that proactively keeps scanning for unauthorized access or resources that one accidentally let open to the world and bombards with a lot of JIRAs for explanations.

In my opinion, security is just perception, and no product is 100% secure. It is either hard or easy to hack. Now it depends upon how hard we can make our product to be hacked.

Coming to the main point that motivated me to write this blog, my very first blog on medium.com was AWS EBS. One day my manager paged me to find out all the EC2 instances with unencrypted EBS volume and to encrypt them as soon as possible without affecting any production service and without making any changes in EC2 private IP. While working on this task out of curiosity, I ended up Googling hacks that had happened because of unencrypted EBS, and results were scary. The case I highlighted above was one of them where people at AshleyMadison made two mistakes. First, they hardcode secrets in source code, and second, all this critical information was unencrypted. If EBS were encrypted, hackers would have had a hard time getting this data even if the snapshot of EBS was publicly accessible until a hacker gets access to its KMS keys.

Let me tell you one unencrypted publicly accessible snapshot can make you live your worst nightmare. Public Snapshot! Now, if you might think I am an Idiot, who does that? Right?

Wait. Do not judge too early read this report written on Ben Morris's findings, which he presented at DEF CON 27 in August 2019. Several major private companies and even federal agencies unknowingly exposing their sensitive data like admin passwords, application keys, and VPN configuration, which can be exploited to tunnel to their corporate network.

This nightmare is real

This nightmare is real. Most of the time, we are just one misconfiguration away from a potential hack. If you are still not convinced, read this interesting comment from "Hacker News" which enlighten a case when a user needs to share an EBS snapshot between two AWS accounts.

If someone exposes snapshot publically for just a couple of minutes, there are bots planted by Hackers in every region looking for such exposed snapshots and copy as soon as they see it. Hackers can create EBS volume out of it and attach to an EC2 instance to view its data and if there are any secrets or hardcoded API keys present in this then.

Now, consider a scenario where you have a use-case that requires you to share a snapshot with other accounts. While implementing this use-case, you realize that EBS volume contains confidential information and you deleted this sensitive information from EBS before creating a snapshot. SMART!

What if I tell you that others can still see the file you have deleted before creating a snapshot. They can still see your sensitive information. Scary! Isn't it?

Here is proof of what I have just said

I created an EC2 instance with Ubuntu AMI ami-0caae0b310f01ff33, which had an EBS volume of 8 GB. For the demonstration, I created a file production.json with some fake credentials under the /home/ubuntu/test_directory directory.

To prove the point, I deleted production.json and created a snapshot of this EBS volume. Let's say I accidentally shared it with unintended person or made it public.

To simulate hackers, I created an EBS volume from the snapshot snap-0f4a66cec80757, which I created after deleting production.json and attached this volume to another EC2 instance.

On the new EC2 instance, I ran lsblk command to see the device name of the new volume, which is nvme1n1 in this case, and mounted it to /data_from_snapshot directory. Once it is mounted /data_from_snapshot will appear in the output of df command.

Now we are almost set to steal victim's confidential data. All we need is a utility like extundelete or testdisk based on the underlying filesystem, which can recover deleted data.

I used extundelete as my test EC2 has ext4 filesystem. extundelete puts all recovered files in a directory called RECOVERED_FILES with a similar directory structure as of the original volume. Therefore RECOVERED_FILES directory is all that I need to examine to see deleted files.

In this example, we already know we deleted file production.json containing secrets under /home/ubuntu/test_directory directory, so we should check a similar directory structure inside the RECOVERED_FILES directory.

Voilà! Now I can see the password and other secrets of the file production.json, which was not even present when the snapshot was created. This means information can leak from not just currently available files on EBS but also from old deleted files. Therefore never share a snapshot if your EBS ever had any confidential data.

Why can Hacker see it?

When we delete a file, it only gets unlinked. Delete command only breaks the link between name and inode and marks the inode as unused so that it can be used again. It does not wipe out data from filesystem blocks. If data blocks are not overwritten by new data, then it is possible to recover data based on the underlying filesystem. There are lots of details on how the operating system and filesystem works behind the scene when we delete a file and what all things it checks. I will discuss these things in detail in my future blogs.

What can we do to avoid such scenarios?

AWS has provided a straightforward solution, which is nothing but encrypted EBS volume, So even if you accidentally share an encrypted EBS snapshot with unintended user or AWS accounts, only users with appropriate permission on KMS key used for encryption can see what is inside it. Data is encrypted before it leaves the EC2 instance, which ensures security of data-at-rest as well as data-in-transit between an instance and its attached EBS storage.
I will share how you can automatically find and encrypt unencrypted-volumes without impacting your production services in my future blogs.

Impact on performance

As per AWS, encrypted EBS volume has the same IOPS performance as unencrypted EBS volume. However, encryption does add some overhead on I/O requests since data get encrypted before it leaves the EC2 instance. To minimize overhead on I/O latency, the EBS encryption feature is available on only a few instances type.

Final Note

Always use encrypted EBS volumes as encrypted volumes automatically create encrypted snapshots and never share a snapshot with sensitive data with anyone or whom you do not trust until you have any unavoidable use-case. Even if you have use-case, always think twice because you are just one mistake away from being hacked.

Additional Links —
inshorts.com

www.theregister.co.uk

krebsonsecurity.com