DEV Community

Ben Corcoran
Ben Corcoran

Posted on • Originally published at Medium on

Understanding the SAS log: Time and Switches

If you’ve worked with SAS before, you’ve seen the SAS log. It’s a beast, an unwieldy and at times unhelpful beast. But we can tame it. Well, maybe not, but we can be hit with a stick long enough that it becomes docile around tourists. Let’s use the great work of elephant tamers and orca ‘trainers’ as inspiration. We’re diving into the world of the SAS log.

Real time vs CPU time

You’ve run SAS code before and had log lines line this before. We’ve looked at this log before.

Proc Sort Log Results

And, err, it took 3 minutes? or did it take 1 minute? What why are there two times, I don’t..what? If you’re like me then this was accompanied with a hammed-up preformative mental breakdown. Followed by your boss taking you aside to make sure everything is okay at home. But as we’ve had our 5 minutes of amateur dramatics lets dig into what these two times actually mean.

Real time

Let’s start simple first, let’s start with real time. This is a measurement of how exactly how much physical time has passed since the job was started. Start a stopwatch and hit run, the time you’d get at the end of execution is the real time.

Real-time is heavily dependent on the resources and load on the system. If there is a huge queue of jobs you can expect real time to be significantly extended. If the load on the system is low, our real-time depends solely on our shitty optimisation.

User CPU time

User CPU time is a slightly different measurement. It is a measurement of how much ‘time’ the job spent utilising the CPU. More explicitly, how much of the job occurs in the CPU. In the above example, almost ⅓ of the real-time was spent inside the CPU. The rest of the time being used for read/write operations and system processes.

On a system with a single CPU, the user CPU time will always be less than the real time. However, if a machine has multiple processors then it’s more than possible for the user CPU time to be longer than the real time. Woah woah woooah, before the urge to go full tilt Peter Finch overtakes, let’s look at this.

The CPU time is the sum of the amount of time the job utilises in each CPU. Say you have a machine with four processors. You’ve absolutely smashed your rig/SAS setup, this thing is purring like a cat when it comes to efficient processing. You run a job that takes two seconds of real time. Your set up spreads that computational load across the four processors. If each processor spends one second working on this job then your total user CPU time is four seconds. Two seconds longer than your real time.

Now we understand the difference between these two times stats. We’re now savant kings of the SAS log, so let’s step this up a notch.

FULLSTIMER, or stats for nerds for nerds.

In our previous example, we got two stats outlining the performance of our proc sort, real-time and user CPU time. However, there is an option that can be called to give us even more information about how inefficient our programming is, FULLSTIMER.

FULLSTIMER SAS Log example

Ho hoo, look at all these new stats we’ve got to inves…jesus christ is that a third time measurement?! /)_-). Right well, we best figure out what these ones measure as well.

System time

We know that the user CPU time is the time taken for the processor(s) to execute our job. The system time then represents the time taken by the processor(s) to execute the operating system tasks that support our job. More simply the system time is the time spent managing your request. For example, if your job was waiting for input/output from a slow disk your job might be involuntarily swapped out of the CPU until that I/O resource became available. In a multiprocessor system, this also keeps track of time spent splitting and pulling together various parallel threads. These system jobs are the ones that contribute to the system time

Memory and OS memory

Memory is fairly straightforward, the amount of memory used in processing the job. This doesn’t include memory used for the job’s overheads, for example running SAS manager.

OS memory is the amount of memory that was released to complete the job. In other words, the maximum amount of memory the OS allocated to the job.

Paging

Right, come on let’s look at paging. We’ve all seen it before, page faults here page something else there. You’ll react with fear before you realise nothing significant has happened. I can still open excel, everything must be fine, I obviously don’t need to worry about paging.

A page is essentially memory on disk. In order to conserve space in the main memory, the operating system can store information on disk until it is required by the OS. When we need that data, it is copied from the disk into the memory for the operating system to use. There are entire books devoted to paging, how it works and how it is implemented so let’s not go into it here. Suffice to say, we’re cheating and storing some of the stuff we should be putting in the memory onto the disk. This means the system can multitask more effectively. We can do this is because not everything is required to be in memory at the same time.

Page faults

A page fault is where we get caught. That thing we pretended was in the memory but was actually on the disk, well they want it now. This means the OS has to do an I/O read to the disk in order to load in this data. If there are a high number of page faults its likely that your system isn’t well optimised. The page fault requires an expensive I/O read rather than a quick dip in and out of the system memory.

Page reclaims

Some of the data we need may already be in a different area of memory when we need it. A page reclaim is a page fault that is handled entirely in the memory (no I/O operation). The data have already been loaded into the memory, used, and then marked as over-writable. But crucially hasn’t yet been overwritten. So we can jump in and grab that data before anyone notices.

Page swaps

The page swaps statistic represents the number of times our job is swapped out of main memory. I.e the instructions for our job have been written to disk so another process can use our slot of the memory.

Context switches

Nerds are actual nerds you know. Okay, so you’ve got an amazing multitasking system. We can kick out any job that is currently being processed and pick it up at a later time. This is called preemption. In this system, we deal out discrete chunks of time for each job. Now you’ve got this, what are you going to name the chunks of CPU time your allocating? Oh…a time slice? really? A time slice…fine (nerds.)

A context switch is when our multitasking system switches out one job for another. There are two kinds of context switch, one is voluntary the other is involuntary.

Voluntary context switches

These occur when the job hands back the CPU in the middle of processing because it is busy waiting on some other information. Typically this takes the form of an I/O read. Voluntary context switches are good, they mean that the job is sharing the CPU resource with other jobs when it has to wait for a slow I/O read. Too many of them might be a sign that your I/O is hampering the speed at which your system could be processing.

Involuntary context switches

These occur when our job has overstayed it’s welcome. If the job hasn’t completed by the end of it’s *sigh* time slice then the job is kicked out to be picked up at a later point. This can also happen if a higher priority job comes in requesting CPU time.

Block operations

Block operations are the number of times that a read or write, the size of the buffer, occurs. The buffer size or bufsize is a dataset option that defines the amount of data that can be transferred in a single I/O operation. This value is set by default to be optimised on your system for sequential reads. You may see improved performance for random direct reads by increasing this value but it will incur additional memory costs.

Input/Output block operations

Now that we’ve defined what a block operation is, it’s fairly simple to see that an input block operation is a bufsize I/O read and an output block operation is a bufsize I/O write. Not all of these will be disk operations as some reads will be of data already stored in memory like we saw when we discussed page reclaims.

So what?

Well hey, at least we know what all these things are on FULLSTIMER now. We’ve understood our SAS log a little better. We know that nerds are terrible at naming things. We also have a better understanding of how our system fares with various jobs.

High values for any of the above statistics can be indicative of poor optimisation. Be that in the code, software or subsystems. Now you have a slightly better understanding of the logging. You might be able to narrow down exactly where in the process your job is bottle-necking. Or you can just show off if you’re friends are impressed by this sort of knowledge. If they are though, ooof mate you need better friends.

Originally published at Benjamin Corcoran.

Top comments (0)