AIX 3.2 Performance Monitoring and Tuning

For any system, continued customer satisfaction and purchasing decisions depend strongly on performance. The job of the performance analyst is to understand the system behavior and identify the usage of resources. The analyst's task in tuning the system is to change the way the system uses the available resources or acquire additional resources. System tuning is an iterative process of resource measurement, resource utilization and resource re-allocation.

Traditional tools on the system do not provide enough detail to analyze the system fully. AIX V3.2 provides a suite of performance tools that provide detailed reports about resource usage. This chapter discusses these tools in detail.

AIX V3.2 Enhancements

This section describes various enhancements in AIX V3.2 thus providing improved performance. The following enhancements are outlined in this section:

Page Replacement Algorithm

The highlights of the page replacement algorithm in AIX V3.2 are as follows:

The page replacement algorithm has been enhanced to reduce the amount of repaging performed. The new algorithm distinguishes between the program text (computational) and the data (file) pages when choosing replacement victims.

Computational memory consists of the pages that belong to working storage segments or program text segments. File memory consists of the remaining pages. The virtual memory manager (VMM) now maintains a repage history buffer. It estimates the computational memory repaging rate (Csubscript R) and the file memory repaging rate (Fsubscript R). When a page needs to be stolen, the values of Csubscript R and Fsubscript R determine if the stolen page is going to be a file memory or a computational memory. This algorithm is defined as follows:

if (Csubscript R > Fsubscript R)
{
   Steal file memory pages
}
else
{
  Steal either file memory or computational memory pages
}

Memory Load Control

The highlights of the memory load control in AIX V3.2 are as follows:

Prior to AIX V3.2, AIX was solely a demand paging system. Each page was read and written one a time, as per the demand. This yielded good performance if memory was not over committed. The performance degraded rapidly if memory becomes over committed. Also this could lead to a thrashing condition where the system will be in a loop of swapping in and swapping out, without any other useful tasks being performed. This was a problem before AIX V3.2.

AIX V3.2 monitors the disk activity continuously. If it determines that a thrashing condition exists, then it will suspend the offending process. In addition, newly created processes will also be suspended. The pages of the suspended processes quickly become stale and are paged out by the page replacement facility. This releases enough memory for the active processes to execute successfully. The suspended processes are then reactivated gradually, when there is no thrashing condition. This technique is called lazy swapping.

Please note that this facility is not a substitute for insufficient system memory. The system should still be configured with enough system memory, so that the memory is not overcommitted. However if such a condition occurs, this facility allows the system to still be functional.

I/O Pacing

The highlights of I/O pacing in AIX V3.2 are as follows:

Prior to AIX V3.2 users occasionally experienced long terminal response times, when another application in the system was doing large writes to disk. Because most writes are synchronous, FIFO I/O queues of several MB can build up, which can take several seconds to complete. The performance of an interactive process (for example: a query) is severely impacted if every disk read spends several seconds waiting in the queue. This was a problem before AIX V3.2.

AIX V3.2 virtual memory manager (VMM) now includes I/O pacing to control writes. I/O pacing limits the number of I/Os that can be outstanding against a file. When a process tries to write to a file that is at the high water mark, it is put to sleep until enough I/Os have completed to meet the low water mark for that file.

Use the command smit chgsys to go into SMIT menu Change / Show Characteristics of Operating System. You will be able to set the high and low water marks for pending write I/Os per file, by entering the number of pages.

Use the command smit chgsys to look at the current settings of these parameters. These parameters are called maxpout and minpout in the system parameter definitions, which may be listed by the following command:

# lsattr -l sys0 -E

These parameters will have to be set by trial and error. I/O pacing sacrifices throughput in order to improve response time. In general a value of 4 X (number of pages) + 1 for the parameter maxpout works well, because write-behind sends the previous four pages to disk when a logical write occurs on the fifth page.

A value of 0 for maxpout and minpout disables I/O pacing which is the default after installation.

Tuning Asynchronous Ports for File Transfer

The highlight of asynchronous ports tuning in AIX V3.2 is as follows:

The asynchronous (async) ports (or tty ports) have been primarily interactive use so far. However, with increased use of modems and wide area networks (WAN), the load on tty ports have steadily increased. Originally tty ports were designed for interactive users whose typing speed normally does not exceed 10 to 15 characters per second. So the tty ports were tuned for output rather than input. When these ports are used for file transfers or large data transfers this created an enormous load on the CPU. This was the problem before AIX V3.2.

AIX V3.2 provides a shell script fastport which addresses the above problem. This script tunes the tty ports for raw mode file transfers. This disables most of the input and output post processing thus reducing the load on the CPU. In addition up to 14 characters can be buffered on input before a CPU interrupt is issued, by defining a parameter rtrig.

A complete description of the tuning process and the script fastport are found in the document AIX Performance Monitoring and Tuning Guide.

Performance Analysis and Tuning Overview

There are two primary areas of performance analysis and tuning. They are system management and application development. Properly designed applications can take advantage of system specific features. If the developer has the knowledge of the system and the measurement and tuning tools, then the application development job becomes much easier.

The system manager can do the following to the existing resources:

Additional resource allocation requires that the system manager first analyses the existing system to determine what and how much of additional resources are needed.

The application developer has control in the following areas:

The developer also has a choice of the resources to be used and controls the system interactions which may effect performance.

Analysis Guidelines

The system administrator must follow the guidelines below for system tuning. Not all steps are required for every situation and some suggestions are not feasible, in some systems. The guidelines are as follows:

The system administrator must define the requirements and expectations of the system. After that the goals must be prioritized based on criteria specific to the situation. Once the required resources are determined, they must be added and used appropriately. Whenever possible, the resources should be used in parallel rather than serially. For example, spread the disk activity across multiple drives to take advantage of concurrent seek operations or to reduce seek distances.

The workloads should be properly time scheduled to minimize resource requirements. The applications may be written to make more efficient use of the existing resources. Some functions and subroutines that do basically the same thing may have vastly different resource utilization.

Performance tuning in a system like AIX presents the following problems to the system administrator:

Most of the multiuser system users want a consistent and acceptable terminal response time. Some other users may forfeit good terminal response time in favor of some real time application or another high priority process. So the requirements from system to system are different.

The system may have to support more users than were originally planned. Or the storage requirements may have increased due to changes in applications. Executive decisions may also completely alter system configurations.

Complex system interactions like the paging system, mapped files and I/O effect performance. These factors need to be considered while tuning a system for performance.

Standard Tools for Performance Monitoring

In this section standard tools for performance monitoring are discussed. These tools are common to most UNIX systems. See AIX Tools for Performance Monitoring for the discussion on tools which are specific to AIX V3.2.

In this section, for each command some description and an example output is provided. For a complete description of the commands please see InfoExplorer or AIX Commands Reference.

Following are the standard tools for performance monitoring:


**** NOTE: **** You can access directly to most of these commands using SMIT fastpath smit monitors.

ps Command

This section contains some example screens for different ps command outputs. The figure captions are self-explanatory.


# ps vg
PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
0 - S 12:44 8 8 8 xx 0 0 0.1 0.0 swapper
1 - S 10:27 336 260 180 xx 21 24 0.1 0.0 /etc/init
514 - R 9033:39 0 28 8 xx 0 0 0.1 0.0 kproc
771 - S 33:00 0 32 16 xx 0 0 0.0 0.0 kproc
1028 - S 0:00 23 32 12 xx 0 0 0.0 0.0 kproc
1460 - S 0:00 4 32 8 xx 0 0 0.0 0.0 kproc
1558 pts/6 S 0:00 61 136 288 32768 302 276 0.0 1.0 -ksh
2149 - S 0:00 0 32 8 xx 0 0 0.0 0.0 kproc
2405 - S 0:21 137 132 84 32768 6 8 0.0 0.0 /usr/etc/portmap
2800 - S 3:45 2056 88 28 xx 1 4 0.0 0.0 /etc/syncd
| | | | | | | |
| | | | | | | |
Status | | | | | | % real memory
S - sleeping | | | | | | that this process
R - running | | | | | | uses for its text
Z - cancelled | | | | | | and data segments
| | | | | |
CPU Time _________| | | | | |___ Ratio of CPU time
| | | | to real time
Pages in during ________| | | |
life of process | | |_______ Text portion of RSS
| |
Virtual size of data _________| | KB of real memory
section on page space |_________________________ used by text and
data segments

Figure: ps BSD Syntax and Output for Running Processes




# ps -ef
USER PID PPID C STIME TTY TIME CMD
root 1 0 0 Aug 10 - 10:42 /etc/init
root 2405 5202 0 Aug 10 - 0:21 /usr/etc/portmap
root 2800 1 0 Aug 10 - 3:50 /etc/syncd 60
root 2947 5202 0 Aug 10 - 0:05 /usr/etc/biod 6
root 19323 24440 0 Aug 12 pts/17 0:00 /bin/ksh
swamy 19627 1 0 10:05:30 - 0:08 xant -title SYDVM1 9.8.0.2
pascale 34478 30893 0 09:18:34 pts/12 0:00 -ksh
yannick 27191 1 1 Aug 11 pts/6 0:26 mwm
roland 31515 18160 0 Aug 12 pts/0 0:34 mwm
| |
| |
Time process started command string with arguments

Figure: ps System V Syntax and Output for Running Processes




# ps -el
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
202803 S 0 1 0 0 60 20 1004 260 - 10:42 init
240801 S 18002 1558 33554 0 60 20 16e5 136 pts/6 0:00 ksh
260801 S 0 2405 5202 0 60 20 755d 132 - 0:21 portmap
240801 S 0 2800 1 0 60 20 24a9 88 5a5df18 - 3:50 syncd
40001 S 0 13498 1 0 23 20 2e0b 236 1fa648 - 0:01 afsd
60801 S 203 43966 1 0 60 20 23a8 404 - 0:35 xant
60801 S 202 44052 1 0 60 20 72dc 1408 - 3:11 xant
220801 S 18002 44982 29617 0 64 24 803 152 pts/24 0:00 uca
| | |
| | |
CPU usage value ___________| | |____ Nice value
|
Priority ____|

Figure: ps System V Syntax and Output for Priorities of Running Processes

vmstat Command

Some output examples of the vmstat command and an explanation for some of the fields follows. This information supplements the sar report.


# vmstat hdisk0 hdisk1 5 5
procs memory page faults cpu disk xfer
----- ----------- ------------------------ ------------ ----------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 2 3 4
0 0 18097 215 0 0 0 0 4 0 194 480 98 8 9 17 1 0 0
0 0 18097 212 0 0 0 0 0 0 246 309 141 8 10 81 1 0 0
0 0 18097 212 0 0 0 0 0 0 224 226 123 4 12 84 0 0 0
0 0 18097 212 0 0 0 0 0 0 216 255 109 4 10 85 0 0 0
0 0 18097 212 0 0 0 0 0 0 193 221 91 5 10 86 0 0 0
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | Active | | | | Pages out from | | | | Physical disk
| | Virtual | | | |__ page space per | | | | transfers per second
| | memory | | | second | | | |
| | | | | | | | | Percentage CPU usage
| | | | | Pages in from | |__|__|__ for user, system,
| | Free RAM | |______ page space per | idle and disk
| | pages | second | I/O wait
| | | |
| | |_________ Real memory slots | Process context switches
| | on free list |___ per second
| |________________________ Processes in
| wait queue
|____ Run queue arrivals
per second

Figure: vmstat Output Example for Disk Activity

This other command can be run to produce global statistics since the system boot:


# vmstat -s
1616324 total address trans. faults
109776 page ins
502683 page outs
6553 paging space page ins
10895 paging space page outs
750 total reclaims
557653 zero filled pages faults
12333 executable filled pages faults
581155 pages examined by clock
47 revolutions of the clock hand
116399 pages freed by the clock
13766 backtracks
343562 lock misses
90 free frame waits
0 extend XPT waits
64437 pending I/O waits
612459 start I/Os
612459 iodones
36104655 cpu context switches
171002835 device interrupts
4297522 software interrupts
0 traps
69829565 syscalls

Figure: vmstat Output Example for Global Statistics

iostat Command

Some output examples for the iostat command and an explanation for some of the fields follows. This information is useful in determining if the disk load is balanced correctly.


# iostat -d

Disks: % tm_act Kbps tps msps Kb_read Kb_wrtn
hdisk0 1.5 4.0 0.8 689675 1957268
hdisk1 0.5 1.7 0.3 448356 668020
hdisk2 0.0 0.0 0.0 15988 13765
cd0 0.0 0.0 0.0 82 0
| | | | |
| | | | |
Percentage of time | | KB read and written
disk was busy during | | during interval
interval sampled | |
100 times per second | |_______ Transfers per second
|
|________ Throughput in KB per second

Figure: iostat Output Example for Disk Activity




# iostat -t 5 4

tty: tin tout cpu: % user % sys % idle % iowait
0.8 71.5 3.6 5.6 84.7 6.1
0.0 0.0 4.6 7.4 88.0 0.0
0.8 2.4 8.4 9.0 74.6 8.0
0.0 0.0 7.2 7.6 85.2 0.0
| |
| |
Total characters |____________ Total characters
read by the system written by the
for all ttys system for all ttys

Figure: iostat Output Example for TTY Activity

timex Command

An example output for the timex command follows.


# timex ls -l 1>/dev/null

real 0.42
user 0.13
sys 0.06

Figure: timex Output Example

The timex command shows the elapsed time, user process CPU time and system (kernel) process CPU time for the command ls -l in the above example. There are three flags that you can specify in the timex command. See InfoExplorer or AIX Commands Reference for details of this command.

sar Utility

The AIX system maintains a series of system activity counters that record various activities and provide the data that sar reports. The counters are updated automatically whether sar is running or not. The monitoring tool sar extracts the data in the counters and saves it, based on the sampling rate and number of samples specified to sar.

System activities monitored include CPU utilization, VMM activity, file access system calls, raw and block I/O operations, fork and execs, messages and semaphores and so on.

The sar command extracts and writes to the standard output records previously saved in a file or directly from the system counters. The sar command reports only local system activities.

How to Set Up sar Data Files

Follow these guidelines to set up sar data files:
  1. Log in as root; then run the following command: su - adm.
  2. Enter: crontab -e.
  3. Uncomment the following lines: (remove the # sign from the front of the line)
    #0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &
    #0 * * * 0,6 /usr/lib/sa/sa1 &
    #0 18-7 * * 1-5 /usr/lib/sa/sa1 &
    #5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -ubcwyaqvm &

    The lines should now look like this:


    0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &
    0 * * * 0,6 /usr/lib/sa/sa1 &
    0 18-7 * * 1-5 /usr/lib/sa/sa1 &
    5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -ubcwyaqvm &

  4. Uncomment the following line in the /etc/rc file:
    # /bin/su - root -c /usr/lib/sa/sadc /usr/adm/sa/sa `date +%d`

    The line should now look like this:


    /bin/su - root -c /usr/lib/sa/sadc /usr/adm/sa/sa `date +%d`

  5. Reboot the system. This will turn on the data collection programs that the sar command uses for displaying data.

sar Examples

Some examples for the command sar follow. The figure captions are self-explanatory.


# sar -c 10 2

AIX pippin 2 3 000111873000 08/18/93

09:08:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
09:08:13 242 120 0 0.00 0.00 31029 24
09:08:24 243 121 0 0.00 0.00 31304 23

Average 243 120 0 0.00 0.00 31167 23
| | | | | | |
| | | | | | |
Total system | | | Total forks Total characters
calls ______| | | and execs read and written
| |
Total reads and writes

Figure: sar Output Example for System Calls Usage




# sar -m 10 2

AIX pippin 2 3 000111873000 08/18/93

09:08:03 msg/s sema/s
09:08:13 0.00 0.00
09:08:24 0.00 0.00

Average 0.00 0.00

Figure: sar Output Example for Messages and Semaphores




# sar -u 60 3

AIX pippin 2 3 000111873000 08/18/93

09:32:31 %usr %sys %wio %idle
09:33:01 2 4 0 93
09:33:32 1 5 0 93
09:34:02 1 4 0 95

Average 1 4 0 94
| | | | Percentage of time spent
| | | | in idle routine with no
Percentage of | | | |_____ pending disk I/O
time spent in | | |
user mode _____| | | Percentage of time spent
| |_____________ in idle routine with
Percentage of time in | pending disk I/O
system (kernel) mode __|

Figure: sar Output Example for CPU Utilization

gprof Utility

The utility grpof generates two useful reports: a flat profile of CPU usage by routine and number of calls and a call-graph profile showing routines plus their descendants in descending order by CPU time. This allows you to determine what parent routines called a particular routine most frequently and what child routines were called by a particular routine most frequently.

The utility gprof works with C, Fortran, Pascal and Cobol. Source code must be compiled with the -pg option. This causes the compiler to insert a call to the mcount function into the object code. This function mcount maintains counters for each time a parent called a child function.

When the program is executed, statistics are collected and saved in the gmon.out file in the current directory. Later when the gprof command is issued, it reads the a.out and gmon.out files by default, to generate the two reports mentioned above.

sa1 Command

The purpose of this command is to collect and store binary data in the /var/adm/sa/sadd file. The syntax of this command is:

# /usr/lib/sa/sa1 [ Interval Number ]

The sa1 command is a shell procedure variant of the sadc command that handles all of the flags and parameters of that command. The sa1 command collects and stores binary data in the /var/adm/sa/sadd file, where dd is the day of the month. The Interval and Number parameters specify that the record should be written Number times at Interval seconds. If you do not specify these parameters, a single record is written. You must have permission to write in the /var/adm/sa directory to use this command.

The sa1 command is designed to be started automatically by the cron command. If the sa1 command is not run daily from the cron command,the sar command displays a message about the non-existence of the /usr/lib/sa/sa1 data file. As an example for the cron entry to create a daily record of sar activities, place the following entry in the adm crontab file:


0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &

sa2 Command

The purpose of this command is to write a daily report in the /var/adm/sa/sardd file. The syntax of this command is:
# /usr/lib/sa/sa2

The sa2 command is a shell procedure variant of the sar command, which writes a daily report in the /var/adm/sa/sardd file, where dd is the day of the month. The sa2 command handles all of the flags and parameters of the sar command.

The sa2 command is designed to be run automatically by the cron command and run concurrently with the sa1 command.

As an example of a cron entry to run the sa2 command daily, place the following entry in the root crontab file:


5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 3600 -ubcwyaqvm &

This will generate a daily report called /var/adm/sa/sardd. It will also remove a report more than one week old.

AIX Tools for Performance Monitoring

All the tools discussed in Standard Tools for Performance Monitoring are available in AIX V3.2. In addition the following tools which are specific to AIX V3.2 are provided to ease the job of performance monitoring and tuning.

AIX Trace Facility

The trace daemon configures a trace session and starts the collection of system events. The data collected by the trace function is recorded in the trace log (see /usr/adm/ras/trcfile). A report from the trace log can be generated with the trcrpt command.

The command rmap provides extended analysis and reporting facilities for the trace log data.

These tools provide the finest level of system visibility of any of the performance tools. These tools may be used after a primary area of concern has been identified by using other tools such as tprof or when you need complete details of the situation.

The command trace is used to start the trace log, which will be used by other tools such as rmap, tprof and filemon.

You can control which system events are monitored by the trace command by using trace hook IDs. To see which IDs are available to you, examine your /usr/include/sys/trchkid.h file.

Tracing TTY Information

To trace tty information, use the following format:

trace -a -j EVENT [,EVENT,EVENT]

Substitute trace hook IDs for the EVENTs shown above after the -j flag. These hooks are described in you trchkid.h file mentioned above. Below are some of the most used hook IDs for ttys:

401 common tty code 402 pty 403 rs asynchronous driver (native, 8 and 16-port adapters 410 cxma asynchronous driver (128-port adapter) 404 lion asynchronous driver (64-port adapter) 405 hft 406 rts/cts pacing 407 xon/xoff pacing 408 dtr pacing 409 dtr open 40c posix line discipline 40d serial printer discipline 40f wopen discipline 107 filename (unode) 15b sys open (port)

You can stop the trace with the trcstop command. You can examine the output of the trcfile by using the trcrpt command.

Trace Example

Here is an example of trace:

  1. Try to limit the number of operational tty devices on the system before running trace. If possible, only have the problem device operational when issuing the trace.
  2. Start the trace with:
    # trace -a -j 403,107,15b
    
  3. If possible, get a printout of your process table to send to the defect support center. Use the command: ps -ef | enq -P<print_queue_name>.
  4. Allow trace to run until the problem reoccurs. The file size of trcfile is set and will only change if you specify proper command line arguments. The data in the trcfile will continually wrap around when the file gets full (see the -s, -L, and -T flags for trace).
  5. Stop the trace with:
    # trcstop
    
  6. Run trcrpt to verify output of the trace and you can send your /usr/adm/ras/trcfile to your IBM Support Center.

tprof Command

The command tprof is an excellent tool for profiling CPU usage. It estimates where the CPU spends its cycles by periodically sampling the program counter (100 times per second). The process whose address space is referenced by the program counter is charged with the time (tic). A tic is 1/100 raised to the th of a second.

The tool tprof provides CPU usage information for each process or program subroutine that is charged with any CPU usage and for each source statement for C and Fortran programs.

The source programs need not be recompiled for the default reports. If the non-stripped code resides in the current directory, subroutine profiling is reported. If the source is compiled with the -qlist option and resides in the current directory, then source statement usage is reported.


# tprof -x sleep 2
# tprof -k
# cat __prof.all
Process PID Total Kernel User Shared Other
======= === ===== ====== ==== ====== =====
PID.514 514 204 204 0 0 0
bsh 33560248 9 9 0 0 0
tprof 6316 3 3 0 0 0
sleep 33561529 3 3 0 0 0
tprof 5805 2 2 0 0 0
tprof 6583 1 1 0 0 0
bsh 7097 1 1 0 0 0
======= === ===== ====== ==== ====== =====
Total 223 223 0 0 0


Process FREQ Total Kernel User Shared Other
======= === ===== ====== ==== ====== =====
PID.514 1 204 204 0 0 0
bsh 2 10 10 0 0 0
tprof 3 6 6 0 0 0
sleep 1 3 3 0 0 0
======= === ===== ====== ==== ====== =====
Total 7 223 223 0 0 0


Total Ticks For All Processes( KERNEL) = 223

Subroutine Ticks % Source Address Bytes
============= ====== ====== ======= ======= =====
.waitproc 204 91.5 waitproc.s 34068 52
.vmvcs 3 1.3 vmvcs.s 29812 2272
.v_copypage_pwr 2 0.9 cpage_pwr.s 40360 300
.ld_hash 2 0.9 ld_lookup.c 711856 144
.v_alloc 1 0.4 v_alloc.c 346976 460
.deny_job_ctl_write 1 0.4 sysproc.c 265028 364
.getvmpage 1 0.4 v_getsubs1.c 376124 496
.v_releasexpt 1 0.4 v_relsubs.c 403728 456
.call_dispatch 1 0.4 flih_util.s 34608 1512
.dir_search 1 0.4 xix_dir.c 776724 552
.getparent 1 0.4 v_getsubs1.c 374764 1360
.v_dpfget 1 0.4 v_getsubs.c 364900 408
.v_updtree 1 0.4 v_alloc.c 348796 256
.ld_sanity 1 0.4 ld_files.c 718900 840
.ld_relocate1 1 0.4 ld_symbols.c 753156 924
.dc_lookup 1 0.4 xix_dc.c 771600 176

Figure: tprof Output Example for CPU usage

rmap Command

Three files must be created prior to running the command rmap. They are as follows:

You can find more details about this command in InfoExplorer if you wish.

svmon Command

The command svmon provides information about Virtual Memory Management (VMM) statistics. It does not provide a true snapshot of memory, because svmon runs at user level with interrupts enabled.

Following is an example of a global summary report, using the command svmon:


# svmon -G
m e m o r y i n u s e p i n p g s p a c e
size inuse free pin work pers clnt work pers clnt size inuse
12288 12168 120 1772 7123 4980 65 1772 0 0 28672 18833
| | |
| | |
physical | |___ frames on free list
memory |
|__________ frames in use

Figure: svmon Global Summary Report




# svmon -Pa

Pid Command Inuse Pin Pgspace
0 2167 1449 2454
1 init 2562 1171 5036
514 wait 1742 1171 2249
771 netw 1744 1171 2250
1028 netm 1742 1171 2250
1460 fkpr 1742 1171 2250
1558 ksh 2588 1171 5005
.... ..... ..... .... ....

Pid: 1558
Command: ksh

Segid Type Description Inuse Pin Pgspace Address Range
5534 pers /dev/hd1:3121 0 0 0 0..2
7c9f pers /dev/hd2:29905 2 0 0 0..1
1806 work kernel extension 12 12 13 0..12300 : 65536..65535
2409 work shared library 774 0 2729 0..2876 : 60123..65535
16e5 work private 7 2 34 0..23 : 65404..65535
38ae pers code,/dev/hd2:8820 69 0 0 0..90
0 work kernel 1724 1157 2229 0..23836 : 65476..65535
| | |
| | |
pages allocated in | | |___ pages allocated to
this process _______| | page space by this
| process
pages locked in memory by |
this process ____________|

Figure: svmon Output Example

rmss Utility

Reduced memory simulator (rmss) provides the capacity to simulate a reduction in memory on a RISC System/6000 without having to remove the memory cards physically. This also provides a facility to run an application over a range of memory sizes and display the performance statistics.

The following example provides a sample output for the command rmss.


# rmss -s 24 -f 8 -d 4 -n 1 cc -o t.c

Hostname: pippin.itsc.austin.ibm.com
Real memory size: 32.00 Mb
Time of day: Wed Aug 18 15:43:08 1993
Command: cc -o t.c

Simulated memory size initialized to 24.00 Mb.

Number of iterations per memory size = 1 warmup + 1 measured = 2.

Memory size Avg. Pageins Avg. Response Time Avg. Pagein Rate
(megabytes) (sec.) (pageins / sec.)
--------------------------------------------------------------------------------
24.00 0.0 2.1 0.0
20.00 62.0 3.0 21.0
16.00 0.0 2.1 0.0
12.00 1.0 2.0 0.5
8.00 900.0 14.1 64.1

Simulated final memory size.

Figure: rmss Output Example for Memory Capacity Planning

filemon Command

The command filemon monitors disk I/O using trace log data. When invoked it puts itself in the background. It provides the following statistics for different I/O levels:

This utility provides a summary and a detailed report. This report could be used to determine a very busy I/O system. Please note that we need large trace log areas to use this command.

Partial steps to use this command are given below. The output produced by the command filemon is long and it is not included in this section.

  1. To start the utility use the following command:
    # filemon -o /tmp/filemon.report
    
  2. To start the trace log, use the command trcon.
  3. Run your test program.
  4. Issue the command trcoff.
  5. Issue the command trcstop.
  6. After this command, the utility filemon will store the report to the file /tmp/filemon.report as per the above commands. Please view this report for analysis.

fileplace Command

This command displays a mapping of logical files to blocks in logical and physical volumes. This command also reveals fragmentation - poor placement of active files. This command complements filemon for detecting file I/O problems.

Some examples using the command fileplace follow. The figure captions are self-explanatory.


# fileplace -l /unix
File: /unix Size: 1485502 bytes Vol: /dev/hd2 (4096 byte blks)

Logical blocks
--------------
02978-02985 8 blks, 32 KB, 2.2%
02987-03341 355 blks, 1420 KB, 97.8%

Figure: Logical Characteristics of a File




# fileplace -p /unix

Warning: 'l' and 'p' flags are mutually exclusive, 'p' flag is used

File: /unix Size: 1485502 bytes Vol: /dev/hd2 (4096 byte blks)

Physical blocks (mirror copy 1) Logical blocks
------------------------------- --------------
44002-44009 hdisk0 8 blks, 32 KB, 2.2% 02978-02985
44011-44365 hdisk0 355 blks, 1420 KB, 97.8% 02987-03341

Figure: Physical Characteristics of a File

Administrative Remedies for Performance Tuning

This section describes the possible steps to be taken to improve the system performance. User should have knowledge in the following areas, before remedial measures can be tried.

Following is a list of administrative remedies that may be taken when a problem area has been identified using the monitoring tools.

The following performance tuning tools will be discussed in this section.

schedtune Utility

The utility schedtune is made available by the performance PTFs. It is part of the option bosadt.lib.obj in AIX 3.2.5. The source for this will be put in the directory /usr/lpp/bos/samples. This command is built by running another command make schedtune in that directory.

The command schedtune -? displays the usage of the command. The command schedtune without any parameters will display the current settings.

With schedtune you are able to define the parameters h, p, w, m and e. The utility schedtune sets these parameters which are used for the scheduler and the process pacing as per the following description.

The memory control algorithm polls the system every second to determine if thrashing is likely. System thrashing is likely when:

Pages written in last second / Pages stolen in last second) > 1/h
A process is considered eligible for suspension if:
Pages repaged in last second / Page faults in last second) > 1/p
As mentioned earlier, during thrashing conditions, new processes are also suspended.

Prior to reactivating the suspended processes, w one second intervals of no thrashing must occur.

The parameter m is the minimum degree of multiprogramming, or the minimum number of active processes.

Each time a suspended process is reactivated, it is guaranteed to be exempt from suspension for e seconds.

vmtune Utility

The utility vmtune is made available by the performance PTFs. It is part of the option bosadt.lib.obj in AIX 3.2.5. The source for this will be put in the directory /usr/lpp/bos/samples. This command is built by running another command make vmtune in that directory. This command controls VMM page replacement code (PRC) algorithm.

The command vmtune without any parameters shows the current settings which is given in the following example:


# /usr/lpp/bos/samples/vmtune

minperm maxperm minpgahead maxpgahead minfree maxfree numperm
1433 5734 2 8 56 64 873

number of memory frames = 8192 number of bad memory pages = 0
maxperm=70.0% of real memory
minperm=17.5% of real memory


Figure: vmtune Output Example

Following is the explanation of the output:

minperm Steal from both file and process (computational) pages if file
pages is less than this number

maxperm Steal only file pages if the file pages are greater than this
number

minfree Minimum pages in free list, causes page operations to start

maxfree Stop page steal operations when the number of pages in free
list is equal to this number

minpgahead Minimum number of pages to do file readaheads when
sequential file access is detected

maxpgahead Minimum number of pages to do file readaheads when
sequential file access is detected

Log Logical Volume per File System

In order to improve the performance of file system access it is possible that every file system has its own log logical volume. By default, there will be only one log logical volume for each volume group. If the file system activity produces many metadata changes (inodes, directories and so on) these changes result in increased activity to the log file. If the file system and the log file reside on the same physical volume a considerable amount of time may be spent to update the log logical volume.

If we configure a separate log logical volume for a file system, this log may be placed on a less active hard disk. This could reduce the disk contention. Use the following procedure to create and implement an additional log logical volume.

  1. Create a new log logical volume on the selected drive by the command:
    # mklv -t jfslog -y LVname VGname 1 PVname
    

    where:
    LVname Logical Volume name
    VGname Volume Group name
    1 Partitions
    PVname Physical Volume name

    A new log logical volume may also be created with the SMIT fastpath command smit mklv. In the SMIT menu, enter the volume group name, logical volme name, number of partitions and the physical volume name. Then enter jfslog for the field Logical volume type. The command will make a logical volume to be used as log logical volume. Let's assume its name is loglv00.

  2. Format the log logical volume by the command:
    # /usr/sbin/logform /dev/loglv00
    
  3. Modify the attribute log = in the file /etc/filesystems for the affected file systems stanza as below.
    log = /dev/loglv00
    

    **** NOTE: **** Do not change the log logical volume for the file systems /, /tmp, /usr and /var.
  4. Unmount the affected file systems and mount them back.

lvedit Command

The logical volume editor is used for interactive definition and placement of logical volumes within a volume group. The syntax of the command is:

# lvedit VGname

where:
VGname Volume Group name

The command lvedit invokes the logical volume editor in an interactive screen. From within the editor the user can display the current state of the logical and physical volumes in that volume group. The user can make any changes that are possible with the commands mklv, chlv, extendlv and rmlv. However lvedit editor provides a precise control over the placement of logical volumes on a physical volume (mapping of logical partitions to physical partitions).

When a user changes a logical volume within the editor, the editor checks that the modification is possible. If the changes are allowed the editor screen is modified to reflect that change. The actual logical and physical volumes are not altered until the user commits these changes, upon exiting the editor.

Note that lvedit editor can change the attributes of the logical volume, but cannot be used to create or modify file systems, paging space, jfs logs and so on.

Most editing operations (other than extensions) are not permitted on active logical volumes. Before a logical volume can be edited, it should be unmounted with the unmount command. It is suggested that the file systems which are modified should be backed up, to avoid loosing any valuable data.

The commands available in lvedit are explained in detail in the document AIX Performance Monitoring and Tuning Guide.

How to Resize or Move the hd6 Paging Space

You may want to move the existing hd6 default paging space logical volume on another disk or to make it smaller for two reasons:

Here are the two procedures that describe how to do it in these cases.

Reduce the Size of hd6 Paging Space

Some of the steps described may not be apparent, but it is important to know they are all necessary. A paging space cannot be deactivated while the system is running, that's why the additional steps are needed.

If you want to reduce hd6 paging space, be sure to leave enough paging space for your software. Generally, the paging space is at least two times the size of your real memory (you can check this size with the command bootinfo -r that gives you the amount of real memory in kilobytes).

In this procedure, we assume the hd6 paging space is defined in the rootvg volume group, on the disk hdisk0.

  1. Create a temporary paging space on rootvg (its size is 80MB):
    # mkps -a -n -s 20 rootvg
    

    **** NOTE: **** Let's assume the name of this new paging space is paging00.
  2. Deactivate hd6 paging space for the next reboot:
    # chps -a n hd6
    
  3. Change the swapon /dev/hd6 entry in the file /sbin/rc.boot:

    Replace this entry by this new one: swapon /dev/paging00.

  4. Build a new boot image on the hard disk:
    # bosboot -d/dev/hdisk0 -a
    
  5. Shut down the operating system and reboot it:
    # shutdown -r
    
  6. Remove the hd6 paging space:
    # rmps hd6
    
  7. Create a new logical volume of the size you want for the hd6 paging space (here its size is 40MB):
    # mklv -t paging -y hd6 rootvg 10
    
  8. Change the swapon /dev/paging00 entry is the file /sbin/rc.boot back:

    Replace this entry by this new one: swapon /dev/hd6.

  9. Build a new boot image on the hard disk:
    # bosboot -d/dev/hdisk0 -a
    
  10. Make the hd6 paging space available to the system:
    # swapon /dev/hd6
    
  11. Change the temporary paging space paging00 to not activate at the next reboot:
    # chps -a n paging00
    
  12. Shut down and reboot the system with the following command:
    # shutdown -r
    
  13. Remove the temporary paging space:
    # rmps paging00
    

Move the hd6 Paging Space on Another Disk of the Same Volume Group

It is not recommended to move hd6 paging space from rootvg to another volume group, because the system assumes in lots of cases it is on rootvg and its name is hard-coded in the boot process and in the getrootfsprocedure, which is used in maintenance mode. You'd better create hd6 as small as possible on rootvg, then create other paging spaces on other volume groups if you wish.

Here is the simple command that moves hd6 from hdisk0 to hdisk1:

# migratepv -l hd6 hdisk0 hdisk1

Tips to Increase Disks Performance

Here are some tips to increase the disks performance by organizing the logical volumes, determing LVM configuration, and eliminating fragmentation.

Organizing Logical Volumes on the Disk to Increase I/O Performance

The intra-physical volume allocation policy describes the location of the logical volume on the disk. This may be set when the logical volume is created with the mklv command or changed with the chlv command. The values may be center, midway, or edge. The disk is layed out as described in Figure - Physical Partitions Distribution Summary in Logical Volume Manager.

To change the intra-policy, use the chlv command as follows:

chlv -a [c m e] LVname

Logical volumes with the most active I/O should be placed on the center, while inactive ones should reside on the edges. Note that the logical volume might be split accross the different edges when allocated. Since it is assumed that the disk heads generally linger in the center of the disk, it would take equal time for the heads to move to the edges to capture data from the least active logical volumes. This would impact pergormance if the data on the edges is read frequently or in large blocks.

The inter-physical volume allocation policy should also be considered when organizing the logical volumes on the disk. This policy will determine wheter a logical volume can be extended across multiple physical volumes (disks). To increase performance, the value should be maximum, indicating that the logical volume should be extended across the maximum number of physical volumes. This, however, will decrease the availability of the logical volume should any one of the physical volumes become unavailable. To restrict a logical volume to a single physical volume (or the minimum number of physical volumes which will hold the logical volume), use minimum allocation.

To change the inter-policy, use the chlv command as follows:

chlv -e [x m] LVname

where:
x = maximum allocation policy
m = minimum allocation policy

After the policies have been set for all logical volumes involved, the logical volumes may be reorganized by using the reorgvg command. This command will reorganize all physical partitions to match the allocation policies assigned to each logical volume. Since the system cannot always fill all of the requirements, it will put the logical volumes as close as possible to the positions specified. Use the command:

reorgvg VGname LVname1 LVname2 ...

where:
VGname is the name of the volume group in which the logical volumes are contained
LVname1, LVname2 and so on are the names of the logical volumes to be reorganized.

Note that the order of the logical volume names indicates the precedence of the organization of the logical volumes. The first one on the list will get the highest consideration in its allocation position. If you do not list any LVnames, then all logical volumes will be reorganized. Note that, in this case, the precedence will simply be determined by the order in which the logical volumes were originally created.

Determining LVM Configuration Based on Performance and Availability

In general, whenever LVM is configured to provide better performance, the availability of the system is impacted.

To configure the system for the highest performance, follow these guidelines when creating logical volumes:

To configure the system for the highest availability, follow these guidelines when creating logical volumes:

Also consider these factors which may increase system availability:

Eliminating Fragmentation on the File Systems

To determine the physical partitions fragmentation on a disk, such as hdisk0, you would issue the command lspv -l hdisk0. This will list the logical volumes that are on that disk, the number of physical partitions from that logical volume that are on that disk, and how those physical partitions are distributed on the disk. Under DISTRIBUTION you can see how many physical partitions are in the outer edge, outer middle, center, inner middle, and inner edge sections of the disk.

To view the actual order of physical partitions on the disk, you would issue the lslv -M hdisk0 command. You will be able to see which logical volumes are on the fragmented portions of the disk. The numbers shown in the output are the physical partition number on the disk (physical volume) and the physical partition number on the logical volume.

If you have a logical volume on fragmented portions of a disk, you may reorganize your disk. You would want the logical volumes with the highest I/O to be located nearest the center of the disk.

Although you can check the fragmentation of physical partitions, you cannot check the fragmentation of your actual data within a file system. This fragmentation is caused by continual creation and removal of files. If you notice that performance has degraded over a period of time, then it is quite possible that there is fragmentation within your file system.

The only way to alleviate this type of fragmentation is to back up the file system by name, delete the file system, then recreate the file system with the mkfs command, and then restore the file system data.