User Resource Management

In order for a user to get anything useful accomplished with a computer, he must appropriate some of the system's resources for his project. Depending on the user, he may clean his user directory very often, erasing the temporary files regularly, and designing his programs the optimized way, or he may accumulate big outdated files as well, writing bad programs that eat a lot of CPU.

The system sets user limits, the disk resources using quota limits, and to calculate different precious information about the users and what they are running on the system.

User Limits

Different limits are imposed by the kernel and deal with a per-process basis. The kernel maintains a list of the limits imposed on a process inside its own address space, and controls access to this data through system calls.

Hard and Soft Limits

The complete set of user attributes is defined in the /usr/lib/security/mkuser.default file, /etc/security/user file, and /etc/security/limits. The defaults can be changed by editing the default stanza in the appropriate security file. Many of the defaults are defined to be the standard behavior.

This section is a discussion of the default values in /etc/security/limits file. This file defines process resource limits for each user.


**** NOTE: **** When editing the /etc/security/limits file, putting a value less than or equal to zero implies it is unlimited.

Here are the default values set when you create a user:


default:
fsize = 2097151
core = 2048
cpu = -1
data = 262144
rss = 65536
stack = 65536

These values are used as default settings when a new user is added to the system. They can be changed when the user is added (mkuser command) or after the user is created (chuser command). At login time, these values will be used to set the user's process limits.

ulimit (a shell built-in in the ksh, bsh, and csh shells) can be used to view and change these values (if you have the authority to change the hard limit).

Resource limits are divided into two types: the soft limits and the hard limits.

Soft limits are resource use limits currently applied by default when a new user is created. A user may increase these values up to the system-wide hard limits.

Hard limits are thus defined as absolute ceilings on resource use. To increase the hard limits you need privilege.

The current limits of a user are inherited in the child processes of this user, until he logs into the system again.

When a user changes a soft current limit to a lower value, then he can't change back to a greater value, until he logs into the system again.

Most of the following information is from the setrlimit(), getrlimit() file, and contributors. The RLIMIT_XXXX definitions come from the setrlimit system call. The units mentioned are the ones that are used to set limits with strlimit.

Example: If you were using strlimit() to set the fsize, you would send the value in bytes, but if you were using korn shell built-in ulimit you would specify the value in 512-byte blocks.

There is also a ulimit system call that the shell built-ins call. The ulimit referenced in this document is the shell built-in ulimit.

/etc/security/limits File

Here is a description of the different fields that /etc/security/limits file contains. You can use the strlimit() and getrlimit() subroutines with the parameters described with each field of this file.

Disk Quotas

AIX 3.2 has implemented a disk quota system which provides a method for controling disk space usage. This system is based on the Berkeley disk quota system and allows assignment of quotas for the AIX journaled file system only.

Disk quotas may be assigned based on three parameters and may be modified using the edquota command. The three parameters are:

  1. User's or groups soft limits

    The number of 1KB disk blocks or files below which the user should remain.

  2. User's or groups hard limits

    The maximum amount of disk blocks or files the user can accumulate.

  3. Quota grace period

    There is a grace period that will allow the soft limit to be exceeded (the default is one week). If the user fails to reduce usage below the soft limit during the specified time, the system will interpret the soft limit as the maximum allocation for that user. The user may correct this condition by removing enough files to reduce usage to below the soft limit.

The disk quota system tracks user and group quotas in the quota.user and quota.group files. These files reside in the root directories of file systems that have quotas assigned. These files are created with the edquota and quotacheck commands and are readable with the quota command.

Typically, only file systems that contain user home directories and files would use quotas. It is recommended that disk quotas never be assigned to the /tmp file system.

A system administrator would consider implementing the disk quota system under the following conditions:

Recovering from Over-Quota Conditions

When you have exceeded quota limits, here are the three procedures you can follow to go on with your process.

How to Set Up the Disk Quota System

Prerequisites: In order to set up the disk quota system you must have root user level authority.

  1. Determine the file systems that you will apply quotas to. In AIX 3.2 user directories default to the /home file system.
  2. The stanza that applies to the /home would be identified and edited in the /etc/filesystems file to include quota, userquota, and groupquota configuration attributes.

    An example stanza line that enables user quotas would be in the format:


      quota = userquota

    In a file system that has both user and group quotas enabled the entry might look similar to the following:


    /home:
    dev = /dev/hd1
    vol = "/home"
    mount = true
    check = true
    free = false
    vfs = jfs
    log = /dev/hd8
    quota = userquota,groupquota
    options = rw

  3. An option would be to specify disk quota file names. The file names quota.user and quota.group are the default names located at the root directory of the file system enabled with quotas. You may specify alternate names or directories for these quota files with the userquota and groupquota attributes.

    A sample entry might look like the following for user bill of the qgroup group:


    /home:
    dev = /dev/hd1
    vol = "/home"
    mount = true
    check = true
    free = false
    vfs = jfs
    log = /dev/hd8
    quota = userquota,groupquota
    userquota = /home/myquota.bill
    groupquota = /home/myquota.qgroup
    options = rw

  4. If not previously mounted, mount the file systems.
    # mount /home
    
  5. Use the edquota command to set the user or group soft and hard limits. The edquota command has the following syntax:
     User Quotas
    edquota -u ( -p Proto-UserName ) UserName ...

    Group Quotas
    edquota -g ( -p Proto-GroupName ) GroupName ...

    Change User or Group Grace Period
    edquota -t -u . -g

    The edquota command creates and edits quotas. It creates a temporary file that contains each user's and group's current disk quotas. It determines the list of file systems with established quotas from the /etc/filesystems file.

    The edquota command invokes the vi editor (or the editor specified by the EDITOR environment variable) on the temporary file, so that quotas can be added and modified.

    The fields that are displayed in the temporary file are:


    **** NOTE: **** A hard limit with a value of 1 indicates that no allocations are permitted. A soft limit with a value of 1, in cunjunction with a hard limit with a value of 0, indicates that allocations are permitted only on a temporary basis.

    More information on the edquota command can be found in InfoExplorer and the edquota man page, or the hardcopy manuals.

    A sample edquota file edit might look like:


    # edquota -u bill
    Quotas for user bill:
    /home: blocks in use: 16, limits (soft = 10, hard = 30)
    inodes in use: 4, limits (soft = 0, hard = 0)

  6. The quota system must be enabled using the quotaon command. The syntax for the quotaon command follows:
    quotaon or quotaoff Command

    Purpose - Turns on and off file system quotas.

    Syntax
    quotaon -g -u -v ( -a . FileSystem ... )

    quotaoff -g -u -v ( -a . FileSystem ... )

    The quotaon command enables disk quotas for one or more file systems specified by the FileSystem parameter. The specified file system must have an entry in the /etc/filesystems file, and must be mounted. The quotaon command looks for the quota.user and quota.group default files in the root directory of the associated file system. These file names may be changed in the /etc/filesystems file.

    By default, both user and group quotas are enabled. The related flags or options are:

    Examples:

    1. To enable user quotas for the /home file system, enter:
      # quotaon -u /home
      
    2. To disable user and group quotas for all file systems in the /etc/filesystems file and print a message, enter:
      # quotaoff -v -a
      
  7. Use the quotacheck command to check the quota files against actual disk usage. It is recommended that you do this each time you first enable quotas on a file system, and after you reboot the system. This process can be automated by adding the following entries to the /etc/rc file:
    echo " Enabling file system quotas "
    /usr/sbin/quotacheck -a
    /usr/sbin/quotaon -a

    The quotacheck command has the following format and syntax:


    quotacheck Command

    Purpose - Checks file system quota consistency.

    Syntax
    quotacheck -g -u -v ( -a . FileSystem ... )

    By default, both user and group quotas are checked.


    **** NOTE: **** It is recommended that the specified file system not be active while the quotacheck command is running.

    Examples:

    1. To check the user and group quotas in the /home file system, enter:
      # quotacheck /home
      
    2. To check only the group quotas in the /home file system, enter:
      # quotacheck -g /home
      
  8. You can use the repquota command to print a summary of quotas and disk usage for a file system. If the -a flag is specified instead of a file system, the repquota command prints the summary for all file systems enabled with quotas.
    Syntax
    repquota -v -g -u ( -a . FileSystem ... )

    Example:

Accounting

When you want to manage a system, you need to have a view of the system performance all day long. In AIX we have several methods to understand performance.

One you can use is the accounting process, to understand the global usage of your station. This chapter explains how you can use the accounting process to collect and show performance.

Accounting Process

The accounting process is using a collection of information you extract by some commands automatically:

startup
To initialize the accounting process (/var/adm/pacct file)
login/init
Put connection time in /var/adm/wtmp file
dodisk
Put disk-usage /var/adm/dacct file
enq/qdaemon
Put printer-usage in /var/adm/qacct file
chargefee
Put charge in /var/adm/fee file
Using these commands, you can get the different time consuming:

You see in Figure - Simplified Accounting Organization the simplified organization of AIX accounting.


Figure: Simplified Accounting Organization

AIX accounting is a complex system of commands, shell scripts and C programs coming from the System V version.

You have the raw data files setting by some process in the /var/adm directory. The /var/adm/acct/nite directory is the directory where runacct puts daily summary files. The script prdaily works with those input files to create daily report. At the end you have the cumulative summary files where the monacct script picks up input to create a monthly report.

runacct Command

The runacct command can seem complicated because it is powerful. Figure - runacct Detail shows the logic of this command. There is information collected starting with the command startup, putting CPU consuming per process in the pacct file (/var/adm/pacct). You can set this command in the /etc/rc script to automatically start the accounting process. The login and init processes put the connection time in the /var/adm/wtmp file. The dodisk script puts the file system consuming in the /var/adm/dacct, as enq/qdaemon do for the printer usage in the /var/adm/qacct. You can charge a user with the chargefee script for an extra work.

All this data is analyzed by runacct, which uses acctcon1, acctcon2 and acctmerg programs to create in the /var/adm/acct/nite directory the results of the day: daytacct file about the process time and connect time. The merging (disk usage, printer usage and fee charged) of the day in the tacctmmdd file in the sum directory (where dd means day and mm month).

The acctcms command gives you the consuming day commands in the /var/adm/acct/sum/daycms file and puts the cumul from previous files and the dayly file in the /var/adm/acct/sum/cms file.

After all that data concatenation in the sum directory, runacct calls a /var/adm/siteacct script file if it exists, which you can set up as a special user exit.

At the end runacct calls prdaily command which creates a big report of the day with all the special reports.


Figure: runacct Detail

Accounting Reports

When you want to know what the user was connected to yesterday, you can use the ac command. It gives you the connect time records. This command can show you the connect time of each user per day. Its input comes from /var/adm/wtmp by default, which is clean after each accounting process (normally every day).


$ ac -p
user1 5.64
user2 15.48
user3 20.05
total 41.17

If you want to know what happens on your system the previous day, you can use the /etc/utmp file:
 $ ac -p -d -w /etc/utmp
user1 5.64
user2 15.48
user3 20.05
Jul 28 total 41.17
root 16.05
Jul 29 total 16.05

General Daily Report

If you want to have a daily report of your activity you can use the prtacct command, if you belong to the adm group. You can specify which column you want to see:


# prtacct -f 1-6 /var/adm/acct/sum/tacct0729 | pg

Fri Jul 30 11:34:40 CDT 1993 Page 1

LOGIN CPU CPU KCORE KCORE
UID NAME PRIME NPRIME PRIME NPRIME
(1) (2) (3) (4) (5) (6) (*)
0 TOTAL 41 45 4.100e+06 4.398e+06
203 pascale 4 0 100546 0
201 roland 6 0 213954 3704
202 swamy 10 1 1.795e+06 239507
18002 yannick 14 32 1.380e+06 3.256e+06
. . . . . . . .

The meaning of the columns follows:
(*)
Column number, does not exist in the report, it's just helpful to know the column
UID
user ID
LOGIN NAME
user name
CPU PRIME
Total CPU time of all the user's processes in minutes during prime time (see /etc/acct/holidays file)
CPU NPRIME
Total CPU time during the non-prime time (defined in /etc/acct/holidays file)
KCORE PRIME
Total memory used by running processes, in kilobyte-minutes, it's a function of the memory used times the length of use, it is not accurate.
KCORE NPRIME
Total memory used by running processes, in kilobyte-minutes, during non-prime time
In the /etc/acct/holidays file you specify the prime time and the holidays, see the set up in this chapter.

You can specify comma-separated list of fields numbers of ranges as parameter of the prtacct command, with -f flag.


# prtacct -f 2,7-12 /var/adm/acct/sum/tacct0729 | pg

Fri Jul 30 11:34:48 CDT 1993 Page 1

LOGIN RD/WR RD/WR BLKIO BLKIO CONNECT CONNECT
NAME PRIME NPRIME PRIME NPRIME PRIME NPRIME
(2) (7) (8) (9) (10) (11) (12) (*)
TOTAL 781380 917795 14517 2903 2720 361
pascale 8795 0 0 0 554 0
roland 390454 726 0 0 45 3
swamy 33888 2818 0 0 38 5
yannick 133270 733176 0 0 22 310
. . . . . . . .

(*)
Column number, does not exist in the report, it's just helpful to know the column
RD/WR
Number of reads and writes per user
BLKIO
Number of block-I/O per user
CONNECT
Total connect time (how long the user was logged in)

Here are the last columns of this report:


# prtacct -f 2,13-18 /var/adm/acct/sum/tacct0729 | pg


Fri Jul 30 11:34:57 CDT 1993 Page 1

LOGIN DISK PRINT FEES # OF # OF # DISK
NAME BLOCKS PROCS SESS SAMPLES
(2) (13) (14) (15) (16) (17) (18) (*)
TOTAL 499608 0 15 8004 61 12
pascale 2760 0 5 0 1 1
roland 4260 0 0 134 1 1
swamy 50 0 0 254 1 1
yannick 8530 0 0 4693 1 1
. . . . . . . .

(*)
Column number, does not exist in the report, it's just helpful to know the column
DISK BLOCKS
Average total amount of disk space used by the user
PRINT
Number of lines printed
FEES
Total fees entered with chargefee
# OF PROCS
Total number of processes belonging to this user
# OF SESS
Number of distinct login sessions
# DISK SAMPLES
Number of times dodisk was run during the accounting period

You can chose the columns and the day you want for analyzis, for example:

# prtacct -f 2,3,5,7,11 /var/adm/acct/sum/tacct0730 | pg

You can use any file with the acct format like:

# prtacct -f 2-6,11 /var/adm/acct/nite/daytacct | pg

Process Time Report

If you want to know the process time of finished processes, you use the /var/adm/pacct file. You can see the active processes with the ps command, or to see how much time the system uses for a particular command with time command or timex, which we are talking about later.

With the acctcom command you use by default the /var/adm/pacct file, where the system collects the process time. You can specify a lot of options with this very powerful command. If you want to know the last processes running (option -b) with the characters transferred (option -i) and just 10 lines, you enter:


# acctcom -b -i | head

COMMAND START END REAL CPU CHARS BLOCKS
NAME USER TTYNAME TIME TIME (SECS) (SECS) TRNSFD READ
bsh yannick pts 13:59:00 13:59:00 0.05 0.01 0 0
lscons yannick pts 13:59:00 13:59:00 0.02 0.02 11 0
#acctcom root pts 13:58:51 13:58:51 0.19 0.14 34800 0
#head root pts 13:58:51 13:58:51 0.17 0.02 9504 0
sendmail root ? 13:58:44 13:58:44 0.02 0.01 5120 0
#vi root pts 13:57:52 13:58:22 30.72 0.23 30416 0
ls roland pts 13:57:27 13:57:32 5.00 1.30 35856 0

The headings are self-explanatory. If you want to know just the average statistics about these processes:


# acctcom -a -q

cmds=3757 Real=48.96 CPU=0.13 USER=0.04 SYS=0.08
CHAR=30150.63 BLK=1.55 USR/TOT=0.34 HOG=0.26


where:
the HOG factor gives the ratio:
         (total CPU time) / (elapsed time)

Command Usage Summaries

There is a command similar to acctcom that allows analysis of the different command usage summaries: it is the acctcms command. It also has some nice options. To see the 10 best CPU consuming commands of yesterday (the file daycms is about the previous day), you enter:


# acctcms -a -s -j -c /var/adm/acct/sum/daycms | head -16

TOTAL COMMAND SUMMARY
COMMAND NUMBER TOTAL TOTAL TOTAL MEAN MEAN HOG CHARS BLOCKS
NAME CMDS KCOREMIN CPU-MIN REAL-MIN SIZE-K CPU-MIN FACTOR TRNSFD READ

TOTALS 7777 89895.01 61.29 14890.07 1466.61 0.01 0.41 8.093e+08 2906.00

ileaf 5 78213.12 19.34 1947.93 4044.80 3.87 0.99 5.943e+07 0.00
xant 29 4271.04 12.61 2151.44 338.71 0.43 0.59 1.235e+07 0.00
xlock 6 1358.19 12.28 1530.37 110.59 2.05 0.80 3.835e+08 0.00
uncompre 53 1747.57 5.75 6.48 304.01 0.11 88.75 1.916e+08 0.00
compress 6 2358.44 4.96 5.73 475.35 0.83 86.53 1.095e+08 0.00
custom 8 657.37 0.94 105.83 702.95 0.12 0.88 2.959e+06 0.00
bsh 3021 40.23 0.83 17.44 48.46 0.00 4.76 1.204e+06 0.00
lscons 2874 36.87 0.71 0.78 52.01 0.00 90.92 31614.00 0.00
aixterm 7 195.64 0.45 1087.51 435.52 0.06 0.04 1.23e+06 0.00
***other 35 268.85 0.43 33.85 627.60 0.01 1.27 1.551e+07 2903.00

We use the -a and -s option to see the report in ASCII mode, the -j option to combine the commands together and -c to sort the commands by total CPU time rather than total Kcore-minutes.

Disk Usage Report

To know how much disk space is allocated to each user, you need to run the dodisk command to create your /var/adm/acct/nite/dacct input file. You can use, as usual, the prtacct command, to see your report; the acctmerg -a command works also.


**** NOTE: **** The dodisk command is generally run from the /var/spool/cron/crontabs/root cron file. See How to Set Up the Accounting Process.

# prtacct -f 1-2,13,18   /var/adm/acct/nite/dacct | pg

Sat Jul 31 15:43:38 CDT 1993 Page 1

LOGIN DISK # DISK
UID NAME BLOCKS SAMPLES

0 TOTAL 451440 13
0 root 317664 1
- - - - - - - - - - - - - - - - - -
201 roland 11824 1
202 swamy 22240 1
203 pascale 3048 1
18002 yannick 84096 1

Printer Usage Report

You use the pac command to show the printer usage. The acctmerg -q /var/adm/qacct command works also.


# pac -P lp1 yannick roland
Login pages/feet runs price

You can see printer activity per user.

Global Report

runacct script calls the prdaily script which builds in your /var/adm/acct/sum directory a general daily report rprtmmdd.

Because it's a very large wide report, you need to rotate it. If you have a PostScript printer, you can print this report after you rotate it with the following command:

# enscript -r rprt0729 | qprt -P ps

where:
ps is the name of your postscript print queue.

If you cannot print it, you can edit it.

You have the same kind of report in the /var/adm/acct/fiscal directory, where monacct script puts the fiscrptmm file about the month mm. You can edit or print this file.

How to Set Up the Accounting Process

It is very easy to start the accounting process.

First you need to check the option bosext2.acct.obj to make sure it is installed. Then customize your files and create your collection directories.

In this section, we will use the root user for setting up system accounting and the user adm for running the reports.

You may want to modify the root user .profile file in order to provide access to the required executables. The modification of the root profile is optional, but will probably help with administration. As root user, follow these guidelines:

  1. Edit the .profile file and modify the PATH variable. It must include tge /usr/sbin/acct and /var/adm/acct directories. PATH should look as follows:
    PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:\
    /usr/sbin/acct:/var/adm/acct
    export PATH
    
  2. Enter this command to ensure correct permissions and provide acess to the wtmp and pacct files:
    # su - adm -c /usr/sbin/acct/nulladm /var/adm/wtmp /var/adm/pacct
    
  3. Create the three directories that will collect the records if they don't already exist. They must be owned by user adm, group adm:
    # su - adm
    # cd /var/adm/acct
    # mkdir nite sum fiscal
    
  4. Update the /etc/acct/holidays file for the current year.

    The first data line (that is not commented line) should contain the current year, when prime time begins, and when prime time ends.

    The next data lines should contain four fields that are the day of the year (from 1 through 365), the month, the day of the month, and the description of holiday.

    Here is an example for year 1993:


      1993  0700            1900
    * Day of Calendar Company
    * Year Date Holiday
    *
    1 Jan 1 New Year's Day
    ---------------------
    185 Jul 4 Independence Day
    ---------------------
    359 Dec 25 Christmas Day
    365 Dec 31 New Years Eve

  5. Turn on process accounting. The startup command does that for you. You can set up this command in /etc/rc script file, adding the following entry at the end of the file (or just uncommenting it if it already exists):
    /usr/bin/su - root -c /usr/sbin/acct/startup

    Thus the accounting process starts after each boot.

    If you don't want the accounting process to be run at each boot, you can start it manually with this command:

    # /bin/su - root -c /usr/sbin/acct/startup
    
  6. Add this line in each stanza of /etc/filesystems file if you want to report disk accounting for this file system:
      account = true
    
  7. Add this line in each stanza of /etc/qconfig file if you want to report printer accounting for this print queue:
      acctfile = /var/adm/qacct
    

    **** NOTE: **** This statement only works for local printers, not remote printers. It doesn't work for transparent printers, or postscript printers either. There will be no accounting records if you specify the pass-through option when printing your file. The postscript option queue is considered to be pass-through, because it uses a pass-through formatter. The backend program determines what constitutes a page. When using the pass-through option and when printing remotely, you do not use the backend program. Also, if you specify your page length to be zero, qacct information will not be recorded. In addition, all print queues must use the same accounting file. The pac command (printer accounting command) is more versatile because it allows separate accounting files for each printer. For more information, see InfoExplorer
  8. Edit the adm crontab file to run automatically the accounting procedures. You can do this with the following command, logged as root:
    # su - adm -c crontab -e
    

    Here is a sample of /var/spool/cron/crontabs/adm file:



    30 01 * * 1-6 /usr/sbin/acct/runacct 2> /var/adm/acct/nite/accterr
    # Start runacct every Monday through Saturday (1-6) at 01:30 (30 01)
    0 1 * * 5 /usr/sbin/acct/dodisk
    # Start disk accounting at 1:00 am (0 1) every Friday (5), before runacct
    0 * * * * /usr/sbin/acct/ckpacct
    # Check the accounting file every hour (0 *) every day (* * *)
    0 2 1 * * /usr/sbin/acct/monacct
    # Start monacct at 2:0 am (0 2) on the first of every month



**** NOTE: **** Don't forget to increase the /var file system to have enough space for your accounting data.

How to Use the Accounting

Basically you use the accounting process to charge the users according to the resource consumed. So you can analyze the daily report or better the /var/adm/acct/fiscal/fiscrptmm report.

Monthly Fee

First extract from this report the general data, which you want to analyze. Use the sed command as follows:

# sed -n '/0 *TOTAL/,/^$/p' /var/adm/acct/fiscal/fiscrpt11 | \
sed '/^$/d' >general_input
You have to know the fields of this report are as follows:
UID NAME CPU   CPU     KCORE KCORE
         PRIME NPRIME  PRIME NPRIME
$1  $2   $3    $4      $5    $6
CONNECT CONNECT DISK   # OF  # OF    DISK    FEES
PRIME   NOPRIME BLOCKS PROCS SESSION SAMPLES
$7      $8      $9     $10   $11     $12     $13

The $n variable is the field used for an awk program to charge any users.

Here is a sample of an awk program to do that:


BEGIN {
OFMT = "%6.2f"
OFS = " "
cpucharg = .15
concharg = .0015
dasdcharg = .005
fee = .1
}
{ name = $2
cpu = ( $3 + $4 ) * cpucharg
con = ( $7 + $8 ) * concharg
if ( $12 == 0 )
dasd = 0
else
dasd = ( $9 / $12 ) * dasdcharg
fees = $13 * fee
total = cpu + con + dasd + fees
printf("NAME CPU CON DASD FEE TOTAL\n")
printf("%s %6.2f %6.2f %6.2f %6.2f %6.2f\n",name,cpu,con,dasd,fee,total)
}

You can call the awk program with the following command:

# awk -f fee.awk general_input

With this simple awk program example, you can charge any user for the resource consumed.

User Analysis

With the accounting process we can find the biggest consumer of the CPU. Is it a user or is it a process? Every day we have a general report on how much user consumes and how many read/write I/Os they perform. So you can sort the five most consuming users with the following script:


#! /bin/ksh
# name /var/adm/sum/best5
# Every day find the five most consuming CPU users
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
ACCT_NAME="$ACCT/sum/tacct`date +%m%d`"
if [ -r $ACCT_NAME ]
then
# write the head lines
prtacct -f 2-3,5,7 $ACCT_NAME | sed -n '/LOGIN/,/^$/p' \
> $ACCT/nite/BEST$MMDD
# write the 5 users sorted by CPU field
prtacct -f 2-3,5,7 $ACCT_NAME | sed -n '/TOTAL/,/^$/p' \
| sort -n -r +1 | head -6 >> $ACCT/nite/BEST$MMDD

fi

You'll get the five best users in the BESTMMDD file, in the /var/adm/acct/nite directory.

Process Analysis

You can find the most important CPU processes with the acctcom command, which is working with the /var/adm/pacct file. Here is a script to sort the ten most important processes:


#! /bin/ksh
# name /var/adm/acct/sum/proc10
# best process
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
# acctcom parameters -t CPU time system and user
# -r CPU factor ( user-time ) / ( system time + user time )
# -h HOG factor ( total CPU ) / ( elapsed time )
# -v no head lines
# head lines in the PROCmmdd file
acctcom -t -r -h | head -3 | sed -n '/COMMAND/,/NAME/p' \
> $ACCT/nite/PROC$MMDD
# sort on the CPU system and CPU user
acctcom -t -r -h -v | sort +6 -7 -n -r | head -10 \
>> $ACCT/nite/PROC$MMDD
# show the total statistics
acctcom -a -q >> $ACCT/nite/PROC$MMDD

Be careful with this analysis, the pacct file gives you the start and end times, but not the day of the starting. So you can have some processes running all day long and several days, and finishing the analyzis day. You will see these processes with some very big figures.

Command Analyzis

At the end you can use the powerful command acctcms to see expensive commands. Here is a script file example:


#! /bin/ksh
# name /var/adm/acct/sum/cmd10
# best command
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
# acctcms parameters -a ascii
# -s ascii the command name
# -j combine all the command called only once
# -c sort by total CPU time rather than total Kcore-minutes
acctcms -a -s -j -c $ACCT/sum/daycms | head -16 > $ACCT/nite/CMD$MMDD

Sometimes you will be very surprised to see a very common command is very expensive. For instance the xlock command is often very CPU consuming! You can fix this situation with the nice command if necessary.

User Exit

You can combine all the different scripts in the cron process, but the better idea is to use the siteacct script called by runacct.

Here is a sample of that script:


#! /bin/ksh
# name /var/adm/siteacct
/var/adm/acct/sum/best5
/var/adm/acct/sum/proc10
/var/adm/acct/sum/cmd10

You can read three light reports every day to understand your resource utilization. The accounting process is a powerful tool to follow your machine activity.

Diagnosing Problems

This section will help you diagnose problems, that is to ask yourself the useful questions, and will give you an idea of some other things to check.

General Information Needed

Here is the information you should get when you have an accounting problem. We could summarize the questions you have to ask yourself as follows. You could send this information to the IBM Support Center to help you determine the problem.

  1. Which accounting command is being used?
    1. acctcms
    2. acctcom
    3. acctcon
    4. acctdisk
    5. acctmerg
    6. acctprc
    7. acctprc1
    8. acctprc2
  2. All of the accounting commands listed above accept input from standard input (that is acctcms < /usr/adm/pacct) and redirect to standard output (that is acctcmd < /usr/adm/pacct > /tmp/report). Find out which accounting file is being used as standard input and where the output is being directed. Defaults are standard in and standard out.
      acct_command < In_file > Out_file
    
  3. Note all command flags used. Exact syntax is very important.
  4. How is accouting started? Via cron or via command line?
  5. If accounting is started via cron, note the crontab file entries. These entries will be found in one of two places. If accounting is run from root, then the file will be in /usr/spool/cron/crontabs/root. If accounting is run from adm, then the file will be in /usr/spool/cron/crontabs/adm.

When Accounting Fails

When accounting fails, you have to check the following points:

  1. Check to see what state accounting is in. Look at the /usr/adm/acct/nite/active file. States are:
    1. setup
    2. wtmpfix
    3. connect1
    4. connect2
    5. process
    6. merge
    7. fees
    8. disk
    9. queueacct
    10. mergetacct
    11. cms
    12. userexit
    13. cleanup
  2. Check /var/adm/acct/nite/accterr file for additional messages.
  3. Based on error messages, fix the files by using the guidelines in "Fixing Damaged Files" in InfoExplorer.
  4. Restart accounting.

Other Things to Check

If you didn't solve the problem, check the following points also:
  1. /usr is out of space (use the df command to check)
  2. /var/adm/wtmp file has records with inconsistent data stamps
  3. cron fails (check mail for root and adm users)