In order for a user to get anything useful accomplished with a computer, he must appropriate some of the system's resources for his project. Depending on the user, he may clean his user directory very often, erasing the temporary files regularly, and designing his programs the optimized way, or he may accumulate big outdated files as well, writing bad programs that eat a lot of CPU.
The system sets user limits, the disk resources using quota limits, and to calculate different precious information about the users and what they are running on the system.
Different limits are imposed by the kernel and deal with a per-process basis. The kernel maintains a list of the limits imposed on a process inside its own address space, and controls access to this data through system calls.
The complete set of user attributes is defined in the /usr/lib/security/mkuser.default file, /etc/security/user file, and /etc/security/limits. The defaults can be changed by editing the default stanza in the appropriate security file. Many of the defaults are defined to be the standard behavior.
This section is a discussion of the default values in /etc/security/limits file. This file defines process resource limits for each user.
Here are the default values set when you create a user:
default:
fsize = 2097151
core = 2048
cpu = -1
data = 262144
rss = 65536
stack = 65536
These values are used as default settings when a new user is added to the system. They can be changed when the user is added (mkuser command) or after the user is created (chuser command). At login time, these values will be used to set the user's process limits.
ulimit (a shell built-in in the ksh, bsh, and csh shells) can be used to view and change these values (if you have the authority to change the hard limit).
Resource limits are divided into two types: the soft limits and the hard limits.
$ ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ ulimit -Ha
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) unlimited
stack(kbytes) unlimited
memory(kbytes) unlimited
coredump(blocks) unlimited
Soft limits are resource use limits currently applied by default when a new user is created. A user may increase these values up to the system-wide hard limits.
Hard limits are thus defined as absolute ceilings on resource use. To increase the hard limits you need privilege.
The current limits of a user are inherited in the child processes of this user, until he logs into the system again.
When a user changes a soft current limit to a lower value, then he can't change back to a greater value, until he logs into the system again.
Most of the following information is from the setrlimit(), getrlimit() file, and contributors. The RLIMIT_XXXX definitions come from the setrlimit system call. The units mentioned are the ones that are used to set limits with strlimit.
Example: If you were using strlimit() to set the fsize, you would send the value in bytes, but if you were using korn shell built-in ulimit you would specify the value in 512-byte blocks.
There is also a ulimit system call that the shell built-ins call. The ulimit referenced in this document is the shell built-in ulimit.
Here is a description of the different fields that /etc/security/limits file contains. You can use the strlimit() and getrlimit() subroutines with the parameters described with each field of this file.
RLIMIT_FSIZE The largest size, in bytes, of any single file that can be created.
This value is a limit that the kernel will enforce on a user process. This means a user cannot create a file greater than their soft limit unless the root user changes that user's file limit with a ulimit -Hf call. The largest number the limit can be set to is 2147483136 bytes, that is, 2GB. The term unlimited could be used in place of 2147483136, but the limit would still be 2GB. The corresponding value in 512-byte blocks is 4194303.
The minimum value the superuser can set for this parameter with the chuser is 8192 512-byte blocks.
RLIMIT_CORE The largest size, in bytes, of a core file that may be created.
This limit is maintained by the kernel and it will not allow a user to create a core file larger than the set soft limit. Actually, the system will use the minimum value of the core and fsize soft limit.
Users can set this value to 0 to save disk space. Thus they won't be able to create a core file when their program will give a segmentation violation. Here is an example for user isa to show how to not waste disk space with a core file. First, you can see the output with standard defaults. Second, you can see the output with a core limit set to 0.
$ ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ sleep 200
^\Quit(coredump)
$ ls -l core
-rw-r--r-- 1 isa staff 24951 Nov 19 12:17 core
$ rm core
$ ulimit -c 0
$ ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 0
$ sleep 200
^\Quit
$ ls -l core
ls: 0653-341 The file core does not exist.
RLIMIT_CPU The maximum of CPU time (in seconds) to be used by each process. This limit is set (a call is made to setrlimit() by getty() and can be used by each process, but is not enforced by the kernel. This value may be checked by application code or user code, but the kernel will not enforce this value. In other words, if a process passes its soft CPU time limit, the kernel will not send a SIGXCPU signal to the offending process (as stated in the documentation). The kernel will let processes run as long as they want. So this value can be misleading.
As an example, user isa will be modified so that the cpu value is 25 seconds.
# chuser cpu=25 isa
# su - isa
$ ulimit -a
time(seconds) 25
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ ulimit -t 3600
ksh: ulimit: 0403-045 The specified value exceeds the user's allowable limit.
$ integer i=0
$ while [[ $i -lt 100000 ]]
> do echo "hello" > /dev/null
> i=i+1
> done
Cputime limit exceeded
#
In this example, you can see the session of user isa has been killed. The su process run initially doesn't exist anymore once the cpu limit is exceeded.
A user can trap the SIGXCPU signal sent by the system. Here is an example:
# su - isa
$ ulimit -a
time(seconds) 25
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ trap "" XCPU
$ trap
24:
$ integer i=0
$ while [[ $i -lt 100000 ]]
> do echo "hello" > /dev/null
> i=i+1
> done
$
In the example above, the shell executed properly to the end of the loop, and the session process is still alive. You can see how user isa can trap the signal and not be impacted by this limit.
RLIMIT_DATA The maximum size, in bytes, of the data segment for a process; this defines how far a program may extends its break value with the sbrk() system call.
Data and stack are tied together. They exist in Segment 2, and together they can never be greater than approximatively 256MB. This value is actually larger, but 256MB is a good number to use. You can use the shell ulimit to increase the soft/hard limit for either data or stack, but the hard limits can never cross. sbrk() moves the breakpoint up and down with an increment value. malloc() calls sbrk().
The minimum value the superuser can set for this parameter with chuser command is 1272.
Here is an example of a data limit set to 2500KB for user isa:
# chuser data=5000 isa
# su - isa
$ ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 2500
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ cat prog.c
main()
{
int i;
static char tab [2600][1024];
for(i=0;i<2600;i++)
{
tab[i][0]='x';
}
}
$ cc -o prog prog.c
$ ./e
Could not load program ./e
Error was: Not enough space
$ ulimit -d 131072
$ ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) 2048
$ ./e
$
In this example you could see the difference between a data limit set to a low value and the default value when you execute a program that defines a big array and writes the 'x' character at the beginning of each block of the table. In the first case the limit has been exceeded.
The following diagram is a representation of how the user space is seen by
the kernel.
Figure: How the User Space is Seen by the Kernel
The users can raise their hard/soft data or stack values with the shell ulimit or strlimit system calls. The kernel will not let the MAXs overlap.
The kernel loader will set the initial sbrk value after it loads the initialized and unitialized variables, but it cannot set the sbrk value past the CUR (soft). The user (or parent process) must set the hard/soft limits higher if more space is needed before the process is executed.
Take example of users using the ksh. The users can increase the soft limit (up to the hard limit) of the ksh (with the shell built-in ulimit) before they run a program, because the ksh will be the parent process (it will execute the program) and the program will inherit these limits from the parent process, in this case, the ksh.
With data, you might have to lower your stack value to be able to raise your data limit. There is a possibility that users could kill themselves by lowering their stack limit lower than what they have already allocated on the stack and then trying to use it, getting a SIGSEV.
RLIMIT_RSS The maximum size, in bytes, to which a process's resident set size may grow.
The limit is not enforced by the kernel. The kernel will not check this value for a process. It will let the process use as much memory as it needs to run with, and it will not kill a process if it reaches its soft limit. The kernel will only start killing processes if system wide paging space gets very low, and that's after it has sent a SIGDANGER signal (init will catch this signal and send a warning message to the console). This happens when the system only has about 2MB of free paging space, and soon after this it will start killing the youngest process utilizing virtual memory.
RLIMIT_STACK The maximum size, in bytes, of the stack segment for a process. This defines how far a program's stack segment may be extended.
A process can access its stack only up to the soft limit.
AIX 3.2 has implemented a disk quota system which provides a method for controling disk space usage. This system is based on the Berkeley disk quota system and allows assignment of quotas for the AIX journaled file system only.
Disk quotas may be assigned based on three parameters and may be modified using the edquota command. The three parameters are:
The number of 1KB disk blocks or files below which the user should remain.
The maximum amount of disk blocks or files the user can accumulate.
There is a grace period that will allow the soft limit to be exceeded (the default is one week). If the user fails to reduce usage below the soft limit during the specified time, the system will interpret the soft limit as the maximum allocation for that user. The user may correct this condition by removing enough files to reduce usage to below the soft limit.
The disk quota system tracks user and group quotas in the quota.user and quota.group files. These files reside in the root directories of file systems that have quotas assigned. These files are created with the edquota and quotacheck commands and are readable with the quota command.
Typically, only file systems that contain user home directories and files would use quotas. It is recommended that disk quotas never be assigned to the /tmp file system.
A system administrator would consider implementing the disk quota system under the following conditions:
When you have exceeded quota limits, here are the three procedures you can follow to go on with your process.
Prerequisites: In order to set up the disk quota system you must have root user level authority.
An example stanza line that enables user quotas would be in the format:
quota = userquota
In a file system that has both user and group quotas enabled the entry might look similar to the following:
/home:
dev = /dev/hd1
vol = "/home"
mount = true
check = true
free = false
vfs = jfs
log = /dev/hd8
quota = userquota,groupquota
options = rw
A sample entry might look like the following for user bill of the qgroup group:
/home:
dev = /dev/hd1
vol = "/home"
mount = true
check = true
free = false
vfs = jfs
log = /dev/hd8
quota = userquota,groupquota
userquota = /home/myquota.bill
groupquota = /home/myquota.qgroup
options = rw
# mount /home
User Quotas
edquota -u ( -p Proto-UserName ) UserName ...
Group Quotas
edquota -g ( -p Proto-GroupName ) GroupName ...
Change User or Group Grace Period
edquota -t -u . -g
The edquota command creates and edits quotas. It creates a temporary file that contains each user's and group's current disk quotas. It determines the list of file systems with established quotas from the /etc/filesystems file.
The edquota command invokes the vi editor (or the editor specified by the EDITOR environment variable) on the temporary file, so that quotas can be added and modified.
The fields that are displayed in the temporary file are:
The current number of 1KB file system block used by this user or group.
The current number of files used by this user or group.
The number of 1KB blocks the user or group will be allowed to use during normal operations.
The total amount of 1KB blocks the user or group will be allowed to use, including temporary storage during a quota grace period.
The number of files the user or group will be allowed to create during normal operations.
The number of files the user or group will be allowed to create, including temporary files created during a quota grace period.
More information on the edquota command can be found in InfoExplorer and the edquota man page, or the hardcopy manuals.
A sample edquota file edit might look like:
# edquota -u bill
Quotas for user bill:
/home: blocks in use: 16, limits (soft = 10, hard = 30)
inodes in use: 4, limits (soft = 0, hard = 0)
quotaon or quotaoff Command
Purpose - Turns on and off file system quotas.
Syntax
quotaon -g -u -v ( -a . FileSystem ... )
quotaoff -g -u -v ( -a . FileSystem ... )
The quotaon command enables disk quotas for one or more file systems specified by the FileSystem parameter. The specified file system must have an entry in the /etc/filesystems file, and must be mounted. The quotaon command looks for the quota.user and quota.group default files in the root directory of the associated file system. These file names may be changed in the /etc/filesystems file.
By default, both user and group quotas are enabled. The related flags or options are:
Enables or disables all file systems that are read-write and have disk quotas, as indicated by the /etc/filesystems file. When used with the -g flag, only group quotas in the /etc/filesystems file are enabled or disabled; when used with the -u flag, only user quotas in the /etc/filesystems file are enabled or disabled.
Specifies that only group quotas are enabled or disabled.
Specified that only user quotas are enabled or disabled.
Prints a message for each file system in which quotas are turned on or off.
Examples:
# quotaon -u /home
# quotaoff -v -a
echo " Enabling file system quotas "
/usr/sbin/quotacheck -a
/usr/sbin/quotaon -a
The quotacheck command has the following format and syntax:
quotacheck Command
Purpose - Checks file system quota consistency.
Syntax
quotacheck -g -u -v ( -a . FileSystem ... )
By default, both user and group quotas are checked.
Checks all file systems with disk quotas and read-write permissions indicated by the /etc/filesystems file.
Checks group quotas only.
Checks user quotas only.
Reports discrepancies between the calculated and recorded disk quotas. indicated by the /etc/filesystems file.
Examples:
# quotacheck /home
# quotacheck -g /home
Syntax
repquota -v -g -u ( -a . FileSystem ... )
Specifies that quotas are printed for all file systems enabled.
Specifies that only group quotas are printed.
Specified that only user quotas are printed.
Prints a header line before the summary of quotas for each file system.
Example:
# repquota -u /home
User used soft hard grace used soft hard grace
root -- 16 0 0 2 0 0
lanczi -- 12 3000 5000 3 3600 4500
biff -- 8 20 40 2 30 39
bill -- 16 10 30 7days 4 11 30
When you want to manage a system, you need to have a view of the system performance all day long. In AIX we have several methods to understand performance.
One you can use is the accounting process, to understand the global usage of your station. This chapter explains how you can use the accounting process to collect and show performance.
The accounting process is using a collection of information you extract by some commands automatically:
You see in Figure - Simplified Accounting
Organization the simplified organization of AIX accounting.
Figure: Simplified Accounting Organization
AIX accounting is a complex system of commands, shell scripts and C programs coming from the System V version.
You have the raw data files setting by some process in the /var/adm directory. The /var/adm/acct/nite directory is the directory where runacct puts daily summary files. The script prdaily works with those input files to create daily report. At the end you have the cumulative summary files where the monacct script picks up input to create a monthly report.
The runacct command can seem complicated because it is powerful. Figure - runacct Detail shows the logic of this command. There is information collected starting with the command startup, putting CPU consuming per process in the pacct file (/var/adm/pacct). You can set this command in the /etc/rc script to automatically start the accounting process. The login and init processes put the connection time in the /var/adm/wtmp file. The dodisk script puts the file system consuming in the /var/adm/dacct, as enq/qdaemon do for the printer usage in the /var/adm/qacct. You can charge a user with the chargefee script for an extra work.
All this data is analyzed by runacct, which uses acctcon1, acctcon2 and acctmerg programs to create in the /var/adm/acct/nite directory the results of the day: daytacct file about the process time and connect time. The merging (disk usage, printer usage and fee charged) of the day in the tacctmmdd file in the sum directory (where dd means day and mm month).
The acctcms command gives you the consuming day commands in the /var/adm/acct/sum/daycms file and puts the cumul from previous files and the dayly file in the /var/adm/acct/sum/cms file.
After all that data concatenation in the sum directory, runacct calls a /var/adm/siteacct script file if it exists, which you can set up as a special user exit.
At the end runacct calls prdaily command
which creates a big report of the day with all the special
reports.
Figure: runacct Detail
When you want to know what the user was connected to yesterday, you can use the ac command. It gives you the connect time records. This command can show you the connect time of each user per day. Its input comes from /var/adm/wtmp by default, which is clean after each accounting process (normally every day).
$ ac -pIf you want to know what happens on your system the previous day, you can use the /etc/utmp file:
user1 5.64
user2 15.48
user3 20.05
total 41.17
$ ac -p -d -w /etc/utmp
user1 5.64
user2 15.48
user3 20.05
Jul 28 total 41.17
root 16.05
Jul 29 total 16.05
If you want to have a daily report of your activity you can use the prtacct command, if you belong to the adm group. You can specify which column you want to see:
# prtacct -f 1-6 /var/adm/acct/sum/tacct0729 | pgThe meaning of the columns follows:
Fri Jul 30 11:34:40 CDT 1993 Page 1
LOGIN CPU CPU KCORE KCORE
UID NAME PRIME NPRIME PRIME NPRIME
(1) (2) (3) (4) (5) (6) (*)
0 TOTAL 41 45 4.100e+06 4.398e+06
203 pascale 4 0 100546 0
201 roland 6 0 213954 3704
202 swamy 10 1 1.795e+06 239507
18002 yannick 14 32 1.380e+06 3.256e+06
. . . . . . . .
You can specify comma-separated list of fields numbers of ranges as parameter of the prtacct command, with -f flag.
# prtacct -f 2,7-12 /var/adm/acct/sum/tacct0729 | pg
Fri Jul 30 11:34:48 CDT 1993 Page 1
LOGIN RD/WR RD/WR BLKIO BLKIO CONNECT CONNECT
NAME PRIME NPRIME PRIME NPRIME PRIME NPRIME
(2) (7) (8) (9) (10) (11) (12) (*)
TOTAL 781380 917795 14517 2903 2720 361
pascale 8795 0 0 0 554 0
roland 390454 726 0 0 45 3
swamy 33888 2818 0 0 38 5
yannick 133270 733176 0 0 22 310
. . . . . . . .
Here are the last columns of this report:
# prtacct -f 2,13-18 /var/adm/acct/sum/tacct0729 | pg
Fri Jul 30 11:34:57 CDT 1993 Page 1
LOGIN DISK PRINT FEES # OF # OF # DISK
NAME BLOCKS PROCS SESS SAMPLES
(2) (13) (14) (15) (16) (17) (18) (*)
TOTAL 499608 0 15 8004 61 12
pascale 2760 0 5 0 1 1
roland 4260 0 0 134 1 1
swamy 50 0 0 254 1 1
yannick 8530 0 0 4693 1 1
. . . . . . . .
You can chose the columns and the day you want for analyzis, for example:
# prtacct -f 2,3,5,7,11 /var/adm/acct/sum/tacct0730 | pg
You can use any file with the acct format like:
# prtacct -f 2-6,11 /var/adm/acct/nite/daytacct | pg
If you want to know the process time of finished processes, you use the /var/adm/pacct file. You can see the active processes with the ps command, or to see how much time the system uses for a particular command with time command or timex, which we are talking about later.
With the acctcom command you use by default the /var/adm/pacct file, where the system collects the process time. You can specify a lot of options with this very powerful command. If you want to know the last processes running (option -b) with the characters transferred (option -i) and just 10 lines, you enter:
# acctcom -b -i | head
COMMAND START END REAL CPU CHARS BLOCKS
NAME USER TTYNAME TIME TIME (SECS) (SECS) TRNSFD READ
bsh yannick pts 13:59:00 13:59:00 0.05 0.01 0 0
lscons yannick pts 13:59:00 13:59:00 0.02 0.02 11 0
#acctcom root pts 13:58:51 13:58:51 0.19 0.14 34800 0
#head root pts 13:58:51 13:58:51 0.17 0.02 9504 0
sendmail root ? 13:58:44 13:58:44 0.02 0.01 5120 0
#vi root pts 13:57:52 13:58:22 30.72 0.23 30416 0
ls roland pts 13:57:27 13:57:32 5.00 1.30 35856 0
The headings are self-explanatory. If you want to know just the average statistics about these processes:
# acctcom -a -q
cmds=3757 Real=48.96 CPU=0.13 USER=0.04 SYS=0.08
CHAR=30150.63 BLK=1.55 USR/TOT=0.34 HOG=0.26
(total CPU time) / (elapsed time)
There is a command similar to acctcom that allows analysis of the different command usage summaries: it is the acctcms command. It also has some nice options. To see the 10 best CPU consuming commands of yesterday (the file daycms is about the previous day), you enter:
# acctcms -a -s -j -c /var/adm/acct/sum/daycms | head -16
TOTAL COMMAND SUMMARY
COMMAND NUMBER TOTAL TOTAL TOTAL MEAN MEAN HOG CHARS BLOCKS
NAME CMDS KCOREMIN CPU-MIN REAL-MIN SIZE-K CPU-MIN FACTOR TRNSFD READ
TOTALS 7777 89895.01 61.29 14890.07 1466.61 0.01 0.41 8.093e+08 2906.00
ileaf 5 78213.12 19.34 1947.93 4044.80 3.87 0.99 5.943e+07 0.00
xant 29 4271.04 12.61 2151.44 338.71 0.43 0.59 1.235e+07 0.00
xlock 6 1358.19 12.28 1530.37 110.59 2.05 0.80 3.835e+08 0.00
uncompre 53 1747.57 5.75 6.48 304.01 0.11 88.75 1.916e+08 0.00
compress 6 2358.44 4.96 5.73 475.35 0.83 86.53 1.095e+08 0.00
custom 8 657.37 0.94 105.83 702.95 0.12 0.88 2.959e+06 0.00
bsh 3021 40.23 0.83 17.44 48.46 0.00 4.76 1.204e+06 0.00
lscons 2874 36.87 0.71 0.78 52.01 0.00 90.92 31614.00 0.00
aixterm 7 195.64 0.45 1087.51 435.52 0.06 0.04 1.23e+06 0.00
***other 35 268.85 0.43 33.85 627.60 0.01 1.27 1.551e+07 2903.00
We use the -a and -s option to see the report in ASCII mode, the -j option to combine the commands together and -c to sort the commands by total CPU time rather than total Kcore-minutes.
To know how much disk space is allocated to each user, you need to run the dodisk command to create your /var/adm/acct/nite/dacct input file. You can use, as usual, the prtacct command, to see your report; the acctmerg -a command works also.
# prtacct -f 1-2,13,18 /var/adm/acct/nite/dacct | pg
Sat Jul 31 15:43:38 CDT 1993 Page 1
LOGIN DISK # DISK
UID NAME BLOCKS SAMPLES
0 TOTAL 451440 13
0 root 317664 1
- - - - - - - - - - - - - - - - - -
201 roland 11824 1
202 swamy 22240 1
203 pascale 3048 1
18002 yannick 84096 1
You use the pac command to show the printer usage. The acctmerg -q /var/adm/qacct command works also.
# pac -P lp1 yannick rolandYou can see printer activity per user.
Login pages/feet runs price
runacct script calls the prdaily script which builds in your /var/adm/acct/sum directory a general daily report rprtmmdd.
Because it's a very large wide report, you need to rotate it. If you have a PostScript printer, you can print this report after you rotate it with the following command:
# enscript -r rprt0729 | qprt -P ps
If you cannot print it, you can edit it.
You have the same kind of report in the /var/adm/acct/fiscal directory, where monacct script puts the fiscrptmm file about the month mm. You can edit or print this file.
It is very easy to start the accounting process.
First you need to check the option bosext2.acct.obj to make sure it is installed. Then customize your files and create your collection directories.
In this section, we will use the root user for setting up system accounting and the user adm for running the reports.
You may want to modify the root user .profile file in order to provide access to the required executables. The modification of the root profile is optional, but will probably help with administration. As root user, follow these guidelines:
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:\ /usr/sbin/acct:/var/adm/acct export PATH
# su - adm -c /usr/sbin/acct/nulladm /var/adm/wtmp /var/adm/pacct
# su - adm # cd /var/adm/acct # mkdir nite sum fiscal
The first data line (that is not commented line) should contain the current year, when prime time begins, and when prime time ends.
The next data lines should contain four fields that are the day of the year (from 1 through 365), the month, the day of the month, and the description of holiday.
Here is an example for year 1993:
1993 0700 1900
* Day of Calendar Company
* Year Date Holiday
*
1 Jan 1 New Year's Day
---------------------
185 Jul 4 Independence Day
---------------------
359 Dec 25 Christmas Day
365 Dec 31 New Years Eve
/usr/bin/su - root -c /usr/sbin/acct/startupThus the accounting process starts after each boot.
If you don't want the accounting process to be run at each boot, you can start it manually with this command:
# /bin/su - root -c /usr/sbin/acct/startup
account = true
acctfile = /var/adm/qacct
# su - adm -c crontab -e
Here is a sample of /var/spool/cron/crontabs/adm file:
30 01 * * 1-6 /usr/sbin/acct/runacct 2> /var/adm/acct/nite/accterr
# Start runacct every Monday through Saturday (1-6) at 01:30 (30 01)
0 1 * * 5 /usr/sbin/acct/dodisk
# Start disk accounting at 1:00 am (0 1) every Friday (5), before runacct
0 * * * * /usr/sbin/acct/ckpacct
# Check the accounting file every hour (0 *) every day (* * *)
0 2 1 * * /usr/sbin/acct/monacct
# Start monacct at 2:0 am (0 2) on the first of every month
Basically you use the accounting process to charge the users according to the resource consumed. So you can analyze the daily report or better the /var/adm/acct/fiscal/fiscrptmm report.
First extract from this report the general data, which you want to analyze. Use the sed command as follows:
# sed -n '/0 *TOTAL/,/^$/p' /var/adm/acct/fiscal/fiscrpt11 | \ sed '/^$/d' >general_inputYou have to know the fields of this report are as follows:
UID NAME CPU CPU KCORE KCORE PRIME NPRIME PRIME NPRIME $1 $2 $3 $4 $5 $6 CONNECT CONNECT DISK # OF # OF DISK FEES PRIME NOPRIME BLOCKS PROCS SESSION SAMPLES $7 $8 $9 $10 $11 $12 $13
The $n variable is the field used for an awk program to charge any users.
Here is a sample of an awk program to do that:
BEGIN {
OFMT = "%6.2f"
OFS = " "
cpucharg = .15
concharg = .0015
dasdcharg = .005
fee = .1
}
{ name = $2
cpu = ( $3 + $4 ) * cpucharg
con = ( $7 + $8 ) * concharg
if ( $12 == 0 )
dasd = 0
else
dasd = ( $9 / $12 ) * dasdcharg
fees = $13 * fee
total = cpu + con + dasd + fees
printf("NAME CPU CON DASD FEE TOTAL\n")
printf("%s %6.2f %6.2f %6.2f %6.2f %6.2f\n",name,cpu,con,dasd,fee,total)
}
You can call the awk program with the following command:
# awk -f fee.awk general_input
With this simple awk program example, you can charge any user for the resource consumed.
With the accounting process we can find the biggest consumer of the CPU. Is it a user or is it a process? Every day we have a general report on how much user consumes and how many read/write I/Os they perform. So you can sort the five most consuming users with the following script:
#! /bin/ksh
# name /var/adm/sum/best5
# Every day find the five most consuming CPU users
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
ACCT_NAME="$ACCT/sum/tacct`date +%m%d`"
if [ -r $ACCT_NAME ]
then
# write the head lines
prtacct -f 2-3,5,7 $ACCT_NAME | sed -n '/LOGIN/,/^$/p' \
> $ACCT/nite/BEST$MMDD
# write the 5 users sorted by CPU field
prtacct -f 2-3,5,7 $ACCT_NAME | sed -n '/TOTAL/,/^$/p' \
| sort -n -r +1 | head -6 >> $ACCT/nite/BEST$MMDD
fi
You'll get the five best users in the BESTMMDD file, in the /var/adm/acct/nite directory.
You can find the most important CPU processes with the acctcom command, which is working with the /var/adm/pacct file. Here is a script to sort the ten most important processes:
#! /bin/ksh
# name /var/adm/acct/sum/proc10
# best process
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
# acctcom parameters -t CPU time system and user
# -r CPU factor ( user-time ) / ( system time + user time )
# -h HOG factor ( total CPU ) / ( elapsed time )
# -v no head lines
# head lines in the PROCmmdd file
acctcom -t -r -h | head -3 | sed -n '/COMMAND/,/NAME/p' \
> $ACCT/nite/PROC$MMDD
# sort on the CPU system and CPU user
acctcom -t -r -h -v | sort +6 -7 -n -r | head -10 \
>> $ACCT/nite/PROC$MMDD
# show the total statistics
acctcom -a -q >> $ACCT/nite/PROC$MMDD
Be careful with this analysis, the pacct file gives you the start and end times, but not the day of the starting. So you can have some processes running all day long and several days, and finishing the analyzis day. You will see these processes with some very big figures.
At the end you can use the powerful command acctcms to see expensive commands. Here is a script file example:
#! /bin/ksh
# name /var/adm/acct/sum/cmd10
# best command
ACCT='/var/adm/acct'
# set -vx
MMDD=`date +%m%d`
# acctcms parameters -a ascii
# -s ascii the command name
# -j combine all the command called only once
# -c sort by total CPU time rather than total Kcore-minutes
acctcms -a -s -j -c $ACCT/sum/daycms | head -16 > $ACCT/nite/CMD$MMDD
Sometimes you will be very surprised to see a very common command is very expensive. For instance the xlock command is often very CPU consuming! You can fix this situation with the nice command if necessary.
You can combine all the different scripts in the cron process, but the better idea is to use the siteacct script called by runacct.
Here is a sample of that script:
#! /bin/ksh
# name /var/adm/siteacct
/var/adm/acct/sum/best5
/var/adm/acct/sum/proc10
/var/adm/acct/sum/cmd10
You can read three light reports every day to understand your resource utilization. The accounting process is a powerful tool to follow your machine activity.
This section will help you diagnose problems, that is to ask yourself the useful questions, and will give you an idea of some other things to check.
Here is the information you should get when you have an accounting problem. We could summarize the questions you have to ask yourself as follows. You could send this information to the IBM Support Center to help you determine the problem.
acct_command < In_file > Out_file
When accounting fails, you have to check the following points: