D. CP CHECKS:

 

D1. "snap_cmp"

You remember better what you learned the hard way. Many years ago I had to reboot a CP while the plant was online. I did not expect to have any problems, however the customer had changed the INITMA of a few PIDs to 0. The controllers came out in Manual after the reboot. After a couple of minutes I started hearing the alarms. They did not allow me to reboot any other CP for a few days!

After that experience I learned to check the critical block parameters before any CP reboot. I remember writing scripts to extract all those parameters: INITMA, INITI, BCALCI, FBK, PRIBLK, and of course FSENAB from the FBM ECBs

Not long ago I had to do an online-upgrade to install a Quick Fix. I was not sure what was going to happen after the CP was rebooted. To alleviate this and to quickly correct any serious change in the status/output of the controllers I wrote small scripts to get all current values before the reboot: a total of 4 pages (single column).

After the CP was rebooted I run those scripts again and printed the results. Then, I placed both printouts on a table and started comparing as quickly as possible both listings. At that time we found only three controllers that came on a different status. After the operators were informed, they corrected the situation promptly. The reason of the problem was that OLUG (OnLine UpGrade) was aborted the previous day just before cp reboot. The CP rebooted later with the previous day checkpoint. Human errors do happen!

That day I wished I had an automatic comparison tool to use before a CP is rebooted.

With that purpose on mind I wrote later this script snap_cmp that checks the status (M/A, L/R) of all the PID controllers, and also the outputs of AOUT blocks.

The script snap_cmp will show on screen, and also save to a file (/opt/ac/malr.txt), the above mentioned parameters from PID, PIDA, PIDE, PIDX, PIDXE, RATIO, AOUT and AOUTR blocks. Then, it will compare to values found on a previous file (/opt/ac/malr.old), and save the differences to file /opt/ac/changes.txt, that you might print at your convenience.

The procedure to use it is very simple. In addition to the OLUG recommendations, run snap_cmp just BEFORE the CP is rebooted. Once the CP has rebooted and is back ONLINE, run snap_cmp a second time. This time snap_cmp will report on screen blocks whose status or values have changed. Use that information to correct any posible upset caused from those blocks.

Recommendations:
- As usual, put the script snap_cmp in directory /opt/ac, for proper operation.
- It is normal to see an error the very first time you run the script. There is no old file to compare to.
- WITHOUT rebooting the CP, run snap_cmp a few times until you understand how it works and you feel comfortable with it.
- Even when the script will not run if the CP is rebooting, be sure you wait until the CP has finished rebooting. If you do not wait enough you might end up with an empty file that in turn it will be used as a reference the next time you run the script.
- Do not run snap_cmp a third time. You will lose the values previous to the cp reboot. To see the differences again, use: more /opt/ac/changes.txt

Based on this script you might want to write a similar one to grab the status/values of Digital blocks: COUT, MCOUT, GDEV, VLV, MTR, MOVLV, PLB, and all the ones that interface to digital fbm outputs.


D2. "chk_cp"

The purpose of "chk_cp" is just to retrieve several important parameters from ALL the Control Processor's STATION blocks in the system. After retrieval, it will RESET all overrun and PIO counters.

Parameters read are:

IDLETM = Idle time (%)
CPLOAD = Current CP load (%)
CUMOVR = Cumulative Compound/block processor overruns
OMOVRN = Cumulative Object Manager overruns counter
PIOE1R = Fieldbus retries
PIOEFT = Fault Tolerant Output mismatches
PIOEGB = Good-to-Bad FBM state changes
PP_NFD = Points NOT Found (deleted blocks, etc)

The script's output has a simple reminder of the meaning of the parameters.

This script should be run regularly by crontab (every 15 days, weekly, monthly, etc), in a similar to way to "chk_awp", from only ONE station in the system.

Output sample:
IDLETM = Idle Time;  CUMOVR = Cmpnd/Blk Overruns; OMOVRN = OM Overruns
PIOE1R = FB Retries; PIOEFT = Fault Tolerant Output mismatches
PIOEGB = Good->Bad FBM state changes; PP_NFD = Points NOT Found

                         CPs Report
-----------------------------------------------------------------
CPLBUG  IDLETM  CPLOAD    CUMOVR  OMOVRN   PIOE1R  PIOEFT PIOEGB PP_NFD

HLCP03    0.0    6.00         2       0       0       0       0  20
HLCP01   77.8   23.60         0       1       0       0       0   0
HLCP02   49.1   37.40        16      51       0       3       0   1
HLCP04   88.2   76.00        38       0       0      23     103   0
HLCP05   89.3   76.00        48       0       0      24       0   0
HLCP06   86.3   74.00        38       0       0      19      31   0
HLCP07   91.0   70.00         9       0       0       4       0   0
HLCP08   84.9   78.60        32       0       0      16     103   0
HLCP09   89.0   75.60        78       0       0      39       0   0
HLCP10   89.5   77.40        16       0       0       8       0   3
HLCP11   91.6   75.00        26       0       0      13     214   0
HLCP12   90.6   68.20        26       0       0      13       0   0
HLCP13   85.9   74.00        52       0       0      26       0   0
HLCP14   88.7   92.20        66       0       0      22       0   0
HLCP15   92.3   18.00         0       0       0       0       0   0
HLAB01   52.4   38.90         2      22       0       0       0   0
HLAB02   39.1   51.20         5      13       0       0       0   0

D3. "g_cpstat"

"g_cpstat" is the same previous script, without the RESET part.

Run it any time you want just to take a look at the current status of the CPs counters.

The script's output is sent to the screen. If you want to capture to a file, just redirect the output to a file:

g_cpstat > cp_stats

 


D4. "g_locked"

The script "g_locked" will simply list all the CPs that are currently 'locked' in the system. It might be because someone is editing the control configuration (via ICC), or maybe because the communication link was broken and the CP remained in that state.

I use this utility before running an script/utility that might require the CP to be unlocked: iccapi, cpoint, upload, etc.

Output sample:
Getting CPs list...Getting HOSTs lists...Done.

Locked CPs for host CCAW01:

Locked CPs for host D1AP01:
D1CP06

Locked CPs for host D1AW01:

Locked CPs for host D2AW01:

Locked CPs for host D2AW02:

D5. "get_cio"

The script "get_cio" will help you under the following scenarios.

Scenario 1. It's Friday afternoon and you are almost ready to left the main office and head home. At that moment you start wondering how many screens might have been left with the Integrated Control Configurator running in the background.
You start thinking on potential CSA corruption, etc:
Should you go to each control room, check it and close them?.

Well, if you are lucky to have ( vt100) access to your system from your office, you might run this script and find out how many WPs are currently running ICC, and also how many CPs are locked in your system. Just a simple call to the plant control room(s) and you might ask to close those programs. Now you can safely go home.

Scenario 2. You want to open from your office a remote ICC session (I DO NOT recommend it, unless for brief peeks), or maybe just use iccapi to extract a database, when you get the message that the CP is locked. Who might be working on that CP?. You might have to call to each location and find out the WPs running ICC. Running this script you will know which WP has opened the ICC. Just a simple call and problem solved.

Sometimes, after running the script get_cio you will find that there are no WPs with ICC opened, however one CP is locked. The reason might be a communication problem that left that CP in that condition. This time however you can safely remove the lock because you know nobody else is accessing that CP.

The script was written (and tested) to be run on any 50/51 station (AP/AW/WP) in the system. (Need feedback on how it works under AP20/PW hosting CPs).

In summary what this script does is: get the list of all CP hosts and check if ICC is running on each one of them. If true, it will extract the WP name from the ICC process.


D6. "getcpload"

The script "getcpload" By Duc M. Do.
It uses OM commands to compile all of the loading parameters for Control stations on your system. The user can supply the CP letterbug(s) on the command line, or use 'all' to get data for all station listed in /etc/cplns.

Highly recommended to pay a visit to Do's beautiful Home Page at: http://ducdo.iperweb.com



Index - Home


This page hosted by
Get your own Free Home Page

1