D. CP CHECKS:
D1. "snap_cmp"
You remember better what you learned the hard way. Many years ago I had to reboot a CP while the plant was online. I did not expect to have any problems, however the customer had changed the INITMA of a few PIDs to 0. The controllers came out in Manual after the reboot. After a couple of minutes I started hearing the alarms. They did not allow me to reboot any other CP for a few days!
After that experience I learned to check the critical block parameters before any CP reboot. I remember writing scripts to extract all those parameters: INITMA, INITI, BCALCI, FBK, PRIBLK, and of course FSENAB from the FBM ECBs
Not long ago I had to do an online-upgrade to install a Quick Fix. I was not sure what was going to happen after the CP was rebooted. To alleviate this and to quickly correct any serious change in the status/output of the controllers I wrote small scripts to get all current values before the reboot: a total of 4 pages (single column).
After the CP was rebooted I run those scripts again and printed the results. Then, I placed both printouts on a table and started comparing as quickly as possible both listings. At that time we found only three controllers that came on a different status. After the operators were informed, they corrected the situation promptly. The reason of the problem was that OLUG (OnLine UpGrade) was aborted the previous day just before cp reboot. The CP rebooted later with the previous day checkpoint. Human errors do happen!
That day I wished I had an automatic comparison tool to use before a CP is rebooted.
With that purpose on mind I wrote later this script snap_cmp that checks the status (M/A, L/R) of all the PID controllers, and also the outputs of AOUT blocks.
The script snap_cmp will show on screen, and also save to a file (/opt/ac/malr.txt), the above mentioned parameters from PID, PIDA, PIDE, PIDX, PIDXE, RATIO, AOUT and AOUTR blocks. Then, it will compare to values found on a previous file (/opt/ac/malr.old), and save the differences to file /opt/ac/changes.txt, that you might print at your convenience.
The procedure to use it is very simple. In addition to the OLUG recommendations, run snap_cmp just BEFORE the CP is rebooted. Once the CP has rebooted and is back ONLINE, run snap_cmp a second time. This time snap_cmp will report on screen blocks whose status or values have changed. Use that information to correct any posible upset caused from those blocks.
Recommendations:
- As usual, put the script snap_cmp in directory /opt/ac, for proper operation.
- It is normal to see an error the very first time you run the script. There is no old file to compare to.
- WITHOUT rebooting the CP, run snap_cmp a few times until you understand how it works and you feel comfortable with it.
- Even when the script will not run if the CP is rebooting, be sure you wait until the CP has finished rebooting. If you do not wait enough you might end up with an empty file that in turn it will be used as a reference the next time you run the script.
- Do not run snap_cmp a third time. You will lose the values previous to the cp reboot. To see the differences again, use: more /opt/ac/changes.txt
Based on this script you might want to write a similar one to grab the status/values of Digital blocks: COUT, MCOUT, GDEV, VLV, MTR, MOVLV, PLB, and all the ones that interface to digital fbm outputs.
D2. "chk_cp"
The purpose of "chk_cp" is just to retrieve several important parameters from ALL the Control Processor's STATION blocks in the system. After retrieval, it will RESET all overrun and PIO counters.
Parameters read are:
The script's output has a simple reminder of the meaning of the parameters.
This script should be run regularly by crontab (every 15 days, weekly, monthly, etc), in a similar to way to "chk_awp", from only ONE station in the system.
Output sample:IDLETM = Idle Time; CUMOVR = Cmpnd/Blk Overruns; OMOVRN = OM Overruns PIOE1R = FB Retries; PIOEFT = Fault Tolerant Output mismatches PIOEGB = Good->Bad FBM state changes; PP_NFD = Points NOT Found CPs Report ----------------------------------------------------------------- CPLBUG IDLETM CPLOAD CUMOVR OMOVRN PIOE1R PIOEFT PIOEGB PP_NFD HLCP03 0.0 6.00 2 0 0 0 0 20 HLCP01 77.8 23.60 0 1 0 0 0 0 HLCP02 49.1 37.40 16 51 0 3 0 1 HLCP04 88.2 76.00 38 0 0 23 103 0 HLCP05 89.3 76.00 48 0 0 24 0 0 HLCP06 86.3 74.00 38 0 0 19 31 0 HLCP07 91.0 70.00 9 0 0 4 0 0 HLCP08 84.9 78.60 32 0 0 16 103 0 HLCP09 89.0 75.60 78 0 0 39 0 0 HLCP10 89.5 77.40 16 0 0 8 0 3 HLCP11 91.6 75.00 26 0 0 13 214 0 HLCP12 90.6 68.20 26 0 0 13 0 0 HLCP13 85.9 74.00 52 0 0 26 0 0 HLCP14 88.7 92.20 66 0 0 22 0 0 HLCP15 92.3 18.00 0 0 0 0 0 0 HLAB01 52.4 38.90 2 22 0 0 0 0 HLAB02 39.1 51.20 5 13 0 0 0 0
D3. "g_cpstat"
"g_cpstat" is the same previous script, without the RESET part.
Run it any time you want just to take a look at the current status of the CPs counters.
The script's output is sent to the screen. If you want to capture to a file, just redirect the output to a file:
g_cpstat > cp_stats
D4. "g_locked"
The script "g_locked" will simply list all the CPs that are currently 'locked' in the system. It might be because someone is editing the control configuration (via ICC), or maybe because the communication link was broken and the CP remained in that state.
I use this utility before running an script/utility that might require the CP to be unlocked: iccapi, cpoint, upload, etc.
Output sample:Getting CPs list...Getting HOSTs lists...Done. Locked CPs for host CCAW01: Locked CPs for host D1AP01: D1CP06 Locked CPs for host D1AW01: Locked CPs for host D2AW01: Locked CPs for host D2AW02:
D5. "get_cio"
The script "get_cio" will help you under the following scenarios.
Scenario 1. It's Friday afternoon and you are almost ready to left the main office and head home. At that moment you start wondering how many screens might have been left with the Integrated Control Configurator running in the background.
You start thinking on potential CSA corruption, etc:
Should you go to each control room, check it and close them?.
Well, if you are lucky to have ( vt100) access to your system from your office, you might run this script and find out how many WPs are currently running ICC, and also how many CPs are locked in your system. Just a simple call to the plant control room(s) and you might ask to close those programs. Now you can safely go home.
Scenario 2. You want to open from your office a remote ICC session (I DO NOT recommend it, unless for brief peeks), or maybe just use iccapi to extract a database, when you get the message that the CP is locked. Who might be working on that CP?. You might have to call to each location and find out the WPs running ICC. Running this script you will know which WP has opened the ICC. Just a simple call and problem solved.
Sometimes, after running the script get_cio you will find that there are no WPs with ICC opened, however one CP is locked. The reason might be a communication problem that left that CP in that condition. This time however you can safely remove the lock because you know nobody else is accessing that CP.
The script was written (and tested) to be run on any 50/51 station (AP/AW/WP) in the system. (Need feedback on how it works under AP20/PW hosting CPs).
In summary what this script does is: get the list of all CP hosts and check if ICC is running on each one of them. If true, it will extract the WP name from the ICC process.
D6. "getcpload"
The script "getcpload" By Duc M. Do.
It uses OM commands to compile all of the loading parameters for Control stations
on your system. The user can supply the CP letterbug(s) on the command line, or
use 'all' to get data for all station listed in /etc/cplns.
Highly recommended to pay a visit to Do's beautiful Home Page at: http://ducdo.iperweb.com
This page hosted by
Get your own Free Home Page