|
|
Networks and the systems that comprise them have become the backbone of almost every business. As a company grows, its LAN and WAN networks grow more complex, and management issues become more numerous. If a fault occurs on the network, it is more difficult to isolate and fix. To monitor key links and devices efficiently, I set
up a number of TCP/IP-based programs on dedicated PCs. These allowed me to work
with, and monitor WAN and LAN equipment at all times, and ultimately to page me if failure
occurs in a critical link. What can be done with this setup?
Here is a simplified schematic view (click on the equipment outlined in blue to see a sample chart of real data obtained from them) of the three computers I configured:
SNMP? RMON? Syslog? TFTP? SNTP? PING? What can these protocols and servers do?
The Simple Network Management Protocol defines a method by which a remote user can view or change management information for a network element (a host, gateway, server, router, device). With SNMP, a monitoring or management application on the remote user's computer can communicate with an SNMP agent on the network device to access the management data. The SNMP agent on each network device can provide information about
the device's network configuration and operations, such as its network interfaces, routing
tables, IP packets sent and received, and IP packets lost. Examples of these, (called
SNMP objects -- the object identifiers are in parentheses) are:
Router CPU Workload: Here is a sample plot of data I obtained from a set of seven WAN routers located in manufacturing plants in the US, Canada, and the UK. The SNMP script reads the CPU workload variable from the SNMP table, from each of the seven routers, every five minutes.
There are thousands of standard variables that can be read, modified, or monitored for SNMP compliant devices (including PCs). As well, there are many proprietary variable tables for some specific devices. In addition to the numerous variables that can be probed, some devices allow SNMP Traps to be configured. With the equipment setup above I monitored and logged all dropped T1 links (a very critical issue with the company) and all changes in the status of all Fibre Modules (i.e. something causing the Main Link to fail, which then invokes the Standby Link). A dropped T1 line resulted in me getting a page and an E-mail message with more detail. (Yes, I did get paged on the weekends and after-hours... but at least it let me get started on the problem before anyone else encountered it! It also made me look like some kinda genius when a user called to complain about "not being able to get through", and I could tell them exactly where the problem was, and what we had done up till now.) [back up]
RMON identifies errors, alerts you to network problems, baselines networks and acts like a remote network analyzer. Somewhat like SNMP, Remote Monitoring is used to:
While SNMP is based on polling device data variables (in standardized MIB trees), RMON gathers and tracks over time a group of variables. This makes it easy, among other things, to view Statistics (including errors), History, Top N users, and to set Alarms. The alarms could, for example, notify you when traffic on a segment rises above the 60% level. Below is a graph of traffic data from the screening router located on the outside of the corporate firewall. It shows the bandwidth use variation over a three day period. The connection is an ISDN line to the Internet Service Provider. Traffic is typically "bursty" with a few moments of sustained activity. The company does not need to upgrade its connection bandwidth yet. If monitoring at a later time shows a significant increase in traffic, as compared to this plot, proactive action can be taken before it becomes a problem. Here is a one-day snapshot of traffic in and out of the network Gateway router (depicted above). Some interfaces are busy, but everything is well below congestion levels... The WAN connections appear to be healthy with room for growth. (They are all T1 private circuits... if money is an issue, the bandwidths could be reduced)
Below are two plots of data obtained from the Gateway router, which functions as the network Gateway router. On the remote end of its serial interfaces are the divisional routers, all connected by full T1 private circuits. Over a four day interval we can see that traffic fluctuations vary, but does not approach the 50 percent mark yet. The two graphs provide just two examples of determining a baseline for WAN links. With these sets of data as reference you can compare traffic volumes obtained at later times.
[back up] Syslog server -- to monitor serious security issues on the company PIX firewall, routers, and certain UNIX workstations. See an sample of messages received from a PIX firewall. The Syslog server receives standard UDP syslog messages sent from routers, switches, or Unix hosts, and displays the details on the screen. The network-level messages generated are based on user-selected error levels that highlight specific errors, severity conditions, or specific devices and help identify when specific events occur (such as a link down or a device reboot). Once you have received a syslog message you are able to perform a number of actions, such as: displaying the message on the screen, logging the message to a file on disk, emailing the message via SMTP to anyone you choose, raising an audible alarm with the system beep or playing a chosen WAV file, or running an external program, such as a paging notification system. Each router to be managed must have SNMP and SYSLOG configured for the management software to communicate with the router. [back up]
Trivial File Transfer Protocol in essence, is a stripped-down version of FTP used to download software into internetworking devices. It allows files to be transferred to or from a computer over a network. For example, it allows you to copy the system image of a router of switch to the TFTP server. This allows you to make a backup of the configuration of firmware. Or, TFTP can be used to download a new image or configuration information from the network TFTP server. If you replace a switch and want to use the configuration file that you created for the original switch, you can restore that file instead of recreating it. This ability has come in very handy during network emergencies in the past. Once when a remote divisional router died and was replaced with a new unit (according to the maintenance agreement), and once when the corporate main Gateway router completely reset itself to the factory default settings! On a monthly basis I could look for new versions of network
equipment software on the vendor's web sites. If a new version was available I
uploaded the latest version (after hours!) to the relevant hubs, or switches... all
without leaving my desk. [back up]
The server is both a time client and server --
it will acquire a time value from the Internet, set the computer's clock, and it will act
as a timeserver as well -- it will provide time signals to your local network using four
time protocols. The program uses advanced techniques and protocols to permit
clock setting accuracies of +- 50 milliseconds in most cases. A good, easy to use Time Server
is AboutTime. This can run on home
PCs with modem connections too! [back up]
Ping sends an "echo request" in the form of a data packet
to a remote host and displays the results for each "echo reply. This
exchange is referred to as pinging. When ping terminates, it
displays a brief summary of round-trip times and packet loss statistics. Round trip times
indicate the time (in milliseconds) it takes for the packet to get to the remote host and
a response to arrive back. This time varies depending on network load.
In the implementation illustrated above I configured a program that pinged a list of IP addresses sequentially. Every network device that could be configured with an IP address (hubs, switches, routers and CSU/DSUs), and all important corporate fileservers and ERP machines were included in this list. If a ping failed (timed out) the program waited a few minutes and retried. If the ping attempt failed a
second time the program would: [back up] |
Questions or comments? Enter your comments on our Feedback Form Or send mail to paul@PaulAcacia.com or to pacacia@mail.com. Copyright©2000 P. Acacia Consulting Last modified: Sunday, April 30, 2000
|