Network Monitoring

Home Resume Services About Me Table of Contents 

Up ] Network Design ] [ Network Monitoring ] Network Security ]


                

 

    Networks and the systems that comprise them have become the backbone of almost every business.  As a company grows, its LAN and WAN networks grow more complex, and management issues become more numerous.  If a fault occurs on the network, it is more difficult to isolate and fix.

    To monitor key links and devices efficiently, I set up a number of TCP/IP-based programs on dedicated PCs.  These allowed me to work with, and monitor WAN and LAN equipment at all times, and ultimately to page me if failure occurs in a critical link.

What can be done with this setup?

24 by 7 systems monitoring   
Performance management
Fault monitoring
Remote management
Network benchmarking / baseline assessment

Here is a simplified schematic view (click on the equipment outlined in blue to see a sample chart of real data obtained from them) of the three computers I configured:

Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Screening Router Traffic Denied Inbound Attempts Denied Inbound Attempts Denied Inbound Attempts Denied Inbound Attempts Denied Inbound Attempts Denied Inbound Attempts Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic Snapshot Three Day Traffic SnapshotNetwork Monitoring Layout

 


APPLE.GIF (1007 bytes) SNMP?    APPLE.GIF (1007 bytes) RMON?    APPLE.GIF (1007 bytes) Syslog?     APPLE.GIF (1007 bytes) TFTP?     APPLE.GIF (1007 bytes) SNTP?    APPLE.GIF (1007 bytes) PING?

What can these protocols and servers do?


APPLE.GIF (1007 bytes) SNMP server -- Simple Network Management Protocol  is a standard protocol that allows SNMP network devices to communicate management variables across a LAN or WAN to a monitoring workstation. 

    The Simple Network Management Protocol defines a method by which a remote user can view or change management information for a network element (a host, gateway, server, router, device).  With SNMP, a monitoring or management application on the remote user's computer can communicate with an SNMP agent on the network device to access the management data.

    The SNMP agent on each network device can provide information about the device's network configuration and operations, such as its network interfaces, routing tables, IP packets sent and received, and IP packets lost.  Examples of these, (called “SNMP objects” -- the object identifiers are in parentheses) are:
-    System -- description of host (sysDescr), name of person responsible (sysContact), name of host (sysName)
-    Interfaces -- name of interface (ifDescr), its status, type, and physical address (ifOperStatus, ifType, ifPhysAddress)
-    IP connection -- destination (ipRouteDest), next hop of the route (ipRouteNextHop), and type of route (ipRouteType), for example, direct or indirect.

    SNMP provides a standard way to view and change network management information on network elements from different vendors.

    Its purpose? To help network administrators maximize efficiency and productivity.  SNMP allows you to monitor your network, local or remote, for

network errors or failures
events at each port or user
potentially unstable network hardware
network traffic
collisions
data errors

Router CPU Workload:  Here is a sample plot of data I obtained from a set of seven WAN routers located in manufacturing plants in the US, Canada, and the UK.  The SNMP script reads the CPU workload variable from the SNMP table, from each of the seven routers, every five minutes.

Router CPU loads    


This gives us an insight into how the routers work together.   If one router is consistently overloaded it may need upgrading. 
One router in the UK consistently runs at or above 11%.  This is due to its secondary function of forwarding packets to another in the UK.

    There are thousands of standard variables that can be read, modified, or monitored for SNMP compliant devices (including PCs).  As well, there are many proprietary variable tables for some specific devices.

    In addition to the numerous variables that can be probed, some devices allow SNMP Traps to be configured.  With the equipment setup above I monitored and logged all dropped T1 links (a very critical issue with the company) and all changes in the status of all Fibre Modules (i.e. something causing the Main Link to fail, which then invokes the Standby Link).  A dropped T1 line resulted in me getting a page and an E-mail message with more detail.  

(Yes, I did get paged on the weekends and after-hours...  but at least it let me get started on the problem before anyone else encountered it!  It also made me look like some kinda genius when a user called to complain about "not being able to get through", and I could tell them exactly where the problem was, and what we had done up till now.)

[back up]


laven.gif (1007 bytes)
RMON server -- Remote Monitoring protocol is a standard to monitor network traffic over time, and to locate traffic bottles necks.

RMON identifies errors, alerts you to network problems, baselines networks and acts like a remote network analyzer.  Somewhat like SNMP, Remote Monitoring is used to:

Diagnose problems
Identify system overload
Locate persistent errors
Pin-point faulty equipment
Identify bottlenecks
Improve network design
Plan expansion effectively

    While SNMP is based on polling device data variables (in standardized MIB trees), RMON gathers and tracks over time a group of variables.  This makes it easy, among other things, to view Statistics (including errors), History, Top N users, and to set Alarms.  The alarms could, for example, notify you when traffic on a segment rises above the 60% level. 

    Below is a graph of traffic data from the screening router located on the outside of the corporate firewall.  It shows the bandwidth use variation over a three day period.  The connection is an ISDN line to the Internet Service Provider.  Traffic is typically "bursty" with a few moments of sustained activity.  The company does not need to upgrade its connection bandwidth yet.  If monitoring at a later time shows a significant increase in traffic, as compared to this plot, proactive action can be taken before it becomes a problem.

Screening Router I/O Rates

    Here is a one-day snapshot of traffic in and out of the network Gateway router (depicted above).  Some interfaces are busy, but everything is well below congestion levels...  The WAN connections appear to be healthy with room for growth.  (They are all T1 private circuits... if money is an issue, the bandwidths could be reduced)

Network Hub Router I/O Rates -- Over a 24 hour period
Evident is a large file transfer, or possibly a Video Conference session lasting over one hour, at the end of the workday.

    Below are two plots of data obtained from the Gateway router, which functions as the network Gateway router.  On the remote end of its serial interfaces are the divisional routers, all connected by full T1 private circuits. 

Network Hub Router I/O Rates    Network Hub Router Average I/O rates   

Over a four day interval we can see that traffic fluctuations vary, but does not approach the 50 percent mark yet.  The two graphs provide just two examples of determining a baseline for WAN links.  With these sets of data as reference you can compare traffic volumes obtained at later times.

          

[back up]

maroon.gif (1006 bytes) Syslog server -- to monitor serious security issues on the company PIX firewall, routers, and certain UNIX workstations.

    See an sample of messages received from a PIX firewall.

    The Syslog server receives standard UDP syslog messages sent from routers, switches, or Unix hosts, and displays the details on the screen.  The network-level messages generated are based on user-selected error levels that highlight specific errors, severity conditions, or specific devices and help identify when specific events occur (such as a link down or a device reboot).

Once you have received a syslog message you are able to perform a number of actions, such as: displaying the message on the screen, logging the message to a file on disk, emailing the message via SMTP to anyone you choose, raising an audible alarm with the system beep or playing a chosen WAV file, or running an external program, such as a paging notification system. 

Each router to be managed must have SNMP and SYSLOG configured for the management software to communicate with the router.

[back up]


pdust.gif (1007 bytes)
TFTP server -- to upgrade equipment firmware, and to backup router configurations and software.

    Trivial File Transfer Protocol in essence, is a stripped-down version of FTP used to download software into internetworking devices.  It allows files to be transferred to or from a computer over a network.  For example, it allows you to copy the system image of a router of switch to the TFTP server.  This allows you to make a backup of the configuration of firmware.

    Or, TFTP can be used to download a new image or configuration information from the network TFTP server.  If you replace a switch and want to use the configuration file that you created for the original switch, you can restore that file instead of recreating it.

    This ability has come in very handy during network emergencies in the past.  Once when a remote divisional router died and was replaced with a new unit (according to the maintenance agreement), and once when the corporate main Gateway router completely reset itself to the factory default settings!

    On a monthly basis I could look for new versions of network equipment software on the vendor's web sites.  If a new version was available I uploaded the latest version (after hours!) to the relevant hubs, or switches... all without leaving my desk. 
(The key to this ability is to buy equipment that supports remote management and the relevant protocols.)

[back up]


brown.gif (1007 bytes)
SNTP server -- Simple Network Time Protocol server to provide highly accurate time from anywhere on the company WAN.

    The server is both a time client and server -- it will acquire a time value from the Internet, set the computer's clock, and it will act as a timeserver as well -- it will provide time signals to your local network using four time protocols.   The program uses advanced techniques and protocols to permit clock setting accuracies of +- 50 milliseconds in most cases.
    First, the time server is installed on the above machine, which has tunnel access through the firewall.  It obtains an accurate time value from one of a list of Time Hosts (either the U.S. Naval Observatory, NIST, or NASA) and synchronizes its own clock.  Then, other computers within the company WAN can access the Network Management machine to synchronize their own system clocks.

abouttimeprogramicon.gif (1156 bytes)A good, easy to use Time Server is AboutTime.  This can run on home PCs with modem connections too! 
Never be out of sync with you neighborhood Atomic Clock again!

[back up]


brown.gif (1007 bytes)
PING server -- Ping verifies connectivity to a particular host on your local network or on the Internet.

    Ping sends an "echo request" in the form of a data packet to a remote host and displays the results for each "echo reply.”  This exchange is referred to as “pinging.”  When ping terminates, it displays a brief summary of round-trip times and packet loss statistics. Round trip times indicate the time (in milliseconds) it takes for the packet to get to the remote host and a response to arrive back. This time varies depending on network load.

You can use the Ping tool to:

Ping a remote host to verify the network connection is up.
Ping a remote host and view the round-trip times to determine the current speed of the network connection.
Ping hosts along a route to isolate a hardware or software problem. First, ping 127.0.0.1 (or localhost) to verify the local network is up and running. Then, ping hosts and gateways farther and farther away.

    In the implementation illustrated above I configured a program that pinged a list of IP addresses sequentially.  Every network device that could be configured with an IP address (hubs, switches, routers and CSU/DSUs), and all important corporate fileservers and ERP machines were included in this list.  

    If a ping failed (timed out) the program waited a few minutes and retried.  If the ping attempt failed a second time the program would:
    1)   
send an E-mail message to me with all the relevant details, and 
    2)   
page me with the IP address of the device.  
From the IP address on the display of the pager I knew which device or computer it was.  Most often the "problem" cleared up by the next pass of the paging loop -- indicating network congestion, or a busy fileserver.  If the pinging continued to timeout, I could take remedial action before anyone else ever realized there was a problem.

[back up]


[Back to Top]

Questions or comments?   Enter your comments on our  Feedback Form

Or send mail to  paul@PaulAcacia.com  or to  pacacia@mail.com.

Copyright©2000 P. Acacia Consulting

Last modified: Sunday, April 30, 2000

 

1