Next Previous Contents

7. Troubleshooting

There are a number of questions answered in the online FAQ in the troubleshooting section. A good process for determining problems is to stop OpenNMS, delete all the log files in /var/log/opennms/ and restart OpenNMS. A command like watch -d "ls -al" allows one to monitor exactly what log files are changing while OpenNMS is restarting. Using grep ERROR * the log files can be searched for any containing ERROR. Tarus describes the process as:

Watch the output of the "watch" command. The log files should steadily grow. First eventd.log, then capsd and collectd (usually the largest), followed by poller and finally threshd. After threshd.log has some content, you should see rtc.log and then rtcdata.log populate. When rtcdata.log has data, "Calculating" should be gone.If it stops before then, do this in the logs directory:grep FATAL *grep ERROR *and look for anything suspicious.-T

7.1 Questions Answered in the Online FAQ

java.sql.SQLException: Sorry, too many clients already
".../[directory]" does not exist!
[chmod] /bin/chmod: too few arguments
Discovery and ICMP Service Monitor won't start...connection error.
An error occurred initializing the database connections: No suitable driver found
Category Not Found "Router" when starting the Web UI
I'm Installing the RPMs But It Still Can't Find DBI/DBD::Pg...
RTC Session Does Not exist
Web UI won't authenticate me even though I'm in the users.xml
I see a frightening number of Java processes/memory allocated to Java with ps or top
SNMP Data Not Collected on Linux machines
only packages with major numbers <= 3 are supported by this version of RPM
An error occurred initializing the event correlation manager: Connection refused.
I just upgraded to Red Hat 7.1, and java freezes, what gives?
ERROR: Java2 Virtual Machine Not Found.
error while loading shared libraries: libstdc++-libc6.1-1.so.2
assets table problem during install
ONC/RPC program not registered
PostgreSQL doesn't want to start/won't start automatically
Every thing's installed, but I get: HTTP Status 500 - No Context configured to process this request
jar_cacheXXXX.tmp files are filling up my /tmp
build.sh: line 189: 5672 General protect error $JAVA_HOME/bin/java ...
I get "can't parse argument 'RRA:AVERAGE:0.5:1:8928'"
What are the Steps for a Minimal OpenNMS installation?
I installed OpenNMS, and admin/admin Does Not Log Me On
Tomcat won't start, complains about JAVA_HOME
FATAL 1: IDENT authentication failed for user "postgres"
apt complains about zebra and gated in the lynx installer
OpenNMS Says My DNS Server is Down, When It Is Up
Why are some of my XML files all one line?
Why Don't My Linux Servers with the UCD SNMP Agent Show Up in Performance Reports?
opennms.sh status returns nothing, what's happening?!
Linux - OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.
RPM install hangs on RedHat 8.0
How Can I Best Test My XML Files?
Why Do I Get an Invalid ifIndex Error?
Getting around IDENT auth error during installation
How are node labels determined?
Logout/Re-login
I upgraded to 1.1.1 and now "Manage/Unmanage" does not work
Internal Server Errors
Why doesn't the dhcpd process ever start?
Why Are Availability Reports Never Generated?
OpenNMS.Rtcd problem
Why Doesn't the DHCP Service Start?
I can snmpwalk a device, but OpenNMS won't collect data on it, why?
New Installation: I can't login
Why Does My Windows DHCP Server Show as Down?
Why do KSC reports give me a "null parameters" error?

7.2 OpenNMS Console Display Problems

Categories not updating properly

Bugzilla Bug 683 When SNMP is added to a device, categories do not update

It could also be this:

24hr avail did not get updated

OpenNMS 'List All Nodes' displays after more than 5 minutes

A device may have many (hundreds or thousands) of interfaces due to VOIP dial peers. On Cisco devices (AS5300s) the SNMP process will timeout if the interface table is too long. SNMP views can be used to limit what SNMP interfaces are made available to ONMS. Limiting this information will allow ONMS to gather a complete (restricted to main interfaces) interface table. Other devices with hundreds of sub-interfaces may cause a similar problem.

Strange characters are appearing, even after refresh

If the GUI contains strange characters in menu items or response time graphs there is a chance of a corrupt Tomcat (Java Engine) cache. To clear the cache and restart the GUI perform the following steps:

  1. Stop Tomcat
    bash#/etc/init.d/tomcat4 stop
    
     
    
  2. Clear the cache
    bash#rm -rf /var/cache/tomcat4/*
    
     
    
  3. Restart Tomcat
    bash#/etc/init.d/tomcat4 start
    
     
    

Tomcat HTTP Status 500 Errors

Error Trying to Rescan a Node from WebGUI

7.3 Problems running other daemons

DHCP Conflicts

If there is a DHCP client running on the OpenNMS server, it will interfere with the starting of the OpenNMS poller. Disable the DHCP client by assigning a static IP address and restart the OpenNMS server to resolve the situation.

Dhcpd or dhcpcd processes do not start on the OpenNMS server

SSH Connections Refused

The ssh poller will sometimes cause the maximum number of ssh connections to be opened on a monitored server. As a result the server no longer accepts connections on the ssh port. To resolve this problem change the poller plugin from 'ssh' to 'tcp' in the capsd-configuration.xml. An SSH poller configured to avoid this SSH DoS situation looks like the following in capsd-configuration.xml. Prior to v1.1.3 this appeared as SshPlugin rather than TcpPlugin

<protocol-plugin protocol="SSH" class-name="org.opennms.netmgt.capsd.TcpPlugin" scan="on" user-defined="false">
                <property key="banner" value="SSH"/>
                <property key="port" value="22"/>
                <property key="timeout" value="3000"/>
                <property key="retry" value="3"/>
        </protocol-plugin>

SNMP Trap Daemon Conflict

If the SNMP trap daemon is running, it will conflict with OpenNMS. To disable it set SNMPDRUN=no in /etc/default/snmpd.

7.4 Log File Messages

Messages in collectd.log

Messages in capsd.log

Messages in manager.log

Messages in notifd.log

Messages in pollers.log

Errors in scriptd.log

Errors in threshd.log

Errors in web.log

7.5 Customization Problems

New trap definitions added to events.xml are categorized as unformatted.

How to add new trap descriptions on OpenNMS with mib2opennms

When you run mib2opennms, it usually does not set the value of "generic" to "6" and instead leaves it at "0". You almost always have to change that. When you get an unformatted trap event, it will list the enterprise id, the value for generic and the value for specific. Those three need to match the event in eventconf.xml for OpenNMS to not categorize your event as unformatted.

Device names are being displayed improperly.

DNS Resolution and OpenNMS

7.6 Error in varbinds with Extreme and Cisco devices

http://lists.opennms.org/pipermail/discuss/2003-May/027914.html

7.7 Error detecting SNMP on some devices that support it

http://lists.opennms.org/pipermail/install/2003-April/002553.html

7.8 OpenNMS only partially resolving IPs

http://lists.opennms.org/pipermail/discuss/2002-October/025140.html

7.9 Error Trying to Rescan a Node from WebGUI

Try restarting the tomcat4 server if there have been configuration changes made, or the opennms service has been restarted recently.

http://lists.opennms.org/pipermail/install/2003-June/002779.html

Also, there may be a SMB poller issue. See here for details.


Next Previous Contents