There are a number of questions answered in the online FAQ in
the
troubleshooting section. A good process for determining problems is to stop OpenNMS,
delete all the log files in /var/log/opennms/ and restart OpenNMS.
A command like watch -d "ls -al" allows one to monitor
exactly what log files are changing while OpenNMS is restarting.
Using grep ERROR * the log files can be searched for any containing
ERROR. Tarus describes the process as:
Watch the output of the "watch" command. The log files should steadily grow. First eventd.log, then capsd and collectd (usually the largest), followed by poller and finally threshd. After threshd.log has some content, you should see rtc.log and then rtcdata.log populate. When rtcdata.log has data, "Calculating" should be gone.If it stops before then, do this in the logs directory:grep FATAL *grep ERROR *and look for anything suspicious.-T
java.sql.SQLException: Sorry, too many clients already ".../[directory]" does not exist! [chmod] /bin/chmod: too few arguments Discovery and ICMP Service Monitor won't start...connection error. An error occurred initializing the database connections: No suitable driver found Category Not Found "Router" when starting the Web UI I'm Installing the RPMs But It Still Can't Find DBI/DBD::Pg... RTC Session Does Not exist Web UI won't authenticate me even though I'm in the users.xml I see a frightening number of Java processes/memory allocated to Java with ps or top SNMP Data Not Collected on Linux machines only packages with major numbers <= 3 are supported by this version of RPM An error occurred initializing the event correlation manager: Connection refused. I just upgraded to Red Hat 7.1, and java freezes, what gives? ERROR: Java2 Virtual Machine Not Found. error while loading shared libraries: libstdc++-libc6.1-1.so.2 assets table problem during install ONC/RPC program not registered PostgreSQL doesn't want to start/won't start automatically Every thing's installed, but I get: HTTP Status 500 - No Context configured to process this request jar_cacheXXXX.tmp files are filling up my /tmp build.sh: line 189: 5672 General protect error $JAVA_HOME/bin/java ... I get "can't parse argument 'RRA:AVERAGE:0.5:1:8928'" What are the Steps for a Minimal OpenNMS installation? I installed OpenNMS, and admin/admin Does Not Log Me On Tomcat won't start, complains about JAVA_HOME FATAL 1: IDENT authentication failed for user "postgres" apt complains about zebra and gated in the lynx installer OpenNMS Says My DNS Server is Down, When It Is Up Why are some of my XML files all one line? Why Don't My Linux Servers with the UCD SNMP Agent Show Up in Performance Reports? opennms.sh status returns nothing, what's happening?! Linux - OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI. RPM install hangs on RedHat 8.0 How Can I Best Test My XML Files? Why Do I Get an Invalid ifIndex Error? Getting around IDENT auth error during installation How are node labels determined? Logout/Re-login I upgraded to 1.1.1 and now "Manage/Unmanage" does not work Internal Server Errors Why doesn't the dhcpd process ever start? Why Are Availability Reports Never Generated? OpenNMS.Rtcd problem Why Doesn't the DHCP Service Start? I can snmpwalk a device, but OpenNMS won't collect data on it, why? New Installation: I can't login Why Does My Windows DHCP Server Show as Down? Why do KSC reports give me a "null parameters" error?
It could also be this:
24hr avail did not get updated
A device may have many (hundreds or thousands) of interfaces due to VOIP dial peers. On Cisco devices (AS5300s) the SNMP process will timeout if the interface table is too long. SNMP views can be used to limit what SNMP interfaces are made available to ONMS. Limiting this information will allow ONMS to gather a complete (restricted to main interfaces) interface table. Other devices with hundreds of sub-interfaces may cause a similar problem.
If the GUI contains strange characters in menu items or response time graphs there is a chance of a corrupt Tomcat (Java Engine) cache. To clear the cache and restart the GUI perform the following steps:
bash#/etc/init.d/tomcat4 stop
bash#rm -rf /var/cache/tomcat4/*
bash#/etc/init.d/tomcat4 start
Error Trying to Rescan a Node from WebGUI
java.io.FileNotFoundException: /usr/share/OpenNMS/etc/users.xml
(Permission denied)
If there is a message related to (Permission denied) then Tomcat
is probably running under the tomcat4 user. This error is telling
you that the tomcat4 user cannot access the specified file (users.xml
above) and you must manually change the permissions to resolve this
problem.
bash#chown tomcat4 /usr/share/opennms/etc/users.xml
Running Tomcat as root will resolve the problem too.
org.apache.jasper.JasperException: You must set a DbConnectionFactory
before requesting a database connection.
This error will occur when trying to run tomcat4 as user tomcat4
rather than the root user with the default install OpenNMS file permissions.
Try changing setting TOMCAT4_USER="root" in /etc/default/tomcat4
The OpenNMS server is probably not running. Restart the opennms service.
If this message appears while trying to view maps, the Sun JDK
is probably built improperly. Install xlibs and rebuild the Sun JDK
as shown
here.
The non-IP interfaces may be set to null in the issnmpprimay/ipinterfaces column. SQL to fix this is here.
If there is a DHCP client running on the OpenNMS server, it will interfere with the starting of the OpenNMS poller. Disable the DHCP client by assigning a static IP address and restart the OpenNMS server to resolve the situation.
Dhcpd or dhcpcd processes do not start on the OpenNMS server
The ssh poller will sometimes cause the maximum number of ssh
connections to be opened on a monitored server. As a result the server
no longer accepts connections on the ssh port. To resolve this problem
change the poller plugin from 'ssh' to 'tcp' in the capsd-configuration.xml.
An SSH poller configured to avoid this SSH DoS situation looks like
the following in capsd-configuration.xml. Prior to v1.1.3 this appeared
as SshPlugin rather than TcpPlugin
<protocol-plugin protocol="SSH" class-name="org.opennms.netmgt.capsd.TcpPlugin" scan="on" user-defined="false">
<property key="banner" value="SSH"/>
<property key="port" value="22"/>
<property key="timeout" value="3000"/>
<property key="retry" value="3"/>
</protocol-plugin>
If the SNMP trap daemon is running, it will conflict with OpenNMS.
To disable it set SNMPDRUN=no in /etc/default/snmpd.
collectd.log
Hmmm.. not sure about this one yet.
This can have to do with open files, the Java HEAP size and/or corrupt RRD files as stated here
capsd.log
An SNMP View is restricting full SNMP access to a Cisco Devices. Also see the Cisco configuration notes.
A device has other private interfaces that are not reachable by ONMS (but automatically detected and added to ONMS)
A Windows IIS server does not have a default page configured
Qmail mail server, possibly restricted to only allow mail from specific sources other than ONMS
A device is powered down during a SNMP poll
The SshPlugin poller for OpenNMS 1.1.2 has some problems, it
should be configured in capsd-configuration.xml as TcpPlugin.
capsd-configuration.xml. Details
on this can be found
here
manager.log
This error is harmless and always occurs, it likely has to do with the order in which services are stopped.
notifd.log
pollers.log
These is one reason why this could be happening posted here RRD database 'update' failed. Another reason stated in a July1, 2003 posted by Tarus is as follows,
Okay, when OpenNMS writes an RRD for an interface, it uses the directoryname "ifDescr+MAC". On some machines, namely Compaq servers, it ispossible for two interfaces to have the same ifDescr and MAC address. Sowhat happens is that OpenNMS grabs the data for ifIndex=2, writes it,grabs the data for ifIndex=3, and attempts to write it to the same .rrdfile. Since RRD requires a minimum one second step, this second writefails.You know the problems with Layer 2 interfaces, so we really don't have asolution, except not to poll interfaces where this occurs.
-T
scriptd.log
This error will prevent anyone from logging in. If the TOMCAT_USER
is modified while the tomcat4 server is running, the daemon will
not shutdown properly (the only indication of this is a ps aux showing
the processes running still). Stop OpenNMS and Tomcat4, ensure there
are no Java processes remaining and restart the system. All will
be fine.
threshd.log
Invalid ifTable on some Fibre devices (IBM SanDataGateway, McData Sphereon 4500)
web.log
This occurs when Tomcat connects to the Realtime Console (RTC) before OpenNMS is fully running. It eventually connects fine, so this message can be ignored.
This occurs when the SunJDK 1.4 package is built without the
xlibs library installed. To fix this problem perform the following
steps, pressing enter to accept all defaults. The package will be
automatically reinstalled and maps will work properly
bash~#apt-get install xlibs bash~#build-sun-jdk14 bash~#/etc/init.d/tomcat4 restart
How to add new trap descriptions on OpenNMS with mib2opennms
When you run mib2opennms, it usually does not set the value of
"generic" to "6" and instead leaves it at "0".
You almost always have to change that. When you get an unformatted
trap event, it will list the enterprise id, the value for generic
and the value for specific. Those three need to match the event in
eventconf.xml for OpenNMS to not categorize your event as unformatted.
http://lists.opennms.org/pipermail/discuss/2003-May/027914.html
http://lists.opennms.org/pipermail/install/2003-April/002553.html
http://lists.opennms.org/pipermail/discuss/2002-October/025140.html
Try restarting the tomcat4 server if there have been configuration changes made, or the opennms service has been restarted recently.
http://lists.opennms.org/pipermail/install/2003-June/002779.html
Also, there may be a SMB poller issue. See here for details.