WebSphere 184.108.40.206 hangs on application deployment / server startup
We are having a very evasive issue in WebSphere, related to application deployment.
We have a set of EARs we want to deploy in WebSphere 220.127.116.11 ND, on AIX.
When we deploy them manually, or using a script through wsadmin, they can all be installed and started successfully, except for one:
1. When deploying through script, the script would fail with a time-out error
2. When deploying through console, the process would hang for ever
In the SystemOut.log, after 10 minutes or so, the following message appears:
6/27/08 4:56:06:635 CDT 0000003b ThreadMonitor W WSVR0605W: Thread "WebContainer : 2" (0000003f) has been active for 692135 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
We then decide to kill the WebSphere process and restart, which results in the WebSphere server never finishing restart, the process hangs as well, with the same kind of message as above, except the Thread name is "server.startup: N".
We have clearly identified that this happens when starting one of the application, which we'll call "AppA.ear" for this post.
In both cases, we then got a javacore dump (using kill -3), and sw what the threads were doing when hanging, as follows:
3XMTHREADINFO "WebContainer : 2" (TID:0x7000000000D4608, sys_thread_t:0x11CF0AEC0, state:MW, native ID:0x3484) prio=5
4XESTACKTRACE at java.util.jar.Manifest.read(Manifest.java(Compiled Code))
4XESTACKTRACE at java.util.jar.Manifest. (Manifest.java(Compiled Code))
4XESTACKTRACE at java.util.jar.JarFile.getManifest(JarFile.java(Compiled Code))
4XESTACKTRACE at com.ibm.etools.j2ee.commonarchivecore.util.ClasspathUtil.getManifestPaths(Class pathUtil.java(Compile
4XESTACKTRACE at com.ibm.etools.j2ee.commonarchivecore.util.ClasspathUtil.processManifest(Classp athUtil.java:28)
4XESTACKTRACE at com.ibm.etools.j2ee.commonarchivecore.util.ClasspathUtil.processManifest(Classp athUtil.java:34)
4XESTACKTRACE at com.ibm.etools.j2ee.commonarchivecore.impl.ArchiveImpl.getDependencyClassPath(A rchiveImpl.java:1594)
4XESTACKTRACE at com.ibm.etools.j2ee.commonarchivecore.impl.ArchiveImpl.getDependencyClassPath(A rchiveImpl.java:1579)
4XESTACKTRACE at com.ibm.ws.classloader.ClassGraph.addModule(ClassGraph.java:168)
4XESTACKTRACE at com.ibm.ws.classloader.ClassLoaderManager.initialize(ClassLoaderManager.java:18 2)
4XESTACKTRACE at com.ibm.ws.classloader.ClassLoaderManager. (ClassLoaderManager.java:144)
Note that in some other cases, the current method is actually "ZipFile.open", although I don't have a log to show it right now.
So we understand that this problem is somehow related to class loading...
We also notice that, strangely enough, when commenting off another application (e.g. "AppB.ear") in the 'serverindex.xml', then "AppA.ear" can be started successfully, without hanging.
We found this note in IBM bug "PK30072: HANG DURING APPLICATION DEPLOYMENT ON AIX":
"This problem is a hang occurring when redeploying an
application from the administrative console, where the
application has many (600 or more) EJB classes, and where the
application contains JSPs. The hang occurs during JSP
compilation, in "java.util.zip.ZipFile.open" "
However, neither AppA.ear or AppB.ear contain that many EJB classes or JSPs, nor any of our other EARs.
Before posting this, we had already encountered this problem some time ago, but it resolved randomly when restarting the server: sometimes it would hang, sometimes not... Now the hang will occur at every startup, unless we comment off either AppB.ear or AppA.ear from serverindex.xml.
Note: we are using the default deployment parameters for all applications, i.e. Module-level class loader, 'PARENT FIRST', default load order (1 for all EARs, 5000 for nested JARs/WARs ?) etc.
Any clue is appreciated!