08 February 2013

Oracle Forms & Reports 11.1.2.1.0 config.sh fails, unable to start AdminServer

OK, here's another one that have had me tear my hair out for a couple of days - surprisingly, when looking in the mirror, I still have some left... Sigh...

So, you're attempting to install and configure the latest and greatest Oracle Forms & Reports 11.1.2.1.0. You do an Install-Only and subsequently call $ORACLE_HOME/bin/config.sh. The configuration stalls at "Creating Domain" for an hour before ultimately failing.

In the /u01/app/oraInventory/logs/install*.out logs you find:

  Creating a new AdminServer Object ...  
  AdminServer port is 7001  
  Starting the domain ...  
   
 oracle.as.provisioning.util.ConfigException:  
 Error while starting the domain.  
 Cause:  
 An error occurred while starting the domain.  
 Action:  
 See logs for more details.  
     at oracle.as.provisioning.util.ConfigException.createConfigException(ConfigException.java:123)  
     at oracle.as.provisioning.weblogic.ASDomain.startDomain(ASDomain.java:3180)  
     at oracle.as.provisioning.weblogic.ASDomain.startDomain(ASDomain.java:3040)  
     at oracle.as.provisioning.engine.WorkFlowExecutor._startAdminServer(WorkFlowExecutor.java:1645)  
     at oracle.as.provisioning.engine.WorkFlowExecutor._createDomain(WorkFlowExecutor.java:635)  
     at oracle.as.provisioning.engine.WorkFlowExecutor.executeWLSWorkFlow(WorkFlowExecutor.java:391)  
     at oracle.as.provisioning.engine.Config.executeConfigWorkflow_WLS(Config.java:866)  
   


Now, I know from extensive personal experience that this particular exception dump and similar dumps at this configuration run juncture is perhaps one of the most dreaded ever periods in a DBA's and Middleware administrator's and developer's life when they are attempting to install Oracle Forms & Reports. It can happen for sooo many reasons, including but not limited to:

  • Node Manager is running
  • nodemanager.properties is "wrong" for Forms&Reports installation (so just rename it to something else until install is finished)
  • Domain already exists in some shape, including inside domain-registry.xml and nodemanager.domains
  • Domain already exists on filesystem under $MW_HOME/user_projects/applications/ or $MW_HOME/user_projects/domains/
  • /etc/hosts is wrongly configured (host name pointing to 127.0.0.1 has been a bug bear for quite some time in my virtualized environments with DHCP configured)
  • /etc/nsswitch.conf pointing to DNS first and DNS requests do not resolve
  • And so on and so forth...
But here's a new one that I didn't know about or had experienced before (to my knowledge)... It's also not (yet?) documented on Oracle Support. So stay tuned...

You notice in $MW_HOME/logs/wlst_*.log that the domain appears to be created correctly from the output logged:

 2013-02-08 14:32:04,077 INFO [main] com.oracle.cie.domain.script.ScriptExecutor - succeed: read domain from "/u01/app/oracle/Middleware/user_projects/domains/frsdev"  
 2013-02-08 14:32:04,077 INFO [main] com.oracle.cie.domain.script.ScriptExecutor - succeed: update domain  
 2013-02-08 14:32:04,077 INFO [main] com.oracle.cie.domain.script.ScriptExecutor - close template  
 2013-02-08 14:32:04,077 INFO [main] com.oracle.cie.domain.script.ScriptExecutor - succeed: close template  
   

And yet, the AdminServer is simply not starting up as it should - and for this very reason the OUI waits for 1 hour for it to start before giving up and failing (that's a looong time to wait btw.). You check with ps and find no AdminServer java process.

At this point you might even get desperate enough to run an strace against the OUI java process using: strace -f -s80 -o/tmp/oui.strace -p and notice something along the lines of:
 31939 open("/tmp/tmp1360242470594.py", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 615  
 31939 write(615, "import os\nreadTemplate('/tmp/tmp1360242469306wls.jar')\ndumpStack()\nsetOption('Se"..., 1513) = 1513  
 31939 stat("/u01/app/oracle/Middleware/wlserver_10.3/common/bin/wlst.sh", {st_mode=S_IFREG|0750, st_size=785, ...}) = 0  
 31939 stat("/u01/app/oracle/Middleware/wlserver_10.3/common/bin/wlst.sh", {st_mode=S_IFREG|0750, st_size=785, ...}) = 0  
 31939 clone(child_stack=0x7f8fd7821ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8fd78229d0, tls=0x7f8fd7822700, child_tidptr=0x7f8fd78229d0) = 19272  
 31939 clone(child_stack=0x7f90152e1ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f90152e29d0, tls=0x7f90152e2700, child_tidptr=0x7f90152e29d0) = 19274  
 31939 <... clone resumed> child_stack=0x7f8fd7922ff0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8fd79239d0, tls=0x7f8fd7923700, child_tidptr=0x7f8fd79239d0) = 19275  
 31939 --- SIGSEGV (Segmentation fault) @ 0 (0) ---  
   

Hmmmm, SIGSEGV is no good - but you're not any wiser as to the root cause...

Ok, back in the /u01/app/oraInventory/logs/*.out logs you may (not necessarily) notice in the beginning of the file(s):

 Command Launched: /usr/java/jdk1.6.0_38/bin/java -cp /tmp/OraInstall2013-02-08_08-37-48AM/ext/jlib/handlers/weblogicHandlers.jar oracle.as.install.common.handlers.weblogicQueries.PrintJVMProperties  
 java.io.IOException: Cannot run program "/usr/java/jdk1.6.0_38/bin/java": java.io.IOException: error=2, No such file or directory  
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)  
     at java.lang.Runtime.exec(Runtime.java:593)  
 ...  
 java.io.IOException: Cannot run program "/usr/java/jdk1.6.0_38/bin/java": java.io.IOException: error=2, No such file or directory  
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)  
     at java.lang.Runtime.exec(Runtime.java:593)  
 ...  
 java.io.IOException: Cannot run program "/usr/java/jdk1.6.0_38/bin/java": java.io.IOException: error=2, No such file or directory  
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)  
     at java.lang.Runtime.exec(Runtime.java:593)  
   

Hmmmm, what's going on here? There is no /usr/java/jdk1.6.0_38 JAVA_HOME on your file system.

Interestingly, you find that during installation (not configuration) similar errors may have been (quietly - i.e. no errors reported during installation) dropped in $ORACLE_HOME/cfgtoollogs/oui log files, e.g. install*.out:

 Command Launched: /usr/java/jdk1.6.0_38/bin/java -cp /tmp/OraInstall2013-02-08_11-31-25AM/ext/jlib/handlers/weblogicHandlers.jar oracle.as.install.common.handlers.weblogicQueries.PrintJVMProperties  
 java.io.IOException: Cannot run program "/usr/java/jdk1.6.0_38/bin/java": java.io.IOException: error=2, No such file or directory  
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)  
     at java.lang.Runtime.exec(Runtime.java:593)  
 ...  
 java.io.IOException: Cannot run program "/usr/java/jdk1.6.0_38/bin/java": java.io.IOException: error=2, No such file or directory  
     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)  
     at java.lang.Runtime.exec(Runtime.java:593)  
 ...  

You have at this stage perhaps dilligently (and desperately) tried running the OUI and config.sh with all the Java location options in your arsenal - but to no avail... Things still keep failing:

 export JAVA_HOME=/usr/java/jdk1.6.0_39  
 export PATH=$JAVA_HOME/bin:$PATH  
 unset ORACLE_HOME LD_LIBRARY_PATH  
 
 which java  
 /usr/java/jdk1.6.0_39/bin/java

./runInstaller -logLevel finest -jreLoc /usr/java/jdk1.6.0_39 . frsdev cd $ORACLE_HOME bin/config.sh -logLevel finest -jreLoc /usr/java/jdk1.6.0_39

Ok, you've run out of options and stuff to try - there is but one remaining issue to try and identify - where on earth is this non-existing /usr/java/jdk1.6.0_38 home originating from...?!

Meet find/grep...

  pwd  
 /u01/app/oracle/Middleware/wlserver_10.3  
   
 find . -type f -exec grep -i "jdk1.6.0_38" {} \; -ls  
 JAVA_HOME=/usr/java/jdk1.6.0_38  
 WLS_JAVA_HOME=/usr/java/jdk1.6.0_38  
 JAVAHOME=/usr/java/jdk1.6.0_38  
 2638455 4 -rw-r----- 1 oracle oinstall 1180 Jan 10 11:27 ./.product.properties  

Yeah, I hear you... NO WAY!!! But the evidence is irrefutable - on the filesystem Oracle (or the old BEA staff) have quietly dropped in a hidden file, which contains its own JAVA_HOME definitions. And not only that, they override WHATEVER you throw at the shell (export JAVA_HOME=...), OUI (-jreLoc) and config.sh (-jreLoc).

Now, stop config.sh, fix this file, remove the old configuration remnants (as described above) and now retry.
Ok, so why-oh-why did this happen - how did this old(er) JDK definition make it into the file in the first place? Let me answer you with a question: did you ever upgrade your JDK(s) on the system and remove the old one(s)? Did you do this after you installed WebLogic?

Yes, I know you dilligently changed all {domain}/bin/setDomainEnv.sh, nodemanager.properties, startNodeManager.sh etc. etc. etc. There was just this tiny one .product.properties file left to ruin it all...


Sigh... 2 days wasted... There are times when I consider completely moving to an open source stack... At least you got better access to code and scripts to troubleshoot such trivialities and not having to waste this amount of time!

4 comments:

ghp said...

You're ugly but I like you.

It's for people like you that the internet was invented.

Thanks a bunch, you saved me 46 hours.

Morten Jensen said...

Just as well I'm not particularly sensitive...

I will therefore take your comment as a compliment.

Morten

ghp said...

I knew you weren't particularly sensitive since it says on your profile that you've been working with Oracle since 1997.

It leaves it's marks, though ;-)

raj said...

Respected Sir, I had the same issue configuring Oracle IDM and came across your blog.
Following the steps from your blog, i was able to identify the problem and fix the issue.
Thanks a bunch for the steps to troubleshoot the problem. I was struck with it for almost 2 days.