In trying to add a second node to a two node Oracle Grid 11gR2 cluster, a problem came up during node verification.
# cluvfy stage -pre nodeadd -n node2
...
Checking DNS response time for an unreachable node
Node Name Status
------------------------------------ ------------------------
node1 failed
node2 failed
PRVF-5637 : DNS response time could not be checked on following nodes: node1, node2
File "/etc/resolv.conf" is not consistent across nodes
The nodes were running identical DNS server configurations, with resolve.conf pointing to localhost, which recused into another DNS server . Turns out if you execute <nlslookup some-non-existent-host> on this particular setup, an apparently valid response is returned, but the command exists with status 1.
# nslookup some-non-existent-host
Server: 127.0.0.1
Address: 127.0.0.1#53
** server can't find some-non-existent-host: NXDOMAIN
# echo $?
1
The solution hack in this case is to disable recursion in the named.conf file. Now I had a non-authoritative answer, and the exit code is 0 from the nslookup command. cluvfy is happy, and no ignoring errors to get the install to work.
options {
...
recursion no;
...
# nslookup some-non-existent-host
Server: 127.0.0.1
Address: 127.0.0.1#53
Non-authoritative answer:
*** Can't find some-non-existent-host: No answer
# echo $?
0
# cluvfy stage -pre nodeadd -n node2
...
Checking DNS response time for an unreachable node
Node Name Status
------------------------------------ ------------------------
node1 passed
node2 passed
The DNS response time for an unreachable node is within acceptable limit on all nodes
File "/etc/resolv.conf" is consistent across nodes
Pre-check for node addition was successful.
Now, if anyone wants to send a clue my way about this behavior, from either a named or cluvfy perspective, that would be great…
Edit: See https://forums.oracle.com/forums/thread.jspa?messageID=10756719#10756719 for information about the bind change.