From: Kris Kelley ( )
Date: Wed Apr 03 2002 - 14:12:12 EST
- Next message: DURIARTE@CADTECH.ES: "Problem with IRIX NFS Client"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello all. Last Thursday I experienced some very weird NFS behavior
that I am still at a loss to explain.
The set-up:
The clients are two linux servers, named mx-two and mx-three. They run
Red Hat 7.1, with the base versions of the kernel (2.4.2) and mount
(2.10r). nfs-utils is not installed, since these are only clients, and
I do not use file locking over NFS.
The server that both clients mount to is a Windows 2000 Server running
Maestro 7.0.
The clients each have three separate shares mounted, using NFS version 2
via UDP with these options: rsize=4096,wsize=4096,hard,intr
Thursday, mx-two has had two of its three mounts hang on several
occasions, and it was always the same two mounts. While those mounts
were hanging (and causing the associated processes to freeze
indefinitely), the mounts on mx-three, along with the remaining mount on
mx-two, were experiencing a lot less trouble.
The first time this happened was at 9 AM. I saw these logs on mx-two:
Mar 28 08:58:44 mx-two kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 09:00:38 mx-two last message repeated 3 times
Mar 28 09:02:34 mx-two kernel: nfs:
task 30994 can't get a request slot
Mar 28 09:03:27 mx-two kernel: nfs:
task 34669 can't get a request slot
Mar 28 09:04:00 mx-two kernel: nfs:
task 36090 can't get a request slot
Mar 28 09:04:27 mx-two kernel: nfs:
task 36477 can't get a request slot
Mar 28 09:13:13 mx-two kernel: nfs:
task 44698 can't get a request slot
The "can't get a request slot" errors continued for about 45 minutes, at
which point the problem seemingly cleared up by itself:
Mar 28 09:43:15 mx-two kernel: nfs: server 10.1.1.24 OK
Mar 28 09:43:22 mx-two last message repeated 19 times
mx-three's logs were somewhat more benign:
Mar 28 09:16:32 mx-three kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 09:16:34 mx-three kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 09:16:34 mx-three kernel: nfs: server 10.1.1.24 OK
Mar 28 09:16:36 mx-three kernel: nfs: server 10.1.1.24 OK
Mar 28 09:18:05 mx-three kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 09:18:05 mx-three kernel: nfs: server 10.1.1.24 OK
Mar 28 09:42:13 mx-three kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 09:42:37 mx-three kernel: nfs: server 10.1.1.24 OK
The problem resurfaced starting at 9:58 AM, with mx-two saying the NFS
server was not responding, and the "can't get a request slot" errors
beginning to pile up. This continued for nearly an hour and a half, and
eventually I saw this error in the logs:
Mar 28 11:25:00 mx-two kernel: nfs_statfs: statfs error = 512
Immediately after this, I tried rebooting the machine. Interestingly,
not only did the hung shares (/nfsmount2 and /nfsmount3) have trouble
unmounting, but the one good share (/nfsmount1) also had trouble
unmounting. I saw these logs during the shutdown process:
Mar 28 11:26:01 mx-two kernel: nfs:
task 27416 can't get a request slot
Mar 28 11:26:08 mx-two kernel: nfs:
task 27655 can't get a request slot
Mar 28 11:26:32 mx-two umount: Cannot MOUNTPROG RPC: RPC:
Port mapper failure - RPC: Timed out
Mar 28 11:26:32 mx-two umount: umount2: Device or resource busy
Mar 28 11:26:32 mx-two umount: umount: /nfsmount2: device is busy
Mar 28 11:26:42 mx-two kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 11:27:32 mx-two umount: Cannot MOUNTPROG RPC: RPC:
Port mapper failure - RPC: Timed out
Mar 28 11:27:32 mx-two umount: umount2: Device or resource busy
Mar 28 11:27:32 mx-two umount: umount: /nfsmount3: device is busy
Mar 28 11:27:43 mx-two kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 11:28:32 mx-two umount: Cannot MOUNTPROG RPC: RPC:
Port mapper failure - RPC: Timed out
Mar 28 11:28:33 mx-two umount: umount2: Device or resource busy
Mar 28 11:28:33 mx-two umount: umount: /nfsmount1: device is busy
Mar 28 11:28:33 mx-two netfs: Unmounting NFS filesystems: failed
Mar 28 11:29:32 mx-two kernel: nfs:
task 28525 can't get a request slot
Mar 28 11:30:35 mx-two kernel: nfs:
task 28713 can't get a request slot
At this point I gave up and just hit the power button. When the machine
came back alive, the NFS shares failed to mount:
Mar 28 11:35:45 mx-two mount: mount: RPC: Timed out
Mar 28 11:36:06 mx-two mount: mount: RPC: Timed out
Mar 28 11:36:06 mx-two netfs: Mounting NFS filesystems: failed
I remounted the shares manually soon after the start-up process was
complete.
Meanwhile, mx-three had virtually no trouble during this 1.5-hour-long
period, logging only a few time-outs at about the time I was rebooting
mx-two:
Mar 28 11:32:31 mx-three kernel: nfs:
server 10.1.1.24 not responding, still trying
Mar 28 11:32:32 mx-three last message repeated 2 times
Mar 28 11:32:32 mx-three kernel: nfs: server 10.1.1.24 OK
Mar 28 11:32:33 mx-three kernel: nfs: server 10.1.1.24 OK
Mar 28 11:32:34 mx-three kernel: nfs: server 10.1.1.24 OK
This whole entire cycle repeated itself several times during the day.
Sometimes I was able to work around the problem by "remounting", that
is, mounting the same shares at the same mount points, hiding the old,
hung mounts. While this did not clear up processes that were trying to
access those mounts at the time, it did allow newer processes to see the
shares properly. On one occasion, the problem seemed to be cleared up,
the same way it did at 9:43 AM, and the hung processes cleared out.
Three other times, however, I ended up rebooting mx-two to clean up the
broken mounts. All the while, mx-three reported time-outs, and
occasionally a "can't get a request slot" error, but did not have
extended long-term problems the way mx-two did. And during the times
these errors were piling up, that one mount on mx-two still seemed to
behave itself.
On one occasion, while mx-two was thrashing about, I tried unmounting
one of the affected shares from mx-three. I got an RPC time-out but the
share still unmounted.
These machines stay fairly busy during the day, as both SMTP and IMAP
servers, and they share the load fairly equally (balanced behind a
common, outside IP). The network admin was trying different settings on
the firewall, but the problem persisted through several configuration
changes, and he is convinced that is not a firewall issue.
mx-three's installation is more recent, but the relevant software
(kernel, mount, email software packages) running on both is exactly the
same.
NFS has been behaving itself ever since. At one point I tried switching
mx-two's mounts to TCP, and they are still mounted that way (using NFS
version 2). Otherwise, I have not changed anything on the server or the
clients.
This set-up has been in place for several months now, with nothing of
this nature happening before. I am very inexperienced at troubleshooting
NFS, so I would greatly welcome any pointers on where to start digging
to find the root of this problem. Thank you!
---Kris Kelley
- Next message: DURIARTE@CADTECH.ES: "Problem with IRIX NFS Client"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b29
: Tue Apr 30 2002 - 23:54:02 EDT

Print View
Contact Me
