READ/RENEW loop bug

Discussion:

READ/RENEW loop bug

Steven Brudenell

2010-01-17 05:39:46 UTC

Hello list,

I'm having a pretty nasty issue with nfs4. I work in a Linux
environment where a lot of software is run off of a central nfs4
server (i.e., executables are on the nfs mount).

The issue is that client machines will suddenly enter a state where
they start spamming nfs traffic and become unusable. The nfs traffic
is apparently an infinite loop of the following:

-> PUTFH
<- NFS4_OK
-> READ
<- NFS4ERR_STALE_STATEID
-> RENEW
<- NFS4_OK

This repeats infinitely. The filehandle, clientid and stateid used are
always the same in all iterations of the loop. I have pcap captures I
can send, if this is useful (I'm afraid adding an attachment to my
initial message to the list will put me on the wrong side of the spam
filter).

When this starts happening, one process (the specific process is never
the same) on the client machine freezes completely. The same
executable cannot be run again -- it will similarly freeze. These
processes can be killed, but the spam traffic does not stop. The only
fix we've found so far is a reboot of the client machine. This issue
has persisted across multiple reboots of the server machine, and does
not appear to be triggered by any particular event on the server.
There is also no deterministic trigger on the client side, though I
think it seems to happen in periods of high traffic.

The server is nfs 4.0 (not 4.1). The server machine is Ubuntu 9.10
server (2.6.31-17-server); we've had this bug crop up on both Ubuntu
9.10 (2.6.31-17-generic) and Ubuntu 8.04 (not sure of kernel version)
desktops.

Now I know that Ubuntu has a patched kernel, and all else being equal
this should be a bug against Ubuntu. The reason I'm sending this to
this list is because I've spent the entire day reading the nfs4 spec
(rfc 3530) and reading the nfs client and server code in both Ubuntu
and latest-stable vanilla flavors, and
1) I'm confused and a little concerned about the way nfs4 works and I
would like to clarify my understanding, and
2) I hope I can help resolve the issue a little faster by working
directly with the devs, since I am also a dev (though not a kernel
dev).

I have not yet tried to understand how the client's stateid became
stale in the first place. The spec says that the client should be able
to deal with this happening at any time.

After reading the spec and the kernel code, it seems as though the
client code and server code have two different ideas about the
semantics of the RENEW operation.

The client code seems to respond to NFS4ERR_STALE_STATEID in almost
all cases by issuing a RENEW (via nfs4_schedule_state_recovery in
nfs/nfs4proc.c).

The server decides a stateid is stale if its si_boot time is new
enough (STALE_STATEID in nfsd/nfs4state.c). The server's RENEW
implementation (renew_client in nfsd/nfs4state.c) just resets its
"last active time" (nfs4client::cl_time, and the lru) to renew its
leases, which does not appear to reset the si_boot time of any
associated stateids. In fact, the only thing that seems to set
stateid_t::si_boot (thus freshening the stateid) are the OPEN and LOCK
operations, and friends.

Going by server implementation it looks like if client sees
STALE_STATEID, it must re-OPEN/LOCK a filehandle.

However, I can't tell from rfc 3530 whether this is correct. It says:

"The RENEW operation is used by the client to renew leases which it
currently holds at a server."

And that a lease is

"An interval of time defined by the server for which the client is
irrevocably granted a lock."

And a stateid:

"A 128-bit quantity returned by a server that uniquely defines the
open and locking state provided by the server for a specific open or
lock owner for a specific file."

I have not wrapped my head around the lock/lease/stateid trinity yet.
A strict interpretation points out that a stateid is "_open_ and
locking state", so therefore the only proper response to STALE_STATEID
is an OPEN. A loose interpretation says that they're all the same, and
that RENEW should refresh stateids.

It seems to me to be that the client was written with one assumption,
and the server with the other. The READ/RENEW loop seems like the
obvious outcome from that.

There's also a third valid perspective:

"The client must also employ the SETCLIENTID operation when it
receives a NFS4ERR_STALE_STATEID error using a stateid derived from
its current clientid."

As far as I can tell, the SETCLIENTID path only gets triggered on
NFS4ERR_STALE_CLIENTID.

I am sure I have missed some important things somewhere, because I
can't believe the implementation could be so divergent from the spec
as I think it is. However it's currently the only way I have to
explain the READ/RENEW loop bug that's taking down all the machines
where I work.

What have I missed?

Thanks very much in advance,

~Steve

J. Bruce Fields

2010-01-17 22:36:45 UTC