Discussion:
client problems with opensuse11.2
David Werner
2010-03-11 10:42:21 UTC
Permalink
Hello,

we run a nfsv4-server on a RHELS 5.4 (redhat tikanga) system. kernel
2.6.18-164.11.1.el5 (64 Bit) and the last days again
kernel 2.6.18-128.1.1.el5, which seems to provide
a more stable setup.

Security of nfs (authentication) is none.
We still use a setup with nis and rpc.idmap.

Clients are opensuse with versions 11.1 and 11.2.
both 32 and 64-bit systems

on 32 bit and

kernel-2.6.31.5-0.1-default
kernel-2.6.31.12-0.1-default
kernel-2.6.33-rc8-20-default (one I took from a special "HEAD" repository)
kernel-2.6.32.9-20-default (one I build myself essentially vanilla kernel with config options from 2.6.31.12)

on 64 bit

kernel-2.6.31.5-0.1-desktop
kernel-2.6.31.12-0.1-desktop


What I observe is that with all current client kernel for opensuse 11.2
and also with (kernel-2.6.33-rc8-20, kernel-2.6.32.9-20)
quite often hanging problems when using kde-4.3 e.g. konqueror
but also other applications more hangings seem to occur with 2.6.18-164 then with 2.6.18-128
as server-system.

The symptoms are allways the same:

The server and client produce kind of loop. Many hundred till thousend requests per second
were produced, rpciod and kthreadd show some activity. Grafic appears mostly frozen.
Work on console is possible. System load goes up, ranges typically between 4-30.
I started to document cases with tcpdumps:
Someone with nfs-programming skills might look at them. They can be found:

http://maultier.iws.uni-stuttgart.de:8080/nfsv4/

I try to extend documentation as I can and if I observe something again.
I found before similiar looping with opensuse 11.1 when rebooting
the server and some clients where open, but did not document it.


Regards, David
--
David Werner system administrator
IWS, Universit?t Stuttgart phone: +49 711 685 670 10
Pfaffenwaldring 61
D-70569 Stuttgart email: david.werner at iws.uni-stuttgart.de

pgp key: http://www.hydrosys.uni-stuttgart.de/institut/mitarbeiter/werner.php
pgp fingerprint: 1952 3442 47F7 F6A1 55B2 15F9 1E9D 487E B0DC 1862
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://linux-nfs.org/pipermail/nfsv4/attachments/20100311/371d214a/attachment.bin
J. Bruce Fields
2010-03-11 15:54:29 UTC
Permalink
Post by David Werner
Hello,
we run a nfsv4-server on a RHELS 5.4 (redhat tikanga) system. kernel
2.6.18-164.11.1.el5 (64 Bit) and the last days again
kernel 2.6.18-128.1.1.el5, which seems to provide
a more stable setup.
Security of nfs (authentication) is none.
We still use a setup with nis and rpc.idmap.
Clients are opensuse with versions 11.1 and 11.2.
both 32 and 64-bit systems
on 32 bit and
kernel-2.6.31.5-0.1-default
kernel-2.6.31.12-0.1-default
kernel-2.6.33-rc8-20-default (one I took from a special "HEAD" repository)
kernel-2.6.32.9-20-default (one I build myself essentially vanilla kernel with config options from 2.6.31.12)
on 64 bit
kernel-2.6.31.5-0.1-desktop
kernel-2.6.31.12-0.1-desktop
What I observe is that with all current client kernel for opensuse 11.2
and also with (kernel-2.6.33-rc8-20, kernel-2.6.32.9-20)
quite often hanging problems when using kde-4.3 e.g. konqueror
but also other applications more hangings seem to occur with 2.6.18-164 then with 2.6.18-128
as server-system.
The server and client produce kind of loop. Many hundred till thousend requests per second
were produced, rpciod and kthreadd show some activity. Grafic appears mostly frozen.
Work on console is possible. System load goes up, ranges typically between 4-30.
http://maultier.iws.uni-stuttgart.de:8080/nfsv4/
I try to extend documentation as I can and if I observe something again.
I found before similiar looping with opensuse 11.1 when rebooting
the server and some clients where open, but did not document it.
From 2010-03-11_event1/tcpdump: there's an infinite loop that looks
like:

write->bad_stateid
renew->ok

stateid is 487A914B510B0200ECB81F00,
clientid is 487A914B9C010000.

(so the timestamps (first 4 bytes) are consistent between the two,
unlike a similar example we saw on the list before).

--b.
Post by David Werner
Regards, David
--
David Werner system administrator
IWS, Universit?t Stuttgart phone: +49 711 685 670 10
Pfaffenwaldring 61
D-70569 Stuttgart email: david.werner at iws.uni-stuttgart.de
pgp key: http://www.hydrosys.uni-stuttgart.de/institut/mitarbeiter/werner.php
pgp fingerprint: 1952 3442 47F7 F6A1 55B2 15F9 1E9D 487E B0DC 1862
_______________________________________________
NFSv4 mailing list
NFSv4 at linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
David Werner
2010-04-21 18:29:06 UTC
Permalink
Over the last days I observed that our time configuration
had some unlucky choices, I reconfigured it and learned about watching the clocks with
ntpq, peers and other stuff.
It seems that some of our problems are gone. Not sure whether for that reason.
What opensuse dropped some time ago is initial "ntpdate" which seems to be
now obsolet tool. But they don't use "ntpd -q" instead, so I configured on clients
server-lines with "iburst"-parameter, to come up shortly with nearly right time.
Now NTP seems to run fine.
All that that leeds me to the question:

How large time-differences for NFS version 4 are generally tolerable?
When will a NFS server-client configuration became insane?
What senariaos should one avoid for what reason?
Should one integrate parts oft an ntp-like-protocol into a future fileserver protocol stack
in order to detect or avoid client-server-time-problems? Are there fundamental reasons
which are against that, something like fixed timestamps on files.

I guess some of the questions will answered by RFCs. May be answers to first could
be given to recommend and reason some good practise.

Greetings, David
J. Bruce Fields
2010-04-22 21:05:03 UTC
Permalink
Post by David Werner
Over the last days I observed that our time configuration
had some unlucky choices, I reconfigured it and learned about watching the clocks with
ntpq, peers and other stuff.
It seems that some of our problems are gone. Not sure whether for that reason.
What opensuse dropped some time ago is initial "ntpdate" which seems to be
now obsolet tool. But they don't use "ntpd -q" instead, so I configured on clients
server-lines with "iburst"-parameter, to come up shortly with nearly right time.
Now NTP seems to run fine.
How large time-differences for NFS version 4 are generally tolerable?
It really shouldn't be a problem, this is just a bug.

Could you test with the following patch, and your old (problematic)
configuration, and see if the problem is still there?

--b.

commit 1eb88dd568def1b25a521e66f0e03cee76bcdd90
Author: J. Bruce Fields <bfields at citi.umich.edu>
Date: Thu Apr 22 16:21:39 2010 -0400

Revert "nfsd4: distinguish expired from stale stateids"

This reverts commit 78155ed75f470710f2aecb3e75e3d97107ba8374.

Conflicts:

fs/nfsd/nfs4state.c

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6a8feda..aac1f08 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -190,7 +190,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_stateid *stp, struct svc_f
dp->dl_vfs_file = stp->st_vfs_file;
dp->dl_type = type;
dp->dl_ident = cb->cb_ident;
- dp->dl_stateid.si_boot = get_seconds();
+ dp->dl_stateid.si_boot = boot_time;
dp->dl_stateid.si_stateownerid = current_delegid++;
dp->dl_stateid.si_fileid = 0;
dp->dl_stateid.si_generation = 0;
@@ -1827,7 +1827,7 @@ init_stateid(struct nfs4_stateid *stp, struct nfs4_file *fp, struct nfsd4_open *
stp->st_stateowner = sop;
get_nfs4_file(fp);
stp->st_file = fp;
- stp->st_stateid.si_boot = get_seconds();
+ stp->st_stateid.si_boot = boot_time;
stp->st_stateid.si_stateownerid = sop->so_id;
stp->st_stateid.si_fileid = fp->fi_id;
stp->st_stateid.si_generation = 0;
@@ -2661,39 +2661,11 @@ nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stateid *stp)
static int
STALE_STATEID(stateid_t *stateid)
{
- if (time_after((unsigned long)boot_time,
- (unsigned long)stateid->si_boot)) {
- dprintk("NFSD: stale stateid " STATEID_FMT "!\n",
- STATEID_VAL(stateid));
- return 1;
- }
- return 0;
-}
-
-static int
-EXPIRED_STATEID(stateid_t *stateid)
-{
- if (time_before((unsigned long)boot_time,
- ((unsigned long)stateid->si_boot)) &&
- time_before((unsigned long)(stateid->si_boot + lease_time), get_seconds())) {
- dprintk("NFSD: expired stateid " STATEID_FMT "!\n",
- STATEID_VAL(stateid));
- return 1;
- }
- return 0;
-}
-
-static __be32
-stateid_error_map(stateid_t *stateid)
-{
- if (STALE_STATEID(stateid))
- return nfserr_stale_stateid;
- if (EXPIRED_STATEID(stateid))
- return nfserr_expired;
-
- dprintk("NFSD: bad stateid " STATEID_FMT "!\n",
+ if (stateid->si_boot == boot_time)
+ return 0;
+ dprintk("NFSD: stale stateid " STATEID_FMT "!\n",
STATEID_VAL(stateid));
- return nfserr_bad_stateid;
+ return 1;
}

static inline int
@@ -2817,10 +2789,8 @@ nfs4_preprocess_stateid_op(struct nfsd4_compound_state *cstate,
status = nfserr_bad_stateid;
if (is_delegation_stateid(stateid)) {
dp = find_delegation_stateid(ino, stateid);
- if (!dp) {
- status = stateid_error_map(stateid);
+ if (!dp)
goto out;
- }
status = check_stateid_generation(stateid, &dp->dl_stateid,
flags);
if (status)
@@ -2833,10 +2803,8 @@ nfs4_preprocess_stateid_op(struct nfsd4_compound_state *cstate,
*filpp = dp->dl_vfs_file;
} else { /* open or lock stateid */
stp = find_stateid(stateid, flags);
- if (!stp) {
- status = stateid_error_map(stateid);
+ if (!stp)
goto out;
- }
if (nfs4_check_fh(current_fh, stp))
goto out;
if (!stp->st_stateowner->so_confirmed)
@@ -2908,7 +2876,7 @@ nfs4_preprocess_seqid_op(struct nfsd4_compound_state *cstate, u32 seqid,
*/
sop = search_close_lru(stateid->si_stateownerid, flags);
if (sop == NULL)
- return stateid_error_map(stateid);
+ return nfserr_bad_stateid;
*sopp = sop;
goto check_replay;
}
@@ -3175,10 +3143,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (!is_delegation_stateid(stateid))
goto out;
dp = find_delegation_stateid(inode, stateid);
- if (!dp) {
- status = stateid_error_map(stateid);
+ if (!dp)
goto out;
- }
status = check_stateid_generation(stateid, &dp->dl_stateid, flags);
if (status)
goto out;
@@ -3404,7 +3370,7 @@ alloc_init_lock_stateid(struct nfs4_stateowner *sop, struct nfs4_file *fp, struc
stp->st_stateowner = sop;
get_nfs4_file(fp);
stp->st_file = fp;
- stp->st_stateid.si_boot = get_seconds();
+ stp->st_stateid.si_boot = boot_time;
stp->st_stateid.si_stateownerid = sop->so_id;
stp->st_stateid.si_fileid = fp->fi_id;
stp->st_stateid.si_generation = 0;

Loading...