----- Original Message -----
From: "William Jojo"
To: "Jeremy Allison"
Cc: ; "Gerald (Jerry) Carter" ;
"Andrew Tridgell" ; "Jeremy Allison"
Sent: Tuesday, February 28, 2006 4:33 PM
Subject: Re: [Samba] hanging smbd(s) revisited


>
> ----- Original Message -----
> From: "Jeremy Allison"
> To: "William Jojo"
> Cc: ; "Gerald (Jerry) Carter" ;
> "Andrew Tridgell" ; "Jeremy Allison"
> Sent: Tuesday, February 28, 2006 3:25 PM
> Subject: Re: [Samba] hanging smbd(s) revisited
>
>
> > On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote:
> > >
> > > So we've gone back to 3.0.20 and we're stable again. I should indicate

> that
> > > it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop,

> excel
> > > shared workbook and ACLs (not necessarily in that order).
> > >
> > > Since the problem manifests in the filesystem where our Samba install

> is,
> > > and it appears to be a tdb (namely locking.tdb for fd=15, but can't

> identify
> > > the fd=3 that spins unmercifully), I'm wondering if *maybe* it could

be
> the
> > > "Fix for tdb clear-if-first race condition." or some other tdb change

> after
> > > 3.0.20 that traded one bug for another? I'm guessing... :-)

> >
> > Identifying that fd would be really useful.

>
> Ok, dug it up. This is the IBM info.
>
>
> ----- Original Message -----
> From: Robert Elias
> To: jojowil@hvcc.edu
> Sent: Monday, February 27, 2006 12:30 PM
> Subject: Pmr#47402,180
>
>
> Bill,
>
> Thank you for patience while I work through your questions. I ran this

issue
> by our level 3 performance team and received the following input.
>
> The file in question is inode 12363 in /samba. Use 'find /samba -inum

12363'
> to determine the file name.
>
> I ran this by the Samba team members that work for IBM and they suggested
> the following:
>
> As a long shot, I suggest that you have him run tdbtorture (a file i/o
> testcase) from the samba source tree as that does a simulation of the
> locking that Samba does and if we have a bug in AIX locking.
>
> Your comments or thoughts?
>
> Thanks,
>
> Robert Elias
> AIX Duty Manager
> IBM Integrated Technology Services
> 214-257-9292 - T/L 972
>
>
>
>
>
>
> [storage:/samba/3.0.21b] # find /samba -inum 12363
> /samba/3.0.21b/var/locks/locking.tdb
>
>
>
> > > We are going to start moving to 20a, then 20b, then to 21 then back to

> 21a
> > > where we started (21b did it too, haven't tried 21c yet) after another

> day
> > > or two of 3.0.20 to make sure we're not losing our mind.

> >
> > I've looked over the logic for the aquiring/release of the lock
> > for the locking.tdb in the 3.0.21c release code - I can't see any

possible
> > paths, error or otherwise where the lock can be left live on a
> > record. I'll keep looking though. When it's spinning, what is the errno

> that the fcntl call
> > returns ?
> >

>
> What appears to happen is pid 266946 is exiting (exited?) and some kind of
> dealock has occured which shows the following in filemon.sum from the
> perfpmr that IBM had me run during the event.
>
>
>
> 9603204 hooks processed (incl. 2108 utility)
> 60.013 secs in measured interval
> Cpu utilization: 42.9%
>
> Most Active Files
> ------------------------------------------------------------------------
> #MBs #opns #rds #wrs file volume:inode
> ------------------------------------------------------------------------
> 230.1 0 29492 0 pid=266946_fd=3
> 43.3 0 1588 129 pid=240270_fd=5
>

>
>
> My question to IBM was how can this happen? The above inode number is what
> was provided to me yesterday.
>
> Since moving to 3.0.20 the problem has subsided, I'm back here and not
> bugging IBM at the moment. :-|
>
> Whatever else I can get you, just say the word. :-)
>
> Do you agree with us to step to 20a, 20b ... ?
>
>


We've survived two days on 3.0.20, and our load is even more than when we
started. We have over 1000 smbd's running on this machine and it's not even
breaking a sweat.

Now additonally, I'm looking through source/locking/locking.c I notice that
diff of 3.0.20 and 20a and 20b have no changes. Then in 3.0.21 there's an
invasive change. (locking/posix.c remains unchanged through 21b.)

I'm pretty certain that 20a and 20b will be fine for us based on what I see,
but I'm still learning (and comprehending :-) ) these changes looking for a
smoking gun. And tomorrow I will put 20b (skipping 20a) in place on this
server.

I'm opening a bug because I think this one is real and load related.


Cheers,

Bill



> Cheers,
>
> Bill
>
>
> > Jeremy.
> > --
> > To unsubscribe from this list go to the following URL and read the
> > instructions: https://lists.samba.org/mailman/listinfo/samba
> >

>
> --
> To unsubscribe from this list go to the following URL and read the
> instructions: https://lists.samba.org/mailman/listinfo/samba
>


--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/listinfo/samba