[Samba] secrets.tdb locking fun! - Samba

This is a discussion on [Samba] secrets.tdb locking fun! - Samba ; Hi all, After much grief today, we'd like to support the 'TODO' note in source/lib/util_tdb.c :- /* TODO: If we time out waiting for a lock, it might * be nice to use F_GETLK to get the pid of the ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [Samba] secrets.tdb locking fun!

  1. [Samba] secrets.tdb locking fun!

    Hi all,

    After much grief today, we'd like to support the 'TODO' note in
    source/lib/util_tdb.c :-

    /* TODO: If we time out waiting for a lock, it might
    * be nice to use F_GETLK to get the pid of the
    * process currently holding the lock and print that
    * as part of the debugging message. -- mbp */


    It could have saved us a whole-sale restart of Samba if we could have
    more easily identified the process that was causing us to get loads and
    loads of:-

    [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_log(783)
    tdb(/usr/local/samba/private/secrets.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
    [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal(82)
    tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /usr/local/samba/private/secrets.tdb


    We believe we did find it, but only by sending a SIGTERM to all 'smbd'
    processes and being left with 'smbd' proceses that didn't die. (SIGKILL
    did remove them).


    Mac
    Assistant Systems Administrator @nibsc.ac.uk
    mac@nibsc.ac.uk
    Work: +44 1707 641565 Everything else: +44 7956 237670 (anytime)
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

  2. Re: [Samba] secrets.tdb locking fun!

    On Fri, Jun 15, 2007 at 03:19:08PM +0100, Mac wrote:
    > After much grief today, we'd like to support the 'TODO' note in
    > source/lib/util_tdb.c :-
    >
    > /* TODO: If we time out waiting for a lock, it might
    > * be nice to use F_GETLK to get the pid of the
    > * process currently holding the lock and print that
    > * as part of the debugging message. -- mbp */


    Not sure if you have Linux, but if you do a "cat
    /proc/locks" in that situation would also have helped you.

    Volker

    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (GNU/Linux)

    iD8DBQFGcqy5pZr5CauZH5wRAj/cAJ9qy3kg6vIRZIvou2L3s/TDRr0oyQCgx8Zt
    rOltpt43HmyzYWDoQbQ5ysQ=
    =6KlQ
    -----END PGP SIGNATURE-----


  3. Re: [Samba] secrets.tdb locking fun!

    On Fri, Jun 15, 2007 at 03:19:08PM +0100, Mac wrote:
    > Hi all,
    >
    > After much grief today, we'd like to support the 'TODO' note in
    > source/lib/util_tdb.c :-
    >
    > /* TODO: If we time out waiting for a lock, it might
    > * be nice to use F_GETLK to get the pid of the
    > * process currently holding the lock and print that
    > * as part of the debugging message. -- mbp */
    >
    >
    > It could have saved us a whole-sale restart of Samba if we could have
    > more easily identified the process that was causing us to get loads and
    > loads of:-
    >
    > [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_log(783)
    > tdb(/usr/local/samba/private/secrets.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
    > [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal(82)
    > tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /usr/local/samba/private/secrets.tdb
    >
    >
    > We believe we did find it, but only by sending a SIGTERM to all 'smbd'
    > processes and being left with 'smbd' proceses that didn't die. (SIGKILL
    > did remove them).


    If you get an smbd in that state, can you attach to it with gdb and
    get a stack backtrace so we can see where it was ?

    Thanks,

    Jeremy.
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

  4. Re: [Samba] secrets.tdb locking fun!

    Hi there,

    >Date: Fri, 15 Jun 2007 10:32:50 -0700
    >From: Jeremy Allison
    >To: Mac
    >Cc: samba@samba.org
    >Subject: Re: [Samba] secrets.tdb locking fun!
    >>
    >> [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_log(783)
    >> tdb(/usr/local/samba/private/secrets.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
    >> [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal(82)
    >> tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /usr/local/samba/private/secrets.tdb
    >>

    >If you get an smbd in that state, can you attach to it with gdb and
    >get a stack backtrace so we can see where it was ?



    We'll certainly have a go for you. I assume you mean the 'smbd' that
    has got wedged (non "TERM"-able). Or do you mean the still running ones
    (i.e the ones that are getting the 'tdb' errors.)





    (and yes, the very astute of you will have noticed that the code
    fragment I mentioned came from 3.0.25, but the log file entries are
    3.0.24....... We'll move to 3.0.25 when 'b' comes out).


    Mac
    Assistant Systems Administrator @nibsc.ac.uk
    mac@nibsc.ac.uk
    Work: +44 1707 641565 Everything else: +44 7956 237670 (anytime)
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

  5. Re: [Samba] secrets.tdb locking fun!

    On Fri, Jun 15, 2007 at 06:44:41PM +0100, Mac wrote:
    > Hi there,
    >
    > >Date: Fri, 15 Jun 2007 10:32:50 -0700
    > >From: Jeremy Allison
    > >To: Mac
    > >Cc: samba@samba.org
    > >Subject: Re: [Samba] secrets.tdb locking fun!
    > >>
    > >> [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_log(783)
    > >> tdb(/usr/local/samba/private/secrets.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
    > >> [2007/06/15 13:02:20, 0, pid=5430] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal(82)
    > >> tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /usr/local/samba/private/secrets.tdb
    > >>

    > >If you get an smbd in that state, can you attach to it with gdb and
    > >get a stack backtrace so we can see where it was ?

    >
    >
    > We'll certainly have a go for you. I assume you mean the 'smbd' that
    > has got wedged (non "TERM"-able). Or do you mean the still running ones
    > (i.e the ones that are getting the 'tdb' errors.)
    >
    >
    >
    >
    >
    > (and yes, the very astute of you will have noticed that the code
    > fragment I mentioned came from 3.0.25, but the log file entries are
    > 3.0.24....... We'll move to 3.0.25 when 'b' comes out).


    I mean the one in the non-TERM'able state. That's the one that's
    blocking the others.

    Jeremy.
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

  6. Re: [Samba] secrets.tdb locking fun!

    Hi all,

    It seems to have happened again.

    Here's some sample log entries:-

    [2007/07/06 14:24:33, 1, pid=6128] smbd/service.c:make_connection_snum(950)
    212.219.219.243 (212.219.219.243) connect to service favorites initially as user tjowett (uid=1860, gid=270) (pid 6128)
    [2007/07/06 14:24:33, 0, pid=3074] tdb/tdbutil.c:tdb_log(783)
    tdb(/usr/local/samba/private/secrets.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
    [2007/07/06 14:24:33, 0, pid=3074] tdb/tdbutil.c:tdb_chainlock_with_timeout_internal(82)
    tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /usr/local/samba/private/secrets.tdb
    [2007/07/06 14:24:33, 1, pid=3074] libads/kerberos_verify.c:ads_verify_ticket(357)
    ads_verify_ticket: unable to protect replay cache with mutex.
    [2007/07/06 14:24:33, 1, pid=3074] smbd/sesssetup.c:reply_spnego_kerberos(202)
    Failed to verify incoming ticket!

    >I mean the one in the non-TERM'able state. That's the one that's
    >blocking the others.


    We've used a 'pkill -TERM' to try and track down the "wedged" process.
    Last time this occured two processes did not die after a SIGTERM. We
    suspected one of them of being the culprit.

    Not entirely sure that we've succeeded this time. The process that
    (appeared) not to have died with '-TERM' had a stack backtrace of :-

    (gdb) bt
    #0 0x00078760 in get_valid_user_struct ()
    #1 0x00078c0c in invalidate_vuid ()
    #2 0x00078f34 in invalidate_all_vuids ()
    #3 0x004a1868 in exit_server_common ()
    #4 0x004a1dc0 in exit_server_cleanly ()
    #5 0x00123c2c in async_processing ()
    #6 0x001244ec in receive_message_or_smb ()
    #7 0x00127d04 in smbd_process ()
    #8 0x004a306c in main ()
    (gdb) detach
    Detaching from program: /usr/local/samba/3.0.24/sbin/smbd process 17813
    (gdb) quit


    But by the time we'd exited gdb and had a further look around, the
    process (17813) appeared to have exited cleanly (with no need for a
    '-KILL'). This was not the observed behaviour last time.



    We're working on scripting the 'gdb bt' so that we can run it across all
    the smbd processes, when this next happens. We're also trying to work
    out if we can (sanely) use truss (or similar) to track what on earth is
    going on.



    Mac
    Assistant Systems Administrator @nibsc.ac.uk
    mac@nibsc.ac.uk
    Work: +44 1707 641565 Everything else: +44 7956 237670 (anytime)
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

+ Reply to Thread