[Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba? - Samba

This is a discussion on [Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba? - Samba ; Hi folks, Today I noticed some strange behaviour when accessing a samba server (samba 3.0.25a) from windows: On our Debian fileserver I prepared a file testfile.txt being owned by user usera and group dpt-a. Then I "setfacl -m g:admins:rwx testfile.txt". ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: [Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?

  1. [Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?

    Hi folks,

    Today I noticed some strange behaviour when accessing a samba server
    (samba 3.0.25a) from windows: On our Debian fileserver I prepared a
    file testfile.txt being owned by user usera and group dpt-a. Then I
    "setfacl -m g:admins:rwx testfile.txt". User userb who is only in
    group admins, but not in dpt-a is thus permitted to access and change
    this file by its POSIX-ACL, which works flawlessly from linux.

    $ getfacl testfile.txt
    # file: testfile.txt
    # owner: usera
    # group: dpt-a
    user::rwx
    group::r--
    group:admins:rwx
    mask::rwx
    other::r--


    Then I did some changes to that file from a windows machine via
    notepad.exe and noticed, that notepad seemed to "succeed" in saving,
    but the changes were *not* written to that file! Very strange IMHO.


    So I did some more digging with strace, since I didn't find a clue in
    the logs.

    "strace -e open,close,write -f smbd -D" yielded:
    [pid 17704] open("foo/testfile.txt", O_RDWR|O_CREAT|O_NOFOLLOW, 0744) = 29
    [some write()s to FD 24]
    [pid 17704] open("foo/testfile.txt", O_WRONLY|O_NOFOLLOW) = -1 EAGAIN (Resource temporarily unavailable)
    [pid 17704] --- SIGIO (I/O possible) @ 0 (0) ---
    [pid 17704] +++ killed by SIGIO +++
    [pid 17478] --- SIGCHLD (Child exited) @ 0 (0) ---

    So this seemed to "explain" notepad thinking the file was saved
    successfully when I assume the SMB-protocol to not do "hard checks"
    for successful writes. Since the child serving my windows-access was
    killed, no error-message was probably be sent out.

    When googling for SIGIO and samba, I noticed some google-hits talking
    about oplocks, so I just tried disabling kernel oplocks in smb.conf:
    "kernel oplocks = no". This did the trick, after restarting samba, the
    writes were successsful again.


    Since the manpage states I would want oplocks (and I do *g*), I
    enabled them again and tried debugging using gdb (to provide the
    samba-team with a more detailed report). As I don't really know gdb, I
    failed in the first attempt because of samba forking multiple
    processes which were not "caught" by my gdb call (but the error
    occurred). So as weekend was approaching, I did'nt dig further into
    gdb, but read the manpage for smbd and started "gdb /usr/sbin/smbd -F
    -i". When trying to reproduce the error, I failed. I could reproduce
    this change even without gdb: "smbd -F -i -d 5" started from the shell
    did the writes, whereas "normal" smbd (smbd -F) failed to write the
    changes.


    One wild guess: maybe oplocks can only be done by the file owner /
    group owner and the samba-process crashes because of such a thing? Is
    there a difference in privilege-handling between "smbd -F" and "smbd
    -F -i" that could explain this?

    I'd assume this to be a samba bug, because I could reproduce this both
    with a not-so-recent linux-2.6 i386 and with a more recent linux-2.6
    amd64.

    I can provide more debugging output etc. at the earliest on monday;
    sorry I forgot taking a log of a "full" strace-call as well as writing
    down the exact kernel versions which would of course have been very
    useful for you.


    Thanks for your replies and any help in solving this issue,
    Yours
    Matthias Merz

    --
    Beware of bugs in the above code; I have only proved it
    correct, not tried it. (Donald E. Knuth)
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

  2. Re: [Samba] Weird behaviour when using "kernel oplocks = yes" leading to "corrupt" files - bug in samba?

    On Fri, Jun 01, 2007 at 11:44:29PM +0200, Matthias Merz wrote:
    > Hi folks,
    >
    > Today I noticed some strange behaviour when accessing a samba server
    > (samba 3.0.25a) from windows: On our Debian fileserver I prepared a
    > file testfile.txt being owned by user usera and group dpt-a. Then I
    > "setfacl -m g:admins:rwx testfile.txt". User userb who is only in
    > group admins, but not in dpt-a is thus permitted to access and change
    > this file by its POSIX-ACL, which works flawlessly from linux.
    >
    > $ getfacl testfile.txt
    > # file: testfile.txt
    > # owner: usera
    > # group: dpt-a
    > user::rwx
    > group::r--
    > group:admins:rwx
    > mask::rwx
    > other::r--
    >
    >
    > Then I did some changes to that file from a windows machine via
    > notepad.exe and noticed, that notepad seemed to "succeed" in saving,
    > but the changes were *not* written to that file! Very strange IMHO.
    >
    >
    > So I did some more digging with strace, since I didn't find a clue in
    > the logs.
    >
    > "strace -e open,close,write -f smbd -D" yielded:
    > [pid 17704] open("foo/testfile.txt", O_RDWR|O_CREAT|O_NOFOLLOW, 0744) = 29
    > [some write()s to FD 24]
    > [pid 17704] open("foo/testfile.txt", O_WRONLY|O_NOFOLLOW) = -1 EAGAIN (Resource temporarily unavailable)
    > [pid 17704] --- SIGIO (I/O possible) @ 0 (0) ---
    > [pid 17704] +++ killed by SIGIO +++
    > [pid 17478] --- SIGCHLD (Child exited) @ 0 (0) ---


    This actually looks like an old kernel bug that
    has been fixed - sorry I can't remember the
    version.

    The kernel shouldn't be sending a SIGIO for an
    oplock break, it should be sending a POSIX RT
    signal #define RT_SIGNAL_LEASE (SIGRTMIN+1) in
    the Samba source.

    I recall this as a kernel bug that got fixed
    a few months or so ago.

    This isn't a Samba bug IMHO.

    Jeremy.
    --
    To unsubscribe from this list go to the following URL and read the
    instructions: https://lists.samba.org/mailman/listinfo/samba

+ Reply to Thread