[PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition - Kernel

This is a discussion on [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition - Kernel ; This patch fixes a race condition in lseek. While it is expected that unpredictable behaviour may result while repositioning the offset of a file descriptor concurrently with reading/writing to the same file descriptor, this should not happen when merely *reading* ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition

  1. [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition

    This patch fixes a race condition in lseek. While it is expected that
    unpredictable behaviour may result while repositioning the offset of a
    file descriptor concurrently with reading/writing to the same file
    descriptor, this should not happen when merely *reading* the file
    descriptor's offset.

    Unfortunately, the only portable way in Unix to read a file
    descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
    concurrently with read/write may mess up the position, as shown by the
    testcase below:

    #include
    #include
    #include
    #include
    #include
    #include
    #include


    void *loop(void *ptr)
    {
    fprintf(stderr, "Starting seek thread\n");
    while(1) {
    if(lseek(0, 0LL, SEEK_CUR) < 0LL)
    perror("seek");
    }
    }

    int main(int argc, char **argv) {
    long long l=0;
    int r;
    char buf[4096];

    pthread_t thread;
    pthread_create(&thread, 0, loop, 0);

    for(r=0; 1 ; r++) {
    int n = read(0, buf, 4096);
    if(n == 0)
    break;
    if(n < 4096) {
    fprintf(stderr, "Short read %d %s\n", n, strerror(errno));
    }
    l+= n;
    }
    fprintf(stderr, "Read %lld bytes\n", l);

    return 0;
    }

    Compile this and run it on a multi-processor machine as
    ./a.out
    where bigFile is a 1 Gigabyte file. It should print 1073741824.
    However, on a buggy kernel, it usually produces a bigger number. The
    problem only happens on a multiprocessor machine. This is because an
    lseek(fd, 0, SEEK_CUR) running concurrently with a read() or write()
    will reset the position back to what it used to be when the read()
    started.

    This behavior was observed "in the wild" when using udpcast which uses
    lseek to monitor progress of reading/writing the uncompressed data.

    The patch below fixes the issue by "special-casing" the lseek(fd, 0,
    SEEK_CUR) pattern.

    Apparently, an attempt was already made to fix the issue by the
    following code:

    if (offset != file->f_pos) {
    file->f_pos = offset;
    file->f_version = 0;
    }

    However, this doesn't work if file->f_pos was changed (by read() or
    write()) between the time offset was computed, and the time where it
    considers writing it back.

    Signed-off-by: Alain Knaff

    ---

    diff -pur kernel.orig/fs/read_write.c kernel/fs/read_write.c
    --- kernel.orig/fs/read_write.c 2008-10-11 14:12:07.000000000 +0200
    +++ kernel/fs/read_write.c 2008-11-06 19:55:59.000000000 +0100
    @@ -42,6 +42,8 @@ generic_file_llseek_unlocked(struct file
    offset += inode->i_size;
    break;
    case SEEK_CUR:
    + if(offset == 0)
    + return file->f_pos;
    offset += file->f_pos;
    }
    retval = -EINVAL;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition

    On Thu, 6 Nov 2008 21:21:12 +0100
    Alain Knaff wrote:

    > This patch fixes a race condition in lseek. While it is expected that
    > unpredictable behaviour may result while repositioning the offset of a
    > file descriptor concurrently with reading/writing to the same file
    > descriptor, this should not happen when merely *reading* the file
    > descriptor's offset.
    >
    > Unfortunately, the only portable way in Unix to read a file
    > descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
    > concurrently with read/write may mess up the position, as shown by the
    > testcase below:
    >
    > #include
    > #include
    > #include
    > #include
    > #include
    > #include
    > #include
    >
    >
    > void *loop(void *ptr)
    > {
    > fprintf(stderr, "Starting seek thread\n");
    > while(1) {
    > if(lseek(0, 0LL, SEEK_CUR) < 0LL)
    > perror("seek");
    > }
    > }
    >
    > int main(int argc, char **argv) {
    > long long l=0;
    > int r;
    > char buf[4096];
    >
    > pthread_t thread;
    > pthread_create(&thread, 0, loop, 0);
    >
    > for(r=0; 1 ; r++) {
    > int n = read(0, buf, 4096);
    > if(n == 0)
    > break;
    > if(n < 4096) {
    > fprintf(stderr, "Short read %d %s\n", n, strerror(errno));
    > }
    > l+= n;
    > }
    > fprintf(stderr, "Read %lld bytes\n", l);
    >
    > return 0;
    > }
    >
    > Compile this and run it on a multi-processor machine as
    > ./a.out >
    > where bigFile is a 1 Gigabyte file. It should print 1073741824.
    > However, on a buggy kernel, it usually produces a bigger number. The
    > problem only happens on a multiprocessor machine. This is because an
    > lseek(fd, 0, SEEK_CUR) running concurrently with a read() or write()
    > will reset the position back to what it used to be when the read()
    > started.
    >
    > This behavior was observed "in the wild" when using udpcast which uses
    > lseek to monitor progress of reading/writing the uncompressed data.
    >
    > The patch below fixes the issue by "special-casing" the lseek(fd, 0,
    > SEEK_CUR) pattern.
    >
    > Apparently, an attempt was already made to fix the issue by the
    > following code:
    >
    > if (offset != file->f_pos) {
    > file->f_pos = offset;
    > file->f_version = 0;
    > }
    >
    > However, this doesn't work if file->f_pos was changed (by read() or
    > write()) between the time offset was computed, and the time where it
    > considers writing it back.
    >
    > Signed-off-by: Alain Knaff
    >
    > ---
    >
    > diff -pur kernel.orig/fs/read_write.c kernel/fs/read_write.c
    > --- kernel.orig/fs/read_write.c 2008-10-11 14:12:07.000000000 +0200
    > +++ kernel/fs/read_write.c 2008-11-06 19:55:59.000000000 +0100
    > @@ -42,6 +42,8 @@ generic_file_llseek_unlocked(struct file
    > offset += inode->i_size;
    > break;
    > case SEEK_CUR:
    > + if(offset == 0)
    > + return file->f_pos;
    > offset += file->f_pos;
    > }
    > retval = -EINVAL;


    OK, I think that a concurrent lseek(fd, 0, SEEK_CUR) is a sufficiently
    sane operation that this is worth doing. As you point out, there is no
    other way of userspace doing what is effectively a read-only operation
    - userspace would be entitled to wonder "ytf did the kernel rewrite the
    file offset for that?".


    Do the below additions look OK?

    From: Andrew Morton

    - fix coding-style
    - fix default_llseek() as well
    - add comments

    Cc: Al Viro
    Cc: Alain Knaff
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    ---

    fs/read_write.c | 19 +++++++++++++++++--
    1 file changed, 17 insertions(+), 2 deletions(-)

    diff -puN fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix fs/read_write.c
    --- a/fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix
    +++ a/fs/read_write.c
    @@ -50,6 +50,14 @@ generic_file_llseek_unlocked(struct file
    offset += inode->i_size;
    break;
    case SEEK_CUR:
    + /*
    + * Here we special-case the lseek(fd, 0, SEEK_CUR)
    + * position-querying operation. Avoid rewriting the "same"
    + * f_pos value back to the file because a concurrent read(),
    + * write() or lseek() might have altered it
    + */
    + if (offset == 0)
    + return file->f_pos;
    offset += file->f_pos;
    break;
    }
    @@ -105,8 +113,14 @@ loff_t default_llseek(struct file *file,
    offset += i_size_read(file->f_path.dentry->d_inode);
    break;
    case SEEK_CUR:
    - if(offset == 0)
    - return file->f_pos;
    + /*
    + * See SEEK_CUR description in
    + * generic_file_llseek_unlocked()
    + */
    + if (offset == 0) {
    + retval = file->f_pos;
    + goto out;
    + }
    offset += file->f_pos;
    }
    retval = -EINVAL;
    @@ -117,6 +131,7 @@ loff_t default_llseek(struct file *file,
    }
    retval = offset;
    }
    +out:
    unlock_kernel();
    return retval;
    }
    _

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition

    Andrew Morton wrote:
    [...]
    > OK, I think that a concurrent lseek(fd, 0, SEEK_CUR) is a sufficiently
    > sane operation that this is worth doing. As you point out, there is no
    > other way of userspace doing what is effectively a read-only operation
    > - userspace would be entitled to wonder "ytf did the kernel rewrite the
    > file offset for that?".
    >
    >
    > Do the below additions look OK?


    Yes, these additions look ok to me. Thanks for your quick and easy reply.

    > From: Andrew Morton
    >
    > - fix coding-style
    > - fix default_llseek() as well


    Wow, that one was missing an unlock_kernel()... I'm wondering why this
    hadn't let to some major SNAFU...

    > - add comments


    Thanks,

    Alain
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread