[PATCH 0/3][RFC] ioctl dispatcher - Kernel

This is a discussion on [PATCH 0/3][RFC] ioctl dispatcher - Kernel ; While ioctls are officially ugly, they are the best choice for some use cases, not to mention compatibility issues. Currently ioctl writers face the following hurdles: - if the ioctl uses a data buffer, the ioctl handler must allocate kernel ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: [PATCH 0/3][RFC] ioctl dispatcher

  1. [PATCH 0/3][RFC] ioctl dispatcher

    While ioctls are officially ugly, they are the best choice for some use
    cases, not to mention compatibility issues. Currently ioctl writers face
    the following hurdles:

    - if the ioctl uses a data buffer, the ioctl handler must allocate
    kernel memory for this buffer
    - the memory may be allocated on the heap or on the stack, depending on the
    buffer size
    - handle any errors from the operation
    - copy the data from userspace, if necessary
    - handle any errors from the operation
    - actually perform the operation
    - copy the data back to userspace, if necessary
    - handle any errors from the operation
    - free the buffer, if allocated from the heap

    The first patch automates these operations, only requiring the caller to
    supply the ioctl number and a callback in a table.

    The second patch addresses another problem with ioctls: they are brittle.
    Once written, an ioctl cannot be extended, since the buffer sizes used for
    transferring data are encoded in the ioctl number. This is addressed by
    allowing the user-supplied size and the kernel-visible size of the data
    buffer to be different; the kernel will zero fill or truncate appropriately.
    With the new mechanism, it is easy to write forward- and backward- compatible
    ioctl handlers.

    The third patch demonstrates the effectiveness of the first patch; it
    converts some of kvm's ioctl handlers to the new mechanism, removing
    around 90 lines in the process.

    Comments welcome.

    Avi Kivity (3):
    ioctl: generic ioctl dispatcher
    ioctl: extensible ioctl dispatch
    KVM: Convert x86 vcpu ioctls to use dispatch_ioctl_extensible()

    arch/x86/kvm/x86.c | 241 ++++++++++++++++---------------------------------
    fs/ioctl.c | 139 ++++++++++++++++++++++++++++
    include/linux/ioctl.h | 37 ++++++++
    3 files changed, 253 insertions(+), 164 deletions(-)

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 3/3] KVM: Convert x86 vcpu ioctls to use dispatch_ioctl_extensible()

    Signed-off-by: Avi Kivity
    ---
    arch/x86/kvm/x86.c | 241 +++++++++++++++++-----------------------------------
    1 files changed, 77 insertions(+), 164 deletions(-)

    diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
    index 4cfdd1b..aa9d228 100644
    --- a/arch/x86/kvm/x86.c
    +++ b/arch/x86/kvm/x86.c
    @@ -1084,26 +1084,24 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
    *
    * @return number of msrs set successfully.
    */
    -static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
    +static int msr_io(const struct ioctl_arg *iarg,
    int (*do_msr)(struct kvm_vcpu *vcpu,
    - unsigned index, u64 *data),
    - int writeback)
    + unsigned index, u64 *data))
    {
    - struct kvm_msrs msrs;
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_msrs *msrs = iarg->argp;
    + struct kvm_msrs __user *user_msrs =
    + (void __user *)iarg->argl + _IOC_SIZE(iarg->cmd);
    struct kvm_msr_entry *entries;
    - int r, n;
    + int r;
    unsigned size;

    - r = -EFAULT;
    - if (copy_from_user(&msrs, user_msrs, sizeof msrs))
    - goto out;
    -
    r = -E2BIG;
    - if (msrs.nmsrs >= MAX_IO_MSRS)
    + if (msrs->nmsrs >= MAX_IO_MSRS)
    goto out;

    r = -ENOMEM;
    - size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
    + size = sizeof(struct kvm_msr_entry) * msrs->nmsrs;
    entries = vmalloc(size);
    if (!entries)
    goto out;
    @@ -1112,22 +1110,26 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
    if (copy_from_user(entries, user_msrs->entries, size))
    goto out_free;

    - r = n = __msr_io(vcpu, &msrs, entries, do_msr);
    + r = __msr_io(vcpu, msrs, entries, do_msr);
    if (r < 0)
    goto out_free;

    - r = -EFAULT;
    - if (writeback && copy_to_user(user_msrs->entries, entries, size))
    - goto out_free;
    -
    - r = n;
    -
    out_free:
    vfree(entries);
    out:
    return r;
    }

    +static long kvm_vcpu_ioctl_get_msrs(const struct ioctl_arg *iarg)
    +{
    + return msr_io(iarg, kvm_get_msr);
    +}
    +
    +static long kvm_vcpu_ioctl_set_msrs(const struct ioctl_arg *iarg)
    +{
    + return msr_io(iarg, do_set_msr);
    +}
    +
    int kvm_dev_ioctl_check_extension(long ext)
    {
    int r;
    @@ -1271,10 +1273,12 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
    }

    /* when an old userspace process fills a new kernel module */
    -static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
    - struct kvm_cpuid *cpuid,
    - struct kvm_cpuid_entry __user *entries)
    +static long kvm_vcpu_ioctl_set_cpuid(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_cpuid *cpuid = iarg->argp;
    + struct kvm_cpuid_entry __user *entries
    + = (void __user *)iarg->argl + _IOC_SIZE(iarg->cmd);
    int r, i;
    struct kvm_cpuid_entry *cpuid_entries;

    @@ -1311,10 +1315,12 @@ out:
    return r;
    }

    -static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
    - struct kvm_cpuid2 *cpuid,
    - struct kvm_cpuid_entry2 __user *entries)
    +static long kvm_vcpu_ioctl_set_cpuid2(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_cpuid2 *cpuid = iarg->argp;
    + struct kvm_cpuid_entry2 __user *entries
    + = (void __user *)iarg->argl + _IOC_SIZE(iarg->cmd);
    int r;

    r = -E2BIG;
    @@ -1331,10 +1337,12 @@ out:
    return r;
    }

    -static int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
    - struct kvm_cpuid2 *cpuid,
    - struct kvm_cpuid_entry2 __user *entries)
    +static long kvm_vcpu_ioctl_get_cpuid2(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_cpuid2 *cpuid = iarg->argp;
    + struct kvm_cpuid_entry2 __user *entries
    + = (void __user *)iarg->argl + _IOC_SIZE(iarg->cmd);
    int r;

    r = -E2BIG;
    @@ -1513,19 +1521,22 @@ out:
    return r;
    }

    -static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
    - struct kvm_lapic_state *s)
    +static long kvm_vcpu_ioctl_get_lapic(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_lapic_state *s = iarg->argp;
    +
    vcpu_load(vcpu);
    memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
    vcpu_put(vcpu);
    -
    return 0;
    }

    -static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
    - struct kvm_lapic_state *s)
    +static long kvm_vcpu_ioctl_set_lapic(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_lapic_state *s = iarg->argp;
    +
    vcpu_load(vcpu);
    memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
    kvm_apic_post_state_restore(vcpu);
    @@ -1534,9 +1545,11 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
    return 0;
    }

    -static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
    - struct kvm_interrupt *irq)
    +static long kvm_vcpu_ioctl_interrupt(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_interrupt *irq = iarg->argp;
    +
    if (irq->irq < 0 || irq->irq >= 256)
    return -EINVAL;
    if (irqchip_in_kernel(vcpu->kvm))
    @@ -1551,148 +1564,48 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
    return 0;
    }

    -static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
    - struct kvm_tpr_access_ctl *tac)
    +static long vcpu_ioctl_tpr_access_reporting(const struct ioctl_arg *iarg)
    {
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_tpr_access_ctl *tac = iarg->argp;
    +
    if (tac->flags)
    return -EINVAL;
    vcpu->arch.tpr_access_reporting = !!tac->enabled;
    return 0;
    }

    -long kvm_arch_vcpu_ioctl(struct file *filp,
    - unsigned int ioctl, unsigned long arg)
    +static long vcpu_ioctl_set_vapic_addr(const struct ioctl_arg *iarg)
    {
    - struct kvm_vcpu *vcpu = filp->private_data;
    - void __user *argp = (void __user *)arg;
    - int r;
    - struct kvm_lapic_state *lapic = NULL;
    -
    - switch (ioctl) {
    - case KVM_GET_LAPIC: {
    - lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
    -
    - r = -ENOMEM;
    - if (!lapic)
    - goto out;
    - r = kvm_vcpu_ioctl_get_lapic(vcpu, lapic);
    - if (r)
    - goto out;
    - r = -EFAULT;
    - if (copy_to_user(argp, lapic, sizeof(struct kvm_lapic_state)))
    - goto out;
    - r = 0;
    - break;
    - }
    - case KVM_SET_LAPIC: {
    - lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
    - r = -ENOMEM;
    - if (!lapic)
    - goto out;
    - r = -EFAULT;
    - if (copy_from_user(lapic, argp, sizeof(struct kvm_lapic_state)))
    - goto out;
    - r = kvm_vcpu_ioctl_set_lapic(vcpu, lapic);
    - if (r)
    - goto out;
    - r = 0;
    - break;
    - }
    - case KVM_INTERRUPT: {
    - struct kvm_interrupt irq;
    -
    - r = -EFAULT;
    - if (copy_from_user(&irq, argp, sizeof irq))
    - goto out;
    - r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
    - if (r)
    - goto out;
    - r = 0;
    - break;
    - }
    - case KVM_SET_CPUID: {
    - struct kvm_cpuid __user *cpuid_arg = argp;
    - struct kvm_cpuid cpuid;
    -
    - r = -EFAULT;
    - if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
    - goto out;
    - r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries);
    - if (r)
    - goto out;
    - break;
    - }
    - case KVM_SET_CPUID2: {
    - struct kvm_cpuid2 __user *cpuid_arg = argp;
    - struct kvm_cpuid2 cpuid;
    + struct kvm_vcpu *vcpu = iarg->file->private_data;
    + struct kvm_vapic_addr *va = iarg->argp;

    - r = -EFAULT;
    - if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
    - goto out;
    - r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid,
    - cpuid_arg->entries);
    - if (r)
    - goto out;
    - break;
    - }
    - case KVM_GET_CPUID2: {
    - struct kvm_cpuid2 __user *cpuid_arg = argp;
    - struct kvm_cpuid2 cpuid;
    + if (!irqchip_in_kernel(vcpu->kvm))
    + return -EINVAL;

    - r = -EFAULT;
    - if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
    - goto out;
    - r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
    - cpuid_arg->entries);
    - if (r)
    - goto out;
    - r = -EFAULT;
    - if (copy_to_user(cpuid_arg, &cpuid, sizeof cpuid))
    - goto out;
    - r = 0;
    - break;
    - }
    - case KVM_GET_MSRS:
    - r = msr_io(vcpu, argp, kvm_get_msr, 1);
    - break;
    - case KVM_SET_MSRS:
    - r = msr_io(vcpu, argp, do_set_msr, 0);
    - break;
    - case KVM_TPR_ACCESS_REPORTING: {
    - struct kvm_tpr_access_ctl tac;
    + kvm_lapic_set_vapic_addr(vcpu, va->vapic_addr);
    + return 0;
    +}

    - r = -EFAULT;
    - if (copy_from_user(&tac, argp, sizeof tac))
    - goto out;
    - r = vcpu_ioctl_tpr_access_reporting(vcpu, &tac);
    - if (r)
    - goto out;
    - r = -EFAULT;
    - if (copy_to_user(argp, &tac, sizeof tac))
    - goto out;
    - r = 0;
    - break;
    - };
    - case KVM_SET_VAPIC_ADDR: {
    - struct kvm_vapic_addr va;
    +static const struct ioctl_handler vcpu_ioctl_handlers[] = {
    + { KVM_GET_LAPIC, kvm_vcpu_ioctl_get_lapic },
    + { KVM_SET_LAPIC, kvm_vcpu_ioctl_set_lapic },
    + { KVM_INTERRUPT, kvm_vcpu_ioctl_interrupt },
    + { KVM_SET_CPUID, kvm_vcpu_ioctl_set_cpuid },
    + { KVM_SET_CPUID2, kvm_vcpu_ioctl_set_cpuid2 },
    + { KVM_GET_CPUID2, kvm_vcpu_ioctl_get_cpuid2 },
    + { KVM_GET_MSRS, kvm_vcpu_ioctl_get_msrs },
    + { KVM_SET_MSRS, kvm_vcpu_ioctl_set_msrs },
    + { KVM_TPR_ACCESS_REPORTING, vcpu_ioctl_tpr_access_reporting },
    + { KVM_SET_VAPIC_ADDR, vcpu_ioctl_set_vapic_addr },
    + { }
    +};

    - r = -EINVAL;
    - if (!irqchip_in_kernel(vcpu->kvm))
    - goto out;
    - r = -EFAULT;
    - if (copy_from_user(&va, argp, sizeof va))
    - goto out;
    - r = 0;
    - kvm_lapic_set_vapic_addr(vcpu, va.vapic_addr);
    - break;
    - }
    - default:
    - r = -EINVAL;
    - }
    -out:
    - if (lapic)
    - kfree(lapic);
    - return r;
    +long kvm_arch_vcpu_ioctl(struct file *filp,
    + unsigned int ioctl, unsigned long arg)
    +{
    + return dispatch_ioctl_extensible(NULL, filp, ioctl, arg,
    + vcpu_ioctl_handlers, NULL);
    }

    static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
    --
    1.6.0.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 2/3] ioctl: extensible ioctl dispatch

    ioctls (as traditionally implemented) are very brittle; one cannot add
    a field to the argument structure without breaking the ABI. In many
    cases, this is a good thing as the ABI was not designed for extension.

    In other cases, however, it makes sense to design an ABI with extension
    in mind. This patch provides a new ioctl_dispatch_extensible(), which
    is designed to cope with evolving interfaces:

    - the ioctl number is matched only by its number and type, allowing
    size and direction to change
    - buffer space is allocated for the size the kernel expects, even if
    the user passed a smaller buffer
    - copying is limited by the size the kernel expects, even if the user
    passed a larger buffer
    - buffer size mismatches are handled by zeroing

    The ABI can account for new fields either by recognizing zero as a "missing"
    value, or by implementing bitflags for feature availability. Both an old
    kernel with new userspace and a new kernel with old userspace are accomodated.

    Signed-off-by: Avi Kivity
    ---
    fs/ioctl.c | 87 +++++++++++++++++++++++++++++++++++++++++-------
    include/linux/ioctl.h | 4 ++
    2 files changed, 78 insertions(+), 13 deletions(-)

    diff --git a/fs/ioctl.c b/fs/ioctl.c
    index f63d5ce..5a354fc 100644
    --- a/fs/ioctl.c
    +++ b/fs/ioctl.c
    @@ -225,12 +225,13 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
    * Locates and calls ioctl handler in @handlers; if none exist, calls
    * @fallback; if that does not exist, return -ENOTTY.
    */
    -long dispatch_ioctl(struct inode *inode, struct file *filp,
    - unsigned cmd, unsigned long arg,
    - const struct ioctl_handler *handlers,
    - long (*fallback)(const struct ioctl_arg *arg))
    +static long __dispatch_ioctl(struct inode *inode, struct file *filp,
    + unsigned cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg),
    + unsigned cmd_mask)
    {
    - unsigned dir, size;
    + unsigned dir, size, usize, copysize;
    long ret;
    long stackbuf[IOCTL_STACK_BUF / sizeof(long)];
    struct ioctl_arg iarg;
    @@ -242,7 +243,7 @@ long dispatch_ioctl(struct inode *inode, struct file *filp,
    * used (and performance sensitive) items are in the front.
    */
    for (; handlers->handler; ++handlers)
    - if (handlers->cmd == cmd) {
    + if ((handlers->cmd & cmd_mask) == (cmd & cmd_mask)) {
    handler = handlers->handler;
    break;
    }
    @@ -255,10 +256,14 @@ long dispatch_ioctl(struct inode *inode, struct file *filp,
    iarg.inode = inode;
    iarg.argl = arg;

    - size = 0;
    + size = usize = 0;
    dir = _IOC_DIR(cmd);
    - if (dir != _IOC_NONE)
    - size = _IOC_SIZE(cmd);
    + if (dir != _IOC_NONE) {
    + size = _IOC_SIZE(handlers->cmd);
    + usize = _IOC_SIZE(cmd);
    + if (!handlers->handler)
    + size = usize;
    + }

    iarg.argp = stackbuf;
    if (size > sizeof(stackbuf)) {
    @@ -266,18 +271,28 @@ long dispatch_ioctl(struct inode *inode, struct file *filp,
    if (!iarg.argp)
    return -ENOMEM;
    }
    - if (dir & _IOC_WRITE)
    - if (copy_from_user(iarg.argp, (void __user *)arg, size))
    +
    + if (dir & _IOC_WRITE) {
    + copysize = min(size, usize);
    + if (copy_from_user(iarg.argp, (void __user *)arg, copysize))
    goto out_fault;
    + if (copysize < size)
    + memset(iarg.argp + copysize, 0, size - copysize);
    + }

    ret = handler(&iarg);

    if (ret < 0)
    goto out;

    - if (dir & _IOC_READ)
    - if (copy_to_user((void __user *)arg, iarg.argp, size))
    + if (dir & _IOC_READ) {
    + copysize = min(size, usize);
    + if (copy_to_user((void __user *)arg, iarg.argp, copysize))
    goto out_fault;
    + for (; copysize < usize; ++copysize)
    + if (put_user(0, (u8 __user *)iarg.argp + copysize))
    + goto out_fault;
    + }
    out:
    if (iarg.argp != stackbuf)
    kfree(iarg.argp);
    @@ -288,4 +303,50 @@ out_fault:
    ret = -EFAULT;
    goto out;
    }
    +
    +/**
    + * dispatch_ioctl - dispatch to an ioctl handler based on ioctl number
    + * @inode: inode to invoke ioctl method on
    + * @filp: open file to invoke ioctl method on
    + * @cmd: ioctl command to execute
    + * @arg: command-specific argument for ioctl
    + * @handlers: list of potential handlers for @cmd; null terminated;
    + * place frequently used cmds first
    + * @fallback: optional function to call if @cmd not found in @handlers
    + *
    + * Locates and calls ioctl handler in @handlers; if none exist, calls
    + * @fallback; if that does not exist, return -ENOTTY.
    + */
    +long dispatch_ioctl(struct inode *inode, struct file *filp,
    + unsigned cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg))
    +{
    + return __dispatch_ioctl(inode, filp, cmd, arg, handlers, fallback, ~0);
    +}
    EXPORT_SYMBOL_GPL(dispatch_ioctl);
    +
    +/**
    + * dispatch_ioctl_extensible - dispatch to an ioctl handler based on ioctl
    + * number; adjust for differing buffer sizes
    + * between user and kernel
    + * @inode: inode to invoke ioctl method on
    + * @filp: open file to invoke ioctl method on
    + * @cmd: ioctl command to execute
    + * @arg: command-specific argument for ioctl
    + * @handlers: list of potential handlers for @cmd; null terminated;
    + * place frequently used cmds first
    + * @fallback: optional function to call if @cmd not found in @handlers
    + *
    + * Locates and calls ioctl handler in @handlers; if none exist, calls
    + * @fallback; if that does not exist, return -ENOTTY.
    + */
    +long dispatch_ioctl_extensible(struct inode *inode, struct file *filp,
    + unsigned cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg))
    +{
    + return __dispatch_ioctl(inode, filp, cmd, arg, handlers, fallback,
    + ~(_IOC_SIZEMASK << _IOC_SIZESHIFT));
    +}
    +EXPORT_SYMBOL_GPL(dispatch_ioctl_extensible);
    diff --git a/include/linux/ioctl.h b/include/linux/ioctl.h
    index 881974a..5e73fbf 100644
    --- a/include/linux/ioctl.h
    +++ b/include/linux/ioctl.h
    @@ -33,6 +33,10 @@ long dispatch_ioctl(struct inode *inode, struct file *filp,
    unsigned int cmd, unsigned long arg,
    const struct ioctl_handler *handlers,
    long (*fallback)(const struct ioctl_arg *arg));
    +long dispatch_ioctl_extensible(struct inode *inode, struct file *filp,
    + unsigned int cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg));

    #endif

    --
    1.6.0.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 1/3] ioctl: generic ioctl dispatcher

    ioctl handler writers currently have the following tasks:

    - allocate a buffer for the data, either in memory or on the heap, depending
    on the buffer size
    - handle errors
    - copy the data from userspace
    - handle errors
    - actually perform the intended operation
    - copy data back to userspace
    - handle errors
    - free the buffer, in case it was allocated on the heap

    This patch provides a dispatch_ioctl() helper, which will do all of these
    things for the caller, except for performing the intended operation. The
    caller need only provide a list of ioctl commands and handler functions.

    Signed-off-by: Avi Kivity
    ---
    fs/ioctl.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++
    include/linux/ioctl.h | 33 ++++++++++++++++++++
    2 files changed, 111 insertions(+), 0 deletions(-)

    diff --git a/fs/ioctl.c b/fs/ioctl.c
    index 7db32b3..f63d5ce 100644
    --- a/fs/ioctl.c
    +++ b/fs/ioctl.c
    @@ -211,3 +211,81 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
    out:
    return error;
    }
    +
    +/**
    + * dispatch_ioctl - dispatch to an ioctl handler based on ioctl number
    + * @inode: inode to invoke ioctl method on
    + * @filp: open file to invoke ioctl method on
    + * @cmd: ioctl command to execute
    + * @arg: command-specific argument for ioctl
    + * @handlers: list of potential handlers for @cmd; null terminated;
    + * place frequently used cmds first
    + * @fallback: optional function to call if @cmd not found in @handlers
    + *
    + * Locates and calls ioctl handler in @handlers; if none exist, calls
    + * @fallback; if that does not exist, return -ENOTTY.
    + */
    +long dispatch_ioctl(struct inode *inode, struct file *filp,
    + unsigned cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg))
    +{
    + unsigned dir, size;
    + long ret;
    + long stackbuf[IOCTL_STACK_BUF / sizeof(long)];
    + struct ioctl_arg iarg;
    + long (*handler)(const struct ioctl_arg *arg) = fallback;
    +
    + /*
    + * Yes, we use a linear search. That's actually the fastest
    + * algorithm around if the list is small, or if the frequently
    + * used (and performance sensitive) items are in the front.
    + */
    + for (; handlers->handler; ++handlers)
    + if (handlers->cmd == cmd) {
    + handler = handlers->handler;
    + break;
    + }
    +
    + if (!handler)
    + return -ENOTTY;
    +
    + iarg.cmd = cmd;
    + iarg.file = filp;
    + iarg.inode = inode;
    + iarg.argl = arg;
    +
    + size = 0;
    + dir = _IOC_DIR(cmd);
    + if (dir != _IOC_NONE)
    + size = _IOC_SIZE(cmd);
    +
    + iarg.argp = stackbuf;
    + if (size > sizeof(stackbuf)) {
    + iarg.argp = kmalloc(size, GFP_KERNEL);
    + if (!iarg.argp)
    + return -ENOMEM;
    + }
    + if (dir & _IOC_WRITE)
    + if (copy_from_user(iarg.argp, (void __user *)arg, size))
    + goto out_fault;
    +
    + ret = handler(&iarg);
    +
    + if (ret < 0)
    + goto out;
    +
    + if (dir & _IOC_READ)
    + if (copy_to_user((void __user *)arg, iarg.argp, size))
    + goto out_fault;
    +out:
    + if (iarg.argp != stackbuf)
    + kfree(iarg.argp);
    +
    + return ret;
    +
    +out_fault:
    + ret = -EFAULT;
    + goto out;
    +}
    +EXPORT_SYMBOL_GPL(dispatch_ioctl);
    diff --git a/include/linux/ioctl.h b/include/linux/ioctl.h
    index aa91eb3..881974a 100644
    --- a/include/linux/ioctl.h
    +++ b/include/linux/ioctl.h
    @@ -3,5 +3,38 @@

    #include

    +#ifdef __KERNEL__
    +
    +struct inode;
    +struct file;
    +
    +/*
    + * for dispatch_ioctl(), how many bytes we allow to be allocated on stack.
    + * Arch may override.
    + */
    +#ifndef IOCTL_STACK_BUF
    +#define IOCTL_STACK_BUF 128
    +#endif
    +
    +struct ioctl_arg {
    + unsigned cmd;
    + struct file *file;
    + struct inode *inode;
    + long argl;
    + void *argp;
    +};
    +
    +struct ioctl_handler {
    + unsigned cmd;
    + long (*handler)(const struct ioctl_arg *arg);
    +};
    +
    +long dispatch_ioctl(struct inode *inode, struct file *filp,
    + unsigned int cmd, unsigned long arg,
    + const struct ioctl_handler *handlers,
    + long (*fallback)(const struct ioctl_arg *arg));
    +
    +#endif
    +
    #endif /* _LINUX_IOCTL_H */

    --
    1.6.0.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 0/3][RFC] ioctl dispatcher

    On Sat, 27 Sep 2008 18:43:59 +0300
    Avi Kivity wrote:

    > While ioctls are officially ugly, they are the best choice for some
    > use cases, not to mention compatibility issues. Currently ioctl
    > writers face the following hurdles:
    >
    > - if the ioctl uses a data buffer, the ioctl handler must allocate
    > kernel memory for this buffer
    > - the memory may be allocated on the heap or on the stack,
    > depending on the buffer size
    > - handle any errors from the operation
    > - copy the data from userspace, if necessary
    > - handle any errors from the operation
    > - actually perform the operation
    > - copy the data back to userspace, if necessary
    > - handle any errors from the operation
    > - free the buffer, if allocated from the heap
    >
    > The first patch automates these operations, only requiring the caller
    > to supply the ioctl number and a callback in a table.
    >
    > The second patch addresses another problem with ioctls: they are
    > brittle. Once written, an ioctl cannot be extended, since the buffer
    > sizes used for transferring data are encoded in the ioctl number.
    > This is addressed by allowing the user-supplied size and the
    > kernel-visible size of the data buffer to be different; the kernel
    > will zero fill or truncate appropriately. With the new mechanism, it
    > is easy to write forward- and backward- compatible ioctl handlers.
    >
    > The third patch demonstrates the effectiveness of the first patch; it
    > converts some of kvm's ioctl handlers to the new mechanism, removing
    > around 90 lines in the process.
    >


    this doesn't seem to be much different from the way the DRM code deals
    with ioctls. Or the V4L code.
    Personally I hate that code though.....

    There is a fine balance here; between driver writers screwing something
    up they shouldn't be doing in the first place and us being able to
    clearly see what the code is doing; your patch kinda hides some key
    elements of the ioctl path... I'm afraid it gives a false sense of
    security though. Not having to deal with one aspect of security just to
    have to deal with the rest....
    Lets put it this way: if the driver author has to type "copy_from_user"
    there's a chance that he'll remember that the data comes from the user
    and isn't to be trusted on face value.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 0/3][RFC] ioctl dispatcher

    Arjan van de Ven wrote:
    >
    >> While ioctls are officially ugly, they are the best choice for some
    >> use cases, not to mention compatibility issues. Currently ioctl
    >> writers face the following hurdles:
    >>
    >> - if the ioctl uses a data buffer, the ioctl handler must allocate
    >> kernel memory for this buffer
    >> - the memory may be allocated on the heap or on the stack,
    >> depending on the buffer size
    >> - handle any errors from the operation
    >> - copy the data from userspace, if necessary
    >> - handle any errors from the operation
    >> - actually perform the operation
    >> - copy the data back to userspace, if necessary
    >> - handle any errors from the operation
    >> - free the buffer, if allocated from the heap
    >>
    >> The first patch automates these operations, only requiring the caller
    >> to supply the ioctl number and a callback in a table.
    >>
    >>

    > this doesn't seem to be much different from the way the DRM code deals
    > with ioctls. Or the V4L code.
    > Personally I hate that code though.....
    >
    > There is a fine balance here; between driver writers screwing something
    > up they shouldn't be doing in the first place and us being able to
    > clearly see what the code is doing; your patch kinda hides some key
    > elements of the ioctl path...


    Which key elements?

    It hides the big switch (ioctl), memory allocation, and error handling,
    and exposes the actual ioctl-specific code, which I thought was the key
    element.

    Why are we interested in boilerplate?

    > I'm afraid it gives a false sense of
    > security though. Not having to deal with one aspect of security just to
    > have to deal with the rest....
    >


    It reduces the number of potential mistakes a driver author can make.

    > Lets put it this way: if the driver author has to type "copy_from_user"
    > there's a chance that he'll remember that the data comes from the user
    > and isn't to be trusted on face value.
    >


    I'll rename the argp variable to argp_user_supplied.

    I can't believe you think writing the copy code from scratch (or worse,
    copy/paste) each time helps security.

    --
    I have a truly marvellous patch that fixes the bug which this
    signature is too narrow to contain.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 1/3] ioctl: generic ioctl dispatcher

    Andi Kleen wrote:
    > Avi Kivity writes:
    >
    >
    >> +long dispatch_ioctl(struct inode *inode, struct file *filp,
    >> + unsigned cmd, unsigned long arg,
    >> + const struct ioctl_handler *handlers,
    >> + long (*fallback)(const struct ioctl_arg *arg))
    >>

    >
    > The basic idea is good, but i don't like the proliferation of callbacks,
    > which tends to make complicated code and is ugly for simple code
    > (which a lot of ioctls are)
    >
    >


    If the simple calls mostly don't use the argument as a pointer, they are
    better off using a plain switch. For my own code, I usually leave the
    boilerplate within the switch and the app-specific code in a separate
    function anyway, so there's no big change in style.

    The main motivation here was the extensibility (patch 2), which becomes
    much more difficult with a switch.

    > How about you make it return an number that can index a switch() instead?
    > Then everything could be still kept in the same function.
    >
    >


    We need to execute code both before and after the handler, so it would
    look pretty ugly:

    long my_ioctl_handler(...)
    {
    struct ioctl_arg iarg;
    ...
    long ret;

    ret = dispatch_ioctl_begin(&iarg, ...);
    if (ret < 0)
    return ret;
    switch (ret) {
    case _IOC_KEY(MY_IOCTL):
    // your stuff goes here
    break;
    ...
    }
    dispatch_ioctl_end(&iarg, ret);
    return ret;
    }

    The only clean way to do this without callbacks is with
    constructors/destructors, but we don't have those in C.

    --
    I have a truly marvellous patch that fixes the bug which this
    signature is too narrow to contain.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread