[RFC] cgroups: implement device whitelist lsm (v2) - Kernel

This is a discussion on [RFC] cgroups: implement device whitelist lsm (v2) - Kernel ; Implement a cgroup using the LSM interface to enforce mknod and open on device files. This implements a simple device access whitelist. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 31

Thread: [RFC] cgroups: implement device whitelist lsm (v2)

  1. [RFC] cgroups: implement device whitelist lsm (v2)

    Implement a cgroup using the LSM interface to enforce mknod and open
    on device files.

    This implements a simple device access whitelist. A whitelist entry
    has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it
    applies to all types, all major numbers, and all minor numbers. Major and
    minor are obvious. Access is a composition of r (read), w (write), and
    m (mknod).

    The root devcgroup starts with rwm to 'all'. A child devcg gets a copy
    of the parent. Admins can then add and remove devices to the whitelist.
    Once CAP_HOST_ADMIN is introduced it will be needed to add entries as
    well or remove entries from another cgroup, though just CAP_SYS_ADMIN
    will suffice to remove entries for your own group.

    An entry is added by doing "echo " > devcg.allow,
    for instance:

    echo b 7 0 mrw > /cgroups/1/devcg.allow

    An entry is removed by doing likewise into devcg.deny. Since this is a
    pure whitelist, not acls, you can only remove entries which exist in the
    whitelist. You must explicitly

    echo a 0 0 mrw > /cgroups/1/devcg.deny

    to remove the "allow all" entry which is automatically inherited from
    the root cgroup.

    While composing this with the ns_cgroup may seem logical, it is not
    the right thing to do, because updates to /cg/cg1/devcg.deny are
    not reflected in /cg/cg1/cg2/devcg.allow.

    A task may only be moved to another devcgroup if it is moving to
    a direct descendent of its current devcgroup.

    CAP_NS_OVERRIDE is defined as the capability needed to cross namespaces.
    A task needs both CAP_NS_OVERRIDE and CAP_SYS_ADMIN to create a new
    devcgroup, update a devcgroup's access, or move a task to a new
    devcgroup.

    CONFIG_COMMONCAP is defined whenever security/commoncap.c should
    be compiled, so that the decision of whether to show the option
    for FILE_CAPABILITIES can be a bit cleaner.

    Pavel, do you have any experience with what sorts of rules a typical
    customer would get? I wasn't sure whether it was worth having some
    sort of hash or tree for searching through a cgroup's rules since if
    most containers have say 0-2 rules then the overhead will just be
    higher.

    Changelog:
    Mar 12 2008: allow dev_cgroup lsm to be used when
    SECURITY=n, and allow stacking with SELinux
    and Smack. Don't work too hard in Kconfig
    to prevent a warning when smack+devcg are
    both compiled in, worry about that later.

    Signed-off-by: Serge E. Hallyn
    ---
    include/linux/capability.h | 11 +-
    include/linux/cgroup_subsys.h | 6 +
    include/linux/devcg.h | 55 ++++++
    include/linux/security.h | 15 ++
    init/Kconfig | 7 +
    kernel/Makefile | 1 +
    kernel/dev_cgroup.c | 411 +++++++++++++++++++++++++++++++++++++++++
    security/Kconfig | 8 +-
    security/Makefile | 12 +-
    security/dev_cgroup.c | 138 ++++++++++++++
    security/smack/smack_lsm.c | 38 ++++
    11 files changed, 692 insertions(+), 10 deletions(-)
    create mode 100644 include/linux/devcg.h
    create mode 100644 kernel/dev_cgroup.c
    create mode 100644 security/dev_cgroup.c

    diff --git a/include/linux/capability.h b/include/linux/capability.h
    index eaab759..f8ecba1 100644
    --- a/include/linux/capability.h
    +++ b/include/linux/capability.h
    @@ -333,7 +333,16 @@ typedef struct kernel_cap_struct {

    #define CAP_MAC_ADMIN 33

    -#define CAP_LAST_CAP CAP_MAC_ADMIN
    +/* Allow acting on resources in another namespace. In particular:
    + * 1. when combined with CAP_MKNOD and dev_cgroup is enabled,
    + * allow creation of devices not in the device whitelist.
    + * 2. whencombined with CAP_SYS_ADMIN and dev_cgroup is enabled,
    + * allow editing device cgroup whitelist
    + */
    +
    +#define CAP_NS_OVERRIDE 34
    +
    +#define CAP_LAST_CAP CAP_NS_OVERRIDE

    #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)

    diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
    index 1ddebfc..01e8034 100644
    --- a/include/linux/cgroup_subsys.h
    +++ b/include/linux/cgroup_subsys.h
    @@ -42,3 +42,9 @@ SUBSYS(mem_cgroup)
    #endif

    /* */
    +
    +#ifdef CONFIG_CGROUP_DEV
    +SUBSYS(devcg)
    +#endif
    +
    +/* */
    diff --git a/include/linux/devcg.h b/include/linux/devcg.h
    new file mode 100644
    index 0000000..7d9f367
    --- /dev/null
    +++ b/include/linux/devcg.h
    @@ -0,0 +1,55 @@
    +#include
    +#include
    +#include
    +#include
    +#include
    +
    +#include
    +
    +#define ACC_MKNOD 1
    +#define ACC_READ 2
    +#define ACC_WRITE 4
    +
    +#define DEV_BLOCK 1
    +#define DEV_CHAR 2
    +#define DEV_ALL 4 /* this represents all devices */
    +
    +/*
    + * whitelist locking rules:
    + * cgroup_lock() cannot be taken under cgroup->lock.
    + * cgroup->lock can be taken with or without cgroup_lock().
    + *
    + * modifications always require cgroup_lock
    + * modifications to a list which is visible require the
    + * cgroup->lock *and* cgroup_lock()
    + * walking the list requires cgroup->lock or cgroup_lock().
    + *
    + * reasoning: dev_whitelist_copy() needs to kmalloc, so needs
    + * a mutex, which the cgroup_lock() is. Since modifying
    + * a visible list requires both locks, either lock can be
    + * taken for walking the list. Since the wh->spinlock is taken
    + * for modifying a public-accessible list, the spinlock is
    + * sufficient for just walking the list.
    + */
    +
    +struct dev_whitelist_item {
    + u32 major, minor;
    + short type;
    + short access;
    + struct list_head list;
    +};
    +
    +struct dev_cgroup {
    + struct cgroup_subsys_state css;
    + struct list_head whitelist;
    + spinlock_t lock;
    +};
    +
    +static inline struct dev_cgroup *cgroup_to_devcg(
    + struct cgroup *cgroup)
    +{
    + return container_of(cgroup_subsys_state(cgroup, devcg_subsys_id),
    + struct dev_cgroup, css);
    +}
    +
    +extern struct cgroup_subsys devcg_subsys;
    diff --git a/include/linux/security.h b/include/linux/security.h
    index 2231526..a818d3f 100644
    --- a/include/linux/security.h
    +++ b/include/linux/security.h
    @@ -1735,6 +1735,13 @@ int security_secctx_to_secid(char *secdata, u32 seclen, u32 *secid);
    void security_release_secctx(char *secdata, u32 seclen);

    #else /* CONFIG_SECURITY */
    +#ifdef CONFIG_CGROUP_DEV
    +extern int devcgroup_inode_mknod(struct inode *dir, struct dentry *dentry,
    + int mode, dev_t dev);
    +extern int devcgroup_inode_permission(struct inode *inode, int mask,
    + struct nameidata *nd);
    +#endif
    +
    struct security_mnt_opts {
    };

    @@ -2011,7 +2018,11 @@ static inline int security_inode_mknod (struct inode *dir,
    struct dentry *dentry,
    int mode, dev_t dev)
    {
    +#ifdef CONFIG_CGROUP_DEV
    + return devcgroup_inode_mknod(dir, dentry, mode, dev);
    +#else
    return 0;
    +#endif
    }

    static inline int security_inode_rename (struct inode *old_dir,
    @@ -2036,7 +2047,11 @@ static inline int security_inode_follow_link (struct dentry *dentry,
    static inline int security_inode_permission (struct inode *inode, int mask,
    struct nameidata *nd)
    {
    +#ifdef CONFIG_CGROUP_DEV
    + return devcgroup_inode_permission(inode, mask, nd);
    +#else
    return 0;
    +#endif
    }

    static inline int security_inode_setattr (struct dentry *dentry,
    diff --git a/init/Kconfig b/init/Kconfig
    index 009f2d8..05343a2 100644
    --- a/init/Kconfig
    +++ b/init/Kconfig
    @@ -298,6 +298,13 @@ config CGROUP_NS
    for instance virtual servers and checkpoint/restart
    jobs.

    +config CGROUP_DEV
    + bool "Device controller for cgroups"
    + depends on CGROUPS && EXPERIMENTAL
    + help
    + Provides a cgroup implementing whitelists for devices which
    + a process in the cgroup can mknod or open.
    +
    config CPUSETS
    bool "Cpuset support"
    depends on SMP && CGROUPS
    diff --git a/kernel/Makefile b/kernel/Makefile
    index 9cc073e..74cd321 100644
    --- a/kernel/Makefile
    +++ b/kernel/Makefile
    @@ -42,6 +42,7 @@ obj-$(CONFIG_CGROUPS) += cgroup.o
    obj-$(CONFIG_CGROUP_DEBUG) += cgroup_debug.o
    obj-$(CONFIG_CPUSETS) += cpuset.o
    obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
    +obj-$(CONFIG_CGROUP_DEV) += dev_cgroup.o
    obj-$(CONFIG_UTS_NS) += utsname.o
    obj-$(CONFIG_USER_NS) += user_namespace.o
    obj-$(CONFIG_PID_NS) += pid_namespace.o
    diff --git a/kernel/dev_cgroup.c b/kernel/dev_cgroup.c
    new file mode 100644
    index 0000000..f088824
    --- /dev/null
    +++ b/kernel/dev_cgroup.c
    @@ -0,0 +1,411 @@
    +/*
    + * dev_cgroup.c - device cgroup subsystem
    + *
    + * Copyright 2007 IBM Corp
    + */
    +
    +#include
    +
    +static int devcg_can_attach(struct cgroup_subsys *ss,
    + struct cgroup *new_cgroup, struct task_struct *task)
    +{
    + struct cgroup *orig;
    +
    + if (!capable(CAP_SYS_ADMIN) || !capable(CAP_NS_OVERRIDE))
    + return -EPERM;
    +
    + if (current != task) {
    + if (!cgroup_is_descendant(new_cgroup))
    + return -EPERM;
    + }
    +
    + if (atomic_read(&new_cgroup->count) != 0)
    + return -EPERM;
    +
    + orig = task_cgroup(task, devcg_subsys_id);
    + if (orig && orig != new_cgroup->parent)
    + return -EPERM;
    +
    + return 0;
    +}
    +
    +/*
    + * called under cgroup_lock()
    + */
    +int dev_whitelist_copy(struct list_head *dest, struct list_head *orig)
    +{
    + struct dev_whitelist_item *wh, *tmp, *new;
    +
    + list_for_each_entry(wh, orig, list) {
    + new = kmalloc(sizeof(*wh), GFP_KERNEL);
    + if (!new)
    + goto free_and_exit;
    + new->major = wh->major;
    + new->minor = wh->minor;
    + new->type = wh->type;
    + new->access = wh->access;
    + list_add_tail(&new->list, dest);
    + }
    +
    + return 0;
    +
    +free_and_exit:
    + list_for_each_entry_safe(wh, tmp, dest, list) {
    + list_del(&wh->list);
    + kfree(wh);
    + }
    + return -ENOMEM;
    +}
    +
    +/* Stupid prototype - don't bother combining existing entries */
    +/*
    + * called under cgroup_lock()
    + * since the list is visible to other tasks, we need the spinlock also
    + */
    +int dev_whitelist_add(struct dev_cgroup *dev_cgroup,
    + struct dev_whitelist_item *wh)
    +{
    + struct dev_whitelist_item *whcopy;
    +
    + whcopy = kmalloc(sizeof(*whcopy), GFP_KERNEL);
    + if (!whcopy)
    + return -ENOMEM;
    +
    + memcpy(whcopy, wh, sizeof(*whcopy));
    + spin_lock(&dev_cgroup->lock);
    + list_add_tail(&whcopy->list, &dev_cgroup->whitelist);
    + spin_unlock(&dev_cgroup->lock);
    + return 0;
    +}
    +
    +/*
    + * called under cgroup_lock()
    + * since the list is visible to other tasks, we need the spinlock also
    + */
    +void dev_whitelist_rm(struct dev_cgroup *dev_cgroup,
    + struct dev_whitelist_item *wh)
    +{
    + struct dev_whitelist_item *walk, *tmp;
    +
    + spin_lock(&dev_cgroup->lock);
    + list_for_each_entry_safe(walk, tmp, &dev_cgroup->whitelist, list) {
    + if (walk->type == DEV_ALL)
    + goto remove;
    + if (walk->type != wh->type)
    + continue;
    + if (walk->major != ~0 && walk->major != wh->major)
    + continue;
    + if (walk->minor != ~0 && walk->minor != wh->minor)
    + continue;
    +
    +remove:
    + walk->access &= ~wh->access;
    + if (!walk->access) {
    + list_del(&walk->list);
    + kfree(walk);
    + }
    + }
    + spin_unlock(&dev_cgroup->lock);
    +}
    +
    +/*
    + * Rules: you can only create a cgroup if
    + * 1. you are capable(CAP_SYS_ADMIN|CAP_NS_OVERRIDE)
    + * 2. the target cgroup is a descendant of your own cgroup
    + *
    + * Note: called from kernel/cgroup.c with cgroup_lock() held.
    + */
    +static struct cgroup_subsys_state *devcg_create(struct cgroup_subsys *ss,
    + struct cgroup *cgroup)
    +{
    + struct dev_cgroup *dev_cgroup, *parent_dev_cgroup;
    + struct cgroup *parent_cgroup;
    + int ret;
    +
    + if (!capable(CAP_SYS_ADMIN) || !capable(CAP_NS_OVERRIDE))
    + return ERR_PTR(-EPERM);
    + if (!cgroup_is_descendant(cgroup))
    + return ERR_PTR(-EPERM);
    +
    + dev_cgroup = kzalloc(sizeof(*dev_cgroup), GFP_KERNEL);
    + if (!dev_cgroup)
    + return ERR_PTR(-ENOMEM);
    + INIT_LIST_HEAD(&dev_cgroup->whitelist);
    + parent_cgroup = cgroup->parent;
    +
    + if (parent_cgroup == NULL) {
    + struct dev_whitelist_item *wh;
    + wh = kmalloc(sizeof(*wh), GFP_KERNEL);
    + wh->minor = wh->major = ~0;
    + wh->type = DEV_ALL;
    + wh->access = ACC_MKNOD | ACC_READ | ACC_WRITE;
    + list_add(&wh->list, &dev_cgroup->whitelist);
    + } else {
    + parent_dev_cgroup = cgroup_to_devcg(parent_cgroup);
    + ret = dev_whitelist_copy(&dev_cgroup->whitelist,
    + &parent_dev_cgroup->whitelist);
    + if (ret) {
    + kfree(dev_cgroup);
    + return ERR_PTR(ret);
    + }
    + }
    +
    + spin_lock_init(&dev_cgroup->lock);
    + return &dev_cgroup->css;
    +}
    +
    +static void devcg_destroy(struct cgroup_subsys *ss,
    + struct cgroup *cgroup)
    +{
    + struct dev_cgroup *dev_cgroup;
    + struct dev_whitelist_item *wh, *tmp;
    +
    + dev_cgroup = cgroup_to_devcg(cgroup);
    + list_for_each_entry_safe(wh, tmp, &dev_cgroup->whitelist, list) {
    + list_del(&wh->list);
    + kfree(wh);
    + }
    + kfree(dev_cgroup);
    +}
    +
    +#define DEVCG_ALLOW 1
    +#define DEVCG_DENY 2
    +
    +void set_access(char *acc, short access)
    +{
    + int idx = 0;
    + memset(acc, 0, 4);
    + if (access & ACC_READ)
    + acc[idx++] = 'r';
    + if (access & ACC_WRITE)
    + acc[idx++] = 'w';
    + if (access & ACC_MKNOD)
    + acc[idx++] = 'm';
    +}
    +
    +char type_to_char(short type)
    +{
    + if (type == DEV_ALL)
    + return 'a';
    + if (type == DEV_CHAR)
    + return 'c';
    + if (type == DEV_BLOCK)
    + return 'b';
    + return 'X';
    +}
    +
    +static void set_majmin(char *str, int len, unsigned m)
    +{
    + memset(str, 0, len);
    + if (m == ~0)
    + sprintf(str, "*");
    + else
    + snprintf(str, len, "%d", m);
    +}
    +
    +char *print_whitelist(struct dev_cgroup *devcgroup, int *len)
    +{
    + char *buf, *s, acc[4];
    + struct dev_whitelist_item *wh;
    + int ret;
    + int count = 0;
    + char maj[10], min[10];
    +
    + buf = kmalloc(4096, GFP_KERNEL);
    + if (!buf)
    + return ERR_PTR(-ENOMEM);
    + s = buf;
    + *s = '\0';
    + *len = 0;
    +
    + spin_lock(&devcgroup->lock);
    + list_for_each_entry(wh, &devcgroup->whitelist, list) {
    + set_access(acc, wh->access);
    + set_majmin(maj, 10, wh->major);
    + set_majmin(min, 10, wh->minor);
    + ret = snprintf(s, 4095-(s-buf), "%c %s %s %s\n",
    + type_to_char(wh->type), maj, min, acc);
    + if (s+ret >= buf+4095) {
    + kfree(buf);
    + buf = ERR_PTR(-ENOMEM);
    + break;
    + }
    + s += ret;
    + *len += ret;
    + count++;
    + }
    + spin_unlock(&devcgroup->lock);
    +
    + return buf;
    +}
    +
    +static ssize_t devcg_access_read(struct cgroup *cgroup,
    + struct cftype *cft, struct file *file,
    + char __user *userbuf, size_t nbytes, loff_t *ppos)
    +{
    + struct dev_cgroup *devcgrp = cgroup_to_devcg(cgroup);
    + int filetype = cft->private;
    + char *buffer;
    + int len, retval;
    +
    + if (filetype != DEVCG_ALLOW)
    + return -EINVAL;
    + buffer = print_whitelist(devcgrp, &len);
    + if (IS_ERR(buffer))
    + return PTR_ERR(buffer);
    +
    + retval = simple_read_from_buffer(userbuf, nbytes, ppos, buffer, len);
    + kfree(buffer);
    + return retval;
    +}
    +
    +static inline short convert_access(char *acc)
    +{
    + short access = 0;
    +
    + while (*acc) {
    + switch (*acc) {
    + case 'r':
    + case 'R': access |= ACC_READ; break;
    + case 'w':
    + case 'W': access |= ACC_WRITE; break;
    + case 'm':
    + case 'M': access |= ACC_MKNOD; break;
    + case '\n': break;
    + default:
    + return -EINVAL;
    + }
    + acc++;
    + }
    +
    + return access;
    +}
    +
    +static inline short convert_type(char intype)
    +{
    + short type = 0;
    + switch (intype) {
    + case 'a': type = DEV_ALL; break;
    + case 'c': type = DEV_CHAR; break;
    + case 'b': type = DEV_BLOCK; break;
    + default: type = -EACCES; break;
    + }
    + return type;
    +}
    +
    +static int convert_majmin(char *m, unsigned *u)
    +{
    + if (m[0] == '*') {
    + *u = ~0;
    + return 0;
    + }
    + if (sscanf(m, "%u", u) != 1)
    + return -EINVAL;
    + return 0;
    +}
    +
    +static ssize_t devcg_access_write(struct cgroup *cgroup, struct cftype *cft,
    + struct file *file, const char __user *userbuf,
    + size_t nbytes, loff_t *ppos)
    +{
    + struct cgroup *cur_cgroup;
    + struct dev_cgroup *devcgrp, *cur_devcgroup;
    + int filetype = cft->private;
    + char *buffer, acc[4], maj[10], min[10];
    + int retval = 0;
    + int nitems;
    + char type;
    + struct dev_whitelist_item wh;
    +
    + if (!capable(CAP_SYS_ADMIN) || !capable(CAP_NS_OVERRIDE))
    + return -EPERM;
    +
    + devcgrp = cgroup_to_devcg(cgroup);
    + cur_cgroup = task_cgroup(current, devcg_subsys.subsys_id);
    + cur_devcgroup = cgroup_to_devcg(cur_cgroup);
    +
    + buffer = kmalloc(nbytes+1, GFP_KERNEL);
    + if (!buffer)
    + return -ENOMEM;
    +
    + if (copy_from_user(buffer, userbuf, nbytes)) {
    + retval = -EFAULT;
    + goto out1;
    + }
    + buffer[nbytes] = 0; /* nul-terminate */
    +
    + cgroup_lock();
    + if (cgroup_is_removed(cgroup)) {
    + retval = -ENODEV;
    + goto out2;
    + }
    +
    + memset(&wh, 0, sizeof(wh));
    + memset(acc, 0, 4);
    + nitems = sscanf(buffer, "%c %9s %9s %3s", &type, maj, min,
    + acc);
    + retval = -EINVAL;
    + if (nitems != 4)
    + goto out2;
    + wh.type = convert_type(type);
    + if (wh.type < 0)
    + goto out2;
    + wh.access = convert_access(acc);
    + if (convert_majmin(maj, &wh.major))
    + goto out2;
    + if (convert_majmin(min, &wh.minor))
    + goto out2;
    + if (wh.access < 0)
    + goto out2;
    + retval = 0;
    + switch (filetype) {
    + case DEVCG_ALLOW:
    + retval = dev_whitelist_add(devcgrp, &wh);
    + break;
    + case DEVCG_DENY:
    + dev_whitelist_rm(devcgrp, &wh);
    + break;
    + default:
    + retval = -EINVAL;
    + goto out2;
    + }
    +
    + if (retval == 0)
    + retval = nbytes;
    +
    +out2:
    + cgroup_unlock();
    +out1:
    + kfree(buffer);
    + return retval;
    +}
    +
    +static struct cftype dev_cgroup_files[] = {
    + {
    + .name = "allow",
    + .read = devcg_access_read,
    + .write = devcg_access_write,
    + .private = DEVCG_ALLOW,
    + },
    + {
    + .name = "deny",
    + .write = devcg_access_write,
    + .private = DEVCG_DENY,
    + },
    +};
    +
    +static int devcg_populate(struct cgroup_subsys *ss,
    + struct cgroup *cont)
    +{
    + return cgroup_add_files(cont, ss, dev_cgroup_files,
    + ARRAY_SIZE(dev_cgroup_files));
    +}
    +
    +struct cgroup_subsys devcg_subsys = {
    + .name = "devcg",
    + .can_attach = devcg_can_attach,
    + .create = devcg_create,
    + .destroy = devcg_destroy,
    + .populate = devcg_populate,
    + .subsys_id = devcg_subsys_id,
    +};
    diff --git a/security/Kconfig b/security/Kconfig
    index 5dfc206..8082edc 100644
    --- a/security/Kconfig
    +++ b/security/Kconfig
    @@ -75,15 +75,19 @@ config SECURITY_NETWORK_XFRM

    config SECURITY_CAPABILITIES
    bool "Default Linux Capabilities"
    - depends on SECURITY
    + depends on SECURITY && !CGROUP_DEV
    default y
    help
    This enables the "default" Linux capabilities functionality.
    If you are unsure how to answer this question, answer Y.

    +config COMMONCAP
    + bool
    + default !SECURITY || SECURITY_CAPABILITIES || SECURITY_ROOTPLUG || SECURITY_SMACK || CGROUP_DEV
    +
    config SECURITY_FILE_CAPABILITIES
    bool "File POSIX Capabilities (EXPERIMENTAL)"
    - depends on (SECURITY=n || SECURITY_CAPABILITIES!=n) && EXPERIMENTAL
    + depends on COMMONCAP && EXPERIMENTAL
    default n
    help
    This enables filesystem capabilities, allowing you to give
    diff --git a/security/Makefile b/security/Makefile
    index 9e8b025..6093003 100644
    --- a/security/Makefile
    +++ b/security/Makefile
    @@ -6,15 +6,13 @@ obj-$(CONFIG_KEYS) += keys/
    subdir-$(CONFIG_SECURITY_SELINUX) += selinux
    subdir-$(CONFIG_SECURITY_SMACK) += smack

    -# if we don't select a security model, use the default capabilities
    -ifneq ($(CONFIG_SECURITY),y)
    -obj-y += commoncap.o
    -endif
    +obj-$(CONFIG_COMMONCAP) += commoncap.o

    # Object file lists
    obj-$(CONFIG_SECURITY) += security.o dummy.o inode.o
    # Must precede capability.o in order to stack properly.
    obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
    -obj-$(CONFIG_SECURITY_SMACK) += commoncap.o smack/built-in.o
    -obj-$(CONFIG_SECURITY_CAPABILITIES) += commoncap.o capability.o
    -obj-$(CONFIG_SECURITY_ROOTPLUG) += commoncap.o root_plug.o
    +obj-$(CONFIG_SECURITY_SMACK) += smack/built-in.o
    +obj-$(CONFIG_SECURITY_CAPABILITIES) += capability.o
    +obj-$(CONFIG_SECURITY_ROOTPLUG) += root_plug.o
    +obj-$(CONFIG_CGROUP_DEV) += dev_cgroup.o
    diff --git a/security/dev_cgroup.c b/security/dev_cgroup.c
    new file mode 100644
    index 0000000..cef1527
    --- /dev/null
    +++ b/security/dev_cgroup.c
    @@ -0,0 +1,138 @@
    +/*
    + * LSM portion of the device cgroup subsystem.
    + *
    + * Copyright 2007 IBM Corp
    + */
    +
    +#include
    +
    +int devcgroup_inode_permission(struct inode *inode, int mask,
    + struct nameidata *nd)
    +{
    + struct cgroup *cgroup;
    + struct dev_cgroup *dev_cgroup;
    + struct dev_whitelist_item *wh;
    +
    + dev_t device = inode->i_rdev;
    + if (!device)
    + return 0;
    + if (!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode))
    + return 0;
    + cgroup = task_cgroup(current, devcg_subsys.subsys_id);
    + dev_cgroup = cgroup_to_devcg(cgroup);
    + if (!dev_cgroup)
    + return 0;
    +
    + spin_lock(&dev_cgroup->lock);
    + list_for_each_entry(wh, &dev_cgroup->whitelist, list) {
    + if (wh->type & DEV_ALL)
    + goto acc_check;
    + if ((wh->type & DEV_BLOCK) && !S_ISBLK(inode->i_mode))
    + continue;
    + if ((wh->type & DEV_CHAR) && !S_ISCHR(inode->i_mode))
    + continue;
    + if (wh->major != ~0 && wh->major != imajor(inode))
    + continue;
    + if (wh->minor != ~0 && wh->minor != iminor(inode))
    + continue;
    +acc_check:
    + if ((mask & MAY_WRITE) && !(wh->access & ACC_WRITE))
    + continue;
    + if ((mask & MAY_READ) && !(wh->access & ACC_READ))
    + continue;
    + spin_unlock(&dev_cgroup->lock);
    + return 0;
    + }
    + spin_unlock(&dev_cgroup->lock);
    +
    + return -EPERM;
    +}
    +
    +int devcgroup_inode_mknod(struct inode *dir, struct dentry *dentry,
    + int mode, dev_t dev)
    +{
    + struct cgroup *cgroup;
    + struct dev_cgroup *dev_cgroup;
    + struct dev_whitelist_item *wh;
    +
    + cgroup = task_cgroup(current, devcg_subsys.subsys_id);
    + dev_cgroup = cgroup_to_devcg(cgroup);
    + if (!dev_cgroup)
    + return 0;
    +
    + spin_lock(&dev_cgroup->lock);
    + list_for_each_entry(wh, &dev_cgroup->whitelist, list) {
    + if (wh->type & DEV_ALL)
    + goto acc_check;
    + if ((wh->type & DEV_BLOCK) && !S_ISBLK(mode))
    + continue;
    + if ((wh->type & DEV_CHAR) && !S_ISCHR(mode))
    + continue;
    + if (wh->major != ~0 && wh->major != MAJOR(dev))
    + continue;
    + if (wh->minor != ~0 && wh->minor != MINOR(dev))
    + continue;
    +acc_check:
    + if (!(wh->access & ACC_MKNOD))
    + continue;
    + spin_unlock(&dev_cgroup->lock);
    + return 0;
    + }
    + spin_unlock(&dev_cgroup->lock);
    + return -EPERM;
    +}
    +
    +#ifdef CONFIG_SECURITY
    +static struct security_operations devcgroup_security_ops = {
    + .inode_mknod = devcgroup_inode_mknod,
    + .inode_permission = devcgroup_inode_permission,
    +
    + .ptrace = cap_ptrace,
    + .capget = cap_capget,
    + .capset_check = cap_capset_check,
    + .capset_set = cap_capset_set,
    + .capable = cap_capable,
    + .settime = cap_settime,
    + .netlink_send = cap_netlink_send,
    + .netlink_recv = cap_netlink_recv,
    +
    + .bprm_apply_creds = cap_bprm_apply_creds,
    + .bprm_set_security = cap_bprm_set_security,
    + .bprm_secureexec = cap_bprm_secureexec,
    +
    + .inode_setxattr = cap_inode_setxattr,
    + .inode_removexattr = cap_inode_removexattr,
    + .inode_need_killpriv = cap_inode_need_killpriv,
    + .inode_killpriv = cap_inode_killpriv,
    +
    + .task_kill = cap_task_kill,
    + .task_setscheduler = cap_task_setscheduler,
    + .task_setioprio = cap_task_setioprio,
    + .task_setnice = cap_task_setnice,
    + .task_post_setuid = cap_task_post_setuid,
    + .task_prctl = cap_task_prctl,
    + .task_reparent_to_init = cap_task_reparent_to_init,
    +
    + .syslog = cap_syslog,
    +
    + .vm_enough_memory = cap_vm_enough_memory,
    +};
    +
    +#define MY_NAME "dev_cgroup"
    +static int __init dev_cgroup_security_init(void)
    +{
    + /* register ourselves with the security framework */
    + if (register_security(&devcgroup_security_ops)) {
    + if (mod_reg_security(MY_NAME, &devcgroup_security_ops)) {
    + printk(KERN_ERR "Failure registering dev_cgroup LSM\n");
    + return -EINVAL;
    + }
    + printk(KERN_INFO "dev_cgroup LSM initialized as secondary\n");
    + return 0;
    + }
    + printk(KERN_INFO "Device cgroup LSM initialized\n");
    + return 0;
    +}
    +
    +security_initcall(dev_cgroup_security_init);
    +#endif
    diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
    index 20ec35c..08bf081 100644
    --- a/security/smack/smack_lsm.c
    +++ b/security/smack/smack_lsm.c
    @@ -36,6 +36,18 @@
    #define SOCKFS_MAGIC 0x534F434B
    #define TMPFS_MAGIC 0x01021994

    +#ifdef CONFIG_CGROUP_DEV
    +/*
    + * A better approach is probably to switch to full secondary->
    + * support for smack, but for now just hook the devcgroup lsm
    + * hooks explicitly when compiled in
    + */
    +int devcgroup_inode_mknod(struct inode *dir, struct dentry *dentry,
    + int mode, dev_t dev);
    +int devcgroup_inode_permission(struct inode *inode, int mask,
    + struct nameidata *nd);
    +#endif
    +
    /**
    * smk_fetch - Fetch the smack label from a file.
    * @ip: a pointer to the inode
    @@ -523,6 +535,12 @@ static int smack_inode_rename(struct inode *old_inode,
    static int smack_inode_permission(struct inode *inode, int mask,
    struct nameidata *nd)
    {
    +#ifdef CONFIG_CGROUP_DEV
    + int err;
    + err = devcgroup_inode_permission(inode, mask, nd);
    + if (err)
    + return err;
    +#endif
    /*
    * No permission to check. Existence test. Yup, it's there.
    */
    @@ -1370,6 +1388,23 @@ static int smack_inode_setsecurity(struct inode *inode, const char *name,
    return 0;
    }

    +#ifdef CONFIG_CGROUP_DEV
    +/**
    + * smack_inode_mknod
    + * @dir contains the inode structure of parent of the new file.
    + * @dentry contains the dentry structure of the new file.
    + * @mode contains the mode of the new file.
    + * @dev contains the device number.
    + *
    + * Just calls the devcgroup hook to allow stacking.
    + */
    +static int smack_inode_mknod(struct inode *dir, struct dentry *dentry,
    + int mode, dev_t dev)
    +{
    + return devcgroup_inode_mknod(dir, dentry, mode, dev);
    +}
    +#endif
    +
    /**
    * smack_socket_post_create - finish socket setup
    * @sock: the socket
    @@ -2460,6 +2495,9 @@ struct security_operations smack_ops = {
    .inode_getsecurity = smack_inode_getsecurity,
    .inode_setsecurity = smack_inode_setsecurity,
    .inode_listsecurity = smack_inode_listsecurity,
    +#ifdef CONFIG_CGROUP_DEV
    + .inode_mknod = smack_inode_mknod,
    +#endif

    .file_permission = smack_file_permission,
    .file_alloc_security = smack_file_alloc_security,
    --
    1.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Wed, 12 Mar 2008, Serge E. Hallyn wrote:

    > +#ifdef CONFIG_SECURITY
    > +static struct security_operations devcgroup_security_ops = {
    > + .inode_mknod = devcgroup_inode_mknod,
    > + .inode_permission = devcgroup_inode_permission,
    > +
    > + .ptrace = cap_ptrace,
    > + .capget = cap_capget,
    > + .capset_check = cap_capset_check,
    > + .capset_set = cap_capset_set,
    > + .capable = cap_capable,
    > + .settime = cap_settime,
    > + .netlink_send = cap_netlink_send,
    > + .netlink_recv = cap_netlink_recv,
    > +
    > + .bprm_apply_creds = cap_bprm_apply_creds,
    > + .bprm_set_security = cap_bprm_set_security,
    > + .bprm_secureexec = cap_bprm_secureexec,
    > +
    > + .inode_setxattr = cap_inode_setxattr,
    > + .inode_removexattr = cap_inode_removexattr,
    > + .inode_need_killpriv = cap_inode_need_killpriv,
    > + .inode_killpriv = cap_inode_killpriv,
    > +
    > + .task_kill = cap_task_kill,
    > + .task_setscheduler = cap_task_setscheduler,
    > + .task_setioprio = cap_task_setioprio,
    > + .task_setnice = cap_task_setnice,
    > + .task_post_setuid = cap_task_post_setuid,
    > + .task_prctl = cap_task_prctl,
    > + .task_reparent_to_init = cap_task_reparent_to_init,
    > +
    > + .syslog = cap_syslog,
    > +
    > + .vm_enough_memory = cap_vm_enough_memory,
    > +};


    For lower overall complexity, why not just extend the capability LSM to
    include the devcgroup_ perms if CONFIG_CGROUP_DEV ?



    - James
    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting James Morris (jmorris@namei.org):
    > On Wed, 12 Mar 2008, Serge E. Hallyn wrote:
    >
    > > +#ifdef CONFIG_SECURITY
    > > +static struct security_operations devcgroup_security_ops = {
    > > + .inode_mknod = devcgroup_inode_mknod,
    > > + .inode_permission = devcgroup_inode_permission,
    > > +
    > > + .ptrace = cap_ptrace,
    > > + .capget = cap_capget,
    > > + .capset_check = cap_capset_check,
    > > + .capset_set = cap_capset_set,
    > > + .capable = cap_capable,
    > > + .settime = cap_settime,
    > > + .netlink_send = cap_netlink_send,
    > > + .netlink_recv = cap_netlink_recv,
    > > +
    > > + .bprm_apply_creds = cap_bprm_apply_creds,
    > > + .bprm_set_security = cap_bprm_set_security,
    > > + .bprm_secureexec = cap_bprm_secureexec,
    > > +
    > > + .inode_setxattr = cap_inode_setxattr,
    > > + .inode_removexattr = cap_inode_removexattr,
    > > + .inode_need_killpriv = cap_inode_need_killpriv,
    > > + .inode_killpriv = cap_inode_killpriv,
    > > +
    > > + .task_kill = cap_task_kill,
    > > + .task_setscheduler = cap_task_setscheduler,
    > > + .task_setioprio = cap_task_setioprio,
    > > + .task_setnice = cap_task_setnice,
    > > + .task_post_setuid = cap_task_post_setuid,
    > > + .task_prctl = cap_task_prctl,
    > > + .task_reparent_to_init = cap_task_reparent_to_init,
    > > +
    > > + .syslog = cap_syslog,
    > > +
    > > + .vm_enough_memory = cap_vm_enough_memory,
    > > +};

    >
    > For lower overall complexity, why not just extend the capability LSM to
    > include the devcgroup_ perms if CONFIG_CGROUP_DEV ?


    That does make for a simpler implementation at this point, however if
    any other such LSMs come along (as Casey seemed to think they would) the
    end result could end up being horrible spaghetti code of dependencies
    and interrelated configs.

    But OTOH we went years with no such changes, so that's probably not a
    particularly practical concern unless someone can cite specific upcoming
    examples. So if noone objects I'll try that approach.

    thanks,
    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Thu, 13 Mar 2008, Serge E. Hallyn wrote:

    > That does make for a simpler implementation at this point, however if
    > any other such LSMs come along (as Casey seemed to think they would) the
    > end result could end up being horrible spaghetti code of dependencies
    > and interrelated configs.


    That can be addressed as the need arises. A basic tenet of kernel
    development is to avoid speculative infrastructure.

    > But OTOH we went years with no such changes, so that's probably not a
    > particularly practical concern unless someone can cite specific upcoming
    > examples. So if noone objects I'll try that approach.


    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting James Morris (jmorris@namei.org):
    > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >
    > > That does make for a simpler implementation at this point, however if
    > > any other such LSMs come along (as Casey seemed to think they would) the
    > > end result could end up being horrible spaghetti code of dependencies
    > > and interrelated configs.

    >
    > That can be addressed as the need arises. A basic tenet of kernel
    > development is to avoid speculative infrastructure.


    True, but while this change simplifies the code a bit, the semantics
    seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    and:

    SECURITY=n or
    rootplug is enabled
    capabilities is enabled
    smack is enabled
    selinux+capabilities is enabled

    It will not be enforcing when
    dummy is loaded
    only selinux (and not capabilities) is loaded

    If that's ok with people then I'm fine with it. I suppose it should be
    explained in the CONFIG_CGROUP_DEV help section, which it isn't in this
    version I'm about to set. Patch hitting the wire in a minute.

    thanks,
    -serge

    > > But OTOH we went years with no such changes, so that's probably not a
    > > particularly practical concern unless someone can cite specific upcoming
    > > examples. So if noone objects I'll try that approach.

    >
    > --
    > James Morris
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Thu, 13 Mar 2008, Serge E. Hallyn wrote:

    > True, but while this change simplifies the code a bit, the semantics
    > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > and:
    >
    > SECURITY=n or
    > rootplug is enabled
    > capabilities is enabled
    > smack is enabled
    > selinux+capabilities is enabled


    Well, this is how real systems are going to be deployed.

    It becomes confusing, IMHO, if you have to change which secondary LSM you
    stack with SELinux to enable a cgroup feature.

    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting James Morris (jmorris@namei.org):
    > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >
    > > True, but while this change simplifies the code a bit, the semantics
    > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > and:
    > >
    > > SECURITY=n or
    > > rootplug is enabled
    > > capabilities is enabled
    > > smack is enabled
    > > selinux+capabilities is enabled

    >
    > Well, this is how real systems are going to be deployed.


    Sorry, do you mean with capabilities?

    > It becomes confusing, IMHO, if you have to change which secondary LSM you
    > stack with SELinux to enable a cgroup feature.


    So you're saying selinux without capabilities should still be able to
    use dev_cgroup? (Just making sure I understand right)

    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Thu, 13 Mar 2008, Serge E. Hallyn wrote:

    > Quoting James Morris (jmorris@namei.org):
    > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > >
    > > > True, but while this change simplifies the code a bit, the semantics
    > > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > > and:
    > > >
    > > > SECURITY=n or
    > > > rootplug is enabled
    > > > capabilities is enabled
    > > > smack is enabled
    > > > selinux+capabilities is enabled

    > >
    > > Well, this is how real systems are going to be deployed.

    >
    > Sorry, do you mean with capabilities?


    Yes.

    All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    imagine not enabling them on other kernels.

    > > It becomes confusing, IMHO, if you have to change which secondary LSM you
    > > stack with SELinux to enable a cgroup feature.

    >
    > So you're saying selinux without capabilities should still be able to
    > use dev_cgroup? (Just making sure I understand right)


    Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    in capabilities makes sense (rather than having us change the secondary
    stacking LSM just to enable a feature).





    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting James Morris (jmorris@namei.org):
    > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >
    > > Quoting James Morris (jmorris@namei.org):
    > > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > > >
    > > > > True, but while this change simplifies the code a bit, the semantics
    > > > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > > > and:
    > > > >
    > > > > SECURITY=n or
    > > > > rootplug is enabled
    > > > > capabilities is enabled
    > > > > smack is enabled
    > > > > selinux+capabilities is enabled
    > > >
    > > > Well, this is how real systems are going to be deployed.

    > >
    > > Sorry, do you mean with capabilities?

    >
    > Yes.
    >
    > All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    > imagine not enabling them on other kernels.
    >
    > > > It becomes confusing, IMHO, if you have to change which secondary LSM you
    > > > stack with SELinux to enable a cgroup feature.

    > >
    > > So you're saying selinux without capabilities should still be able to
    > > use dev_cgroup? (Just making sure I understand right)

    >
    > Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    > in capabilities makes sense (rather than having us change the secondary
    > stacking LSM just to enable a feature).


    Oh, ok.

    Will let the patch stand until Pavel and Greg comment then.

    thanks,
    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC] cgroups: implement device whitelist lsm (v2)


    --- James Morris wrote:

    > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >
    > > Quoting James Morris (jmorris@namei.org):
    > > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > > >
    > > > > True, but while this change simplifies the code a bit, the semantics
    > > > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > > > and:
    > > > >
    > > > > SECURITY=n or
    > > > > rootplug is enabled
    > > > > capabilities is enabled
    > > > > smack is enabled
    > > > > selinux+capabilities is enabled
    > > >
    > > > Well, this is how real systems are going to be deployed.

    > >
    > > Sorry, do you mean with capabilities?

    >
    > Yes.
    >
    > All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    > imagine not enabling them on other kernels.
    >
    > > > It becomes confusing, IMHO, if you have to change which secondary LSM you

    >
    > > > stack with SELinux to enable a cgroup feature.

    > >
    > > So you're saying selinux without capabilities should still be able to
    > > use dev_cgroup? (Just making sure I understand right)

    >
    > Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    > in capabilities makes sense (rather than having us change the secondary
    > stacking LSM just to enable a feature).


    That's what I was getting at. When the next feature comes along
    are we going to stuff it into capabilities, too? Maybe we'll
    cram it into audit or CIPSO instead, but how long can this go on?
    Eventually we need a mechanism that allows more or less general
    mix-and-match, maybe with a few rules like "don't mix plaids and
    stripes" to keep things sane or these lesser facilities have no
    chance. Seems like we're still making LSM too hard to use.

    Unless I take an aggressive approach to adding them to Smack.
    Hmm, might be a way to make a buck or two that way.
    (Only kidding)(Well, mostly only kidding)

    And yes, I understand that there are still those about who
    don't like LSM in any form, much less in a useful one.


    Casey Schaufler
    casey@schaufler-ca.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Thu, Mar 13, 2008 at 08:41:21PM -0500, Serge E. Hallyn wrote:
    > Quoting James Morris (jmorris@namei.org):
    > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > >
    > > > Quoting James Morris (jmorris@namei.org):
    > > > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > > > >
    > > > > > True, but while this change simplifies the code a bit, the semantics
    > > > > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > > > > and:
    > > > > >
    > > > > > SECURITY=n or
    > > > > > rootplug is enabled
    > > > > > capabilities is enabled
    > > > > > smack is enabled
    > > > > > selinux+capabilities is enabled
    > > > >
    > > > > Well, this is how real systems are going to be deployed.
    > > >
    > > > Sorry, do you mean with capabilities?

    > >
    > > Yes.
    > >
    > > All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    > > imagine not enabling them on other kernels.
    > >
    > > > > It becomes confusing, IMHO, if you have to change which secondary LSM you
    > > > > stack with SELinux to enable a cgroup feature.
    > > >
    > > > So you're saying selinux without capabilities should still be able to
    > > > use dev_cgroup? (Just making sure I understand right)

    > >
    > > Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    > > in capabilities makes sense (rather than having us change the secondary
    > > stacking LSM just to enable a feature).

    >
    > Oh, ok.
    >
    > Will let the patch stand until Pavel and Greg comment then.


    My main question was why was that file in the kernel/ directory?
    Shouldn't that also be in the security/ directory?

    And to be honest, I didn't really look at it at all other than the
    diffstat to make sure you weren't messing with the kobj_map stuff
    anymore

    thanks,

    greg k-h
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Wed, Mar 12, 2008 at 8:27 PM, Serge E. Hallyn wrote:
    >
    > While composing this with the ns_cgroup may seem logical, it is not
    > the right thing to do, because updates to /cg/cg1/devcg.deny are
    > not reflected in /cg/cg1/cg2/devcg.allow.


    Maybe you should follow up the tree to ensure that all parent groups
    have access to the device too? Or alternatively, cache the results of
    this lookup whenever permissions for a device change?

    >
    > A task may only be moved to another devcgroup if it is moving to
    > a direct descendent of its current devcgroup.


    What's the rationale for that?

    >
    > CAP_NS_OVERRIDE is defined as the capability needed to cross namespaces.
    > A task needs both CAP_NS_OVERRIDE and CAP_SYS_ADMIN to create a new
    > devcgroup, update a devcgroup's access, or move a task to a new
    > devcgroup.


    But this isn't necessarily crossing namespaces. It could be used for
    device control in the same namespace (e.g. allowing a job to access a
    raw disk for its data storage rather than going through the
    filesystem).

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Wed, Mar 12, 2008 at 8:27 PM, Serge E. Hallyn wrote:
    > Implement a cgroup using the LSM interface to enforce mknod and open
    > on device files.
    >
    > This implements a simple device access whitelist. A whitelist entry
    > has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it
    > applies to all types, all major numbers, and all minor numbers. Major and
    > minor are obvious. Access is a composition of r (read), w (write), and
    > m (mknod).
    >
    > The root devcgroup starts with rwm to 'all'. A child devcg gets a copy
    > of the parent. Admins can then add and remove devices to the whitelist.
    > Once CAP_HOST_ADMIN is introduced it will be needed to add entries as
    > well or remove entries from another cgroup, though just CAP_SYS_ADMIN
    > will suffice to remove entries for your own group.
    >
    > An entry is added by doing "echo " > devcg.allow,
    > for instance:
    >
    > echo b 7 0 mrw > /cgroups/1/devcg.allow
    >
    > An entry is removed by doing likewise into devcg.deny. Since this is a
    > pure whitelist, not acls, you can only remove entries which exist in the
    > whitelist. You must explicitly
    >
    > echo a 0 0 mrw > /cgroups/1/devcg.deny
    >
    > to remove the "allow all" entry which is automatically inherited from
    > the root cgroup.


    In keeping with the naming convention for control groups, "devices"
    would be better than "devcg".

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Serge E. Hallyn wrote:
    > Quoting James Morris (jmorris@namei.org):
    >> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >>
    >>> Quoting James Morris (jmorris@namei.org):
    >>>> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >>>>
    >>>>> True, but while this change simplifies the code a bit, the semantics
    >>>>> seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    >>>>> and:
    >>>>>
    >>>>> SECURITY=n or
    >>>>> rootplug is enabled
    >>>>> capabilities is enabled
    >>>>> smack is enabled
    >>>>> selinux+capabilities is enabled
    >>>> Well, this is how real systems are going to be deployed.
    >>> Sorry, do you mean with capabilities?

    >> Yes.
    >>
    >> All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    >> imagine not enabling them on other kernels.
    >>
    >>>> It becomes confusing, IMHO, if you have to change which secondary LSM you
    >>>> stack with SELinux to enable a cgroup feature.
    >>> So you're saying selinux without capabilities should still be able to
    >>> use dev_cgroup? (Just making sure I understand right)

    >> Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    >> in capabilities makes sense (rather than having us change the secondary
    >> stacking LSM just to enable a feature).

    >
    > Oh, ok.
    >
    > Will let the patch stand until Pavel and Greg comment then.


    Well, I saw your previous patch, that was implemented as just another
    LSM module and I liked it except for the LSM dependency.

    Since this version can happily work w/o LSM, I like it too

    > thanks,
    > -serge
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting Pavel Emelyanov (xemul@openvz.org):
    > Serge E. Hallyn wrote:
    > > Quoting James Morris (jmorris@namei.org):
    > >> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > >>
    > >>> Quoting James Morris (jmorris@namei.org):
    > >>>> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > >>>>
    > >>>>> True, but while this change simplifies the code a bit, the semantics
    > >>>>> seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > >>>>> and:
    > >>>>>
    > >>>>> SECURITY=n or
    > >>>>> rootplug is enabled
    > >>>>> capabilities is enabled
    > >>>>> smack is enabled
    > >>>>> selinux+capabilities is enabled
    > >>>> Well, this is how real systems are going to be deployed.
    > >>> Sorry, do you mean with capabilities?
    > >> Yes.
    > >>
    > >> All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    > >> imagine not enabling them on other kernels.
    > >>
    > >>>> It becomes confusing, IMHO, if you have to change which secondary LSM you
    > >>>> stack with SELinux to enable a cgroup feature.
    > >>> So you're saying selinux without capabilities should still be able to
    > >>> use dev_cgroup? (Just making sure I understand right)
    > >> Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    > >> in capabilities makes sense (rather than having us change the secondary
    > >> stacking LSM just to enable a feature).

    > >
    > > Oh, ok.
    > >
    > > Will let the patch stand until Pavel and Greg comment then.

    >
    > Well, I saw your previous patch, that was implemented as just another
    > LSM module and I liked it except for the LSM dependency.


    James and Stephen agree with your LSM qualms. I suppose we could add
    cgroups next to the lsm hooks. I suspect Paul Menage would complain
    about that (Paul?), and I do think it's silly as they are security
    questions, not group tracking questions, but if it's what people want
    I can send out a new patch next week.

    > Since this version can happily work w/o LSM, I like it too


    In an earlier version I asked whether you had any experience with usual
    # rules per container. Do you have an idea? Right now the whitelist is
    a straight list we search through linearly. If # rules is generally
    tiny then I'm inclined to keep it that way...

    thanks,
    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting Greg KH (greg@kroah.com):
    > On Thu, Mar 13, 2008 at 08:41:21PM -0500, Serge E. Hallyn wrote:
    > > Quoting James Morris (jmorris@namei.org):
    > > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > > >
    > > > > Quoting James Morris (jmorris@namei.org):
    > > > > > On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    > > > > >
    > > > > > > True, but while this change simplifies the code a bit, the semantics
    > > > > > > seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    > > > > > > and:
    > > > > > >
    > > > > > > SECURITY=n or
    > > > > > > rootplug is enabled
    > > > > > > capabilities is enabled
    > > > > > > smack is enabled
    > > > > > > selinux+capabilities is enabled
    > > > > >
    > > > > > Well, this is how real systems are going to be deployed.
    > > > >
    > > > > Sorry, do you mean with capabilities?
    > > >
    > > > Yes.
    > > >
    > > > All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    > > > imagine not enabling them on other kernels.
    > > >
    > > > > > It becomes confusing, IMHO, if you have to change which secondary LSM you
    > > > > > stack with SELinux to enable a cgroup feature.
    > > > >
    > > > > So you're saying selinux without capabilities should still be able to
    > > > > use dev_cgroup? (Just making sure I understand right)
    > > >
    > > > Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    > > > in capabilities makes sense (rather than having us change the secondary
    > > > stacking LSM just to enable a feature).

    > >
    > > Oh, ok.
    > >
    > > Will let the patch stand until Pavel and Greg comment then.

    >
    > My main question was why was that file in the kernel/ directory?
    > Shouldn't that also be in the security/ directory?


    I'm using cgroups to track the tasks which should have their device
    permissions restricted. Right now cgroups are all under kernel/.

    > And to be honest, I didn't really look at it at all other than the
    > diffstat to make sure you weren't messing with the kobj_map stuff
    > anymore
    >
    > thanks,
    >
    > greg k-h
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting Paul Menage (menage@google.com):
    > On Wed, Mar 12, 2008 at 8:27 PM, Serge E. Hallyn wrote:
    > > Implement a cgroup using the LSM interface to enforce mknod and open
    > > on device files.
    > >
    > > This implements a simple device access whitelist. A whitelist entry
    > > has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it
    > > applies to all types, all major numbers, and all minor numbers. Major and
    > > minor are obvious. Access is a composition of r (read), w (write), and
    > > m (mknod).
    > >
    > > The root devcgroup starts with rwm to 'all'. A child devcg gets a copy
    > > of the parent. Admins can then add and remove devices to the whitelist.
    > > Once CAP_HOST_ADMIN is introduced it will be needed to add entries as
    > > well or remove entries from another cgroup, though just CAP_SYS_ADMIN
    > > will suffice to remove entries for your own group.
    > >
    > > An entry is added by doing "echo " > devcg.allow,
    > > for instance:
    > >
    > > echo b 7 0 mrw > /cgroups/1/devcg.allow
    > >
    > > An entry is removed by doing likewise into devcg.deny. Since this is a
    > > pure whitelist, not acls, you can only remove entries which exist in the
    > > whitelist. You must explicitly
    > >
    > > echo a 0 0 mrw > /cgroups/1/devcg.deny
    > >
    > > to remove the "allow all" entry which is automatically inherited from
    > > the root cgroup.

    >
    > In keeping with the naming convention for control groups, "devices"
    > would be better than "devcg".


    Noted, thanks.

    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Quoting Paul Menage (menage@google.com):
    > On Wed, Mar 12, 2008 at 8:27 PM, Serge E. Hallyn wrote:
    > >
    > > While composing this with the ns_cgroup may seem logical, it is not
    > > the right thing to do, because updates to /cg/cg1/devcg.deny are
    > > not reflected in /cg/cg1/cg2/devcg.allow.

    >
    > Maybe you should follow up the tree to ensure that all parent groups
    > have access to the device too? Or alternatively, cache the results of
    > this lookup whenever permissions for a device change?


    Yes, I considered that. Alternatively additions to a parent cgroup's
    ..deny could be propagated to all its descendents (but not additions to
    the .allow).

    I've noted this as something to add to the next version.

    > > A task may only be moved to another devcgroup if it is moving to
    > > a direct descendent of its current devcgroup.

    >
    > What's the rationale for that?


    To prevent it escaping to laxer device permissions, which of course only
    makes sense if we do what you recommend above

    > > CAP_NS_OVERRIDE is defined as the capability needed to cross namespaces.
    > > A task needs both CAP_NS_OVERRIDE and CAP_SYS_ADMIN to create a new
    > > devcgroup, update a devcgroup's access, or move a task to a new
    > > devcgroup.

    >
    > But this isn't necessarily crossing namespaces. It could be used for
    > device control in the same namespace (e.g. allowing a job to access a
    > raw disk for its data storage rather than going through the
    > filesystem).


    Yeah it should be renamed. I want to use the same cap which we would
    use for user namespaces though. CAP_NS_CONT(ainer)? Even though there
    really is no such thing as a 'container'. But that would tie together
    any such privileges for cgroups and namespaces.

    thanks,
    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    On Fri, Mar 14, 2008 at 7:05 AM, Serge E. Hallyn wrote:
    > > > A task may only be moved to another devcgroup if it is moving to
    > > > a direct descendent of its current devcgroup.

    > >
    > > What's the rationale for that?

    >
    > To prevent it escaping to laxer device permissions, which of course only
    > makes sense if we do what you recommend above
    >


    That makes it impossible for a root process to enter a child cgroup,
    do something, and then go back to its own cgroup. Why aren't the
    existing cgroup security semantics sufficient?

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [RFC] cgroups: implement device whitelist lsm (v2)

    Serge E. Hallyn wrote:
    > Quoting Pavel Emelyanov (xemul@openvz.org):
    >> Serge E. Hallyn wrote:
    >>> Quoting James Morris (jmorris@namei.org):
    >>>> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >>>>
    >>>>> Quoting James Morris (jmorris@namei.org):
    >>>>>> On Thu, 13 Mar 2008, Serge E. Hallyn wrote:
    >>>>>>
    >>>>>>> True, but while this change simplifies the code a bit, the semantics
    >>>>>>> seem more muddled - devcg will be enforcing when CONFIG_CGROUP_DEV=y
    >>>>>>> and:
    >>>>>>>
    >>>>>>> SECURITY=n or
    >>>>>>> rootplug is enabled
    >>>>>>> capabilities is enabled
    >>>>>>> smack is enabled
    >>>>>>> selinux+capabilities is enabled
    >>>>>> Well, this is how real systems are going to be deployed.
    >>>>> Sorry, do you mean with capabilities?
    >>>> Yes.
    >>>>
    >>>> All Fedora, RHEL, CentOS etc. ship with SELinux+capabilities. I can't
    >>>> imagine not enabling them on other kernels.
    >>>>
    >>>>>> It becomes confusing, IMHO, if you have to change which secondary LSM you
    >>>>>> stack with SELinux to enable a cgroup feature.
    >>>>> So you're saying selinux without capabilities should still be able to
    >>>>> use dev_cgroup? (Just making sure I understand right)
    >>>> Nope, SELinux always stacks with capabilities, so havng the cgroup hooks
    >>>> in capabilities makes sense (rather than having us change the secondary
    >>>> stacking LSM just to enable a feature).
    >>> Oh, ok.
    >>>
    >>> Will let the patch stand until Pavel and Greg comment then.

    >> Well, I saw your previous patch, that was implemented as just another
    >> LSM module and I liked it except for the LSM dependency.

    >
    > James and Stephen agree with your LSM qualms. I suppose we could add


    Thanks!

    > cgroups next to the lsm hooks. I suspect Paul Menage would complain
    > about that (Paul?), and I do think it's silly as they are security
    > questions, not group tracking questions, but if it's what people want
    > I can send out a new patch next week.


    The way I see this is: cgroups provide a common way to group tasks
    and an API for general configuration - that's the controller "face",
    and it's up to the controller to decide where he turns his "back",
    IOW where the hooks are placed. For the memory controller - they are
    injected directly into the mm code. For this controller, I think it
    would be OK to use LSM or about-LSM hooks.

    >> Since this version can happily work w/o LSM, I like it too

    >
    > In an earlier version I asked whether you had any experience with usual
    > # rules per container. Do you have an idea? Right now the whitelist is
    > a straight list we search through linearly. If # rules is generally
    > tiny then I'm inclined to keep it that way...


    The # of rules usually has a linear dependency on the number of containers
    (each of then has to have an access to /dev/null,zero,random at least), so
    having 100 containers we will have to scan through a 300-entries list. I'd
    vote for a hash table or a radix/binary/rb tree for that. Or any other way
    for non-linear search you can provide

    > thanks,
    > -serge
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast