[PATCH 0/16 v6] PCI: Linux kernel SR-IOV support - Kernel

This is a discussion on [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support - Kernel ; Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such ...

+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast
Results 1 to 20 of 70

Thread: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

  1. [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

    Greetings,

    Following patches are intended to support SR-IOV capability in the
    Linux kernel. With these patches, people can turn a PCI device with
    the capability into multiple ones from software perspective, which
    will benefit KVM and achieve other purposes such as QoS, security,
    and etc.

    Changes from v5 to v6:
    1, update ABI document to include SR-IOV sysfs entries (Greg KH)
    2, fix two coding style problems (Ingo Molnar)

    ---

    [PATCH 1/16 v6] PCI: remove unnecessary arg of pci_update_resource()
    [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'
    [PATCH 3/16 v6] PCI: export __pci_read_base
    [PATCH 4/16 v6] PCI: make pci_alloc_child_bus() be able to handle NULL bridge
    [PATCH 5/16 v6] PCI: add a wrapper for resource_alignment()
    [PATCH 6/16 v6] PCI: add a new function to map BAR offset
    [PATCH 7/16 v6] PCI: cleanup pcibios_allocate_resources()
    [PATCH 8/16 v6] PCI: add boot options to reassign resources
    [PATCH 9/16 v6] PCI: add boot option to align MMIO resources
    [PATCH 10/16 v6] PCI: cleanup pci_bus_add_devices()
    [PATCH 11/16 v6] PCI: split a new function from pci_bus_add_devices()
    [PATCH 12/16 v6] PCI: support the SR-IOV capability
    [PATCH 13/16 v6] PCI: reserve bus range for SR-IOV device
    [PATCH 14/16 v6] PCI: document for SR-IOV user and developer
    [PATCH 15/16 v6] PCI: document the SR-IOV sysfs entries
    [PATCH 16/16 v6] PCI: document the new PCI boot parameters

    ---

    Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG
    is intended to enable multiple system software to share PCI hardware
    resources. PCI device that supports this capability can be extended
    to one Physical Functions plus multiple Virtual Functions. Physical
    Function, which could be considered as the "real" PCI device, reflects
    the hardware instance and manages all physical resources. Virtual
    Functions are associated with a Physical Function and shares physical
    resources with the Physical Function.Software can control allocation of
    Virtual Functions via registers encapsulated in the capability structure.

    SR-IOV specification can be found at
    http://www.pcisig.com/members/downlo....0_11Sep07.pdf

    Devices that support SR-IOV are available from following vendors:
    http://download.intel.com/design/net...Brf/320025.pdf
    http://www.netxen.com/products/chips...ns/NX3031.html
    http://www.neterion.com/products/x3100.html
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 4/16 v6] PCI: make pci_alloc_child_bus() be able to handle NULL bridge

    Make pci_alloc_child_bus() be able to allocate buses without bridge
    devices. Some SR-IOV devices can occupy more than one bus number,
    but there is no explicit bridges because that have internal routing
    mechanism.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    drivers/pci/probe.c | 7 +++++--
    1 files changed, 5 insertions(+), 2 deletions(-)

    diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
    index db3e5a7..4b12b58 100644
    --- a/drivers/pci/probe.c
    +++ b/drivers/pci/probe.c
    @@ -401,12 +401,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
    if (!child)
    return NULL;

    - child->self = bridge;
    child->parent = parent;
    child->ops = parent->ops;
    child->sysdata = parent->sysdata;
    child->bus_flags = parent->bus_flags;
    - child->bridge = get_device(&bridge->dev);

    /* initialize some portions of the bus device, but don't register it
    * now as the parent is not properly set up yet. This device will get
    @@ -423,6 +421,11 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
    child->primary = parent->secondary;
    child->subordinate = 0xff;

    + if (!bridge)
    + return child;
    +
    + child->self = bridge;
    + child->bridge = get_device(&bridge->dev);
    /* Set up default resource pointers and names.. */
    for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
    child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 10/16 v6] PCI: cleanup pci_bus_add_devices()

    This cleanup makes pci_bus_add_devices() easier to read.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    drivers/pci/bus.c | 56 +++++++++++++++++++++++++------------------------
    drivers/pci/remove.c | 2 +
    2 files changed, 31 insertions(+), 27 deletions(-)

    diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
    index 999cc40..7a21602 100644
    --- a/drivers/pci/bus.c
    +++ b/drivers/pci/bus.c
    @@ -71,7 +71,7 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
    }

    /**
    - * add a single device
    + * pci_bus_add_device - add a single device
    * @dev: device to add
    *
    * This adds a single pci device to the global
    @@ -105,7 +105,7 @@ int pci_bus_add_device(struct pci_dev *dev)
    void pci_bus_add_devices(struct pci_bus *bus)
    {
    struct pci_dev *dev;
    - struct pci_bus *child_bus;
    + struct pci_bus *child;
    int retval;

    list_for_each_entry(dev, &bus->devices, bus_list) {
    @@ -120,39 +120,41 @@ void pci_bus_add_devices(struct pci_bus *bus)
    list_for_each_entry(dev, &bus->devices, bus_list) {
    BUG_ON(!dev->is_added);

    + child = dev->subordinate;
    /*
    * If there is an unattached subordinate bus, attach
    * it and then scan for unattached PCI devices.
    */
    - if (dev->subordinate) {
    - if (list_empty(&dev->subordinate->node)) {
    - down_write(&pci_bus_sem);
    - list_add_tail(&dev->subordinate->node,
    - &dev->bus->children);
    - up_write(&pci_bus_sem);
    - }
    - pci_bus_add_devices(dev->subordinate);
    -
    - /* register the bus with sysfs as the parent is now
    - * properly registered. */
    - child_bus = dev->subordinate;
    - if (child_bus->is_added)
    - continue;
    - child_bus->dev.parent = child_bus->bridge;
    - retval = device_register(&child_bus->dev);
    - if (retval)
    - dev_err(&dev->dev, "Error registering pci_bus,"
    - " continuing...\n");
    - else {
    - child_bus->is_added = 1;
    - retval = device_create_file(&child_bus->dev,
    - &dev_attr_cpuaffinity);
    - }
    + if (!child)
    + continue;
    + if (list_empty(&child->node)) {
    + down_write(&pci_bus_sem);
    + list_add_tail(&child->node,
    + &dev->bus->children);
    + up_write(&pci_bus_sem);
    + }
    + pci_bus_add_devices(child);
    +
    + /*
    + * register the bus with sysfs as the parent is now
    + * properly registered.
    + */
    + if (child->is_added)
    + continue;
    + child->dev.parent = child->bridge;
    + retval = device_register(&child->dev);
    + if (retval)
    + dev_err(&dev->dev, "Error registering pci_bus,"
    + " continuing...\n");
    + else {
    + child->is_added = 1;
    + retval = device_create_file(&child->dev,
    + &dev_attr_cpuaffinity);
    if (retval)
    dev_err(&dev->dev, "Error creating cpuaffinity"
    " file, continuing...\n");

    - retval = device_create_file(&child_bus->dev,
    + retval = device_create_file(&child->dev,
    &dev_attr_cpulistaffinity);
    if (retval)
    dev_err(&dev->dev,
    diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
    index 042e089..bfa0869 100644
    --- a/drivers/pci/remove.c
    +++ b/drivers/pci/remove.c
    @@ -72,6 +72,8 @@ void pci_remove_bus(struct pci_bus *pci_bus)
    list_del(&pci_bus->node);
    up_write(&pci_bus_sem);
    pci_remove_legacy_files(pci_bus);
    + if (!pci_bus->is_added)
    + return;
    device_remove_file(&pci_bus->dev, &dev_attr_cpuaffinity);
    device_remove_file(&pci_bus->dev, &dev_attr_cpulistaffinity);
    device_unregister(&pci_bus->dev);
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 7/16 v6] PCI: cleanup pcibios_allocate_resources()

    This cleanup makes pcibios_allocate_resources() easier to read.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    arch/x86/pci/i386.c | 28 ++++++++++++++--------------
    1 files changed, 14 insertions(+), 14 deletions(-)

    diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
    index 844df0c..8729bde 100644
    --- a/arch/x86/pci/i386.c
    +++ b/arch/x86/pci/i386.c
    @@ -147,7 +147,7 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
    static void __init pcibios_allocate_resources(int pass)
    {
    struct pci_dev *dev = NULL;
    - int idx, disabled;
    + int idx, enabled;
    u16 command;
    struct resource *r, *pr;

    @@ -160,22 +160,22 @@ static void __init pcibios_allocate_resources(int pass)
    if (!r->start) /* Address not assigned at all */
    continue;
    if (r->flags & IORESOURCE_IO)
    - disabled = !(command & PCI_COMMAND_IO);
    + enabled = command & PCI_COMMAND_IO;
    else
    - disabled = !(command & PCI_COMMAND_MEMORY);
    - if (pass == disabled) {
    - dev_dbg(&dev->dev, "resource %#08llx-%#08llx (f=%lx, d=%d, p=%d)\n",
    + enabled = command & PCI_COMMAND_MEMORY;
    + if (pass == enabled)
    + continue;
    + dev_dbg(&dev->dev, "resource %#08llx-%#08llx (f=%lx, d=%d, p=%d)\n",
    (unsigned long long) r->start,
    (unsigned long long) r->end,
    - r->flags, disabled, pass);
    - pr = pci_find_parent_resource(dev, r);
    - if (!pr || request_resource(pr, r) < 0) {
    - dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    - /* We'll assign a new address later */
    - r->end -= r->start;
    - r->start = 0;
    - }
    - }
    + r->flags, enabled, pass);
    + pr = pci_find_parent_resource(dev, r);
    + if (pr && !request_resource(pr, r))
    + continue;
    + dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    + /* We'll assign a new address later */
    + r->end -= r->start;
    + r->start = 0;
    }
    if (!pass) {
    r = &dev->resource[PCI_ROM_RESOURCE];
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH 13/16 v6] PCI: reserve bus range for SR-IOV device

    Reserve bus range for SR-IOV at device scanning stage.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    drivers/pci/iov.c | 24 ++++++++++++++++++++++++
    drivers/pci/pci.h | 5 +++++
    drivers/pci/probe.c | 3 +++
    3 files changed, 32 insertions(+), 0 deletions(-)

    diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
    index dd299aa..c86bd54 100644
    --- a/drivers/pci/iov.c
    +++ b/drivers/pci/iov.c
    @@ -498,6 +498,30 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    }

    /**
    + * pci_iov_bus_range - find bus range used by SR-IOV capability
    + * @bus: the PCI bus
    + *
    + * Returns max number of buses (exclude current one) used by Virtual
    + * Functions.
    + */
    +int pci_iov_bus_range(struct pci_bus *bus)
    +{
    + int max = 0;
    + u8 busnr, devfn;
    + struct pci_dev *dev;
    +
    + list_for_each_entry(dev, &bus->devices, bus_list) {
    + if (!dev->iov)
    + continue;
    + vf_rid(dev, dev->iov->totalvfs - 1, &busnr, &devfn);
    + if (busnr > max)
    + max = busnr;
    + }
    +
    + return max ? max - bus->number : 0;
    +}
    +
    +/**
    * pci_iov_register - register SR-IOV service
    * @dev: the PCI device
    * @callback: callback function for SR-IOV events
    diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
    index 7735d92..5206ae7 100644
    --- a/drivers/pci/pci.h
    +++ b/drivers/pci/pci.h
    @@ -204,6 +204,7 @@ void pci_iov_remove_sysfs(struct pci_dev *dev);
    extern int pci_iov_resource_align(struct pci_dev *dev, int resno);
    extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    enum pci_bar_type *type);
    +extern int pci_iov_bus_range(struct pci_bus *bus);
    #else
    static inline int pci_iov_init(struct pci_dev *dev)
    {
    @@ -227,6 +228,10 @@ static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    {
    return 0;
    }
    +extern inline int pci_iov_bus_range(struct pci_bus *bus)
    +{
    + return 0;
    +}
    #endif /* CONFIG_PCI_IOV */

    #endif /* DRIVERS_PCI_H */
    diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
    index 18ce9c0..50a1380 100644
    --- a/drivers/pci/probe.c
    +++ b/drivers/pci/probe.c
    @@ -1068,6 +1068,9 @@ unsigned int __devinit pci_scan_child_bus(struct pci_bus *bus)
    for (devfn = 0; devfn < 0x100; devfn += 8)
    pci_scan_slot(bus, devfn);

    + /* Reserve buses for SR-IOV capability. */
    + max += pci_iov_bus_range(bus);
    +
    /*
    * After performing arch-dependent fixup of the bus, look behind
    * all PCI-to-PCI bridges on this bus.
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [PATCH 8/16 v6] PCI: add boot options to reassign resources

    This patch adds boot options so user can reassign device resources
    of all devices under a bus.

    The boot options can be used as:
    pci=assign-mmio=0000:01,assign-pio=0000:02
    '[dddd:]bb' is the domain and bus number.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    arch/x86/pci/common.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
    arch/x86/pci/i386.c | 10 ++++---
    arch/x86/pci/pci.h | 3 ++
    3 files changed, 82 insertions(+), 4 deletions(-)

    diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
    index b67732b..06e1ce0 100644
    --- a/arch/x86/pci/common.c
    +++ b/arch/x86/pci/common.c
    @@ -137,6 +137,72 @@ static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
    }
    }

    +static char *pci_assign_pio;
    +static char *pci_assign_mmio;
    +
    +static int pcibios_bus_resource_needs_fixup(struct pci_bus *bus)
    +{
    + int i;
    + int type = 0;
    + int domain, busnr;
    +
    + if (!bus->self)
    + return 0;
    +
    + for (i = 0; i < 2; i++) {
    + char *str = i ? pci_assign_pio : pci_assign_mmio;
    +
    + while (str && *str) {
    + if (sscanf(str, "%04x:%02x", &domain, &busnr) != 2) {
    + if (sscanf(str, "%02x", &busnr) != 1)
    + break;
    + domain = 0;
    + }
    +
    + if (pci_domain_nr(bus) == domain &&
    + bus->number == busnr) {
    + type |= i ? IORESOURCE_IO : IORESOURCE_MEM;
    + break;
    + }
    +
    + str = strchr(str, ';');
    + if (str)
    + str++;
    + }
    + }
    +
    + return type;
    +}
    +
    +static void __devinit pcibios_fixup_bus_resources(struct pci_bus *bus)
    +{
    + int i;
    + int type = pcibios_bus_resource_needs_fixup(bus);
    +
    + if (!type)
    + return;
    +
    + for (i = 0; i < PCI_BUS_NUM_RESOURCES; i++) {
    + struct resource *res = bus->resource[i];
    +
    + if (!res)
    + continue;
    + if (res->flags & type)
    + res->flags = 0;
    + }
    +}
    +
    +int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    +{
    + struct pci_bus *bus;
    +
    + for (bus = dev->bus; bus && bus != pci_root_bus; bus = bus->parent)
    + if (pcibios_bus_resource_needs_fixup(bus))
    + return 1;
    +
    + return 0;
    +}
    +
    /*
    * Called after each bus is probed, but before its children
    * are examined.
    @@ -147,6 +213,7 @@ void __devinit pcibios_fixup_bus(struct pci_bus *b)
    struct pci_dev *dev;

    pci_read_bridge_bases(b);
    + pcibios_fixup_bus_resources(b);
    list_for_each_entry(dev, &b->devices, bus_list)
    pcibios_fixup_device_resources(dev);
    }
    @@ -519,6 +586,12 @@ char * __devinit pcibios_setup(char *str)
    } else if (!strcmp(str, "skip_isa_align")) {
    pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
    return NULL;
    + } else if (!strncmp(str, "assign-pio=", 11)) {
    + pci_assign_pio = str + 11;
    + return NULL;
    + } else if (!strncmp(str, "assign-mmio=", 12)) {
    + pci_assign_mmio = str + 12;
    + return NULL;
    }
    return str;
    }
    diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
    index 8729bde..ea82a5b 100644
    --- a/arch/x86/pci/i386.c
    +++ b/arch/x86/pci/i386.c
    @@ -169,10 +169,12 @@ static void __init pcibios_allocate_resources(int pass)
    (unsigned long long) r->start,
    (unsigned long long) r->end,
    r->flags, enabled, pass);
    - pr = pci_find_parent_resource(dev, r);
    - if (pr && !request_resource(pr, r))
    - continue;
    - dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    + if (!pcibios_resource_needs_fixup(dev, idx)) {
    + pr = pci_find_parent_resource(dev, r);
    + if (pr && !request_resource(pr, r))
    + continue;
    + dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    + }
    /* We'll assign a new address later */
    r->end -= r->start;
    r->start = 0;
    diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
    index 15b9cf6..f22737d 100644
    --- a/arch/x86/pci/pci.h
    +++ b/arch/x86/pci/pci.h
    @@ -117,6 +117,9 @@ extern int __init pcibios_init(void);
    extern int __init pci_mmcfg_arch_init(void);
    extern void __init pci_mmcfg_arch_free(void);

    +/* pci-common.c */
    +extern int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno);
    +
    /*
    * AMD Fam10h CPUs are buggy, and cannot access MMIO config space
    * on their northbrige except through the * %eax register. As such, you MUST
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [PATCH 11/16 v6] PCI: split a new function from pci_bus_add_devices()

    This patch splits a new function from pci_bus_add_devices(). The new
    function can be used to register PCI bus to the device core and create
    its sysfs entries.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    drivers/pci/bus.c | 47 ++++++++++++++++++++++++++++-------------------
    include/linux/pci.h | 1 +
    2 files changed, 29 insertions(+), 19 deletions(-)

    diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
    index 7a21602..1713c35 100644
    --- a/drivers/pci/bus.c
    +++ b/drivers/pci/bus.c
    @@ -91,6 +91,32 @@ int pci_bus_add_device(struct pci_dev *dev)
    }

    /**
    + * pci_bus_add_child - add a child bus
    + * @bus: bus to add
    + *
    + * This adds sysfs entries for a single bus
    + */
    +int pci_bus_add_child(struct pci_bus *bus)
    +{
    + int retval;
    +
    + if (bus->bridge)
    + bus->dev.parent = bus->bridge;
    +
    + retval = device_register(&bus->dev);
    + if (retval)
    + return retval;
    +
    + bus->is_added = 1;
    +
    + retval = device_create_file(&bus->dev, &dev_attr_cpuaffinity);
    + if (retval)
    + return retval;
    +
    + return device_create_file(&bus->dev, &dev_attr_cpulistaffinity);
    +}
    +
    +/**
    * pci_bus_add_devices - insert newly discovered PCI devices
    * @bus: bus to check for new devices
    *
    @@ -141,26 +167,9 @@ void pci_bus_add_devices(struct pci_bus *bus)
    */
    if (child->is_added)
    continue;
    - child->dev.parent = child->bridge;
    - retval = device_register(&child->dev);
    + retval = pci_bus_add_child(child);
    if (retval)
    - dev_err(&dev->dev, "Error registering pci_bus,"
    - " continuing...\n");
    - else {
    - child->is_added = 1;
    - retval = device_create_file(&child->dev,
    - &dev_attr_cpuaffinity);
    - if (retval)
    - dev_err(&dev->dev, "Error creating cpuaffinity"
    - " file, continuing...\n");
    -
    - retval = device_create_file(&child->dev,
    - &dev_attr_cpulistaffinity);
    - if (retval)
    - dev_err(&dev->dev,
    - "Error creating cpulistaffinity"
    - " file, continuing...\n");
    - }
    + dev_err(&dev->dev, "Error adding bus, continuing\n");
    }
    }

    diff --git a/include/linux/pci.h b/include/linux/pci.h
    index 6ac69af..80d88f8 100644
    --- a/include/linux/pci.h
    +++ b/include/linux/pci.h
    @@ -528,6 +528,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
    void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
    unsigned int pci_scan_child_bus(struct pci_bus *bus);
    int __must_check pci_bus_add_device(struct pci_dev *dev);
    +int pci_bus_add_child(struct pci_bus *bus);
    void pci_read_bridge_bases(struct pci_bus *child);
    struct resource *pci_find_parent_resource(const struct pci_dev *dev,
    struct resource *res);
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. [PATCH 9/16 v6] PCI: add boot option to align MMIO resources

    This patch adds boot option to align MMIO resource for a device.
    The alignment is a bigger value between the PAGE_SIZE and the
    resource size.

    The boot option can be used as:
    pci=align-mmio=0000:01:02.3
    '[0000:]01:02.3' is the domain, bus, device and function number
    of the device.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    arch/x86/pci/common.c | 37 +++++++++++++++++++++++++++++++++++++
    drivers/pci/pci.c | 20 ++++++++++++++++++--
    include/linux/pci.h | 1 +
    3 files changed, 56 insertions(+), 2 deletions(-)

    diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
    index 06e1ce0..3c5d230 100644
    --- a/arch/x86/pci/common.c
    +++ b/arch/x86/pci/common.c
    @@ -139,6 +139,7 @@ static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)

    static char *pci_assign_pio;
    static char *pci_assign_mmio;
    +static char *pci_align_mmio;

    static int pcibios_bus_resource_needs_fixup(struct pci_bus *bus)
    {
    @@ -192,6 +193,36 @@ static void __devinit pcibios_fixup_bus_resources(struct pci_bus *bus)
    }
    }

    +int pcibios_resource_alignment(struct pci_dev *dev, int resno)
    +{
    + int domain, busnr, slot, func;
    + char *str = pci_align_mmio;
    +
    + if (dev->resource[resno].flags & IORESOURCE_IO)
    + return 0;
    +
    + while (str && *str) {
    + if (sscanf(str, "%04x:%02x:%02x.%d",
    + &domain, &busnr, &slot, &func) != 4) {
    + if (sscanf(str, "%02x:%02x.%d",
    + &busnr, &slot, &func) != 3)
    + break;
    + domain = 0;
    + }
    +
    + if (pci_domain_nr(dev->bus) == domain &&
    + dev->bus->number == busnr &&
    + dev->devfn == PCI_DEVFN(slot, func))
    + return PAGE_SIZE;
    +
    + str = strchr(str, ';');
    + if (str)
    + str++;
    + }
    +
    + return 0;
    +}
    +
    int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    {
    struct pci_bus *bus;
    @@ -200,6 +231,9 @@ int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    if (pcibios_bus_resource_needs_fixup(bus))
    return 1;

    + if (pcibios_resource_alignment(dev, resno))
    + return 1;
    +
    return 0;
    }

    @@ -592,6 +626,9 @@ char * __devinit pcibios_setup(char *str)
    } else if (!strncmp(str, "assign-mmio=", 12)) {
    pci_assign_mmio = str + 12;
    return NULL;
    + } else if (!strncmp(str, "align-mmio=", 11)) {
    + pci_align_mmio = str + 11;
    + return NULL;
    }
    return str;
    }
    diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
    index b02167a..11ecd6f 100644
    --- a/drivers/pci/pci.c
    +++ b/drivers/pci/pci.c
    @@ -1015,6 +1015,20 @@ int __attribute__ ((weak)) pcibios_set_pcie_reset_state(struct pci_dev *dev,
    }

    /**
    + * pcibios_resource_alignment - get resource alignment requirement
    + * @dev: the PCI device
    + * @resno: resource number
    + *
    + * Queries the resource alignment from PCI low level code. Returns positive
    + * if there is alignment requirement of the resource, or 0 otherwise.
    + */
    +int __attribute__ ((weak)) pcibios_resource_alignment(struct pci_dev *dev,
    + int resno)
    +{
    + return 0;
    +}
    +
    +/**
    * pci_set_pcie_reset_state - set reset state for device dev
    * @dev: the PCI-E device reset
    * @state: Reset state to enter into
    @@ -1913,12 +1927,14 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags)
    */
    int pci_resource_alignment(struct pci_dev *dev, int resno)
    {
    - resource_size_t align;
    + resource_size_t align, bios_align;
    struct resource *res = dev->resource + resno;

    + bios_align = pcibios_resource_alignment(dev, resno);
    +
    align = resource_alignment(res);
    if (align)
    - return align;
    + return align > bios_align ? align : bios_align;

    dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
    return 0;
    diff --git a/include/linux/pci.h b/include/linux/pci.h
    index 2ada2b6..6ac69af 100644
    --- a/include/linux/pci.h
    +++ b/include/linux/pci.h
    @@ -1121,6 +1121,7 @@ int pcibios_add_platform_entries(struct pci_dev *dev);
    void pcibios_disable_device(struct pci_dev *dev);
    int pcibios_set_pcie_reset_state(struct pci_dev *dev,
    enum pcie_reset_state state);
    +int pcibios_resource_alignment(struct pci_dev *dev, int resno);

    #ifdef CONFIG_PCI_MMCONFIG
    extern void __init pci_mmcfg_early_init(void);
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. [PATCH 12/16 v6] PCI: support the SR-IOV capability

    Support Single Root I/O Virtualization (SR-IOV) capability.

    Cc: Alex Chiang
    Cc: Grant Grundler
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Jesse Barnes
    Cc: Matthew Wilcox
    Cc: Randy Dunlap
    Cc: Roland Dreier
    Signed-off-by: Yu Zhao

    ---
    drivers/pci/Kconfig | 12 +
    drivers/pci/Makefile | 2 +
    drivers/pci/iov.c | 592 ++++++++++++++++++++++++++++++++++++++++++++++
    drivers/pci/pci-sysfs.c | 4 +
    drivers/pci/pci.c | 14 +
    drivers/pci/pci.h | 48 ++++
    drivers/pci/probe.c | 4 +
    include/linux/pci.h | 39 +++
    include/linux/pci_regs.h | 21 ++
    9 files changed, 736 insertions(+), 0 deletions(-)
    create mode 100644 drivers/pci/iov.c

    diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
    index e1ca425..e7c0836 100644
    --- a/drivers/pci/Kconfig
    +++ b/drivers/pci/Kconfig
    @@ -50,3 +50,15 @@ config HT_IRQ
    This allows native hypertransport devices to use interrupts.

    If unsure say Y.
    +
    +config PCI_IOV
    + bool "PCI SR-IOV support"
    + depends on PCI
    + select PCI_MSI
    + default n
    + help
    + This option allows device drivers to enable Single Root I/O
    + Virtualization. Each Virtual Function's PCI configuration
    + space can be accessed using its own Bus, Device and Function
    + Number (Routing ID). Each Virtual Function also has PCI Memory
    + Space, which is used to map its own register set.
    diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
    index 4b47f4e..abbfcfa 100644
    --- a/drivers/pci/Makefile
    +++ b/drivers/pci/Makefile
    @@ -55,3 +55,5 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
    ifeq ($(CONFIG_PCI_DEBUG),y)
    EXTRA_CFLAGS += -DDEBUG
    endif
    +
    +obj-$(CONFIG_PCI_IOV) += iov.o
    diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
    new file mode 100644
    index 0000000..dd299aa
    --- /dev/null
    +++ b/drivers/pci/iov.c
    @@ -0,0 +1,592 @@
    +/*
    + * drivers/pci/iov.c
    + *
    + * Copyright (C) 2008 Intel Corporation
    + *
    + * PCI Express Single Root I/O Virtualization capability support.
    + */
    +
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include "pci.h"
    +
    +
    +#define iov_config_attr(field) \
    +static ssize_t field##_show(struct device *dev, \
    + struct device_attribute *attr, char *buf) \
    +{ \
    + struct pci_dev *pdev = to_pci_dev(dev); \
    + return sprintf(buf, "%d\n", pdev->iov->field); \
    +}
    +
    +iov_config_attr(status);
    +iov_config_attr(totalvfs);
    +iov_config_attr(initialvfs);
    +iov_config_attr(numvfs);
    +
    +static inline void vf_rid(struct pci_dev *dev, int vfn, u8 *busnr, u8 *devfn)
    +{
    + u16 rid;
    +
    + rid = (dev->bus->number << 8) + dev->devfn +
    + dev->iov->offset + dev->iov->stride * vfn;
    + *busnr = rid >> 8;
    + *devfn = rid & 0xff;
    +}
    +
    +static int vf_add(struct pci_dev *dev, int vfn)
    +{
    + int i;
    + int rc;
    + u8 busnr, devfn;
    + struct pci_dev *vf;
    + struct pci_bus *bus;
    + struct resource *res;
    + resource_size_t size;
    +
    + vf_rid(dev, vfn, &busnr, &devfn);
    +
    + vf = alloc_pci_dev();
    + if (!vf)
    + return -ENOMEM;
    +
    + if (dev->bus->number == busnr)
    + vf->bus = bus = dev->bus;
    + else {
    + list_for_each_entry(bus, &dev->bus->children, node)
    + if (bus->number == busnr) {
    + vf->bus = bus;
    + break;
    + }
    + BUG_ON(!vf->bus);
    + }
    +
    + vf->sysdata = bus->sysdata;
    + vf->dev.parent = dev->dev.parent;
    + vf->dev.bus = dev->dev.bus;
    + vf->devfn = devfn;
    + vf->hdr_type = PCI_HEADER_TYPE_NORMAL;
    + vf->multifunction = 0;
    + vf->vendor = dev->vendor;
    + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_VF_DID, &vf->device);
    + vf->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
    + vf->error_state = pci_channel_io_normal;
    + vf->is_pcie = 1;
    + vf->pcie_type = PCI_EXP_TYPE_ENDPOINT;
    + vf->dma_mask = 0xffffffff;
    +
    + dev_set_name(&vf->dev, "%04x:%02x:%02x.%d", pci_domain_nr(bus),
    + busnr, PCI_SLOT(devfn), PCI_FUNC(devfn));
    +
    + pci_read_config_byte(vf, PCI_REVISION_ID, &vf->revision);
    + vf->class = dev->class;
    + vf->current_state = PCI_UNKNOWN;
    + vf->irq = 0;
    +
    + for (i = 0; i < PCI_IOV_NUM_BAR; i++) {
    + res = dev->resource + PCI_IOV_RESOURCES + i;
    + if (!res->parent)
    + continue;
    + vf->resource[i].name = pci_name(vf);
    + vf->resource[i].flags = res->flags;
    + size = resource_size(res);
    + do_div(size, dev->iov->totalvfs);
    + vf->resource[i].start = res->start + size * vfn;
    + vf->resource[i].end = vf->resource[i].start + size - 1;
    + rc = request_resource(res, &vf->resource[i]);
    + BUG_ON(rc);
    + }
    +
    + vf->subsystem_vendor = dev->subsystem_vendor;
    + pci_read_config_word(vf, PCI_SUBSYSTEM_ID, &vf->subsystem_device);
    +
    + pci_device_add(vf, bus);
    + return pci_bus_add_device(vf);
    +}
    +
    +static void vf_remove(struct pci_dev *dev, int vfn)
    +{
    + u8 busnr, devfn;
    + struct pci_dev *vf;
    +
    + vf_rid(dev, vfn, &busnr, &devfn);
    +
    + vf = pci_get_bus_and_slot(busnr, devfn);
    + if (!vf)
    + return;
    +
    + pci_dev_put(vf);
    + pci_remove_bus_device(vf);
    +}
    +
    +static int iov_enable(struct pci_dev *dev)
    +{
    + int rc;
    + int i, j;
    + u16 ctrl;
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov->callback)
    + return -ENODEV;
    +
    + if (!iov->numvfs)
    + return -EINVAL;
    +
    + if (iov->status)
    + return 0;
    +
    + rc = iov->callback(dev, PCI_IOV_ENABLE);
    + if (rc)
    + return rc;
    +
    + pci_read_config_word(dev, iov->cap + PCI_IOV_CTRL, &ctrl);
    + ctrl |= (PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
    + pci_write_config_word(dev, iov->cap + PCI_IOV_CTRL, ctrl);
    + ssleep(1);
    +
    + for (i = 0; i < iov->numvfs; i++) {
    + rc = vf_add(dev, i);
    + if (rc)
    + goto failed;
    + }
    +
    + iov->status = 1;
    + return 0;
    +
    +failed:
    + for (j = 0; j < i; j++)
    + vf_remove(dev, j);
    +
    + pci_read_config_word(dev, iov->cap + PCI_IOV_CTRL, &ctrl);
    + ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
    + pci_write_config_word(dev, iov->cap + PCI_IOV_CTRL, ctrl);
    + ssleep(1);
    +
    + return rc;
    +}
    +
    +static int iov_disable(struct pci_dev *dev)
    +{
    + int i;
    + int rc;
    + u16 ctrl;
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov->callback)
    + return -ENODEV;
    +
    + if (!iov->status)
    + return 0;
    +
    + rc = iov->callback(dev, PCI_IOV_DISABLE);
    + if (rc)
    + return rc;
    +
    + for (i = 0; i < iov->numvfs; i++)
    + vf_remove(dev, i);
    +
    + pci_read_config_word(dev, iov->cap + PCI_IOV_CTRL, &ctrl);
    + ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE);
    + pci_write_config_word(dev, iov->cap + PCI_IOV_CTRL, ctrl);
    + ssleep(1);
    +
    + iov->status = 0;
    + return 0;
    +}
    +
    +static int iov_set_numvfs(struct pci_dev *dev, int numvfs)
    +{
    + int rc;
    + u16 offset, stride;
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov->callback)
    + return -ENODEV;
    +
    + if (numvfs < 0 || numvfs > iov->initialvfs || iov->status)
    + return -EINVAL;
    +
    + if (numvfs == iov->numvfs)
    + return 0;
    +
    + rc = iov->callback(dev, PCI_IOV_NUMVFS | iov->numvfs);
    + if (rc)
    + return rc;
    +
    + pci_write_config_word(dev, iov->cap + PCI_IOV_NUM_VF, numvfs);
    + pci_read_config_word(dev, iov->cap + PCI_IOV_VF_OFFSET, &offset);
    + pci_read_config_word(dev, iov->cap + PCI_IOV_VF_STRIDE, &stride);
    + if ((numvfs && !offset) || (numvfs > 1 && !stride))
    + return -EIO;
    +
    + iov->offset = offset;
    + iov->stride = stride;
    + iov->numvfs = numvfs;
    + return 0;
    +}
    +
    +static ssize_t status_store(struct device *dev,
    + struct device_attribute *attr,
    + const char *buf, size_t count)
    +{
    + int rc;
    + long enable;
    + struct pci_dev *pdev = to_pci_dev(dev);
    +
    + rc = strict_strtol(buf, 0, &enable);
    + if (rc)
    + return rc;
    +
    + mutex_lock(&pdev->iov->ops_lock);
    + switch (enable) {
    + case 0:
    + rc = iov_disable(pdev);
    + break;
    + case 1:
    + rc = iov_enable(pdev);
    + break;
    + default:
    + rc = -EINVAL;
    + }
    + mutex_unlock(&pdev->iov->ops_lock);
    +
    + return rc ? rc : count;
    +}
    +
    +static ssize_t numvfs_store(struct device *dev,
    + struct device_attribute *attr,
    + const char *buf, size_t count)
    +{
    + int rc;
    + long numvfs;
    + struct pci_dev *pdev = to_pci_dev(dev);
    +
    + rc = strict_strtol(buf, 0, &numvfs);
    + if (rc)
    + return rc;
    +
    + mutex_lock(&pdev->iov->ops_lock);
    + rc = iov_set_numvfs(pdev, numvfs);
    + mutex_unlock(&pdev->iov->ops_lock);
    +
    + return rc ? rc : count;
    +}
    +
    +static DEVICE_ATTR(totalvfs, S_IRUGO, totalvfs_show, NULL);
    +static DEVICE_ATTR(initialvfs, S_IRUGO, initialvfs_show, NULL);
    +static DEVICE_ATTR(numvfs, S_IWUSR | S_IRUGO, numvfs_show, numvfs_store);
    +static DEVICE_ATTR(enable, S_IWUSR | S_IRUGO, status_show, status_store);
    +
    +static struct attribute *iov_attrs[] = {
    + &dev_attr_totalvfs.attr,
    + &dev_attr_initialvfs.attr,
    + &dev_attr_numvfs.attr,
    + &dev_attr_enable.attr,
    + NULL
    +};
    +
    +static struct attribute_group iov_attr_group = {
    + .attrs = iov_attrs,
    + .name = "iov",
    +};
    +
    +static int iov_alloc_bus(struct pci_bus *bus, int busnr)
    +{
    + int i;
    + int rc;
    + struct pci_dev *dev;
    + struct pci_bus *child;
    +
    + list_for_each_entry(dev, &bus->devices, bus_list)
    + if (dev->iov)
    + break;
    +
    + BUG_ON(!dev->iov);
    + pci_dev_get(dev);
    + mutex_lock(&dev->iov->bus_lock);
    +
    + for (i = bus->number + 1; i <= busnr; i++) {
    + list_for_each_entry(child, &bus->children, node)
    + if (child->number == i)
    + break;
    + if (child->number == i)
    + continue;
    + child = pci_add_new_bus(bus, NULL, i);
    + if (!child)
    + return -ENOMEM;
    +
    + child->subordinate = i;
    + child->dev.parent = bus->bridge;
    + rc = pci_bus_add_child(child);
    + if (rc)
    + return rc;
    + }
    +
    + mutex_unlock(&dev->iov->bus_lock);
    +
    + return 0;
    +}
    +
    +static void iov_release_bus(struct pci_bus *bus)
    +{
    + struct pci_dev *dev, *tmp;
    + struct pci_bus *child, *next;
    +
    + list_for_each_entry(dev, &bus->devices, bus_list)
    + if (dev->iov)
    + break;
    +
    + BUG_ON(!dev->iov);
    + mutex_lock(&dev->iov->bus_lock);
    +
    + list_for_each_entry(tmp, &bus->devices, bus_list)
    + if (tmp->iov && tmp->iov->callback)
    + goto done;
    +
    + list_for_each_entry_safe(child, next, &bus->children, node)
    + if (!child->bridge)
    + pci_remove_bus(child);
    +done:
    + mutex_unlock(&dev->iov->bus_lock);
    + pci_dev_put(dev);
    +}
    +
    +/**
    + * pci_iov_init - initialize device's SR-IOV capability
    + * @dev: the PCI device
    + *
    + * Returns 0 on success, or negative on failure.
    + *
    + * The major differences between Virtual Function and PCI device are:
    + * 1) the device with multiple bus numbers uses internal routing, so
    + * there is no explicit bridge device in this case.
    + * 2) Virtual Function memory spaces are designated by BARs encapsulated
    + * in the capability structure, and the BARs in Virtual Function PCI
    + * configuration space are read-only zero.
    + */
    +int pci_iov_init(struct pci_dev *dev)
    +{
    + int i;
    + int pos;
    + u32 pgsz;
    + u16 ctrl, total, initial, offset, stride;
    + struct pci_iov *iov;
    + struct resource *res;
    +
    + if (!dev->is_pcie || (dev->pcie_type != PCI_EXP_TYPE_RC_END &&
    + dev->pcie_type != PCI_EXP_TYPE_ENDPOINT))
    + return -ENODEV;
    +
    + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_IOV);
    + if (!pos)
    + return -ENODEV;
    +
    + ctrl = pci_ari_enabled(dev) ? PCI_IOV_CTRL_ARI : 0;
    + pci_write_config_word(dev, pos + PCI_IOV_CTRL, ctrl);
    + ssleep(1);
    +
    + pci_read_config_word(dev, pos + PCI_IOV_TOTAL_VF, &total);
    + pci_read_config_word(dev, pos + PCI_IOV_INITIAL_VF, &initial);
    + pci_write_config_word(dev, pos + PCI_IOV_NUM_VF, initial);
    + pci_read_config_word(dev, pos + PCI_IOV_VF_OFFSET, &offset);
    + pci_read_config_word(dev, pos + PCI_IOV_VF_STRIDE, &stride);
    + if (!total || initial > total || (initial && !offset) ||
    + (initial > 1 && !stride))
    + return -EIO;
    +
    + pci_read_config_dword(dev, pos + PCI_IOV_SUP_PGSIZE, &pgsz);
    + i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0;
    + pgsz &= ~((1 << i) - 1);
    + if (!pgsz)
    + return -EIO;
    +
    + pgsz &= ~(pgsz - 1);
    + pci_write_config_dword(dev, pos + PCI_IOV_SYS_PGSIZE, pgsz);
    +
    + iov = kzalloc(sizeof(*iov), GFP_KERNEL);
    + if (!iov)
    + return -ENOMEM;
    +
    + iov->cap = pos;
    + iov->totalvfs = total;
    + iov->initialvfs = initial;
    + iov->offset = offset;
    + iov->stride = stride;
    + iov->align = pgsz << 12;
    +
    + for (i = 0; i < PCI_IOV_NUM_BAR; i++) {
    + res = dev->resource + PCI_IOV_RESOURCES + i;
    + pos = iov->cap + PCI_IOV_BAR_0 + i * 4;
    + i += __pci_read_base(dev, pci_bar_unknown, res, pos);
    + if (!res->flags)
    + continue;
    + res->flags &= ~IORESOURCE_SIZEALIGN;
    + res->end = res->start + resource_size(res) * total - 1;
    + }
    +
    + mutex_init(&iov->ops_lock);
    + mutex_init(&iov->bus_lock);
    +
    + dev->iov = iov;
    +
    + return 0;
    +}
    +
    +/**
    + * pci_iov_release - release resources used by SR-IOV capability
    + * @dev: the PCI device
    + */
    +void pci_iov_release(struct pci_dev *dev)
    +{
    + if (!dev->iov)
    + return;
    +
    + mutex_destroy(&dev->iov->ops_lock);
    + mutex_destroy(&dev->iov->bus_lock);
    + kfree(dev->iov);
    + dev->iov = NULL;
    +}
    +
    +/**
    + * pci_iov_create_sysfs - create sysfs for SR-IOV capability
    + * @dev: the PCI device
    + */
    +void pci_iov_create_sysfs(struct pci_dev *dev)
    +{
    + if (!dev->iov)
    + return;
    +
    + sysfs_create_group(&dev->dev.kobj, &iov_attr_group);
    +}
    +
    +/**
    + * pci_iov_remove_sysfs - remove sysfs of SR-IOV capability
    + * @dev: the PCI device
    + */
    +void pci_iov_remove_sysfs(struct pci_dev *dev)
    +{
    + if (!dev->iov)
    + return;
    +
    + sysfs_remove_group(&dev->dev.kobj, &iov_attr_group);
    +}
    +
    +int pci_iov_resource_align(struct pci_dev *dev, int resno)
    +{
    + if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCES_END)
    + return 0;
    +
    + BUG_ON(!dev->iov);
    +
    + return dev->iov->align;
    +}
    +
    +int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    + enum pci_bar_type *type)
    +{
    + if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCES_END)
    + return 0;
    +
    + BUG_ON(!dev->iov);
    +
    + *type = pci_bar_unknown;
    + return dev->iov->cap + PCI_IOV_BAR_0 +
    + 4 * (resno - PCI_IOV_RESOURCES);
    +}
    +
    +/**
    + * pci_iov_register - register SR-IOV service
    + * @dev: the PCI device
    + * @callback: callback function for SR-IOV events
    + *
    + * Returns 0 on success, or negative on failure.
    + */
    +int pci_iov_register(struct pci_dev *dev,
    + int (*callback)(struct pci_dev *, u32))
    +{
    + u8 busnr, devfn;
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov)
    + return -ENODEV;
    +
    + if (!callback || iov->callback)
    + return -EINVAL;
    +
    + vf_rid(dev, iov->totalvfs - 1, &busnr, &devfn);
    + if (busnr > dev->bus->subordinate)
    + return -EIO;
    +
    + iov->callback = callback;
    + return iov_alloc_bus(dev->bus, busnr);
    +}
    +EXPORT_SYMBOL_GPL(pci_iov_register);
    +
    +/**
    + * pci_iov_unregister - unregister SR-IOV service
    + * @dev: the PCI device
    + */
    +void pci_iov_unregister(struct pci_dev *dev)
    +{
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov || !iov->callback)
    + return;
    +
    + iov->callback = NULL;
    + iov_release_bus(dev->bus);
    +}
    +EXPORT_SYMBOL_GPL(pci_iov_unregister);
    +
    +/**
    + * pci_iov_enable - enable SR-IOV capability
    + * @dev: the PCI device
    + * @numvfs: number of VFs to be available
    + *
    + * Returns 0 on success, or negative on failure.
    + */
    +int pci_iov_enable(struct pci_dev *dev, int numvfs)
    +{
    + int rc;
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov)
    + return -ENODEV;
    +
    + if (!iov->callback)
    + return -EINVAL;
    +
    + mutex_lock(&iov->ops_lock);
    + rc = iov_set_numvfs(dev, numvfs);
    + if (rc)
    + goto done;
    + rc = iov_enable(dev);
    +done:
    + mutex_unlock(&iov->ops_lock);
    +
    + return rc;
    +}
    +EXPORT_SYMBOL_GPL(pci_iov_enable);
    +
    +/**
    + * pci_iov_disable - disable SR-IOV capability
    + * @dev: the PCI device
    + *
    + * Should be called upon Physical Function driver removal, and power
    + * state change. All previous allocated Virtual Functions are reclaimed.
    + */
    +void pci_iov_disable(struct pci_dev *dev)
    +{
    + struct pci_iov *iov = dev->iov;
    +
    + if (!iov || !iov->callback)
    + return;
    +
    + mutex_lock(&iov->ops_lock);
    + iov_disable(dev);
    + mutex_unlock(&iov->ops_lock);
    +}
    +EXPORT_SYMBOL_GPL(pci_iov_disable);
    diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
    index 5c456ab..18881f2 100644
    --- a/drivers/pci/pci-sysfs.c
    +++ b/drivers/pci/pci-sysfs.c
    @@ -847,6 +847,9 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
    /* Active State Power Management */
    pcie_aspm_create_sysfs_dev_files(dev);

    + /* Single Root I/O Virtualization */
    + pci_iov_create_sysfs(dev);
    +
    return 0;
    }

    @@ -932,6 +935,7 @@ static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
    }

    pcie_aspm_remove_sysfs_dev_files(dev);
    + pci_iov_remove_sysfs(dev);
    }

    /**
    diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
    index 11ecd6f..10a43b2 100644
    --- a/drivers/pci/pci.c
    +++ b/drivers/pci/pci.c
    @@ -1936,6 +1936,13 @@ int pci_resource_alignment(struct pci_dev *dev, int resno)
    if (align)
    return align > bios_align ? align : bios_align;

    + if (resno > PCI_ROM_RESOURCE && resno < PCI_BRIDGE_RESOURCES) {
    + /* device specific resource */
    + align = pci_iov_resource_align(dev, resno);
    + if (align)
    + return align > bios_align ? align : bios_align;
    + }
    +
    dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
    return 0;
    }
    @@ -1950,12 +1957,19 @@ int pci_resource_alignment(struct pci_dev *dev, int resno)
    */
    int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type)
    {
    + int reg;
    +
    if (resno < PCI_ROM_RESOURCE) {
    *type = pci_bar_unknown;
    return PCI_BASE_ADDRESS_0 + 4 * resno;
    } else if (resno == PCI_ROM_RESOURCE) {
    *type = pci_bar_mem32;
    return dev->rom_base_reg;
    + } else if (resno < PCI_BRIDGE_RESOURCES) {
    + /* device specific resource */
    + reg = pci_iov_resource_bar(dev, resno, type);
    + if (reg)
    + return reg;
    }

    dev_err(&dev->dev, "BAR: invalid resource #%d\n", resno);
    diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
    index d707477..7735d92 100644
    --- a/drivers/pci/pci.h
    +++ b/drivers/pci/pci.h
    @@ -181,4 +181,52 @@ static inline int pci_ari_enabled(struct pci_dev *dev)
    return dev->ari_enabled;
    }

    +/* Single Root I/O Virtualization */
    +struct pci_iov {
    + int cap; /* capability position */
    + int align; /* page size used to map memory space */
    + int status; /* status of SR-IOV */
    + u16 totalvfs; /* total VFs associated with the PF */
    + u16 initialvfs; /* initial VFs associated with the PF */
    + u16 numvfs; /* number of VFs available */
    + u16 offset; /* first VF Routing ID offset */
    + u16 stride; /* following VF stride */
    + struct mutex ops_lock; /* lock for SR-IOV operations */
    + struct mutex bus_lock; /* lock for VF bus */
    + int (*callback)(struct pci_dev *, u32); /* event callback function */
    +};
    +
    +#ifdef CONFIG_PCI_IOV
    +extern int pci_iov_init(struct pci_dev *dev);
    +extern void pci_iov_release(struct pci_dev *dev);
    +void pci_iov_create_sysfs(struct pci_dev *dev);
    +void pci_iov_remove_sysfs(struct pci_dev *dev);
    +extern int pci_iov_resource_align(struct pci_dev *dev, int resno);
    +extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    + enum pci_bar_type *type);
    +#else
    +static inline int pci_iov_init(struct pci_dev *dev)
    +{
    + return -EIO;
    +}
    +static inline void pci_iov_release(struct pci_dev *dev)
    +{
    +}
    +static inline void pci_iov_create_sysfs(struct pci_dev *dev)
    +{
    +}
    +static inline void pci_iov_remove_sysfs(struct pci_dev *dev)
    +{
    +}
    +static inline int pci_iov_resource_align(struct pci_dev *dev, int resno)
    +{
    + return 0;
    +}
    +static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno,
    + enum pci_bar_type *type)
    +{
    + return 0;
    +}
    +#endif /* CONFIG_PCI_IOV */
    +
    #endif /* DRIVERS_PCI_H */
    diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
    index 4b12b58..18ce9c0 100644
    --- a/drivers/pci/probe.c
    +++ b/drivers/pci/probe.c
    @@ -779,6 +779,7 @@ static int pci_setup_device(struct pci_dev * dev)
    static void pci_release_capabilities(struct pci_dev *dev)
    {
    pci_vpd_release(dev);
    + pci_iov_release(dev);
    }

    /**
    @@ -962,6 +963,9 @@ static void pci_init_capabilities(struct pci_dev *dev)

    /* Alternative Routing-ID Forwarding */
    pci_enable_ari(dev);
    +
    + /* Single Root I/O Virtualization */
    + pci_iov_init(dev);
    }

    void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
    diff --git a/include/linux/pci.h b/include/linux/pci.h
    index 80d88f8..77af7e0 100644
    --- a/include/linux/pci.h
    +++ b/include/linux/pci.h
    @@ -87,6 +87,12 @@ enum {
    /* #6: expansion ROM */
    PCI_ROM_RESOURCE,

    + /* device specific resources */
    +#ifdef CONFIG_PCI_IOV
    + PCI_IOV_RESOURCES,
    + PCI_IOV_RESOURCES_END = PCI_IOV_RESOURCES + PCI_IOV_NUM_BAR - 1,
    +#endif
    +
    /* address space assigned to buses behind the bridge */
    #ifndef PCI_BRIDGE_RES_NUM
    #define PCI_BRIDGE_RES_NUM 4
    @@ -165,6 +171,7 @@ struct pci_cap_saved_state {

    struct pcie_link_state;
    struct pci_vpd;
    +struct pci_iov;

    /*
    * The pci_dev structure is used to describe PCI devices.
    @@ -253,6 +260,7 @@ struct pci_dev {
    struct list_head msi_list;
    #endif
    struct pci_vpd *vpd;
    + struct pci_iov *iov;
    };

    extern struct pci_dev *alloc_pci_dev(void);
    @@ -1147,5 +1155,36 @@ static inline void * pci_ioremap_bar(struct pci_dev *pdev, int bar)
    }
    #endif

    +/* SR-IOV events masks */
    +#define PCI_IOV_NUM_VIRTFN 0x0000FFFFU /* NumVFs to be set */
    +/* SR-IOV events values */
    +#define PCI_IOV_ENABLE 0x00010000U /* SR-IOV enable request */
    +#define PCI_IOV_DISABLE 0x00020000U /* SR-IOV disable request */
    +#define PCI_IOV_NUMVFS 0x00040000U /* SR-IOV disable request */
    +
    +#ifdef CONFIG_PCI_IOV
    +extern int pci_iov_enable(struct pci_dev *dev, int numvfs);
    +extern void pci_iov_disable(struct pci_dev *dev);
    +extern int pci_iov_register(struct pci_dev *dev,
    + int (*callback)(struct pci_dev *dev, u32 event));
    +extern void pci_iov_unregister(struct pci_dev *dev);
    +#else
    +static inline int pci_iov_enable(struct pci_dev *dev, int numvfs)
    +{
    + return -EIO;
    +}
    +static inline void pci_iov_disable(struct pci_dev *dev)
    +{
    +}
    +static inline int pci_iov_register(struct pci_dev *dev,
    + int (*callback)(struct pci_dev *dev, u32 event))
    +{
    + return -EIO;
    +}
    +static inline void pci_iov_unregister(struct pci_dev *dev)
    +{
    +}
    +#endif /* CONFIG_PCI_IOV */
    +
    #endif /* __KERNEL__ */
    #endif /* LINUX_PCI_H */
    diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
    index eb6686b..1b28b3f 100644
    --- a/include/linux/pci_regs.h
    +++ b/include/linux/pci_regs.h
    @@ -363,6 +363,7 @@
    #define PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */
    #define PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */
    #define PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCI/PCI-X Bridge */
    +#define PCI_EXP_TYPE_RC_END 0x9 /* Root Complex Integrated Endpoint */
    #define PCI_EXP_FLAGS_SLOT 0x0100 /* Slot implemented */
    #define PCI_EXP_FLAGS_IRQ 0x3e00 /* Interrupt message number */
    #define PCI_EXP_DEVCAP 4 /* Device capabilities */
    @@ -434,6 +435,7 @@
    #define PCI_EXT_CAP_ID_DSN 3
    #define PCI_EXT_CAP_ID_PWR 4
    #define PCI_EXT_CAP_ID_ARI 14
    +#define PCI_EXT_CAP_ID_IOV 16

    /* Advanced Error Reporting */
    #define PCI_ERR_UNCOR_STATUS 4 /* Uncorrectable Error Status */
    @@ -551,4 +553,23 @@
    #define PCI_ARI_CTRL_ACS 0x0002 /* ACS Function Groups Enable */
    #define PCI_ARI_CTRL_FG(x) (((x) >> 4) & 7) /* Function Group */

    +/* Single Root I/O Virtualization */
    +#define PCI_IOV_CAP 0x04 /* SR-IOV Capabilities */
    +#define PCI_IOV_CTRL 0x08 /* SR-IOV Control */
    +#define PCI_IOV_CTRL_VFE 0x01 /* VF Enable */
    +#define PCI_IOV_CTRL_MSE 0x08 /* VF Memory Space Enable */
    +#define PCI_IOV_CTRL_ARI 0x10 /* ARI Capable Hierarchy */
    +#define PCI_IOV_STATUS 0x0a /* SR-IOV Status */
    +#define PCI_IOV_INITIAL_VF 0x0c /* Initial VFs */
    +#define PCI_IOV_TOTAL_VF 0x0e /* Total VFs */
    +#define PCI_IOV_NUM_VF 0x10 /* Number of VFs */
    +#define PCI_IOV_FUNC_LINK 0x12 /* Function Dependency Link */
    +#define PCI_IOV_VF_OFFSET 0x14 /* First VF Offset */
    +#define PCI_IOV_VF_STRIDE 0x16 /* Following VF Stride */
    +#define PCI_IOV_VF_DID 0x1a /* VF Device ID */
    +#define PCI_IOV_SUP_PGSIZE 0x1c /* Supported Page Sizes */
    +#define PCI_IOV_SYS_PGSIZE 0x20 /* System Page Size */
    +#define PCI_IOV_BAR_0 0x24 /* VF BAR0 */
    +#define PCI_IOV_NUM_BAR 6 /* Number of VF BARs */
    +
    #endif /* LINUX_PCI_REGS_H */
    --
    1.5.6.4

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

    On Wednesday 22 October 2008 02:40:41 am Yu Zhao wrote:
    > This patch moves all definitions of the PCI resource names to an 'enum',
    > and also replaces some hard-coded resource variables with symbol
    > names. This change eases introduction of device specific resources.


    Thanks for removing a bunch of magic numbers from the code.

    > static void
    > pci_restore_bars(struct pci_dev *dev)
    > {
    > - int i, numres;
    > -
    > - switch (dev->hdr_type) {
    > - case PCI_HEADER_TYPE_NORMAL:
    > - numres = 6;
    > - break;
    > - case PCI_HEADER_TYPE_BRIDGE:
    > - numres = 2;
    > - break;
    > - case PCI_HEADER_TYPE_CARDBUS:
    > - numres = 1;
    > - break;
    > - default:
    > - /* Should never get here, but just in case... */
    > - return;
    > - }
    > + int i;
    >
    > - for (i = 0; i < numres; i++)
    > + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
    > pci_update_resource(dev, i);
    > }


    The behavior of this function used to depend on dev->hdr_type. Now
    we don't look at hdr_type at all, so we do the same thing for all
    devices.

    For example, for a CardBus device, we used to call pci_update_resource()
    only for BAR 0; now we call it for BARs 0-6.

    Maybe this is safe, but I can't tell from the patch, so I think you
    should explain *why* it's safe in the changelog.

    > +/*
    > + * For PCI devices, the region numbers are assigned this way:
    > + */
    > +enum {
    > + /* #0-5: standard PCI regions */
    > + PCI_STD_RESOURCES,
    > + PCI_STD_RESOURCES_END = 5,
    > +
    > + /* #6: expansion ROM */
    > + PCI_ROM_RESOURCE,
    > +
    > + /* address space assigned to buses behind the bridge */
    > +#ifndef PCI_BRIDGE_RES_NUM
    > +#define PCI_BRIDGE_RES_NUM 4
    > +#endif
    > + PCI_BRIDGE_RESOURCES,
    > + PCI_BRIDGE_RES_END = PCI_BRIDGE_RESOURCES + PCI_BRIDGE_RES_NUM - 1,


    Since you used "PCI_STD_RESOURCES_END" above, maybe you should use
    "PCI_BRIDGE_RESOURCES_END" instead of "PCI_BRIDGE_RES_END".

    Bjorn
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH 8/16 v6] PCI: add boot options to reassign resources

    On Wednesday 22 October 2008 02:43:03 am Yu Zhao wrote:
    > This patch adds boot options so user can reassign device resources
    > of all devices under a bus.
    >
    > The boot options can be used as:
    > pci=assign-mmio=0000:01,assign-pio=0000:02
    > '[dddd:]bb' is the domain and bus number.


    I think this example is incorrect because you look for ";" to
    separate options, not ",".

    Bjorn

    > Cc: Alex Chiang
    > Cc: Grant Grundler
    > Cc: Greg KH
    > Cc: Ingo Molnar
    > Cc: Jesse Barnes
    > Cc: Matthew Wilcox
    > Cc: Randy Dunlap
    > Cc: Roland Dreier
    > Signed-off-by: Yu Zhao
    >
    > ---
    > arch/x86/pci/common.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
    > arch/x86/pci/i386.c | 10 ++++---
    > arch/x86/pci/pci.h | 3 ++
    > 3 files changed, 82 insertions(+), 4 deletions(-)
    >
    > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
    > index b67732b..06e1ce0 100644
    > --- a/arch/x86/pci/common.c
    > +++ b/arch/x86/pci/common.c
    > @@ -137,6 +137,72 @@ static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
    > }
    > }
    >
    > +static char *pci_assign_pio;
    > +static char *pci_assign_mmio;
    > +
    > +static int pcibios_bus_resource_needs_fixup(struct pci_bus *bus)
    > +{
    > + int i;
    > + int type = 0;
    > + int domain, busnr;
    > +
    > + if (!bus->self)
    > + return 0;
    > +
    > + for (i = 0; i < 2; i++) {
    > + char *str = i ? pci_assign_pio : pci_assign_mmio;
    > +
    > + while (str && *str) {
    > + if (sscanf(str, "%04x:%02x", &domain, &busnr) != 2) {
    > + if (sscanf(str, "%02x", &busnr) != 1)
    > + break;
    > + domain = 0;
    > + }
    > +
    > + if (pci_domain_nr(bus) == domain &&
    > + bus->number == busnr) {
    > + type |= i ? IORESOURCE_IO : IORESOURCE_MEM;
    > + break;
    > + }
    > +
    > + str = strchr(str, ';');
    > + if (str)
    > + str++;
    > + }
    > + }
    > +
    > + return type;
    > +}
    > +
    > +static void __devinit pcibios_fixup_bus_resources(struct pci_bus *bus)
    > +{
    > + int i;
    > + int type = pcibios_bus_resource_needs_fixup(bus);
    > +
    > + if (!type)
    > + return;
    > +
    > + for (i = 0; i < PCI_BUS_NUM_RESOURCES; i++) {
    > + struct resource *res = bus->resource[i];
    > +
    > + if (!res)
    > + continue;
    > + if (res->flags & type)
    > + res->flags = 0;
    > + }
    > +}
    > +
    > +int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    > +{
    > + struct pci_bus *bus;
    > +
    > + for (bus = dev->bus; bus && bus != pci_root_bus; bus = bus->parent)
    > + if (pcibios_bus_resource_needs_fixup(bus))
    > + return 1;
    > +
    > + return 0;
    > +}
    > +
    > /*
    > * Called after each bus is probed, but before its children
    > * are examined.
    > @@ -147,6 +213,7 @@ void __devinit pcibios_fixup_bus(struct pci_bus *b)
    > struct pci_dev *dev;
    >
    > pci_read_bridge_bases(b);
    > + pcibios_fixup_bus_resources(b);
    > list_for_each_entry(dev, &b->devices, bus_list)
    > pcibios_fixup_device_resources(dev);
    > }
    > @@ -519,6 +586,12 @@ char * __devinit pcibios_setup(char *str)
    > } else if (!strcmp(str, "skip_isa_align")) {
    > pci_probe |= PCI_CAN_SKIP_ISA_ALIGN;
    > return NULL;
    > + } else if (!strncmp(str, "assign-pio=", 11)) {
    > + pci_assign_pio = str + 11;
    > + return NULL;
    > + } else if (!strncmp(str, "assign-mmio=", 12)) {
    > + pci_assign_mmio = str + 12;
    > + return NULL;
    > }
    > return str;
    > }
    > diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
    > index 8729bde..ea82a5b 100644
    > --- a/arch/x86/pci/i386.c
    > +++ b/arch/x86/pci/i386.c
    > @@ -169,10 +169,12 @@ static void __init pcibios_allocate_resources(int pass)
    > (unsigned long long) r->start,
    > (unsigned long long) r->end,
    > r->flags, enabled, pass);
    > - pr = pci_find_parent_resource(dev, r);
    > - if (pr && !request_resource(pr, r))
    > - continue;
    > - dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    > + if (!pcibios_resource_needs_fixup(dev, idx)) {
    > + pr = pci_find_parent_resource(dev, r);
    > + if (pr && !request_resource(pr, r))
    > + continue;
    > + dev_err(&dev->dev, "BAR %d: can't allocate resource\n", idx);
    > + }
    > /* We'll assign a new address later */
    > r->end -= r->start;
    > r->start = 0;
    > diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h
    > index 15b9cf6..f22737d 100644
    > --- a/arch/x86/pci/pci.h
    > +++ b/arch/x86/pci/pci.h
    > @@ -117,6 +117,9 @@ extern int __init pcibios_init(void);
    > extern int __init pci_mmcfg_arch_init(void);
    > extern void __init pci_mmcfg_arch_free(void);
    >
    > +/* pci-common.c */
    > +extern int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno);
    > +
    > /*
    > * AMD Fam10h CPUs are buggy, and cannot access MMIO config space
    > * on their northbrige except through the * %eax register. As such, you MUST



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH 9/16 v6] PCI: add boot option to align MMIO resources

    On Wednesday 22 October 2008 02:43:24 am Yu Zhao wrote:
    > This patch adds boot option to align MMIO resource for a device.
    > The alignment is a bigger value between the PAGE_SIZE and the
    > resource size.


    It looks like this forces alignment on PAGE_SIZE, not "a bigger
    value between the PAGE_SIZE and the resource size." Can you
    clarify the changelog to specify exactly what alignment this
    option forces?

    > The boot option can be used as:
    > pci=align-mmio=0000:01:02.3
    > '[0000:]01:02.3' is the domain, bus, device and function number
    > of the device.


    I think you also support using multiple "align-mmio=DDDD:BB:dd.f"
    options separated by ";", but I had to read the code to figure that
    out. Can you give an example of this in the changelog and the
    kernel-parameters.txt patch?

    Bjorn

    > Cc: Alex Chiang
    > Cc: Grant Grundler
    > Cc: Greg KH
    > Cc: Ingo Molnar
    > Cc: Jesse Barnes
    > Cc: Matthew Wilcox
    > Cc: Randy Dunlap
    > Cc: Roland Dreier
    > Signed-off-by: Yu Zhao
    >
    > ---
    > arch/x86/pci/common.c | 37 +++++++++++++++++++++++++++++++++++++
    > drivers/pci/pci.c | 20 ++++++++++++++++++--
    > include/linux/pci.h | 1 +
    > 3 files changed, 56 insertions(+), 2 deletions(-)
    >
    > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
    > index 06e1ce0..3c5d230 100644
    > --- a/arch/x86/pci/common.c
    > +++ b/arch/x86/pci/common.c
    > @@ -139,6 +139,7 @@ static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev)
    >
    > static char *pci_assign_pio;
    > static char *pci_assign_mmio;
    > +static char *pci_align_mmio;
    >
    > static int pcibios_bus_resource_needs_fixup(struct pci_bus *bus)
    > {
    > @@ -192,6 +193,36 @@ static void __devinit pcibios_fixup_bus_resources(struct pci_bus *bus)
    > }
    > }
    >
    > +int pcibios_resource_alignment(struct pci_dev *dev, int resno)
    > +{
    > + int domain, busnr, slot, func;
    > + char *str = pci_align_mmio;
    > +
    > + if (dev->resource[resno].flags & IORESOURCE_IO)
    > + return 0;
    > +
    > + while (str && *str) {
    > + if (sscanf(str, "%04x:%02x:%02x.%d",
    > + &domain, &busnr, &slot, &func) != 4) {
    > + if (sscanf(str, "%02x:%02x.%d",
    > + &busnr, &slot, &func) != 3)
    > + break;
    > + domain = 0;
    > + }
    > +
    > + if (pci_domain_nr(dev->bus) == domain &&
    > + dev->bus->number == busnr &&
    > + dev->devfn == PCI_DEVFN(slot, func))
    > + return PAGE_SIZE;
    > +
    > + str = strchr(str, ';');
    > + if (str)
    > + str++;
    > + }
    > +
    > + return 0;
    > +}
    > +
    > int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    > {
    > struct pci_bus *bus;
    > @@ -200,6 +231,9 @@ int pcibios_resource_needs_fixup(struct pci_dev *dev, int resno)
    > if (pcibios_bus_resource_needs_fixup(bus))
    > return 1;
    >
    > + if (pcibios_resource_alignment(dev, resno))
    > + return 1;
    > +
    > return 0;
    > }
    >
    > @@ -592,6 +626,9 @@ char * __devinit pcibios_setup(char *str)
    > } else if (!strncmp(str, "assign-mmio=", 12)) {
    > pci_assign_mmio = str + 12;
    > return NULL;
    > + } else if (!strncmp(str, "align-mmio=", 11)) {
    > + pci_align_mmio = str + 11;
    > + return NULL;
    > }
    > return str;
    > }
    > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
    > index b02167a..11ecd6f 100644
    > --- a/drivers/pci/pci.c
    > +++ b/drivers/pci/pci.c
    > @@ -1015,6 +1015,20 @@ int __attribute__ ((weak)) pcibios_set_pcie_reset_state(struct pci_dev *dev,
    > }
    >
    > /**
    > + * pcibios_resource_alignment - get resource alignment requirement
    > + * @dev: the PCI device
    > + * @resno: resource number
    > + *
    > + * Queries the resource alignment from PCI low level code. Returns positive
    > + * if there is alignment requirement of the resource, or 0 otherwise.
    > + */
    > +int __attribute__ ((weak)) pcibios_resource_alignment(struct pci_dev *dev,
    > + int resno)
    > +{
    > + return 0;
    > +}
    > +
    > +/**
    > * pci_set_pcie_reset_state - set reset state for device dev
    > * @dev: the PCI-E device reset
    > * @state: Reset state to enter into
    > @@ -1913,12 +1927,14 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags)
    > */
    > int pci_resource_alignment(struct pci_dev *dev, int resno)
    > {
    > - resource_size_t align;
    > + resource_size_t align, bios_align;
    > struct resource *res = dev->resource + resno;
    >
    > + bios_align = pcibios_resource_alignment(dev, resno);
    > +
    > align = resource_alignment(res);
    > if (align)
    > - return align;
    > + return align > bios_align ? align : bios_align;
    >
    > dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
    > return 0;
    > diff --git a/include/linux/pci.h b/include/linux/pci.h
    > index 2ada2b6..6ac69af 100644
    > --- a/include/linux/pci.h
    > +++ b/include/linux/pci.h
    > @@ -1121,6 +1121,7 @@ int pcibios_add_platform_entries(struct pci_dev *dev);
    > void pcibios_disable_device(struct pci_dev *dev);
    > int pcibios_set_pcie_reset_state(struct pci_dev *dev,
    > enum pcie_reset_state state);
    > +int pcibios_resource_alignment(struct pci_dev *dev, int resno);
    >
    > #ifdef CONFIG_PCI_MMCONFIG
    > extern void __init pci_mmcfg_early_init(void);



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

    Bjorn Helgaas wrote:
    > On Wednesday 22 October 2008 02:40:41 am Yu Zhao wrote:
    >> This patch moves all definitions of the PCI resource names to an 'enum',
    >> and also replaces some hard-coded resource variables with symbol
    >> names. This change eases introduction of device specific resources.

    >
    > Thanks for removing a bunch of magic numbers from the code.
    >
    >> static void
    >> pci_restore_bars(struct pci_dev *dev)
    >> {
    >> - int i, numres;
    >> -
    >> - switch (dev->hdr_type) {
    >> - case PCI_HEADER_TYPE_NORMAL:
    >> - numres = 6;
    >> - break;
    >> - case PCI_HEADER_TYPE_BRIDGE:
    >> - numres = 2;
    >> - break;
    >> - case PCI_HEADER_TYPE_CARDBUS:
    >> - numres = 1;
    >> - break;
    >> - default:
    >> - /* Should never get here, but just in case... */
    >> - return;
    >> - }
    >> + int i;
    >>
    >> - for (i = 0; i < numres; i++)
    >> + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
    >> pci_update_resource(dev, i);
    >> }

    >
    > The behavior of this function used to depend on dev->hdr_type. Now
    > we don't look at hdr_type at all, so we do the same thing for all
    > devices.
    >
    > For example, for a CardBus device, we used to call pci_update_resource()
    > only for BAR 0; now we call it for BARs 0-6.
    >
    > Maybe this is safe, but I can't tell from the patch, so I think you
    > should explain *why* it's safe in the changelog.


    It's safe because pci_update_resource() will ignore unused resources.
    E.g., for a Cardbus, only BAR 0 is used and its 'flags' is set, then
    pci_update_resource() only updates it. BAR 1-6 are ignored since their
    'flags' are 0.

    I'll put more explanation in the changelog.

    Thanks,
    Yu
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 9/16 v6] PCI: add boot option to align MMIO resources

    Bjorn Helgaas wrote:
    > On Wednesday 22 October 2008 02:43:24 am Yu Zhao wrote:
    >> This patch adds boot option to align MMIO resource for a device.
    >> The alignment is a bigger value between the PAGE_SIZE and the
    >> resource size.

    >
    > It looks like this forces alignment on PAGE_SIZE, not "a bigger
    > value between the PAGE_SIZE and the resource size." Can you
    > clarify the changelog to specify exactly what alignment this
    > option forces?


    I guess following would explain your question.

    >> int pci_resource_alignment(struct pci_dev *dev, int resno)
    >> {
    >> - resource_size_t align;
    >> + resource_size_t align, bios_align;
    >> struct resource *res = dev->resource + resno;
    >>
    >> + bios_align = pcibios_resource_alignment(dev, resno);
    >> +
    >> align = resource_alignment(res);
    >> if (align)
    >> - return align;
    >> + return align > bios_align ? align : bios_align;
    >>
    >> dev_err(&dev->dev, "alignment: invalid resource #%d\n", resno);
    >> return 0;


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

    On Wednesday 22 October 2008 08:44:24 am Yu Zhao wrote:
    > Bjorn Helgaas wrote:
    > > On Wednesday 22 October 2008 02:40:41 am Yu Zhao wrote:
    > >> This patch moves all definitions of the PCI resource names to an 'enum',
    > >> and also replaces some hard-coded resource variables with symbol
    > >> names. This change eases introduction of device specific resources.

    > >
    > > Thanks for removing a bunch of magic numbers from the code.
    > >
    > >> static void
    > >> pci_restore_bars(struct pci_dev *dev)
    > >> {
    > >> - int i, numres;
    > >> -
    > >> - switch (dev->hdr_type) {
    > >> - case PCI_HEADER_TYPE_NORMAL:
    > >> - numres = 6;
    > >> - break;
    > >> - case PCI_HEADER_TYPE_BRIDGE:
    > >> - numres = 2;
    > >> - break;
    > >> - case PCI_HEADER_TYPE_CARDBUS:
    > >> - numres = 1;
    > >> - break;
    > >> - default:
    > >> - /* Should never get here, but just in case... */
    > >> - return;
    > >> - }
    > >> + int i;
    > >>
    > >> - for (i = 0; i < numres; i++)
    > >> + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
    > >> pci_update_resource(dev, i);
    > >> }

    > >
    > > The behavior of this function used to depend on dev->hdr_type. Now
    > > we don't look at hdr_type at all, so we do the same thing for all
    > > devices.
    > >
    > > For example, for a CardBus device, we used to call pci_update_resource()
    > > only for BAR 0; now we call it for BARs 0-6.
    > >
    > > Maybe this is safe, but I can't tell from the patch, so I think you
    > > should explain *why* it's safe in the changelog.

    >
    > It's safe because pci_update_resource() will ignore unused resources.
    > E.g., for a Cardbus, only BAR 0 is used and its 'flags' is set, then
    > pci_update_resource() only updates it. BAR 1-6 are ignored since their
    > 'flags' are 0.
    >
    > I'll put more explanation in the changelog.


    This is a logically separate change from merely substituting enum
    names for magic numbers, so you might even consider splitting it
    into a separate patch. Better bisection and all that, you know :-)

    Bjorn
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH 8/16 v6] PCI: add boot options to reassign resources

    Bjorn Helgaas wrote:
    > On Wednesday 22 October 2008 02:43:03 am Yu Zhao wrote:
    >> This patch adds boot options so user can reassign device resources
    >> of all devices under a bus.
    >>
    >> The boot options can be used as:
    >> pci=assign-mmio=0000:01,assign-pio=0000:02
    >> '[dddd:]bb' is the domain and bus number.

    >
    > I think this example is incorrect because you look for ";" to
    > separate options, not ",".


    The semicolon is used to separate multiple parameters for assign-mmio
    and assign-pio. E.g., 'pci=assign-mmio=0000:01;0001:02;0004:03'. And the
    comma separates different parameters for 'pci='.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

    Bjorn Helgaas wrote:
    > On Wednesday 22 October 2008 08:44:24 am Yu Zhao wrote:
    >> Bjorn Helgaas wrote:
    >>> On Wednesday 22 October 2008 02:40:41 am Yu Zhao wrote:
    >>>> This patch moves all definitions of the PCI resource names to an 'enum',
    >>>> and also replaces some hard-coded resource variables with symbol
    >>>> names. This change eases introduction of device specific resources.
    >>> Thanks for removing a bunch of magic numbers from the code.
    >>>
    >>>> static void
    >>>> pci_restore_bars(struct pci_dev *dev)
    >>>> {
    >>>> - int i, numres;
    >>>> -
    >>>> - switch (dev->hdr_type) {
    >>>> - case PCI_HEADER_TYPE_NORMAL:
    >>>> - numres = 6;
    >>>> - break;
    >>>> - case PCI_HEADER_TYPE_BRIDGE:
    >>>> - numres = 2;
    >>>> - break;
    >>>> - case PCI_HEADER_TYPE_CARDBUS:
    >>>> - numres = 1;
    >>>> - break;
    >>>> - default:
    >>>> - /* Should never get here, but just in case... */
    >>>> - return;
    >>>> - }
    >>>> + int i;
    >>>>
    >>>> - for (i = 0; i < numres; i++)
    >>>> + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++)
    >>>> pci_update_resource(dev, i);
    >>>> }
    >>> The behavior of this function used to depend on dev->hdr_type. Now
    >>> we don't look at hdr_type at all, so we do the same thing for all
    >>> devices.
    >>>
    >>> For example, for a CardBus device, we used to call pci_update_resource()
    >>> only for BAR 0; now we call it for BARs 0-6.
    >>>
    >>> Maybe this is safe, but I can't tell from the patch, so I think you
    >>> should explain *why* it's safe in the changelog.

    >> It's safe because pci_update_resource() will ignore unused resources.
    >> E.g., for a Cardbus, only BAR 0 is used and its 'flags' is set, then
    >> pci_update_resource() only updates it. BAR 1-6 are ignored since their
    >> 'flags' are 0.
    >>
    >> I'll put more explanation in the changelog.

    >
    > This is a logically separate change from merely substituting enum
    > names for magic numbers, so you might even consider splitting it
    > into a separate patch. Better bisection and all that, you know :-)


    Will do.

    Thanks,
    Yu
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH 7/16 v6] PCI: cleanup pcibios_allocate_resources()

    On Wed, Oct 22, 2008 at 1:42 AM, Yu Zhao wrote:
    > This cleanup makes pcibios_allocate_resources() easier to read.
    >
    > Cc: Alex Chiang
    > Cc: Grant Grundler
    > Cc: Greg KH
    > Cc: Ingo Molnar
    > Cc: Jesse Barnes
    > Cc: Matthew Wilcox
    > Cc: Randy Dunlap
    > Cc: Roland Dreier
    > Signed-off-by: Yu Zhao
    >
    > ---
    > arch/x86/pci/i386.c | 28 ++++++++++++++--------------
    > 1 files changed, 14 insertions(+), 14 deletions(-)
    >
    > diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
    > index 844df0c..8729bde 100644
    > --- a/arch/x86/pci/i386.c
    > +++ b/arch/x86/pci/i386.c
    > @@ -147,7 +147,7 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
    > static void __init pcibios_allocate_resources(int pass)
    > {
    > struct pci_dev *dev = NULL;
    > - int idx, disabled;
    > + int idx, enabled;
    > u16 command;
    > struct resource *r, *pr;
    >
    > @@ -160,22 +160,22 @@ static void __init pcibios_allocate_resources(int pass)
    > if (!r->start) /* Address not assigned at all */
    > continue;
    > if (r->flags & IORESOURCE_IO)
    > - disabled = !(command & PCI_COMMAND_IO);
    > + enabled = command & PCI_COMMAND_IO;
    > else
    > - disabled = !(command & PCI_COMMAND_MEMORY);
    > - if (pass == disabled) {
    > - dev_dbg(&dev->dev, "resource %#08llx-%#08llx (f=%lx, d=%d, p=%d)\n",
    > + enabled = command & PCI_COMMAND_MEMORY;
    > + if (pass == enabled)
    > + continue;


    it seems you change the flow here for MMIO
    because PCI_COMMAND_MEMORY is 2.

    YH
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH 7/16 v6] PCI: cleanup pcibios_allocate_resources()

    On Thu, Oct 23, 2008 at 03:10:26PM +0800, Yinghai Lu wrote:
    > On Wed, Oct 22, 2008 at 1:42 AM, Yu Zhao wrote:
    > > diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
    > > index 844df0c..8729bde 100644
    > > --- a/arch/x86/pci/i386.c
    > > +++ b/arch/x86/pci/i386.c
    > > @@ -147,7 +147,7 @@ static void __init pcibios_allocate_bus_resources(struct list_head *bus_list)
    > > static void __init pcibios_allocate_resources(int pass)
    > > {
    > > struct pci_dev *dev = NULL;
    > > - int idx, disabled;
    > > + int idx, enabled;
    > > u16 command;
    > > struct resource *r, *pr;
    > >
    > > @@ -160,22 +160,22 @@ static void __init pcibios_allocate_resources(int pass)
    > > if (!r->start) /* Address not assigned at all */
    > > continue;
    > > if (r->flags & IORESOURCE_IO)
    > > - disabled = !(command & PCI_COMMAND_IO);
    > > + enabled = command & PCI_COMMAND_IO;
    > > else
    > > - disabled = !(command & PCI_COMMAND_MEMORY);
    > > - if (pass == disabled) {
    > > - dev_dbg(&dev->dev, "resource %#08llx-%#08llx (f=%lx, d=%d, p=%d)\n",
    > > + enabled = command & PCI_COMMAND_MEMORY;
    > > + if (pass == enabled)
    > > + continue;

    >
    > it seems you change the flow here for MMIO
    > because PCI_COMMAND_MEMORY is 2.
    >
    > YH


    Nice finding! Will change it back to 'disable' next version.

    Thanks,
    Yu
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

    On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
    > Greetings,
    >
    > Following patches are intended to support SR-IOV capability in the
    > Linux kernel. With these patches, people can turn a PCI device with
    > the capability into multiple ones from software perspective, which
    > will benefit KVM and achieve other purposes such as QoS, security,
    > and etc.


    Is there any actual users of this API around yet? How was it tested as
    there is no hardware to test on? Which drivers are going to have to be
    rewritten to take advantage of this new interface?

    thanks,

    greg k-h
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast