[PATCH 0/4, v3] Physical PCI slot objects - Kernel

This is a discussion on [PATCH 0/4, v3] Physical PCI slot objects - Kernel ; On Mon, 26 Nov 2007 19:04:54 -0800 Gary Hade wrote: > > Is this patchset appropriate for the -mm tree yet? > > I would defer to our illustrious maintainers on this one. > > > Or do you think ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 34 of 34

Thread: [PATCH 0/4, v3] Physical PCI slot objects

  1. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Mon, 26 Nov 2007 19:04:54 -0800
    Gary Hade wrote:

    > > Is this patchset appropriate for the -mm tree yet?

    >
    > I would defer to our illustrious maintainers on this one.
    >
    > > Or do you think it still needs more work?

    >
    > I am now much more comfortable with your changes with respect
    > to acpiphp on the systems I worry about but others may have
    > concerns with respect to the other hotplug drivers, or even
    > acpiphp, on other systems.


    I'm ok with adding it to my quilt tree (and into -mm) after Gary gets
    back to us later this week.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Tue, Nov 27, 2007 at 11:11:36AM -0800, Kristen Carlson Accardi wrote:
    > On Mon, 26 Nov 2007 19:04:54 -0800
    > Gary Hade wrote:
    >
    > > > Is this patchset appropriate for the -mm tree yet?

    > >
    > > I would defer to our illustrious maintainers on this one.
    > >
    > > > Or do you think it still needs more work?

    > >
    > > I am now much more comfortable with your changes with respect
    > > to acpiphp on the systems I worry about but others may have
    > > concerns with respect to the other hotplug drivers, or even
    > > acpiphp, on other systems.

    >
    > I'm ok with adding it to my quilt tree (and into -mm) after Gary gets
    > back to us later this week.


    I'm getting back to you but unfortunately with not so good
    news. Sorry Alex.

    On the x3950 (configured single node) I encountered the below
    problem when attempting to hotplug a PCIe adapter when 'pci_slot'
    was loaded prior to 'acpiphp'. I did not see the problem when
    the drivers were loaded in the opposite order.

    After the messages appeared, the hotplug operation did not
    complete and I was unable to recover hotplug functionality
    without a reboot. I attempted recovery by unloading both
    drivers and then reloading only 'acpiphp' but even though
    an LED for the slot that had received the problem triggering
    adapter indicated that the slot was receiving power, the
    sysfs slot power file indicated otherwise (contained 0).
    So, I was unable to power off the slot to allow the removal
    and reinsertion of the adapter.

    FYI, the node contains 2 hotpluggable PCIe slots and 5
    non-hotpluggable PCIe slots but 'pci_slot' only exposed
    the 2 hotpluggable slots. This does not appear to be due
    to a 'pci_slot' driver problem since I looked at the DSDT
    and SSDT and found that there are currently no _SUN methods
    for the non-hotpluggable slots.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc

    invalid opcode: 0000 [1] SMP
    CPU 1
    Modules linked in: acpiphp pci_slot e1000 aic79xx scsi_transport_spi shpchp dock pci_hotplug ipt_LOG xt_limit xt_pkttype button battery ac power_supply ip6t_REJECT xt_tcpudp ipt_REJECT iptable_mangle iptable_filter ip6table_mangle ip_tables ip6table_filter ip6_tables x_tables ipv6 usbhid ff_memless ext3 jbd loop dm_mod ehci_hcd uhci_hcd usbcore ide_cd bnx2 cdrom rng_core reiserfs ata_piix ahci libata thermal processor piix sg megaraid_sas fan edd sd_mod scsi_mod ide_disk ide_core
    Pid: 121, comm: kacpi_notify Not tainted 2.6.24-rc3-gh-smp #1
    RIP: 0010:[] [] ci_slot:__this_module+0x21c4/0xfffffffffffff204
    RSP: 0018:ffff81103fa43ea8 EFLAGS: 00010216
    RAX: ffff81103f944a18 RBX: ffff81103d4fe910 RCX: 000000000000000f
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8110400d13d0
    RBP: ffffffff8032d97b R08: ffff8110400fc7e0 R09: 0000000000000002
    R10: 0000000000000000 R11: ffffffff8021d193 R12: ffff811040105cf0
    R13: ffffffffffffffff R14: ffffffff80635820 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff8110400ed8c0(0000) knlGS:0000000000000000
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00002b266d876471 CR3: 000000103c825000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process kacpi_notify (pid: 121, threadinfo ffff81103fa42000, task ffff81103f9f8040)
    Stack: ffffffff8033339c ffff81103d119a00 ffffffff8032d99e ffff81103f9fc540
    ffffffff8024618d ffff81103f9fc540 ffff81103f9fc540 ffffffff8024696c
    ffffffff80246a46 0000000000000000 ffff81103f9f8040 ffffffff80249ada
    Call Trace:
    [] acpi_ev_notify_dispatch+0x57/0x60
    [] acpi_os_execute_notify+0x23/0x2c
    [] run_workqueue+0x7f/0x10b
    [] worker_thread+0x0/0xe4
    [] worker_thread+0xda/0xe4
    [] autoremove_wake_function+0x0/0x2e
    [] kthread+0x47/0x73
    [] child_rip+0xa/0x12
    [] kthread+0x0/0x73
    [] child_rip+0x0/0x12


    Code: ff ff ff ff 40 23 2c 88 ff ff ff ff 00 c8 c6 3b 10 81 ff ff
    RIP [] ci_slot:__this_module+0x21c4/0xfffffffffffff204
    RSP

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Wed, 28 Nov 2007 13:31:47 -0800
    Gary Hade wrote:

    > FYI, the node contains 2 hotpluggable PCIe slots and 5
    > non-hotpluggable PCIe slots but 'pci_slot' only exposed
    > the 2 hotpluggable slots. This does not appear to be due
    > to a 'pci_slot' driver problem since I looked at the DSDT
    > and SSDT and found that there are currently no _SUN methods
    > for the non-hotpluggable slots.


    Thanks for testing Gary. I would think this situation would be the
    common case, since I doubt most firmware writers would bother to
    implement _SUN for non-hotpluggable slots -- at least on other DSDT
    I've seen this has been the case as well.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Wed, Nov 28, 2007 at 04:02:38PM -0800, Kristen Carlson Accardi wrote:
    > On Wed, 28 Nov 2007 13:31:47 -0800
    > Gary Hade wrote:
    >
    > > FYI, the node contains 2 hotpluggable PCIe slots and 5
    > > non-hotpluggable PCIe slots but 'pci_slot' only exposed
    > > the 2 hotpluggable slots. This does not appear to be due
    > > to a 'pci_slot' driver problem since I looked at the DSDT
    > > and SSDT and found that there are currently no _SUN methods
    > > for the non-hotpluggable slots.

    >
    > Thanks for testing Gary. I would think this situation would be the
    > common case, since I doubt most firmware writers would bother to
    > implement _SUN for non-hotpluggable slots -- at least on other DSDT
    > I've seen this has been the case as well.


    Yea, I was also not surprised although features such as
    Alex working on may provide some motivation to change that.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 0/4, v3] Physical PCI slot objects

    > Hi Gary, Kenji-san, et. al,
    >
    > * Gary Hade :
    >> Alex, What I was trying to suggest is a boot-time kernel
    >> option, not a kernel configuration option. The basic idea is
    >> to give the user (with a single binary kernel) the ability to
    >> include your ACPI-PCI slot driver feature changes only when
    >> they are really needed. In addition to reducing the number of
    >> system/PCI hotplug driver combinations where your changes would
    >> need to be validated, I believe would also help alleviate other
    >> worries (e.g. Andi Kleen's memory consumption concern). I
    >> believe this goal could also be achieved with the kernel config
    >> option by making the pci_slot module runtime loadable with the
    >> PCI hotplug drivers only visiting your new code when the
    >> pci_slot driver is loaded, although I think this would be more
    >> difficult to implement.

    >
    > I have modified my patch series so that the final patch that
    > introduces my ACPI-PCI slot driver is a full-fledged module, that
    > has a tristate Kconfig option.
    >


    Thank you for your good job.

    I tested shpchp and pciehp both with and without pci_slot module. There
    seems no regression from shpchp and pciehp's point of view.
    (I had a little concern about the hotplug slots' name that vary depending
    on whether pci_slot functionality is enabled or disabled. But, now that we
    can build pci_slot driver as a kernel module, I don't think it is a big
    problem).

    Only the problems is that I got Call Traces with the following error
    messages when pci_slot driver was loaded, and one strange slot named
    '1023' was registered (other slots are fine). This is the same problem
    I reported before.

    sysfs: duplicate filename '1023' can not be created
    WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()

    kobject_add failed for 1023 with -EEXIST, don't try to register
    things with the same name in the same directory.

    On my system, hotplug slots themselves can be added, removed and replaced
    with the ohter type of I/O box. The ACPI firmware tells OS the presence of
    those slots using _STA method (That is, it doesn't use 'LoadTable()' AML
    operator). On the other hand, current pci_slot driver doesn't check _STA.
    As a result, pci_slot driver tryied to register the invalid (non-existing)
    slots. The ACPI firmware of my system returns '1023' if the invalid slot's
    _SUN is evaluated. This is the cause of Call Traces mentioned above. To
    fix this problem, pci_slot driver need to check _STA when scanning ACPI
    Namespace.

    I'm sorry for reporting this so late. I'm attaching the patch to fix the
    problem. This is against 2.6.24-rc3 with your patches applied. Could you
    try it?

    BTW, acpiphp also seems to have the same problem...

    Thanks,
    Kenji Kaneshige

    ---
    drivers/acpi/pci_slot.c | 13 +++++++++++++
    1 file changed, 13 insertions(+)

    Index: linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    ================================================== =================
    --- linux-2.6.24-rc3.orig/drivers/acpi/pci_slot.c
    +++ linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    @@ -113,10 +113,17 @@ register_slot(acpi_handle handle, u32 lv
    int device;
    unsigned long sun;
    char name[KOBJ_NAME_LEN];
    + acpi_status status;
    + struct acpi_device *dummy_device;

    struct pci_slot *pci_slot;
    struct pci_bus *pci_bus = context;

    + /* Skip non-existing device object. */
    + status = acpi_bus_get_device(handle, &dummy_device);
    + if (ACPI_FAILURE(status))
    + return AE_OK;
    +
    if (check_slot(handle, &device, &sun))
    return AE_OK;

    @@ -150,12 +157,18 @@ walk_p2p_bridge(acpi_handle handle, u32
    acpi_status status;
    acpi_handle dummy_handle;
    acpi_walk_callback user_function;
    + struct acpi_device *dummy_device;

    struct pci_dev *dev;
    struct pci_bus *pci_bus;
    struct p2p_bridge_context child_context;
    struct p2p_bridge_context *parent_context = context;

    + /* Skip non-existing device object. */
    + status = acpi_bus_get_device(handle, &dummy_device);
    + if (ACPI_FAILURE(status))
    + return AE_OK;
    +
    pci_bus = parent_context->pci_bus;
    user_function = parent_context->user_function;


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Gary,

    First, thanks for all the help and testing -- I really appreciate
    it.

    * Gary Hade :
    >
    > I'm getting back to you but unfortunately with not so good
    > news. Sorry Alex.


    :-/

    > On the x3950 (configured single node) I encountered the below
    > problem when attempting to hotplug a PCIe adapter when 'pci_slot'
    > was loaded prior to 'acpiphp'. I did not see the problem when
    > the drivers were loaded in the opposite order.


    Very bizarre, especially given the stack trace below, which
    doesn't really make any sense to me at all.

    > FYI, the node contains 2 hotpluggable PCIe slots and 5
    > non-hotpluggable PCIe slots but 'pci_slot' only exposed
    > the 2 hotpluggable slots. This does not appear to be due
    > to a 'pci_slot' driver problem since I looked at the DSDT
    > and SSDT and found that there are currently no _SUN methods
    > for the non-hotpluggable slots.


    Ok, this is not too surprising, but it's a different can o'
    worms. Let's save this for another day...

    > invalid opcode: 0000 [1] SMP
    > CPU 1
    > Modules linked in: acpiphp pci_slot e1000 aic79xx scsi_transport_spi shpchp dock pci_hotplug ipt_LOG xt_limit xt_pkttype button battery ac power_supply ip6t_REJECT xt_tcpudp ipt_REJECT iptable_mangle iptable_filter ip6table_mangle ip_tables ip6table_filter ip6_tables x_tables ipv6 usbhid ff_memless ext3 jbd loop dm_mod ehci_hcd uhci_hcd usbcore ide_cd bnx2 cdrom rng_core reiserfs ata_piix ahci libata thermal processor piix sg megaraid_sas fan edd sd_mod scsi_mod ide_disk ide_core
    > Pid: 121, comm: kacpi_notify Not tainted 2.6.24-rc3-gh-smp #1
    > RIP: 0010:[] [] ci_slot:__this_module+0x21c4/0xfffffffffffff204
    > RSP: 0018:ffff81103fa43ea8 EFLAGS: 00010216
    > RAX: ffff81103f944a18 RBX: ffff81103d4fe910 RCX: 000000000000000f
    > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8110400d13d0
    > RBP: ffffffff8032d97b R08: ffff8110400fc7e0 R09: 0000000000000002
    > R10: 0000000000000000 R11: ffffffff8021d193 R12: ffff811040105cf0
    > R13: ffffffffffffffff R14: ffffffff80635820 R15: 0000000000000000
    > FS: 0000000000000000(0000) GS:ffff8110400ed8c0(0000) knlGS:0000000000000000
    > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    > CR2: 00002b266d876471 CR3: 000000103c825000 CR4: 00000000000006e0
    > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > Process kacpi_notify (pid: 121, threadinfo ffff81103fa42000, task ffff81103f9f8040)
    > Stack: ffffffff8033339c ffff81103d119a00 ffffffff8032d99e ffff81103f9fc540
    > ffffffff8024618d ffff81103f9fc540 ffff81103f9fc540 ffffffff8024696c
    > ffffffff80246a46 0000000000000000 ffff81103f9f8040 ffffffff80249ada
    > Call Trace:
    > [] acpi_ev_notify_dispatch+0x57/0x60
    > [] acpi_os_execute_notify+0x23/0x2c
    > [] run_workqueue+0x7f/0x10b
    > [] worker_thread+0x0/0xe4
    > [] worker_thread+0xda/0xe4
    > [] autoremove_wake_function+0x0/0x2e
    > [] kthread+0x47/0x73
    > [] child_rip+0xa/0x12
    > [] kthread+0x0/0x73
    > [] child_rip+0x0/0x12


    Maybe we're trying to kick off a hotplug event on the wrong slot?
    I really have no idea...

    > Code: ff ff ff ff 40 23 2c 88 ff ff ff ff 00 c8 c6 3b 10 81 ff ff
    > RIP [] ci_slot:__this_module+0x21c4/0xfffffffffffff204
    > RSP


    Can you apply this debug patch on top of your tree, and send me
    the output?

    I'd be curious to see the output for your failure case:

    # modprobe pci_slot debug=1
    # modprobe acpiphp debug=1

    Thanks.

    /ac

    diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
    index 724f4f0..5a62def 100644
    --- a/drivers/acpi/pci_slot.c
    +++ b/drivers/acpi/pci_slot.c
    @@ -30,12 +30,16 @@
    #include
    #include

    +static int debug;
    +
    #define DRIVER_VERSION "0.1"
    #define DRIVER_AUTHOR "Alex Chiang "
    #define DRIVER_DESC "ACPI PCI Slot Detection Driver"
    MODULE_AUTHOR(DRIVER_AUTHOR);
    MODULE_DESCRIPTION(DRIVER_DESC);
    MODULE_LICENSE("GPL");
    +MODULE_PARM_DESC(debug, "Debugging mode enabled or not");
    +module_param(debug, bool, 0644);

    #define _COMPONENT ACPI_PCI_COMPONENT
    ACPI_MODULE_NAME("pci_slot");
    @@ -43,6 +47,12 @@ ACPI_MODULE_NAME("pci_slot");
    #define MY_NAME "pci_slot"
    #define err(format, arg...) printk(KERN_ERR "%s: " format , MY_NAME , ## arg)
    #define info(format, arg...) printk(KERN_INFO "%s: " format , MY_NAME , ## arg)
    +#define dbg(format, arg...) \
    + do { \
    + if (debug) \
    + printk(KERN_DEBUG "%s: " format, \
    + MY_NAME , ## arg); \
    + } while (0)

    static int acpi_pci_slot_add(acpi_handle handle);
    static void acpi_pci_slot_remove(acpi_handle handle);
    @@ -125,6 +135,9 @@ register_slot(acpi_handle handle, u32 lvl, void *context, void **rv)
    if (IS_ERR(pci_slot))
    err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot));

    + dbg("pci_slot: %#lx, pci_bus: %x, device: %d, name: %s\n",
    + (uint64_t)pci_slot, pci_bus->number, device, name);
    +
    return AE_OK;
    }

    @@ -174,6 +187,7 @@ walk_p2p_bridge(acpi_handle handle, u32 lvl, void *context, void **rv)
    if (!dev || !dev->subordinate)
    goto out;

    + dbg("p2p bridge walk, pci_bus = %x\n", dev->subordinate->number);
    status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, (u32)1,
    user_function, dev->subordinate, NULL);
    if (ACPI_FAILURE(status))
    @@ -231,6 +245,7 @@ walk_root_bridge(acpi_handle handle, acpi_walk_callback user_function)
    if (!pci_bus)
    return 0;

    + dbg("root bridge walk, pci_bus = %x\n", pci_bus->number);
    status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, (u32)1,
    user_function, pci_bus, NULL);
    if (ACPI_FAILURE(status))
    @@ -283,7 +298,6 @@ acpi_pci_slot_init(void)
    return 0;
    }

    -
    static void __exit
    acpi_pci_slot_exit(void)
    {
    diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
    index f16e0af..80c4928 100644
    --- a/drivers/pci/slot.c
    +++ b/drivers/pci/slot.c
    @@ -9,6 +9,15 @@
    #include
    #include "pci.h"

    +static int pci_slot_debug;
    +#define MY_NAME "slot"
    +#define dbg(format, arg...) \
    + do { \
    + if (pci_slot_debug) \
    + printk(KERN_DEBUG "%s: " format, \
    + MY_NAME , ## arg); \
    + } while (0)
    +
    struct kset pci_slots_subsys;
    EXPORT_SYMBOL_GPL(pci_slots_subsys);

    @@ -63,6 +72,9 @@ static void pci_slot_release(struct kobject *kobj)
    struct pci_slot **pprev;
    struct pci_slot *slot = to_pci_slot(kobj);

    + dbg("%s: releasing pci_slot on %x:%d\n", __FUNCTION__,
    + slot->bus->number, slot->number);
    +
    for (pprev = &slot->bus->slot; *pprev; pprev = &(*pprev)->next) {
    if (*pprev == slot) {
    *pprev = slot->next;
    @@ -103,15 +115,19 @@ int pci_slot_add_hotplug(struct pci_bus *parent, int slot_nr,
    }

    if (!found) {
    + dbg("%s: slot not found\n", __FUNCTION__);
    retval = -EEXIST;
    goto out;
    }

    if (slot->release) {
    + dbg("%s: already claimed\n", __FUNCTION__);
    retval = -EBUSY;
    goto out;
    }

    + dbg("%s: adding release function to %x:%d\n",
    + __FUNCTION__, parent->number, slot_nr);
    slot->release = release;
    out:
    up_write(&pci_bus_sem);
    @@ -131,6 +147,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
    for (slot = parent->slot; slot; slot = slot->next) {
    if (slot->number == slot_nr) {
    kobject_get(&slot->kobj);
    + dbg("%s: bumped refcount to %d on %x:%d\n",
    + __FUNCTION__, slot->kobj.kref.refcount,
    + parent->number, slot_nr);
    goto out;
    }
    }
    @@ -159,6 +178,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
    slot->next = parent->slot;
    parent->slot = slot;

    + dbg("%s: created pci_slot on %x:%d\n",
    + __FUNCTION__, parent->number, slot_nr);
    +
    out:
    up_write(&pci_bus_sem);
    return slot;
    @@ -175,6 +197,10 @@ EXPORT_SYMBOL_GPL(pci_create_slot);
    int pci_destroy_slot(struct pci_slot *slot)
    {
    kobject_put(&slot->kobj);
    +
    + dbg("%s: decreased refcount to %d on %x:%d\n", __FUNCTION__,
    + slot->kobj.kref.refcount, slot->bus->number, slot->number);
    +
    if (atomic_read(&slot->kobj.kref.refcount) == 1)
    kobject_unregister(&slot->kobj);

    @@ -186,6 +212,8 @@ static int pci_slot_init(void)
    {
    int result;

    + pci_slot_debug = 1;
    +
    kobj_set_kset_s(&pci_slots_subsys, pci_bus_type.subsys);
    result = subsystem_register(&pci_slots_subsys);
    if (result)
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Kenji-san,

    * Kenji Kaneshige :
    > > Hi Gary, Kenji-san, et. al,
    > >
    > > * Gary Hade :
    > >> Alex, What I was trying to suggest is a boot-time kernel
    > >> option, not a kernel configuration option. The basic idea is
    > >> to give the user (with a single binary kernel) the ability to
    > >> include your ACPI-PCI slot driver feature changes only when
    > >> they are really needed. In addition to reducing the number of
    > >> system/PCI hotplug driver combinations where your changes would
    > >> need to be validated, I believe would also help alleviate other
    > >> worries (e.g. Andi Kleen's memory consumption concern). I
    > >> believe this goal could also be achieved with the kernel config
    > >> option by making the pci_slot module runtime loadable with the
    > >> PCI hotplug drivers only visiting your new code when the
    > >> pci_slot driver is loaded, although I think this would be more
    > >> difficult to implement.

    > >
    > > I have modified my patch series so that the final patch that
    > > introduces my ACPI-PCI slot driver is a full-fledged module, that
    > > has a tristate Kconfig option.
    > >

    >
    > Thank you for your good job.


    Thanks for testing.

    > I tested shpchp and pciehp both with and without pci_slot
    > module. There seems no regression from shpchp and pciehp's
    > point of view. (I had a little concern about the hotplug
    > slots' name that vary depending on whether pci_slot
    > functionality is enabled or disabled. But, now that we can
    > build pci_slot driver as a kernel module, I don't think it is a
    > big problem).


    Hm, you are right. On my machine, if I load pciehp first and
    acpiphp second (even without loading pci_slot), I will see the
    following:

    [root@canola slots]# ls
    0016_0006 0197_0005 10 3 4 7 8 9

    [root@canola slots]# lsmod | grep pci_slot
    [root@canola slots]# lsmod | grep hp
    acpiphp 115984 0
    pciehp 140616 0
    pci_hotplug 123972 2 acpiphp,pciehp

    On the other hand, if I do load pci_slot first, and then pciehp,
    you are right, I will see something like this:

    [root@canola slots]# ls
    1 10 2 3 4 5 6 7 8 9

    [root@canola slots]# lsmod | grep pci_slot
    pci_slot 74436 0
    [root@canola slots]# lsmod | grep hp
    pciehp 140616 0
    pci_hotplug 123972 1 pciehp

    But I do agree, people don't need to load pci_slot at all if they
    don't want it, and they won't be bothered.

    > Only the problems is that I got Call Traces with the following
    > error messages when pci_slot driver was loaded, and one strange
    > slot named '1023' was registered (other slots are fine). This
    > is the same problem I reported before.
    >
    > sysfs: duplicate filename '1023' can not be created
    > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
    >
    > kobject_add failed for 1023 with -EEXIST, don't try to
    > register things with the same name in the same directory.
    >
    > On my system, hotplug slots themselves can be added, removed
    > and replaced with the ohter type of I/O box. The ACPI firmware
    > tells OS the presence of those slots using _STA method (That
    > is, it doesn't use 'LoadTable()' AML operator). On the other
    > hand, current pci_slot driver doesn't check _STA. As a result,
    > pci_slot driver tryied to register the invalid (non-existing)
    > slots. The ACPI firmware of my system returns '1023' if the
    > invalid slot's _SUN is evaluated. This is the cause of Call
    > Traces mentioned above. To fix this problem, pci_slot driver
    > need to check _STA when scanning ACPI Namespace.


    Now this is very curious. The relevant line in pci_slot is:

    check_slot()
    status = acpi_evaluate_integer(handle, "_SUN", NULL, sun);
    if (ACPI_FAILURE(status))
    return -1;

    Why does your firmware return the error information inside sun,
    instead of returning an error in status? That doesn't seem right
    to me...

    > I'm sorry for reporting this so late. I'm attaching the patch
    > to fix the problem. This is against 2.6.24-rc3 with your
    > patches applied. Could you try it?


    Applying this patch causes me to only detect populated slots in
    my system, which isn't what I want -- otherwise, I could have
    just enumerated the PCI bus and found the devices that way.

    Maybe on your machine, checking existence of _STA might do the
    right thing, but I don't think we should actually be looking at
    any of the actual bits returned.

    If we check ACPI_STA_DEVICE_PRESENT, then we will not detect
    empty slots on my system. Can you try this patch to see if at
    least the first call to acpi_evaluate_integer helps? If that
    doesn't help, maybe the second block will help you, but it breaks
    my machine...

    Thanks.

    /ac


    diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
    index 724f4f0..63a4dc8 100644
    --- a/drivers/acpi/pci_slot.c
    +++ b/drivers/acpi/pci_slot.c
    @@ -55,9 +65,21 @@ static struct acpi_pci_driver acpi_pci_slot_driver = {
    static int
    check_slot(acpi_handle handle, int *device, unsigned long *sun)
    {
    - unsigned long adr;
    + unsigned long adr, sta;
    acpi_status status;

    + /* Doesn't seem to hurt anything on hp systems */
    + status = acpi_evaluate_integer(handle, "_STA", NULL, &sta);
    + if (ACPI_FAILURE(status))
    + return -1;
    +
    + /* This code causes us to fail to detect empty slots, so
    + * commented out for now.
    + *
    + if (!(sta & ACPI_STA_DEVICE_PRESENT))
    + return -1;
    + */
    +
    status = acpi_evaluate_integer(handle, "_ADR", NULL, &adr);
    if (ACPI_FAILURE(status))
    return -1;
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Thu, Nov 29, 2007 at 06:19:41PM -0700, Alex Chiang wrote:

    < snip >

    > > [] kthread+0x47/0x73
    > > [] child_rip+0xa/0x12
    > > [] kthread+0x0/0x73
    > > [] child_rip+0x0/0x12

    >
    > Maybe we're trying to kick off a hotplug event on the wrong slot?
    > I really have no idea...


    One hotplug event related difference that I believe I
    noticed between the x3850 were I did not see the problem and
    the x3950 is the hotplug event arrival location. On the
    x3850 the handler receives the handle for the transparent
    p2p bridge above the slot. On the x3950 I believe the handler
    is receiving the handle for the root bridge (directly above
    the transparent p2p bridge). acpiphp installs a handler for
    each of these possible arrival locations. pci_slot installs
    a handler for neither location which seems like it should be
    okay.

    I notice that acpi_pci_register_driver() was only being
    used by acpiphp before pci_slot. Perhaps your changes are
    flushing out some sort of ACPI core coexistence issue not
    seen previously because there was only a single user?

    My silly idea is probably no better than your no idea.

    >
    > > Code: ff ff ff ff 40 23 2c 88 ff ff ff ff 00 c8 c6 3b 10 81 ff ff
    > > RIP [] ci_slot:__this_module+0x21c4/0xfffffffffffff204
    > > RSP

    >
    > Can you apply this debug patch on top of your tree, and send me
    > the output?
    >
    > I'd be curious to see the output for your failure case:
    >
    > # modprobe pci_slot debug=1
    > # modprobe acpiphp debug=1


    Someone else is using the system right now so I may not
    be able to do this until next week.

    Gary

    --
    Gary Hade
    System x Enablement
    IBM Linux Technology Center
    503-578-4503 IBM T/L: 775-4503
    garyhade@us.ibm.com
    http://www.ibm.com/linux/ltc

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Alex-san,

    >> On my system, hotplug slots themselves can be added, removed
    >> and replaced with the ohter type of I/O box. The ACPI firmware
    >> tells OS the presence of those slots using _STA method (That
    >> is, it doesn't use 'LoadTable()' AML operator). On the other
    >> hand, current pci_slot driver doesn't check _STA. As a result,
    >> pci_slot driver tryied to register the invalid (non-existing)
    >> slots. The ACPI firmware of my system returns '1023' if the
    >> invalid slot's _SUN is evaluated. This is the cause of Call
    >> Traces mentioned above. To fix this problem, pci_slot driver
    >> need to check _STA when scanning ACPI Namespace.

    >
    > Now this is very curious. The relevant line in pci_slot is:
    >
    > check_slot()
    > status = acpi_evaluate_integer(handle, "_SUN", NULL, sun);
    > if (ACPI_FAILURE(status))
    > return -1;
    >
    > Why does your firmware return the error information inside sun,
    > instead of returning an error in status? That doesn't seem right
    > to me...


    Because ACPI spec doesn't provide any way for firmware (AML)
    to return as error.

    In addtion, I think we should not trust the _SUN value of
    non-existing device because the ACPI spec says in "6.5.1 _INI
    (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and
    _UID are run. It means _SUN could be initialized in _INI method
    implecitely. And it also says that "If the _STA method indicates
    that the device is not present, OSPM will not run the _INI and will
    not examine the children of the device for _INI methods.". After all,
    _SUN for non-existing device is not reliable because it might not
    initialized by _INI method.

    >
    >> I'm sorry for reporting this so late. I'm attaching the patch
    >> to fix the problem. This is against 2.6.24-rc3 with your
    >> patches applied. Could you try it?

    >
    > Applying this patch causes me to only detect populated slots in
    > my system, which isn't what I want -- otherwise, I could have
    > just enumerated the PCI bus and found the devices that way.
    >
    > Maybe on your machine, checking existence of _STA might do the
    > right thing, but I don't think we should actually be looking at
    > any of the actual bits returned.
    >
    > If we check ACPI_STA_DEVICE_PRESENT, then we will not detect
    > empty slots on my system. Can you try this patch to see if at
    > least the first call to acpi_evaluate_integer helps? If that
    > doesn't help, maybe the second block will help you, but it breaks
    > my machine...


    Maybe the result is as you guess.
    The first block doesn't help me (with the first block, all of the
    slot disappeared. Please see the bottom of this mail for details).
    The second block helps me.

    There seems a difference of the interpretation about _STA for PCI
    hotplug slot between your firmware and my firmware. The difference
    is:

    - Your firmware provides the _STA method to represent the presence
    of PCI adapter card on the slot.

    - My firmware provides the _STA method to represent the presence
    of the slot.

    Providing _STA method to represent the presence of PCI adpater card
    on the slot (as your firmware does) doesn't seem right to me because
    of the following reasons.

    - ACPI spec says "After calling _EJ0, OSPM verifies the device no
    longer exists to determine if the eject succeeded. For _HID devices,
    OSPM evaluates the _STA method. For _ADR devices, OSPM checks with
    the bus driver for that device." in "6.3.3 _EJx (Eject)". Because
    PCI adapter card on the slot is _ADR device, the presence of the
    card must be checked with bus driver, not _STA.

    - ACPI spec provides the example AML code which uses _STA to
    represent Docking Station (See 6.3.2 _EJD (Ejection Dependent
    Device)". The usage of this is same as my firmware.

    What do you think about that?

    P.S. None of the slots except the strange slot named '1023' were
    detected with your patch. It would happen on other machines
    (might including hp machine) too. The reason is _STA evaluation
    fails on the hotplug slot which doesn't have _STA method. If the
    device object doesn't have a _STA method, we need to handle it as
    if it is present. I believe firmware normally doesn't provide
    _STA method for PCI hotplug slots.

    Thanks,
    Kenji Kaneshige

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Kenji-san,

    * Kenji Kaneshige :
    > Hi Alex-san,
    >
    >>> On my system, hotplug slots themselves can be added, removed
    >>> and replaced with the ohter type of I/O box.


    Are you talking about some sort of I/O cabinet/chassis that you
    can attach to the actual computer? Can the I/O expander unit be
    hotplugged? Or do you need to power your machine down to attach
    it?

    If you can hotplug it, I'm guessing that is why your firmware
    presents SxFy objects in the namespace with "weird" _SUN values,
    and it's why you have to check _STA to see if the slots are valid
    or not. That means the value returned by _SUN will change too,
    right? What will it turn into?

    Is that right? Or am I completely wrong?

    >>> The ACPI firmware tells OS the presence of those slots using
    >>> _STA method (That is, it doesn't use 'LoadTable()' AML
    >>> operator). On the other hand, current pci_slot driver doesn't
    >>> check _STA. As a result, pci_slot driver tryied to register
    >>> the invalid (non-existing) slots. The ACPI firmware of my
    >>> system returns '1023' if the invalid slot's _SUN is
    >>> evaluated. This is the cause of Call Traces mentioned above.
    >>> To fix this problem, pci_slot driver need to check _STA when
    >>> scanning ACPI Namespace.

    >>
    >> Now this is very curious. The relevant line in pci_slot is:
    >> check_slot()
    >> status = acpi_evaluate_integer(handle, "_SUN", NULL, sun);
    >> if (ACPI_FAILURE(status))
    >> return -1;
    >> Why does your firmware return the error information inside sun,
    >> instead of returning an error in status? That doesn't seem right
    >> to me...

    >
    > Because ACPI spec doesn't provide any way for firmware (AML)
    > to return as error.


    You are right -- I got confused about the interpreter returning
    AE_NOT_FOUND vs the actual firmware returning an error value.
    Thank you for this clarification.

    > In addtion, I think we should not trust the _SUN value of
    > non-existing device because the ACPI spec says in "6.5.1 _INI
    > (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and
    > _UID are run. It means _SUN could be initialized in _INI method
    > implecitely. And it also says that "If the _STA method indicates
    > that the device is not present, OSPM will not run the _INI and will
    > not examine the children of the device for _INI methods.". After all,
    > _SUN for non-existing device is not reliable because it might not
    > initialized by _INI method.


    This is true, but HP platforms provide _INI at the root
    device/host bridge level, not on SxFy objects, so it doesn't seem
    that we would need to call _STA before calling _SUN for SxFy.

    Does your firmware provide _INI on SxFy objects?

    >>> I'm sorry for reporting this so late. I'm attaching the patch
    >>> to fix the problem. This is against 2.6.24-rc3 with your
    >>> patches applied. Could you try it?

    >> Applying this patch causes me to only detect populated slots in
    >> my system, which isn't what I want -- otherwise, I could have
    >> just enumerated the PCI bus and found the devices that way.
    >> Maybe on your machine, checking existence of _STA might do the
    >> right thing, but I don't think we should actually be looking at
    >> any of the actual bits returned. If we check ACPI_STA_DEVICE_PRESENT, then
    >> we will not detect
    >> empty slots on my system. Can you try this patch to see if at
    >> least the first call to acpi_evaluate_integer helps? If that
    >> doesn't help, maybe the second block will help you, but it breaks
    >> my machine...

    >
    > Maybe the result is as you guess.
    > The first block doesn't help me (with the first block, all of the
    > slot disappeared. Please see the bottom of this mail for details).
    > The second block helps me.
    >
    > There seems a difference of the interpretation about _STA for PCI
    > hotplug slot between your firmware and my firmware. The difference
    > is:
    >
    > - Your firmware provides the _STA method to represent the presence
    > of PCI adapter card on the slot.
    >
    > - My firmware provides the _STA method to represent the presence
    > of the slot.


    Yes, that sounds right...

    > Providing _STA method to represent the presence of PCI adpater card
    > on the slot (as your firmware does) doesn't seem right to me because
    > of the following reasons.
    >
    > - ACPI spec says "After calling _EJ0, OSPM verifies the device no
    > longer exists to determine if the eject succeeded. For _HID devices,
    > OSPM evaluates the _STA method. For _ADR devices, OSPM checks with
    > the bus driver for that device." in "6.3.3 _EJx (Eject)". Because
    > PCI adapter card on the slot is _ADR device, the presence of the
    > card must be checked with bus driver, not _STA.
    >
    > - ACPI spec provides the example AML code which uses _STA to
    > represent Docking Station (See 6.3.2 _EJD (Ejection Dependent
    > Device)". The usage of this is same as my firmware.
    >
    > What do you think about that?


    Our firmware teams seem to think that _STA should give the status
    of the card for hotplug support and general functional state.
    They claim that it doesn't makes much sense to support _STA on
    the slot itself unless you can physically change the slot
    topology on the machine at runtime, which we can't do (although
    maybe you can).

    The section of the spec you quoted is correct as long as we are
    talking ACPI 2.0 or later. My platforms implement ACPI 1.0b for
    legacy reasons. :-/

    In ACPI 1.0b, _EJx definition says (section 6.3.2):

    For hot removal, the device must be immediately ejected
    when the OS calls the _EJ0 control method. The _EJ0
    control method does not return until ejection is
    complete. After calling _EJ0, the OS will call _STA to
    determine whether or not the eject succeeded.

    So your firmware implementation does not seem backward compatible
    with the 1.0b spec. The different versions of ACPI is part of the
    reason why my patch is breaking on your machine.

    But as long as we are quoting the spec...

    _SUN evaluates to a DWORD that is the number to be used
    in the user interface. This number is required to be
    unique among the slots of the same type. It is also
    recommended that this number match the slot number
    printed on the physical slot whenever possible.

    section 6.1.6 of ACPI 2.0c

    My question is, why is your firmware returning multiple values of
    1023 then? This seems to be the real reason why my patch is
    breaking on your machine.

    While depending on ACPI 1.0b behavior might be somewhat risky,
    returning the same value for _SUN multiple times, for slots of
    the same type, definitely seems non-compliant.

    What is your opinion?

    > P.S. None of the slots except the strange slot named '1023'
    > were detected with your patch. It would happen on other
    > machines (might including hp machine) too. The reason is _STA
    > evaluation fails on the hotplug slot which doesn't have _STA
    > method. If the device object doesn't have a _STA method, we
    > need to handle it as if it is present.


    Thanks for this explanation about treating non-present _STA as
    present. It makes sense.

    > I believe firmware normally doesn't provide _STA method for PCI
    > hotplug slots.


    I somewhat disagree that firmware doesn't normally provide _STA
    for PCI hotplug slots, but I think that is a side-issue.

    Thanks.

    /ac

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Alex-san,

    > Hi Kenji-san,
    >
    > * Kenji Kaneshige :
    >> Hi Alex-san,
    >>
    >>>> On my system, hotplug slots themselves can be added, removed
    >>>> and replaced with the ohter type of I/O box.

    >
    > Are you talking about some sort of I/O cabinet/chassis that you
    > can attach to the actual computer? Can the I/O expander unit be
    > hotplugged? Or do you need to power your machine down to attach
    > it?
    >
    > If you can hotplug it, I'm guessing that is why your firmware
    > presents SxFy objects in the namespace with "weird" _SUN values,
    > and it's why you have to check _STA to see if the slots are valid
    > or not. That means the value returned by _SUN will change too,
    > right? What will it turn into?
    >


    Currently, it's not hotpluggable (will be hotpluggable in the future).
    Here is a sample AML code to explain what my firmware is doing.

    Device (PCI0) {
    Device (P2PA) {
    Device (P2PB) { // for I/O unit (A)
    Name (_ADR, ...)
    Method (_STA) { ... }
    }
    Device (S0F0) { // for I/O unit (B)
    Name (_ADR, ...)
    Method (_STA) { ... }
    Method (_EJx) { ... }
    Method (_SUN) { ... }
    }
    ...
    }
    ...
    }

    If the I/O unit (A) is connected, _STA of P2PB returns as present
    and _STA of S0F0 returns as not present.
    If the I/O unit (B) is connected, _STA of P2PB returns as not
    present and _STA of S0F0 returns as present.

    >> In addtion, I think we should not trust the _SUN value of
    >> non-existing device because the ACPI spec says in "6.5.1 _INI
    >> (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and
    >> _UID are run. It means _SUN could be initialized in _INI method
    >> implecitely. And it also says that "If the _STA method indicates
    >> that the device is not present, OSPM will not run the _INI and will
    >> not examine the children of the device for _INI methods.". After all,
    >> _SUN for non-existing device is not reliable because it might not
    >> initialized by _INI method.

    >
    > This is true, but HP platforms provide _INI at the root
    > device/host bridge level, not on SxFy objects, so it doesn't seem
    > that we would need to call _STA before calling _SUN for SxFy.
    >
    > Does your firmware provide _INI on SxFy objects?


    No, it doesn't. But what I wanted to say was we should not use _SUN
    value of non-existing device object.

    >
    > Our firmware teams seem to think that _STA should give the status
    > of the card for hotplug support and general functional state.
    > They claim that it doesn't makes much sense to support _STA on
    > the slot itself unless you can physically change the slot
    > topology on the machine at runtime, which we can't do (although
    > maybe you can).
    >
    > The section of the spec you quoted is correct as long as we are
    > talking ACPI 2.0 or later. My platforms implement ACPI 1.0b for
    > legacy reasons. :-/
    >
    > In ACPI 1.0b, _EJx definition says (section 6.3.2):
    >
    > For hot removal, the device must be immediately ejected
    > when the OS calls the _EJ0 control method. The _EJ0
    > control method does not return until ejection is
    > complete. After calling _EJ0, the OS will call _STA to
    > determine whether or not the eject succeeded.
    >
    > So your firmware implementation does not seem backward compatible
    > with the 1.0b spec. The different versions of ACPI is part of the
    > reason why my patch is breaking on your machine.
    >


    I think this is the real reason. My platform implements ACPI 2.0 or
    later. I didn't notice the chage to_EJx definition. Maybe we need to
    check ACPI version in pci_slot driver.

    > But as long as we are quoting the spec...
    >
    > _SUN evaluates to a DWORD that is the number to be used
    > in the user interface. This number is required to be
    > unique among the slots of the same type. It is also
    > recommended that this number match the slot number
    > printed on the physical slot whenever possible.
    >
    > section 6.1.6 of ACPI 2.0c
    >
    > My question is, why is your firmware returning multiple values of
    > 1023 then? This seems to be the real reason why my patch is
    > breaking on your machine.
    >
    > While depending on ACPI 1.0b behavior might be somewhat risky,
    > returning the same value for _SUN multiple times, for slots of
    > the same type, definitely seems non-compliant.
    >


    The reason is very simple. The reason is your patch is evaluating
    invalid _SUN method. We must skip non-existing device object. This
    is what your patch is already doing for pci root bridges.
    In addition, even if those _SUN method were changed to return unique
    number, none of the problems would be solved. Maybe pci_slot driver
    would detect many unknown slots.

    Thanks,
    Kenji Kaneshige


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Kenji-san,

    I have been thinking about this problem for quite a bit, and
    think that there are no good solutions...

    * Kenji Kaneshige :
    >>>>> On my system, hotplug slots themselves can be added, removed
    >>>>> and replaced with the ohter type of I/O box.

    >> Are you talking about some sort of I/O cabinet/chassis that you
    >> can attach to the actual computer? Can the I/O expander unit be
    >> hotplugged? Or do you need to power your machine down to attach
    >> it?
    >> If you can hotplug it, I'm guessing that is why your firmware
    >> presents SxFy objects in the namespace with "weird" _SUN values,
    >> and it's why you have to check _STA to see if the slots are valid
    >> or not. That means the value returned by _SUN will change too,
    >> right? What will it turn into?
    >>

    >
    > Currently, it's not hotpluggable (will be hotpluggable in the future).
    > Here is a sample AML code to explain what my firmware is doing.
    >
    > Device (PCI0) {
    > Device (P2PA) {
    > Device (P2PB) { // for I/O unit (A)
    > Name (_ADR, ...)
    > Method (_STA) { ... }
    > }
    > Device (S0F0) { // for I/O unit (B)
    > Name (_ADR, ...)
    > Method (_STA) { ... }
    > Method (_EJx) { ... }
    > Method (_SUN) { ... }
    > }
    > ...
    > }
    > ...
    > }
    >
    > If the I/O unit (A) is connected, _STA of P2PB returns as present
    > and _STA of S0F0 returns as not present.
    > If the I/O unit (B) is connected, _STA of P2PB returns as not
    > present and _STA of S0F0 returns as present.


    If I/O unit A or B can never appear while the system is turned on
    (aka not hotpluggable), then it is incorrect to present them in
    the current namespace.

    >>> In addtion, I think we should not trust the _SUN value of
    >>> non-existing device because the ACPI spec says in "6.5.1 _INI
    >>> (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and
    >>> _UID are run. It means _SUN could be initialized in _INI method
    >>> implecitely. And it also says that "If the _STA method indicates
    >>> that the device is not present, OSPM will not run the _INI and will
    >>> not examine the children of the device for _INI methods.". After all,
    >>> _SUN for non-existing device is not reliable because it might not
    >>> initialized by _INI method.

    >> This is true, but HP platforms provide _INI at the root
    >> device/host bridge level, not on SxFy objects, so it doesn't seem
    >> that we would need to call _STA before calling _SUN for SxFy.
    >> Does your firmware provide _INI on SxFy objects?

    >
    > No, it doesn't. But what I wanted to say was we should not use _SUN
    > value of non-existing device object.


    There is nothing illegal about evaluating _SUN for an object that
    returns 0x0 for _STA.

    Also, when you say "non-existing", I think of the ACPI CA
    exception code AE_NOT_EXIST which means "absent from
    the namespace", and is the reason why my code works on both HP
    and IBM machines. It does not mean "_STA == 0x0".

    >> Our firmware teams seem to think that _STA should give the status
    >> of the card for hotplug support and general functional state.
    >> They claim that it doesn't makes much sense to support _STA on
    >> the slot itself unless you can physically change the slot
    >> topology on the machine at runtime, which we can't do (although
    >> maybe you can).
    >> The section of the spec you quoted is correct as long as we are
    >> talking ACPI 2.0 or later. My platforms implement ACPI 1.0b for
    >> legacy reasons. :-/
    >> In ACPI 1.0b, _EJx definition says (section 6.3.2):
    >> For hot removal, the device must be immediately ejected
    >> when the OS calls the _EJ0 control method. The _EJ0
    >> control method does not return until ejection is
    >> complete. After calling _EJ0, the OS will call _STA to
    >> determine whether or not the eject succeeded.
    >> So your firmware implementation does not seem backward compatible
    >> with the 1.0b spec. The different versions of ACPI is part of the
    >> reason why my patch is breaking on your machine.

    >
    > I think this is the real reason. My platform implements ACPI 2.0 or
    > later. I didn't notice the chage to_EJx definition. Maybe we need to
    > check ACPI version in pci_slot driver.


    I did some experiments on HP low-end ia64 (ACPI 1.0b only) and
    our mid-range and high-end ia64 platforms (ACPI 2.0c). Checking
    for _STA before evaluating _SUN leads to the same result for me:
    we only detect populated slots.

    I think that the real issue is not 1.0 vs 2.0, but the semantics
    that our different firmware teams have placed on _STA. Again,

    - HP firmware thinks _STA should give status of the card
    - Fujitsu firmware thinks _STA should give status of the slot

    So we are at an impasse.

    >> But as long as we are quoting the spec...
    >> _SUN evaluates to a DWORD that is the number to be used
    >> in the user interface. This number is required to be
    >> unique among the slots of the same type. It is also
    >> recommended that this number match the slot number
    >> printed on the physical slot whenever possible.
    >> section 6.1.6 of ACPI 2.0c
    >> My question is, why is your firmware returning multiple values of
    >> 1023 then? This seems to be the real reason why my patch is
    >> breaking on your machine.
    >> While depending on ACPI 1.0b behavior might be somewhat risky,
    >> returning the same value for _SUN multiple times, for slots of
    >> the same type, definitely seems non-compliant.

    >
    > The reason is very simple. The reason is your patch is evaluating
    > invalid _SUN method. We must skip non-existing device object. This
    > is what your patch is already doing for pci root bridges.
    > In addition, even if those _SUN method were changed to return unique
    > number, none of the problems would be solved. Maybe pci_slot driver
    > would detect many unknown slots.


    I did some experiments to see if I could write a quick hack, like

    - do not register a slot if we've already seen this _SUN

    But it broke my reference counting, and I don't think it would be
    robust enough if users wanted to modprobe/rmmod pci_slot and
    acpiphp/pciehp in random order.

    I do not agree that _SUN should return a repeating, non-compliant
    value even if _STA says the device is not present. A truly
    non-existent device is one that does not appear in the namespace.
    If you cannot hotplug an I/O unit on your machine, then your
    firmware shouldn't be presenting those devices in the namespace.

    I think the best option would be for your firmware to return the
    actual value of _SUN if I/O unit A and/or B was plugged in. That
    way, nothing special has to happen when the units are hotplugged
    -- they'll already have correct entries in sysfs. If they are
    never hotplugged, it shouldn't be confusing for the user, since
    they will never actually get any devices in those slots anyway.

    What happens if you try to load acpiphp on your system today?
    Does it work, or do you get the same messages about trying to
    create multiple entries in sysfs with the same name?

    Do you have any better ideas for detecting all physical slots in
    a system, regardless of whether they are populated?

    Thanks.

    /ac

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH 0/4, v3] Physical PCI slot objects

    On Thu, 29 Nov 2007 16:47:14 +0900
    Kenji Kaneshige wrote:

    > > Hi Gary, Kenji-san, et. al,
    > >
    > > * Gary Hade :
    > >> Alex, What I was trying to suggest is a boot-time kernel
    > >> option, not a kernel configuration option. The basic idea is
    > >> to give the user (with a single binary kernel) the ability to
    > >> include your ACPI-PCI slot driver feature changes only when
    > >> they are really needed. In addition to reducing the number of
    > >> system/PCI hotplug driver combinations where your changes would
    > >> need to be validated, I believe would also help alleviate other
    > >> worries (e.g. Andi Kleen's memory consumption concern). I
    > >> believe this goal could also be achieved with the kernel config
    > >> option by making the pci_slot module runtime loadable with the
    > >> PCI hotplug drivers only visiting your new code when the
    > >> pci_slot driver is loaded, although I think this would be more
    > >> difficult to implement.

    > >
    > > I have modified my patch series so that the final patch that
    > > introduces my ACPI-PCI slot driver is a full-fledged module, that
    > > has a tristate Kconfig option.
    > >

    >
    > Thank you for your good job.
    >
    > I tested shpchp and pciehp both with and without pci_slot module.
    > There seems no regression from shpchp and pciehp's point of view.
    > (I had a little concern about the hotplug slots' name that vary
    > depending on whether pci_slot functionality is enabled or disabled.
    > But, now that we can build pci_slot driver as a kernel module, I
    > don't think it is a big problem).
    >
    > Only the problems is that I got Call Traces with the following error
    > messages when pci_slot driver was loaded, and one strange slot named
    > '1023' was registered (other slots are fine). This is the same problem
    > I reported before.
    >
    > sysfs: duplicate filename '1023' can not be created
    > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
    >
    > kobject_add failed for 1023 with -EEXIST, don't try to register
    > things with the same name in the same directory.
    >
    > On my system, hotplug slots themselves can be added, removed and
    > replaced with the ohter type of I/O box. The ACPI firmware tells OS
    > the presence of those slots using _STA method (That is, it doesn't
    > use 'LoadTable()' AML operator). On the other hand, current pci_slot
    > driver doesn't check _STA. As a result, pci_slot driver tryied to
    > register the invalid (non-existing) slots. The ACPI firmware of my
    > system returns '1023' if the invalid slot's _SUN is evaluated. This
    > is the cause of Call Traces mentioned above. To fix this problem,
    > pci_slot driver need to check _STA when scanning ACPI Namespace.
    >
    > I'm sorry for reporting this so late. I'm attaching the patch to fix
    > the problem. This is against 2.6.24-rc3 with your patches applied.
    > Could you try it?
    >
    > BTW, acpiphp also seems to have the same problem...
    >
    > Thanks,
    > Kenji Kaneshige


    Hello - back to this mail for a moment. I wanted to talk about acpiphp
    again. You say that acpiphp has this same problem as Alex's driver wrt
    evalating _SUN on a non existant object. If we skip registering the
    slot in acpiphp's register_slot function if it returns 0 for _STA, we
    will not detect any hotplug slots, since leaf nodes will return 0 for
    _STA if a device is not present. Should we only skip it if it's parent
    is a p2p bridge that has _EJ0 and the parent's _STA says empty?

    Your input is appreciated,
    Kristen

    >
    > ---
    > drivers/acpi/pci_slot.c | 13 +++++++++++++
    > 1 file changed, 13 insertions(+)
    >
    > Index: linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    > ================================================== =================
    > --- linux-2.6.24-rc3.orig/drivers/acpi/pci_slot.c
    > +++ linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    > @@ -113,10 +113,17 @@ register_slot(acpi_handle handle, u32 lv
    > int device;
    > unsigned long sun;
    > char name[KOBJ_NAME_LEN];
    > + acpi_status status;
    > + struct acpi_device *dummy_device;
    >
    > struct pci_slot *pci_slot;
    > struct pci_bus *pci_bus = context;
    >
    > + /* Skip non-existing device object. */
    > + status = acpi_bus_get_device(handle, &dummy_device);
    > + if (ACPI_FAILURE(status))
    > + return AE_OK;
    > +
    > if (check_slot(handle, &device, &sun))
    > return AE_OK;
    >
    > @@ -150,12 +157,18 @@ walk_p2p_bridge(acpi_handle handle, u32
    > acpi_status status;
    > acpi_handle dummy_handle;
    > acpi_walk_callback user_function;
    > + struct acpi_device *dummy_device;
    >
    > struct pci_dev *dev;
    > struct pci_bus *pci_bus;
    > struct p2p_bridge_context child_context;
    > struct p2p_bridge_context *parent_context = context;
    >
    > + /* Skip non-existing device object. */
    > + status = acpi_bus_get_device(handle, &dummy_device);
    > + if (ACPI_FAILURE(status))
    > + return AE_OK;
    > +
    > pci_bus = parent_context->pci_bus;
    > user_function = parent_context->user_function;
    >
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 0/4, v3] Physical PCI slot objects

    Hi Kristen,

    > Hello - back to this mail for a moment. I wanted to talk about acpiphp
    > again. You say that acpiphp has this same problem as Alex's driver wrt
    > evalating _SUN on a non existant object. If we skip registering the
    > slot in acpiphp's register_slot function if it returns 0 for _STA, we
    > will not detect any hotplug slots, since leaf nodes will return 0 for
    > _STA if a device is not present. Should we only skip it if it's parent
    > is a p2p bridge that has _EJ0 and the parent's _STA says empty?


    I've been not looking at acpiphp code for a long time. So I need refresh
    my memory before answering. Sorry.

    But I want to say one thing. We can detect all the hotplug slots if ACPI
    firmware doesn't provide _STA. We don't need _STA for PCI hotplug slots
    and should not provide _STA for PCI hotplug slots if the system is based
    on ACPI2.0 or later, because those are _ADR based devices (PCI root bridge,
    which is _HID based device, is a exception). For _ADR based devices, we
    should check the presence of the device using PCI bus drivers instead of
    evaluating _STA. Please see the section of "_EJx (Eject)" in ACPI2.0b or
    later spec. It says "For _HID devices, OSPM evaluates the _STA method. For
    _ADR devices, OSPM checks with the bus driver for that device.".

    BTW, acpiphp_glue.c says _STA is "optionally". I don't know what it means.

    Thanks,
    Kenji Kaneshige


    Kristen Carlson Accardi wrote:
    > On Thu, 29 Nov 2007 16:47:14 +0900
    > Kenji Kaneshige wrote:
    >
    >>> Hi Gary, Kenji-san, et. al,
    >>>
    >>> * Gary Hade :
    >>>> Alex, What I was trying to suggest is a boot-time kernel
    >>>> option, not a kernel configuration option. The basic idea is
    >>>> to give the user (with a single binary kernel) the ability to
    >>>> include your ACPI-PCI slot driver feature changes only when
    >>>> they are really needed. In addition to reducing the number of
    >>>> system/PCI hotplug driver combinations where your changes would
    >>>> need to be validated, I believe would also help alleviate other
    >>>> worries (e.g. Andi Kleen's memory consumption concern). I
    >>>> believe this goal could also be achieved with the kernel config
    >>>> option by making the pci_slot module runtime loadable with the
    >>>> PCI hotplug drivers only visiting your new code when the
    >>>> pci_slot driver is loaded, although I think this would be more
    >>>> difficult to implement.
    >>> I have modified my patch series so that the final patch that
    >>> introduces my ACPI-PCI slot driver is a full-fledged module, that
    >>> has a tristate Kconfig option.
    >>>

    >> Thank you for your good job.
    >>
    >> I tested shpchp and pciehp both with and without pci_slot module.
    >> There seems no regression from shpchp and pciehp's point of view.
    >> (I had a little concern about the hotplug slots' name that vary
    >> depending on whether pci_slot functionality is enabled or disabled.
    >> But, now that we can build pci_slot driver as a kernel module, I
    >> don't think it is a big problem).
    >>
    >> Only the problems is that I got Call Traces with the following error
    >> messages when pci_slot driver was loaded, and one strange slot named
    >> '1023' was registered (other slots are fine). This is the same problem
    >> I reported before.
    >>
    >> sysfs: duplicate filename '1023' can not be created
    >> WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
    >>
    >> kobject_add failed for 1023 with -EEXIST, don't try to register
    >> things with the same name in the same directory.
    >>
    >> On my system, hotplug slots themselves can be added, removed and
    >> replaced with the ohter type of I/O box. The ACPI firmware tells OS
    >> the presence of those slots using _STA method (That is, it doesn't
    >> use 'LoadTable()' AML operator). On the other hand, current pci_slot
    >> driver doesn't check _STA. As a result, pci_slot driver tryied to
    >> register the invalid (non-existing) slots. The ACPI firmware of my
    >> system returns '1023' if the invalid slot's _SUN is evaluated. This
    >> is the cause of Call Traces mentioned above. To fix this problem,
    >> pci_slot driver need to check _STA when scanning ACPI Namespace.
    >>
    >> I'm sorry for reporting this so late. I'm attaching the patch to fix
    >> the problem. This is against 2.6.24-rc3 with your patches applied.
    >> Could you try it?
    >>
    >> BTW, acpiphp also seems to have the same problem...
    >>
    >> Thanks,
    >> Kenji Kaneshige

    >
    > Hello - back to this mail for a moment. I wanted to talk about acpiphp
    > again. You say that acpiphp has this same problem as Alex's driver wrt
    > evalating _SUN on a non existant object. If we skip registering the
    > slot in acpiphp's register_slot function if it returns 0 for _STA, we
    > will not detect any hotplug slots, since leaf nodes will return 0 for
    > _STA if a device is not present. Should we only skip it if it's parent
    > is a p2p bridge that has _EJ0 and the parent's _STA says empty?
    >
    > Your input is appreciated,
    > Kristen
    >
    >> ---
    >> drivers/acpi/pci_slot.c | 13 +++++++++++++
    >> 1 file changed, 13 insertions(+)
    >>
    >> Index: linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    >> ================================================== =================
    >> --- linux-2.6.24-rc3.orig/drivers/acpi/pci_slot.c
    >> +++ linux-2.6.24-rc3/drivers/acpi/pci_slot.c
    >> @@ -113,10 +113,17 @@ register_slot(acpi_handle handle, u32 lv
    >> int device;
    >> unsigned long sun;
    >> char name[KOBJ_NAME_LEN];
    >> + acpi_status status;
    >> + struct acpi_device *dummy_device;
    >>
    >> struct pci_slot *pci_slot;
    >> struct pci_bus *pci_bus = context;
    >>
    >> + /* Skip non-existing device object. */
    >> + status = acpi_bus_get_device(handle, &dummy_device);
    >> + if (ACPI_FAILURE(status))
    >> + return AE_OK;
    >> +
    >> if (check_slot(handle, &device, &sun))
    >> return AE_OK;
    >>
    >> @@ -150,12 +157,18 @@ walk_p2p_bridge(acpi_handle handle, u32
    >> acpi_status status;
    >> acpi_handle dummy_handle;
    >> acpi_walk_callback user_function;
    >> + struct acpi_device *dummy_device;
    >>
    >> struct pci_dev *dev;
    >> struct pci_bus *pci_bus;
    >> struct p2p_bridge_context child_context;
    >> struct p2p_bridge_context *parent_context = context;
    >>
    >> + /* Skip non-existing device object. */
    >> + status = acpi_bus_get_device(handle, &dummy_device);
    >> + if (ACPI_FAILURE(status))
    >> + return AE_OK;
    >> +
    >> pci_bus = parent_context->pci_bus;
    >> user_function = parent_context->user_function;
    >>
    >>

    >
    >



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2