[PATCH 1/2] pci: add misrouted interrupt error handling - Kernel

This is a discussion on [PATCH 1/2] pci: add misrouted interrupt error handling - Kernel ; We're getting a lot of storage drivers blamed for interrupt misrouting issues. This patch provides a standard way of reporting the problem .... and, if possible, correcting it. Signed-off-by: James Bottomley --- drivers/pci/Makefile | 3 +- drivers/pci/irq.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++ ...

+ Reply to Thread
Results 1 to 20 of 20

Thread: [PATCH 1/2] pci: add misrouted interrupt error handling

  1. [PATCH 1/2] pci: add misrouted interrupt error handling

    We're getting a lot of storage drivers blamed for interrupt misrouting
    issues. This patch provides a standard way of reporting the problem
    .... and, if possible, correcting it.

    Signed-off-by: James Bottomley
    ---
    drivers/pci/Makefile | 3 +-
    drivers/pci/irq.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++
    include/linux/pci.h | 7 +++++
    3 files changed, 69 insertions(+), 1 deletions(-)
    create mode 100644 drivers/pci/irq.c

    diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
    index 7d63f8c..19dacb8 100644
    --- a/drivers/pci/Makefile
    +++ b/drivers/pci/Makefile
    @@ -3,7 +3,8 @@
    #

    obj-y += access.o bus.o probe.o remove.o pci.o quirks.o slot.o \
    - pci-driver.o search.o pci-sysfs.o rom.o setup-res.o
    + pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
    + irq.o
    obj-$(CONFIG_PROC_FS) += proc.o

    # Build PCI Express stuff if needed
    diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
    new file mode 100644
    index 0000000..6441dfa
    --- /dev/null
    +++ b/drivers/pci/irq.c
    @@ -0,0 +1,60 @@
    +/*
    + * PCI IRQ failure handing code
    + *
    + * Copyright (c) 2008 James Bottomley
    + */
    +
    +#include
    +#include
    +#include
    +#include
    +
    +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    +{
    + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    +
    + dev_printk(KERN_ERR, &pdev->dev,
    + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    + parent->dev.bus_id, parent->vendor, parent->device);
    + dev_printk(KERN_ERR, &pdev->dev, "%s\n", reason);
    + dev_printk(KERN_ERR, &pdev->dev, "Please report to linux-kernel@vger.kernel.org\n");
    + WARN_ON(1);
    +}
    +
    +/**
    + * pci_lost_interrupt - reports a lost PCI interrupt
    + * @pdev: device whose interrupt is lost
    + *
    + * The primary function of this routine is to report a lost interrupt
    + * in a standard way which users can recognise (instead of blaming the
    + * driver).
    + *
    + * Returns:
    + * a suggestion for fixing it (although the driver is not required to
    + * act on this).
    + */
    +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    +{
    + if (pdev->msi_enabled || pdev->msix_enabled) {
    + enum pci_lost_interrupt_reason ret;
    +
    + if (pdev->msix_enabled) {
    + pci_note_irq_problem(pdev, "MSIX routing failure");
    + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    + } else {
    + pci_note_irq_problem(pdev, "MSI routing failure");
    + ret = PCI_LOST_IRQ_DISABLE_MSI;
    + }
    + return ret;
    + }
    +#ifdef CONFIG_ACPI
    + if (!(acpi_disabled || acpi_noirq)) {
    + pci_note_irq_problem(pdev, "Potential ACPI misrouting please reboot with acpi=noirq");
    + /* currently no way to fix acpi on the fly */
    + return PCI_LOST_IRQ_DISABLE_ACPI;
    + }
    +#endif
    + pci_note_irq_problem(pdev, "unknown cause (not MSI or ACPI)");
    + return PCI_LOST_IRQ_NO_INFORMATION;
    +}
    +EXPORT_SYMBOL(pci_lost_interrupt);
    diff --git a/include/linux/pci.h b/include/linux/pci.h
    index 825be38..121698a 100644
    --- a/include/linux/pci.h
    +++ b/include/linux/pci.h
    @@ -539,6 +539,13 @@ struct pci_dev __deprecated *pci_find_slot(unsigned int bus,
    unsigned int devfn);
    #endif /* CONFIG_PCI_LEGACY */

    +enum pci_lost_interrupt_reason {
    + PCI_LOST_IRQ_NO_INFORMATION = 0,
    + PCI_LOST_IRQ_DISABLE_MSI,
    + PCI_LOST_IRQ_DISABLE_MSIX,
    + PCI_LOST_IRQ_DISABLE_ACPI,
    +};
    +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *dev);
    int pci_find_capability(struct pci_dev *dev, int cap);
    int pci_find_next_capability(struct pci_dev *dev, u8 pos, int cap);
    int pci_find_ext_capability(struct pci_dev *dev, int cap);
    --
    1.5.6.3



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sun, Aug 03, 2008 at 01:02:12PM -0500, James Bottomley wrote:
    > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > +{
    > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > +
    > + dev_printk(KERN_ERR, &pdev->dev,
    > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > + parent->dev.bus_id, parent->vendor, parent->device);
    > + dev_printk(KERN_ERR, &pdev->dev, "%s\n", reason);
    > + dev_printk(KERN_ERR, &pdev->dev, "Please report to linux-kernel@vger.kernel.org\n");
    > + WARN_ON(1);
    > +}


    Will the dev_printk() strings be included in the kerneloops report? And
    what if there is no parent of the device? Consider device 00:02.0 on my
    laptop:

    00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
    Subsystem: Fujitsu Limited. Device 13fe
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0
    Interrupt: pin A routed to IRQ 16

    > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    > +{
    > + if (pdev->msi_enabled || pdev->msix_enabled) {
    > + enum pci_lost_interrupt_reason ret;
    > +
    > + if (pdev->msix_enabled) {
    > + pci_note_irq_problem(pdev, "MSIX routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    > + } else {
    > + pci_note_irq_problem(pdev, "MSI routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSI;
    > + }
    > + return ret;
    > + }


    Couldn't this be written more concisely as:

    if (pdev->msix_enabled) {
    pci_note_irq_problem(pdev, "MSIX routing failure");
    return PCI_LOST_IRQ_DISABLE_MSIX;
    }
    if (pdev->msi_enabled) {
    pci_note_irq_problem(pdev, "MSI routing failure");
    return PCI_LOST_IRQ_DISABLE_MSI;
    }

    > +#ifdef CONFIG_ACPI
    > + if (!(acpi_disabled || acpi_noirq)) {
    > + pci_note_irq_problem(pdev, "Potential ACPI misrouting please reboot with acpi=noirq");
    > + /* currently no way to fix acpi on the fly */
    > + return PCI_LOST_IRQ_DISABLE_ACPI;
    > + }
    > +#endif
    > + pci_note_irq_problem(pdev, "unknown cause (not MSI or ACPI)");
    > + return PCI_LOST_IRQ_NO_INFORMATION;
    > +}
    > +EXPORT_SYMBOL(pci_lost_interrupt);

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sun, 2008-08-03 at 20:51 -0600, Matthew Wilcox wrote:
    > On Sun, Aug 03, 2008 at 01:02:12PM -0500, James Bottomley wrote:
    > > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > > +{
    > > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > > +
    > > + dev_printk(KERN_ERR, &pdev->dev,
    > > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > > + parent->dev.bus_id, parent->vendor, parent->device);
    > > + dev_printk(KERN_ERR, &pdev->dev, "%s\n", reason);
    > > + dev_printk(KERN_ERR, &pdev->dev, "Please report to linux-kernel@vger.kernel.org\n");
    > > + WARN_ON(1);
    > > +}

    >
    > Will the dev_printk() strings be included in the kerneloops report? And
    > what if there is no parent of the device? Consider device 00:02.0 on my
    > laptop:


    No, but some of them take the prior strings (depending on the
    implementation).

    > 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
    > Subsystem: Fujitsu Limited. Device 13fe
    > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- > Latency: 0
    > Interrupt: pin A routed to IRQ 16


    There is always a parent device ... it might be the PCI root device, in
    which case the information will be blank, but there is always one.

    > > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    > > +{
    > > + if (pdev->msi_enabled || pdev->msix_enabled) {
    > > + enum pci_lost_interrupt_reason ret;
    > > +
    > > + if (pdev->msix_enabled) {
    > > + pci_note_irq_problem(pdev, "MSIX routing failure");
    > > + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    > > + } else {
    > > + pci_note_irq_problem(pdev, "MSI routing failure");
    > > + ret = PCI_LOST_IRQ_DISABLE_MSI;
    > > + }
    > > + return ret;
    > > + }

    >
    > Couldn't this be written more concisely as:
    >
    > if (pdev->msix_enabled) {
    > pci_note_irq_problem(pdev, "MSIX routing failure");
    > return PCI_LOST_IRQ_DISABLE_MSIX;
    > }
    > if (pdev->msi_enabled) {
    > pci_note_irq_problem(pdev, "MSI routing failure");
    > return PCI_LOST_IRQ_DISABLE_MSI;
    > }


    The idea was to separate the cases in case something extra needs be
    done. I think it's pretty much identical as far as the compiler
    optimises, and therefore probably not worth worrying about much.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sun, Aug 03, 2008 at 01:02:12PM -0500, James Bottomley wrote:
    > We're getting a lot of storage drivers blamed for interrupt misrouting
    > issues. This patch provides a standard way of reporting the problem
    > ... and, if possible, correcting it.


    James,
    Can you add a paragraph to Documentation/pci.txt about the usage
    of the new API?

    thanks,
    grant

    >
    > Signed-off-by: James Bottomley
    > ---
    > drivers/pci/Makefile | 3 +-
    > drivers/pci/irq.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++
    > include/linux/pci.h | 7 +++++
    > 3 files changed, 69 insertions(+), 1 deletions(-)
    > create mode 100644 drivers/pci/irq.c
    >
    > diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
    > index 7d63f8c..19dacb8 100644
    > --- a/drivers/pci/Makefile
    > +++ b/drivers/pci/Makefile
    > @@ -3,7 +3,8 @@
    > #
    >
    > obj-y += access.o bus.o probe.o remove.o pci.o quirks.o slot.o \
    > - pci-driver.o search.o pci-sysfs.o rom.o setup-res.o
    > + pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
    > + irq.o
    > obj-$(CONFIG_PROC_FS) += proc.o
    >
    > # Build PCI Express stuff if needed
    > diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
    > new file mode 100644
    > index 0000000..6441dfa
    > --- /dev/null
    > +++ b/drivers/pci/irq.c
    > @@ -0,0 +1,60 @@
    > +/*
    > + * PCI IRQ failure handing code
    > + *
    > + * Copyright (c) 2008 James Bottomley
    > + */
    > +
    > +#include
    > +#include
    > +#include
    > +#include
    > +
    > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > +{
    > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > +
    > + dev_printk(KERN_ERR, &pdev->dev,
    > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > + parent->dev.bus_id, parent->vendor, parent->device);
    > + dev_printk(KERN_ERR, &pdev->dev, "%s\n", reason);
    > + dev_printk(KERN_ERR, &pdev->dev, "Please report to linux-kernel@vger.kernel.org\n");
    > + WARN_ON(1);
    > +}
    > +
    > +/**
    > + * pci_lost_interrupt - reports a lost PCI interrupt
    > + * @pdev: device whose interrupt is lost
    > + *
    > + * The primary function of this routine is to report a lost interrupt
    > + * in a standard way which users can recognise (instead of blaming the
    > + * driver).
    > + *
    > + * Returns:
    > + * a suggestion for fixing it (although the driver is not required to
    > + * act on this).
    > + */
    > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    > +{
    > + if (pdev->msi_enabled || pdev->msix_enabled) {
    > + enum pci_lost_interrupt_reason ret;
    > +
    > + if (pdev->msix_enabled) {
    > + pci_note_irq_problem(pdev, "MSIX routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    > + } else {
    > + pci_note_irq_problem(pdev, "MSI routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSI;
    > + }
    > + return ret;
    > + }
    > +#ifdef CONFIG_ACPI
    > + if (!(acpi_disabled || acpi_noirq)) {
    > + pci_note_irq_problem(pdev, "Potential ACPI misrouting please reboot with acpi=noirq");
    > + /* currently no way to fix acpi on the fly */
    > + return PCI_LOST_IRQ_DISABLE_ACPI;
    > + }
    > +#endif
    > + pci_note_irq_problem(pdev, "unknown cause (not MSI or ACPI)");
    > + return PCI_LOST_IRQ_NO_INFORMATION;
    > +}
    > +EXPORT_SYMBOL(pci_lost_interrupt);
    > diff --git a/include/linux/pci.h b/include/linux/pci.h
    > index 825be38..121698a 100644
    > --- a/include/linux/pci.h
    > +++ b/include/linux/pci.h
    > @@ -539,6 +539,13 @@ struct pci_dev __deprecated *pci_find_slot(unsigned int bus,
    > unsigned int devfn);
    > #endif /* CONFIG_PCI_LEGACY */
    >
    > +enum pci_lost_interrupt_reason {
    > + PCI_LOST_IRQ_NO_INFORMATION = 0,
    > + PCI_LOST_IRQ_DISABLE_MSI,
    > + PCI_LOST_IRQ_DISABLE_MSIX,
    > + PCI_LOST_IRQ_DISABLE_ACPI,
    > +};
    > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *dev);
    > int pci_find_capability(struct pci_dev *dev, int cap);
    > int pci_find_next_capability(struct pci_dev *dev, u8 pos, int cap);
    > int pci_find_ext_capability(struct pci_dev *dev, int cap);
    > --
    > 1.5.6.3
    >
    >
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sun, 2008-08-03 at 22:30 -0600, Grant Grundler wrote:
    > On Sun, Aug 03, 2008 at 01:02:12PM -0500, James Bottomley wrote:
    > > We're getting a lot of storage drivers blamed for interrupt misrouting
    > > issues. This patch provides a standard way of reporting the problem
    > > ... and, if possible, correcting it.

    >
    > James,
    > Can you add a paragraph to Documentation/pci.txt about the usage
    > of the new API?


    When there actually is a new API, sure.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sunday 03 August 2008 12:02:12 pm James Bottomley wrote:
    > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > +{
    > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > +
    > + dev_printk(KERN_ERR, &pdev->dev,
    > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > + parent->dev.bus_id, parent->vendor, parent->device);


    Do you prefer "dev_printk(KERN_ERR, ...)" over "dev_err(...)"? Easier
    to grep for the former, maybe? If so, should we deprecate "dev_err()"
    and friends? When I converted most of the PCI core to use dev_printk(),
    (80ccba1186d48f ...) I used dev_err(), but I don't really care one way
    or the other.

    Maybe use pci_name(parent)?

    I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.

    Bjorn

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Mon, Aug 04, 2008 at 02:43:20PM -0600, Bjorn Helgaas wrote:
    > I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.


    That's unfortunately different from lspci -nn:

    00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960
    Memory Controller Hub [8086:2a00] (rev 03)

    Sorry, I should have reviewed your patches ;-(

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Monday 04 August 2008 03:35:35 pm Matthew Wilcox wrote:
    > On Mon, Aug 04, 2008 at 02:43:20PM -0600, Bjorn Helgaas wrote:
    > > I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.

    >
    > That's unfortunately different from lspci -nn:
    >
    > 00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960
    > Memory Controller Hub [8086:2a00] (rev 03)
    >
    > Sorry, I should have reviewed your patches ;-(


    I can change those. I just copied what seemed to be the most
    prevalent. But lspci is harder to change, so we should probably
    follow that.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Mon, 2008-08-04 at 14:43 -0600, Bjorn Helgaas wrote:
    > On Sunday 03 August 2008 12:02:12 pm James Bottomley wrote:
    > > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > > +{
    > > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > > +
    > > + dev_printk(KERN_ERR, &pdev->dev,
    > > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > > + parent->dev.bus_id, parent->vendor, parent->device);

    >
    > Do you prefer "dev_printk(KERN_ERR, ...)" over "dev_err(...)"? Easier
    > to grep for the former, maybe? If so, should we deprecate "dev_err()"
    > and friends? When I converted most of the PCI core to use dev_printk(),
    > (80ccba1186d48f ...) I used dev_err(), but I don't really care one way
    > or the other.
    >
    > Maybe use pci_name(parent)?
    >
    > I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.


    To be honest I'm not really interested too much in the various API
    preferences ... they can be fixed up later by the people who care.

    What I am interested in is whether we can get this interface to give
    sufficient information to blacklist the offending motherboard if we can
    identify it as the source of the problem. Since the usual culprit is
    the bridge, that's why I'm doing parent vendor/device pairs. However,
    better suggestions for the information that should be displayed would be
    gratefully received.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Monday 04 August 2008 06:02:27 pm James Bottomley wrote:
    > On Mon, 2008-08-04 at 14:43 -0600, Bjorn Helgaas wrote:
    > > On Sunday 03 August 2008 12:02:12 pm James Bottomley wrote:
    > > > +static void pci_note_irq_problem(struct pci_dev *pdev, const char *reason)
    > > > +{
    > > > + struct pci_dev *parent = to_pci_dev(pdev->dev.parent);
    > > > +
    > > > + dev_printk(KERN_ERR, &pdev->dev,
    > > > + "Potentially misrouted IRQ (Bridge %s %04x:%04x)\n",
    > > > + parent->dev.bus_id, parent->vendor, parent->device);

    > >
    > > Do you prefer "dev_printk(KERN_ERR, ...)" over "dev_err(...)"? Easier
    > > to grep for the former, maybe? If so, should we deprecate "dev_err()"
    > > and friends? When I converted most of the PCI core to use dev_printk(),
    > > (80ccba1186d48f ...) I used dev_err(), but I don't really care one way
    > > or the other.
    > >
    > > Maybe use pci_name(parent)?
    > >
    > > I tried to standardize the PCI core on "[%04x/%04x]" for vendor/device ID.

    >
    > To be honest I'm not really interested too much in the various API
    > preferences ... they can be fixed up later by the people who care.


    I'm happy to fix it up later if you prefer. I only mentioned it
    because doing it later adds churn and risk of breakage.

    Bjorn
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sunday, August 03, 2008 11:02 am James Bottomley wrote:
    > +/**
    > + * pci_lost_interrupt - reports a lost PCI interrupt
    > + * @pdev: device whose interrupt is lost
    > + *
    > + * The primary function of this routine is to report a lost interrupt
    > + * in a standard way which users can recognise (instead of blaming the
    > + * driver).
    > + *
    > + * Returns:
    > + * a suggestion for fixing it (although the driver is not required to
    > + * act on this).
    > + */
    > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    > +{
    > + if (pdev->msi_enabled || pdev->msix_enabled) {
    > + enum pci_lost_interrupt_reason ret;
    > +
    > + if (pdev->msix_enabled) {
    > + pci_note_irq_problem(pdev, "MSIX routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    > + } else {
    > + pci_note_irq_problem(pdev, "MSI routing failure");
    > + ret = PCI_LOST_IRQ_DISABLE_MSI;
    > + }
    > + return ret;
    > + }
    > +#ifdef CONFIG_ACPI
    > + if (!(acpi_disabled || acpi_noirq)) {
    > + pci_note_irq_problem(pdev, "Potential ACPI misrouting please reboot with
    > acpi=noirq"); + /* currently no way to fix acpi on the fly */
    > + return PCI_LOST_IRQ_DISABLE_ACPI;
    > + }
    > +#endif
    > + pci_note_irq_problem(pdev, "unknown cause (not MSI or ACPI)");
    > + return PCI_LOST_IRQ_NO_INFORMATION;
    > +}


    This seems to be a function that just returns what type of IRQ you're using or
    how it's routed and it isn't necessarily "lost interrupt" specific.

    This information is clearly useful to drivers both for informational purposes
    and for debugging problems, so on that front it looks good. However, I think
    it should probably be called pci_interrupt_type(struct pci_dev *dev) or
    something instead, and just return an enum of either MSIX, MSI, or LINE
    (though I'm open to better names). From that, the driver can infer what
    might be going wrong, though in the case of a LINE interrupt, all you can
    really do is report that there's probably a platform IRQ routing problem.

    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tue, 2008-08-05 at 10:03 -0700, Jesse Barnes wrote:
    > On Sunday, August 03, 2008 11:02 am James Bottomley wrote:
    > > +/**
    > > + * pci_lost_interrupt - reports a lost PCI interrupt
    > > + * @pdev: device whose interrupt is lost
    > > + *
    > > + * The primary function of this routine is to report a lost interrupt
    > > + * in a standard way which users can recognise (instead of blaming the
    > > + * driver).
    > > + *
    > > + * Returns:
    > > + * a suggestion for fixing it (although the driver is not required to
    > > + * act on this).
    > > + */
    > > +enum pci_lost_interrupt_reason pci_lost_interrupt(struct pci_dev *pdev)
    > > +{
    > > + if (pdev->msi_enabled || pdev->msix_enabled) {
    > > + enum pci_lost_interrupt_reason ret;
    > > +
    > > + if (pdev->msix_enabled) {
    > > + pci_note_irq_problem(pdev, "MSIX routing failure");
    > > + ret = PCI_LOST_IRQ_DISABLE_MSIX;
    > > + } else {
    > > + pci_note_irq_problem(pdev, "MSI routing failure");
    > > + ret = PCI_LOST_IRQ_DISABLE_MSI;
    > > + }
    > > + return ret;
    > > + }
    > > +#ifdef CONFIG_ACPI
    > > + if (!(acpi_disabled || acpi_noirq)) {
    > > + pci_note_irq_problem(pdev, "Potential ACPI misrouting please reboot with
    > > acpi=noirq"); + /* currently no way to fix acpi on the fly */
    > > + return PCI_LOST_IRQ_DISABLE_ACPI;
    > > + }
    > > +#endif
    > > + pci_note_irq_problem(pdev, "unknown cause (not MSI or ACPI)");
    > > + return PCI_LOST_IRQ_NO_INFORMATION;
    > > +}

    >
    > This seems to be a function that just returns what type of IRQ you're using or
    > how it's routed and it isn't necessarily "lost interrupt" specific.


    So perhaps this routine should only note but not advise? The drivers
    can then just call pci_interrupt_type to see if they can do anything.

    > This information is clearly useful to drivers both for informational purposes
    > and for debugging problems, so on that front it looks good. However, I think
    > it should probably be called pci_interrupt_type(struct pci_dev *dev) or
    > something instead, and just return an enum of either MSIX, MSI, or LINE
    > (though I'm open to better names). From that, the driver can infer what
    > might be going wrong, though in the case of a LINE interrupt, all you can
    > really do is report that there's probably a platform IRQ routing problem.


    The only thing that this can't do is ACPI ... but on the other hand once
    the IRQ routing is set by ACPI I'm not sure the driver can do anything
    to fix it.

    So ... depending on how the feedback goes, I'll plan to reroll this as a
    note but not advise function.

    James



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tue, 2008-08-05 at 13:53 -0700, Jesse Barnes wrote:
    > On Tuesday, August 05, 2008 1:44 pm James Bottomley wrote:
    > > On Tue, 2008-08-05 at 10:03 -0700, Jesse Barnes wrote:
    > > > This seems to be a function that just returns what type of IRQ you're
    > > > using or how it's routed and it isn't necessarily "lost interrupt"
    > > > specific.

    > >
    > > So perhaps this routine should only note but not advise? The drivers
    > > can then just call pci_interrupt_type to see if they can do anything.

    >
    > If it's just a pci_irq_type function then it probably shouldn't print
    > anything, and leave that to the caller, since it might be used for other
    > purposes too (e.g. a driver load printk or something). In the lost interrupt
    > case you already have to disable MSI or MSIX in the Fusion driver, so you may
    > as well put the printk there, right? I guess I'm saying it should neither
    > note nor advise, just return the IRQ type.


    Well, no; the object was to have the layer that knew (PCI) print
    information which could be used to identify the problem. Likely what
    the driver will say is something like "MSI isn't working and it's not my
    fault". What I want is for PCI to print something that may be helpful
    to people trying to diagnose the problem. Driver writers aren't going
    to get that right.

    > > > This information is clearly useful to drivers both for informational
    > > > purposes and for debugging problems, so on that front it looks good.
    > > > However, I think it should probably be called pci_interrupt_type(struct
    > > > pci_dev *dev) or something instead, and just return an enum of either
    > > > MSIX, MSI, or LINE (though I'm open to better names). From that, the
    > > > driver can infer what might be going wrong, though in the case of a LINE
    > > > interrupt, all you can really do is report that there's probably a
    > > > platform IRQ routing problem.

    > >
    > > The only thing that this can't do is ACPI ... but on the other hand once
    > > the IRQ routing is set by ACPI I'm not sure the driver can do anything
    > > to fix it.

    >
    > Yeah, ACPI may or may not be the problem, all you'll really know is that
    > you've got a line interrupt that you failed to get... The driver won't be
    > able to do much in that case aside from complain loudly.


    Yes ... it's just that when line interrupts fail (especially if they
    worked previously) it's usually ACPI to blame.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tuesday, August 05, 2008 1:44 pm James Bottomley wrote:
    > On Tue, 2008-08-05 at 10:03 -0700, Jesse Barnes wrote:
    > > This seems to be a function that just returns what type of IRQ you're
    > > using or how it's routed and it isn't necessarily "lost interrupt"
    > > specific.

    >
    > So perhaps this routine should only note but not advise? The drivers
    > can then just call pci_interrupt_type to see if they can do anything.


    If it's just a pci_irq_type function then it probably shouldn't print
    anything, and leave that to the caller, since it might be used for other
    purposes too (e.g. a driver load printk or something). In the lost interrupt
    case you already have to disable MSI or MSIX in the Fusion driver, so you may
    as well put the printk there, right? I guess I'm saying it should neither
    note nor advise, just return the IRQ type.

    > > This information is clearly useful to drivers both for informational
    > > purposes and for debugging problems, so on that front it looks good.
    > > However, I think it should probably be called pci_interrupt_type(struct
    > > pci_dev *dev) or something instead, and just return an enum of either
    > > MSIX, MSI, or LINE (though I'm open to better names). From that, the
    > > driver can infer what might be going wrong, though in the case of a LINE
    > > interrupt, all you can really do is report that there's probably a
    > > platform IRQ routing problem.

    >
    > The only thing that this can't do is ACPI ... but on the other hand once
    > the IRQ routing is set by ACPI I'm not sure the driver can do anything
    > to fix it.


    Yeah, ACPI may or may not be the problem, all you'll really know is that
    you've got a line interrupt that you failed to get... The driver won't be
    able to do much in that case aside from complain loudly.

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tuesday, August 05, 2008 1:56 pm James Bottomley wrote:
    > On Tue, 2008-08-05 at 13:53 -0700, Jesse Barnes wrote:
    > > On Tuesday, August 05, 2008 1:44 pm James Bottomley wrote:
    > > > On Tue, 2008-08-05 at 10:03 -0700, Jesse Barnes wrote:
    > > > > This seems to be a function that just returns what type of IRQ you're
    > > > > using or how it's routed and it isn't necessarily "lost interrupt"
    > > > > specific.
    > > >
    > > > So perhaps this routine should only note but not advise? The drivers
    > > > can then just call pci_interrupt_type to see if they can do anything.

    > >
    > > If it's just a pci_irq_type function then it probably shouldn't print
    > > anything, and leave that to the caller, since it might be used for other
    > > purposes too (e.g. a driver load printk or something). In the lost
    > > interrupt case you already have to disable MSI or MSIX in the Fusion
    > > driver, so you may as well put the printk there, right? I guess I'm
    > > saying it should neither note nor advise, just return the IRQ type.

    >
    > Well, no; the object was to have the layer that knew (PCI) print
    > information which could be used to identify the problem. Likely what
    > the driver will say is something like "MSI isn't working and it's not my
    > fault". What I want is for PCI to print something that may be helpful
    > to people trying to diagnose the problem. Driver writers aren't going
    > to get that right.


    Yeah, I understand what you're trying to do, and given that there's only one
    user of this function (your fusion patch), I don't have a strong preference.

    However, the function is really just telling the driver what type of IRQ a
    given PCI device currently has; it's not really a "lost interrupt" function
    at all. So to me, it makes sense to just generalize it into a pci_irq_type
    function and let drivers do what they will with it. It looks like this is
    the real lost interrupt detection:

    + /* May fail becuase of IRQ misrouting */
    + rc = mpt_get_manufacturing_pg_0(ioc);
    + if (rc) {
    + ...
    + printk(MYIOC_s_ERR_FMT "Cannot recover IRQ routing\n",
    + ioc->name);
    + return -1;
    + }

    not the pci_lost_interrupt() function. So it could just as easily be written
    as such:

    + /* May fail becuase of IRQ misrouting */
    + rc = mpt_get_manufacturing_pg_0(ioc);
    + if (rc) {
    + /* Lost an IRQ, see what type... */
    + int irq_type = pci_irq_type(dev);
    +
    + if (type == PCI_IRQ_MSI) {
    + /* Lost an MSI interrupt, try re-config w/o MSI */
    + free_irq(ioc->pci_irq, ioc);
    + ioc->msi_enable = 0;
    + pci_disable_msi(ioc->pcidev);
    + goto retry;
    + }
    + /* This platform is just broken... */
    + printk(MYIOC_s_ERR_FMT "Cannot recover from %s IRQ
    + routing error\n", pci_irq_type_name(type), ioc->name);
    + return -1;
    + }

    or somesuch. That seems just as simple for driver writers as your initial
    patch, and the function is named in accordance with what it actually does,
    rather than what it's used for...

    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tue, 2008-08-05 at 14:15 -0700, Jesse Barnes wrote:
    > On Tuesday, August 05, 2008 1:56 pm James Bottomley wrote:
    > > On Tue, 2008-08-05 at 13:53 -0700, Jesse Barnes wrote:
    > > > On Tuesday, August 05, 2008 1:44 pm James Bottomley wrote:
    > > > > On Tue, 2008-08-05 at 10:03 -0700, Jesse Barnes wrote:
    > > > > > This seems to be a function that just returns what type of IRQ you're
    > > > > > using or how it's routed and it isn't necessarily "lost interrupt"
    > > > > > specific.
    > > > >
    > > > > So perhaps this routine should only note but not advise? The drivers
    > > > > can then just call pci_interrupt_type to see if they can do anything.
    > > >
    > > > If it's just a pci_irq_type function then it probably shouldn't print
    > > > anything, and leave that to the caller, since it might be used for other
    > > > purposes too (e.g. a driver load printk or something). In the lost
    > > > interrupt case you already have to disable MSI or MSIX in the Fusion
    > > > driver, so you may as well put the printk there, right? I guess I'm
    > > > saying it should neither note nor advise, just return the IRQ type.

    > >
    > > Well, no; the object was to have the layer that knew (PCI) print
    > > information which could be used to identify the problem. Likely what
    > > the driver will say is something like "MSI isn't working and it's not my
    > > fault". What I want is for PCI to print something that may be helpful
    > > to people trying to diagnose the problem. Driver writers aren't going
    > > to get that right.

    >
    > Yeah, I understand what you're trying to do, and given that there's only one
    > user of this function (your fusion patch), I don't have a strong preference.
    >
    > However, the function is really just telling the driver what type of IRQ a
    > given PCI device currently has; it's not really a "lost interrupt" function
    > at all. So to me, it makes sense to just generalize it into a pci_irq_type
    > function and let drivers do what they will with it. It looks like this is
    > the real lost interrupt detection:
    >
    > + /* May fail becuase of IRQ misrouting */
    > + rc = mpt_get_manufacturing_pg_0(ioc);
    > + if (rc) {
    > + ...
    > + printk(MYIOC_s_ERR_FMT "Cannot recover IRQ routing\n",
    > + ioc->name);
    > + return -1;
    > + }
    >
    > not the pci_lost_interrupt() function. So it could just as easily be written
    > as such:
    >
    > + /* May fail becuase of IRQ misrouting */
    > + rc = mpt_get_manufacturing_pg_0(ioc);
    > + if (rc) {
    > + /* Lost an IRQ, see what type... */
    > + int irq_type = pci_irq_type(dev);
    > +
    > + if (type == PCI_IRQ_MSI) {
    > + /* Lost an MSI interrupt, try re-config w/o MSI */
    > + free_irq(ioc->pci_irq, ioc);
    > + ioc->msi_enable = 0;
    > + pci_disable_msi(ioc->pcidev);
    > + goto retry;
    > + }
    > + /* This platform is just broken... */
    > + printk(MYIOC_s_ERR_FMT "Cannot recover from %s IRQ
    > + routing error\n", pci_irq_type_name(type), ioc->name);
    > + return -1;
    > + }
    >
    > or somesuch. That seems just as simple for driver writers as your initial
    > patch, and the function is named in accordance with what it actually does,
    > rather than what it's used for...


    It could, but if the bridge is the culprit (as it usually is for MSI
    problems), this print won't help identify it.

    Therefore, rather than give driver writers a recipe for "print this and
    this and go to the bridge and print this", I'd rather have a single PCI
    callback that prints all the (hopefully) relevant information that will
    allow either fixing or blacklisting.

    James

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Tuesday, August 5, 2008 2:54 pm James Bottomley wrote:
    > > or somesuch. That seems just as simple for driver writers as your
    > > initial patch, and the function is named in accordance with what it
    > > actually does, rather than what it's used for...

    >
    > It could, but if the bridge is the culprit (as it usually is for MSI
    > problems), this print won't help identify it.
    >
    > Therefore, rather than give driver writers a recipe for "print this and
    > this and go to the bridge and print this", I'd rather have a single PCI
    > callback that prints all the (hopefully) relevant information that will
    > allow either fixing or blacklisting.


    So in addition to the IRQ type check we need to dump some device topology
    information... yeah that makes sense. I wonder if the driver core should
    provide something like this. Greg? In the meantime we can definitely add
    the IRQ type function.

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Thu, Aug 07, 2008 at 09:03:22AM -0700, Jesse Barnes wrote:
    > On Tuesday, August 5, 2008 2:54 pm James Bottomley wrote:
    > > > or somesuch. That seems just as simple for driver writers as your
    > > > initial patch, and the function is named in accordance with what it
    > > > actually does, rather than what it's used for...

    > >
    > > It could, but if the bridge is the culprit (as it usually is for MSI
    > > problems), this print won't help identify it.
    > >
    > > Therefore, rather than give driver writers a recipe for "print this and
    > > this and go to the bridge and print this", I'd rather have a single PCI
    > > callback that prints all the (hopefully) relevant information that will
    > > allow either fixing or blacklisting.

    >
    > So in addition to the IRQ type check we need to dump some device topology
    > information... yeah that makes sense. I wonder if the driver core should
    > provide something like this. Greg?


    What kind of topology do you need that is not already provided?

    thanks,

    greg k-h
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Thu, 2008-08-07 at 10:20 -0700, Greg KH wrote:
    > On Thu, Aug 07, 2008 at 09:03:22AM -0700, Jesse Barnes wrote:
    > > On Tuesday, August 5, 2008 2:54 pm James Bottomley wrote:
    > > > > or somesuch. That seems just as simple for driver writers as your
    > > > > initial patch, and the function is named in accordance with what it
    > > > > actually does, rather than what it's used for...
    > > >
    > > > It could, but if the bridge is the culprit (as it usually is for MSI
    > > > problems), this print won't help identify it.
    > > >
    > > > Therefore, rather than give driver writers a recipe for "print this and
    > > > this and go to the bridge and print this", I'd rather have a single PCI
    > > > callback that prints all the (hopefully) relevant information that will
    > > > allow either fixing or blacklisting.

    > >
    > > So in addition to the IRQ type check we need to dump some device topology
    > > information... yeah that makes sense. I wonder if the driver core should
    > > provide something like this. Greg?

    >
    > What kind of topology do you need that is not already provided?


    We really need to print how the device interrupt is routed. But,
    unfortunately, we need to identify the devices in the layout (bridges
    and the like by say device/vendor numbers) so we know what they are ...
    this is the bit that the generic device model can't do because it's
    embedded in the bus specific part.

    James


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH 1/2] pci: add misrouted interrupt error handling

    On Sunday, August 3, 2008 11:02 am James Bottomley wrote:
    > We're getting a lot of storage drivers blamed for interrupt misrouting
    > issues. This patch provides a standard way of reporting the problem
    > ... and, if possible, correcting it.
    >
    > Signed-off-by: James Bottomley


    Ok, per our discussion at the PCI BoF I applied this, so now you can apply the
    fusion and other bits to help debug issues.

    Thanks,
    Jesse
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread