RE: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) - Kernel

This is a discussion on RE: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd) - Kernel ; Hi Tejun, The messages I saw on the screen. Thank you, Christian -----Original Message----- From: Tejun Heo [mailto:htejun@gmail.com] Sent: Tuesday, October 07, 2008 5:14 PM To: Christian Mueller Subject: Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: RE: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

  1. RE: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Hi Tejun,

    The messages I saw on the screen.


    Thank you,
    Christian

    -----Original Message-----
    From: Tejun Heo [mailto:htejun@gmail.com]
    Sent: Tuesday, October 07, 2008 5:14 PM
    To: Christian Mueller
    Subject: Re: [smartmontools-support] inactive SATA drives won't stay in
    standby or sleep, PATA models did. (fwd)

    Christian Mueller wrote:
    > Hi Tejun,
    >
    > I am using an LSI 3080-X. I have tried dmesg and it does not show

    smartmon
    > output.


    Then, what made you believe you're having the same problem? Also, can
    you please revive the cc list?

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Tejun Heo writes:
    > (cc'ing linux-ide and Mikael Pettersson)
    >
    > Christian Mueller wrote:
    > > The messages I saw on the screen.

    >
    > If the drive doesn't wake up and times out the smart command while
    > suspended (which probably is a bug in the firmware but it also can be
    > the controller's fault), then after the timeout, the driver should kick
    > in and reset the drive which will wake up the device and retry the
    > command. It's not the prettiest picture but it's still gonna work. In
    > Linda's case, it looks like the controller (sata_promise) went bonkers
    > on hardreset and requires power cycle to get back to working state.
    > That's why I asked Linda to try another controller.
    >
    > Anyways, if you're seeing a similar problem, the driver or controller
    > probably can't do proper reset after timeout and I can't really help
    > with SAS driver on FreeBSD. :-P
    >
    > Mikael, the original thread is the following.
    >
    > http://article.gmane.org/gmane.linux...tmontools/5842
    >
    > Any ideas why hardreset doesn't work after SMART command timed out?


    I've looked at the above posting and tried to reproduce the problem.

    Linda wrote that "smartctl -n standby -A " sometimes wakes
    the device up, even though it shouldn't. On my Promise test box
    (FC6 user-space with kernel 2.6.27 and smartmontools-5.37-1.2.fc6,
    I put my test disks in standby with "hdparm -y /dev/sdb", and then
    ran numerous "smartctl -n standby -A /dev/sdb -d ata" commands.
    smartctl always noticed the standby state and never woke any disk.

    I then dropped the "-n standby" to observe the wakeup behaviour.
    On one disk (Seagate Barracuda 7200.9) the wakeups were completely
    reliable with no signs of timeouts or libata EH activity.
    On another disk (Hitachi Deskstar HDS722525VLSA80) the first one or
    two times I ran "smartctl -A /dev/sdb -d ata" when the disk was in
    standby it would wake up ok, but the next time it would suffer from
    timeouts, failed COMRESETs, and eventually libata would disable the port.
    When reloading sata_promise that port would fail detection but other
    ports would be ok.

    So I suspect two issues:
    - smartctl -n standby waking Linda's disks from standby is probably
    a disk firmware issue or a smartctl issue. I see no evidence that
    sata_promise is to blame for that.
    - hardreset in sata_promise seems broken. I'll take a closer look
    at that in about a week's time (I'll be busy with other work the
    next couple of days).

    /Mikael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Mikael Pettersson wrote:
    > - smartctl -n standby waking Linda's disks from standby is probably
    > a disk firmware issue or a smartctl issue. I see no evidence that
    > sata_promise is to blame for that.


    /me agrees. I think it's most likely a problem in the firmware.

    > - hardreset in sata_promise seems broken. I'll take a closer look
    > at that in about a week's time (I'll be busy with other work the
    > next couple of days).


    This looks like a rather serious problem, so please take a look at
    this. Also, after the drive went down, does unplugging and replugging
    the signal cable fix the problem, if not does doing the same thing
    with the power cable make any difference?

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Tejun Heo writes:
    > Mikael Pettersson wrote:
    > > - smartctl -n standby waking Linda's disks from standby is probably
    > > a disk firmware issue or a smartctl issue. I see no evidence that
    > > sata_promise is to blame for that.

    >
    > /me agrees. I think it's most likely a problem in the firmware.
    >
    > > - hardreset in sata_promise seems broken. I'll take a closer look
    > > at that in about a week's time (I'll be busy with other work the
    > > next couple of days).

    >
    > This looks like a rather serious problem, so please take a look at
    > this. Also, after the drive went down, does unplugging and replugging
    > the signal cable fix the problem, if not does doing the same thing
    > with the power cable make any difference?


    I couldn't bring the port back up with any combination of signal/power
    cable hotplugging. Even rmmod sata_promise; modprobe sata_promise failed
    to bring the port back up. Hence my suspicion that hardreset is borked
    or doesn't kick in.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Mikael Pettersson wrote:
    > Tejun Heo writes:
    > > > - hardreset in sata_promise seems broken. I'll take a closer look
    > > > at that in about a week's time (I'll be busy with other work the
    > > > next couple of days).

    > >
    > > This looks like a rather serious problem, so please take a look at
    > > this. Also, after the drive went down, does unplugging and replugging
    > > the signal cable fix the problem, if not does doing the same thing
    > > with the power cable make any difference?

    >
    > I couldn't bring the port back up with any combination of signal/power
    > cable hotplugging.


    Heh.. that rules out drive firmware stuck in wonderland.

    > Even rmmod sata_promise; modprobe sata_promise failed to bring the
    > port back up. Hence my suspicion that hardreset is borked or doesn't
    > kick in.


    It looks like the controller requires harder or more proper kick in
    the ass to return to sane state. :-)

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    On Mon, 13 Oct 2008 14:16:24 +0900, Tejun Heo wrote:
    >Mikael Pettersson wrote:
    >> - hardreset in sata_promise seems broken. I'll take a closer look
    >> at that in about a week's time (I'll be busy with other work the
    >> next couple of days).

    >
    >This looks like a rather serious problem, so please take a look at
    >this.


    I've done more tests now, and the problem is that errors detected
    outside of sata_promise itself, typically timeouts, don't trigger
    the pdc_reset_port() call needed to bring the ATA engine behind the
    port back to sanity.

    And the reason no Promise-specific reset is done on timeouts is
    that libata-eh freezes the port before calling ->error_handler.
    sata_promise's error_handler only does a reset if the port is
    non-frozen, and I think that's because we don't want to destroy
    error status bits needed by EH autopsy.

    The solution I've been testing is the straightforward one of
    overriding ->prereset with code which calls pdc_reset_port()
    before calling the default prereset. (See patch below.)
    (Promise's own driver issues resets whenever there's a sign
    of a problem.)

    One of my test disks will often trigger a timeout if smartctl
    accesses it when it's spun down. Previously the port would not
    recover from that, but now it's just a brief reset/detect and
    then it's up again.

    /Mikael

    --- linux-2.6.27/drivers/ata/sata_promise.c.~1~ 2008-07-14 10:22:36.000000000 +0200
    +++ linux-2.6.27/drivers/ata/sata_promise.c 2008-10-20 00:20:58.000000000 +0200
    @@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
    static void pdc_sata_freeze(struct ata_port *ap);
    static void pdc_thaw(struct ata_port *ap);
    static void pdc_sata_thaw(struct ata_port *ap);
    +static int pdc_prereset(struct ata_link *link, unsigned long deadline);
    static void pdc_error_handler(struct ata_port *ap);
    static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
    static int pdc_pata_cable_detect(struct ata_port *ap);
    @@ -175,6 +176,7 @@ static const struct ata_port_operations
    .sff_irq_clear = pdc_irq_clear,

    .post_internal_cmd = pdc_post_internal_cmd,
    + .prereset = pdc_prereset,
    .error_handler = pdc_error_handler,
    };

    @@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
    readl(host_mmio + hotplug_offset); /* flush */
    }

    +static int pdc_prereset(struct ata_link *link, unsigned long deadline)
    +{
    + pdc_reset_port(link->ap);
    + return ata_sff_prereset(link, deadline);
    +}
    +
    static void pdc_error_handler(struct ata_port *ap)
    {
    if (!(ap->pflags & ATA_PFLAG_FROZEN))
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Hello, Mikael.

    > --- linux-2.6.27/drivers/ata/sata_promise.c.~1~ 2008-07-14 10:22:36.000000000 +0200
    > +++ linux-2.6.27/drivers/ata/sata_promise.c 2008-10-20 00:20:58.000000000 +0200
    > @@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
    > static void pdc_sata_freeze(struct ata_port *ap);
    > static void pdc_thaw(struct ata_port *ap);
    > static void pdc_sata_thaw(struct ata_port *ap);
    > +static int pdc_prereset(struct ata_link *link, unsigned long deadline);
    > static void pdc_error_handler(struct ata_port *ap);
    > static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
    > static int pdc_pata_cable_detect(struct ata_port *ap);
    > @@ -175,6 +176,7 @@ static const struct ata_port_operations
    > .sff_irq_clear = pdc_irq_clear,
    >
    > .post_internal_cmd = pdc_post_internal_cmd,
    > + .prereset = pdc_prereset,
    > .error_handler = pdc_error_handler,
    > };
    >
    > @@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
    > readl(host_mmio + hotplug_offset); /* flush */
    > }
    >
    > +static int pdc_prereset(struct ata_link *link, unsigned long deadline)
    > +{
    > + pdc_reset_port(link->ap);


    I would put this into ->hardreset itself as the controller can also
    get out of sync with reality during reset. Other than that, looks
    like the correct approach.

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Tejun Heo writes:
    > Hello, Mikael.
    >
    > > --- linux-2.6.27/drivers/ata/sata_promise.c.~1~ 2008-07-14 10:22:36.000000000 +0200
    > > +++ linux-2.6.27/drivers/ata/sata_promise.c 2008-10-20 00:20:58.000000000 +0200
    > > @@ -153,6 +153,7 @@ static void pdc_freeze(struct ata_port *
    > > static void pdc_sata_freeze(struct ata_port *ap);
    > > static void pdc_thaw(struct ata_port *ap);
    > > static void pdc_sata_thaw(struct ata_port *ap);
    > > +static int pdc_prereset(struct ata_link *link, unsigned long deadline);
    > > static void pdc_error_handler(struct ata_port *ap);
    > > static void pdc_post_internal_cmd(struct ata_queued_cmd *qc);
    > > static int pdc_pata_cable_detect(struct ata_port *ap);
    > > @@ -175,6 +176,7 @@ static const struct ata_port_operations
    > > .sff_irq_clear = pdc_irq_clear,
    > >
    > > .post_internal_cmd = pdc_post_internal_cmd,
    > > + .prereset = pdc_prereset,
    > > .error_handler = pdc_error_handler,
    > > };
    > >
    > > @@ -691,6 +693,12 @@ static void pdc_sata_thaw(struct ata_por
    > > readl(host_mmio + hotplug_offset); /* flush */
    > > }
    > >
    > > +static int pdc_prereset(struct ata_link *link, unsigned long deadline)
    > > +{
    > > + pdc_reset_port(link->ap);

    >
    > I would put this into ->hardreset itself as the controller can also
    > get out of sync with reality during reset.


    The only thing I see going on between prereset and (hard/soft)reset
    is an optional freeze, so I don't see why moving the pdc_reset_port()
    into the beginning of hardreset() would make any difference.

    sata_promise currently uses the ->hardreset and ->softreset inherited
    from ata_sff_port_ops, so it would need to override both to ensure that
    we always do pdc_reset_port() before libata does its thing. That's why
    I felt doing that in ->prereset would be the right solution.

    /Mikael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    On Mon, 20 Oct 2008 01:40:21 +0200, Mikael Pettersson wrote:
    >On Mon, 13 Oct 2008 14:16:24 +0900, Tejun Heo wrote:
    >>Mikael Pettersson wrote:
    >>> - hardreset in sata_promise seems broken. I'll take a closer look
    >>> at that in about a week's time (I'll be busy with other work the
    >>> next couple of days).

    >>
    >>This looks like a rather serious problem, so please take a look at
    >>this.

    >
    >I've done more tests now, and the problem is that errors detected
    >outside of sata_promise itself, typically timeouts, don't trigger
    >the pdc_reset_port() call needed to bring the ATA engine behind the
    >port back to sanity.


    I did a round of regression tests which identified another
    related but different problem.

    In kernels 2.6.24 and 2.6.25 sata_promise would actually recover
    from the timeouts, while in kernels 2.6.26 and 2.6.27 it would not.
    Before 2.6.26 sata_promise explicitly used sata_std_hardreset, but
    in the "make reset related methods proper port operations" commit
    (a1efdaba2dbd6fb89e23a87b66d3f4dd92c9f5af), Tejun changed sata_promise
    to use the hardreset it now inherits from ata_sff_port_ops, namely
    sata_sff_hardreset. This change looks accidental. The main difference
    between these two procedures is that the sff version will poll the
    legacy status register until the port becomes ready.

    Changing sata_promise to use sata_std_hardreset in 2.6.26/.27
    makes the EH after the timeouts much more reliable. Not as
    tidy as with the previous ->prereset fix, but still working.

    /Mikael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Hello,

    Mikael Pettersson wrote:
    > Tejun Heo writes:
    > > Hello, Mikael.
    > > I would put this into ->hardreset itself as the controller can also
    > > get out of sync with reality during reset.

    >
    > The only thing I see going on between prereset and (hard/soft)reset
    > is an optional freeze, so I don't see why moving the pdc_reset_port()
    > into the beginning of hardreset() would make any difference.
    >
    > sata_promise currently uses the ->hardreset and ->softreset inherited
    > from ata_sff_port_ops, so it would need to override both to ensure that
    > we always do pdc_reset_port() before libata does its thing. That's why
    > I felt doing that in ->prereset would be the right solution.


    Hmm.. reset sequence goes on like the following.

    1. prereset
    2. hardreset, if fail, retry
    3. follow-up softreset if requested, if fail, goto #2
    4. postreset, if successful

    So, if some PHY event happens while the reset is waiting for device
    readiness and makes the controller state go out of sync with the
    drive. ->prereset() will NOT be called for the following retry.

    As a rule, ->hardreset should be able to reset the controller from all
    possible situations. ->prereset can be used to smooth out initial
    reset tries (ie. during initial probing, waiting for device readiness
    before SRST for SFF controllers w/o hardreset) but at best its
    function is advisory. When things go wrong, ->hardreset should be
    able to provide solution whatever state the controller is in.

    If both hard and soft resets work better with the controller reset
    added, I think it would be best to override both.

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [smartmontools-support] inactive SATA drives won't stay in standby or sleep, PATA models did. (fwd)

    Tejun Heo writes:
    > Hello,
    >
    > Mikael Pettersson wrote:
    > > Tejun Heo writes:
    > > > Hello, Mikael.
    > > > I would put this into ->hardreset itself as the controller can also
    > > > get out of sync with reality during reset.

    > >
    > > The only thing I see going on between prereset and (hard/soft)reset
    > > is an optional freeze, so I don't see why moving the pdc_reset_port()
    > > into the beginning of hardreset() would make any difference.
    > >
    > > sata_promise currently uses the ->hardreset and ->softreset inherited
    > > from ata_sff_port_ops, so it would need to override both to ensure that
    > > we always do pdc_reset_port() before libata does its thing. That's why
    > > I felt doing that in ->prereset would be the right solution.

    >
    > Hmm.. reset sequence goes on like the following.
    >
    > 1. prereset
    > 2. hardreset, if fail, retry
    > 3. follow-up softreset if requested, if fail, goto #2
    > 4. postreset, if successful
    >
    > So, if some PHY event happens while the reset is waiting for device
    > readiness and makes the controller state go out of sync with the
    > drive. ->prereset() will NOT be called for the following retry.


    I see. Ok, then I'll forget about ->prereset and bind ->hardreset to
    code which does pdc_reset_port() before invoking sata_sff_hardreset().

    Thanks for your input.

    /Mikael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread