[PATCH 1/2] mmc_block: print better data error message after timeout - Kernel

This is a discussion on [PATCH 1/2] mmc_block: print better data error message after timeout - Kernel ; In particular, if the card gets an ECC error it will timeout, in which case it is much more helpful to see an ECC error rather than a timeout error. Signed-off-by: Adrian Hunter --- drivers/mmc/card/block.c | 47 ++++++++++++++++++++++++++++++++++++++++++--- 1 files ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [PATCH 1/2] mmc_block: print better data error message after timeout

  1. [PATCH 1/2] mmc_block: print better data error message after timeout

    In particular, if the card gets an ECC error it will
    timeout, in which case it is much more helpful to see
    an ECC error rather than a timeout error.

    Signed-off-by: Adrian Hunter
    ---
    drivers/mmc/card/block.c | 47 ++++++++++++++++++++++++++++++++++++++++++---
    1 files changed, 43 insertions(+), 4 deletions(-)

    diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
    index 24c97d3..d121462 100644
    --- a/drivers/mmc/card/block.c
    +++ b/drivers/mmc/card/block.c
    @@ -210,6 +210,47 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
    return blocks;
    }

    +static void print_data_error(struct mmc_blk_request *brq, struct mmc_card *card,
    + struct request *req)
    +{
    + struct mmc_command cmd;
    + char *emsg;
    + u32 status;
    + int status_err = 0;
    +
    + if (brq->data.error != -ETIMEDOUT || mmc_host_is_spi(card->host))
    + goto out_print;
    +
    + if (brq->mrq.stop)
    + /* 'Stop' response contains card status */
    + status = brq->mrq.stop->resp[0];
    + else {
    + cmd.opcode = MMC_SEND_STATUS;
    + cmd.arg = card->rca << 16;
    + cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
    + status_err = mmc_wait_for_cmd(card->host, &cmd, 0);
    + if (status_err)
    + goto out_print;
    + status = cmd.resp[0];
    + }
    +
    + emsg = (status & R1_CARD_ECC_FAILED) ? "ECC" : "I/O";
    +
    + printk(KERN_ERR "%s: %s error transferring data, sector %u, "
    + "card status %#x\n", req->rq_disk->disk_name, emsg,
    + (unsigned)req->sector, status);
    +
    + return;
    +
    +out_print:
    + printk(KERN_ERR "%s: error %d transferring data, sector %u, nr %u\n",
    + req->rq_disk->disk_name, brq->data.error, (unsigned)req->sector,
    + (unsigned)req->nr_sectors);
    + if (status_err)
    + printk(KERN_ERR "%s: error %d requesting card status\n",
    + req->rq_disk->disk_name, status_err);
    +}
    +
    static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    {
    struct mmc_blk_data *md = mq->data;
    @@ -281,10 +322,8 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    req->rq_disk->disk_name, brq.cmd.error);
    }

    - if (brq.data.error) {
    - printk(KERN_ERR "%s: error %d transferring data\n",
    - req->rq_disk->disk_name, brq.data.error);
    - }
    + if (brq.data.error)
    + print_data_error(&brq, card, req);

    if (brq.stop.error) {
    printk(KERN_ERR "%s: error %d sending stop command\n",
    --
    1.5.4.3
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/2] mmc_block: print better data error message after timeout

    On Thu, 16 Oct 2008 16:26:55 +0300
    Adrian Hunter wrote:

    > In particular, if the card gets an ECC error it will
    > timeout, in which case it is much more helpful to see
    > an ECC error rather than a timeout error.
    >
    > Signed-off-by: Adrian Hunter
    > ---


    Woo. I think you're the first I've seen that has been able to trigger
    an actual card error.

    As for the patch, I like the idea but I'm not entirely satisfied with
    the implementation.

    > +static void print_data_error(struct mmc_blk_request *brq, struct mmc_card *card,
    > + struct request *req)
    > +{
    > + struct mmc_command cmd;
    > + char *emsg;
    > + u32 status;
    > + int status_err = 0;
    > +
    > + if (brq->data.error != -ETIMEDOUT || mmc_host_is_spi(card->host))
    > + goto out_print;
    > +


    The error codes are more of a hint than anything else, so you should
    check the status for all non-zero codes. You should also not just check
    data.error, but all of them.

    And why exclude spi?

    > + if (brq->mrq.stop)
    > + /* 'Stop' response contains card status */
    > + status = brq->mrq.stop->resp[0];
    > + else {
    > + cmd.opcode = MMC_SEND_STATUS;
    > + cmd.arg = card->rca << 16;
    > + cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
    > + status_err = mmc_wait_for_cmd(card->host, &cmd, 0);
    > + if (status_err)
    > + goto out_print;
    > + status = cmd.resp[0];
    > + }


    Errors can occur on writes as well, so I suggest accumulating the
    status bits instead of trying to get the entire set in one go. I.e.:

    status = 0;
    if (mrq.stop)
    status |= mrq.stop.resp[0];
    while (card not ready)
    status |= send_status();

    IOW, you should do this in the main handler (that already has the
    status loop for writes).

    > +
    > + emsg = (status & R1_CARD_ECC_FAILED) ? "ECC" : "I/O";
    > +


    There are also some other error codes that can be of interest.
    "Internal controller error" for example.

    (it's probably also better to state "Unknown" error and not "I/O" for
    the fallback)

    > @@ -281,10 +322,8 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    > req->rq_disk->disk_name, brq.cmd.error);
    > }
    >
    > - if (brq.data.error) {
    > - printk(KERN_ERR "%s: error %d transferring data\n",
    > - req->rq_disk->disk_name, brq.data.error);
    > - }
    > + if (brq.data.error)
    > + print_data_error(&brq, card, req);
    >


    Please keep the old message and add this as a new extra piece of
    information. It is helpful for debugging to see both what the driver
    and the card reported.

    Rgds
    --
    -- Pierre Ossman

    WARNING: This correspondence is being monitored by the
    Swedish government. Make sure your server uses encryption
    for SMTP traffic and consider using PGP for end-to-end
    encryption.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.9 (GNU/Linux)

    iEUEARECAAYFAkkEUIEACgkQ7b8eESbyJLjRsgCg5fiWRwB/w3eDrWrniguOZnO2
    OeAAljRlhu+jHexSqepst5CC7SCnQh0=
    =nLGh
    -----END PGP SIGNATURE-----


  3. Re: [PATCH 1/2] mmc_block: print better data error message after timeout

    Pierre Ossman wrote:
    > On Thu, 16 Oct 2008 16:26:55 +0300
    > Adrian Hunter wrote:
    >
    >> In particular, if the card gets an ECC error it will
    >> timeout, in which case it is much more helpful to see
    >> an ECC error rather than a timeout error.
    >>
    >> Signed-off-by: Adrian Hunter
    >> ---


    Thank you for looking at the patch.

    > Woo. I think you're the first I've seen that has been able to trigger
    > an actual card error.


    Well, my impression is that they are quite easy to break

    > As for the patch, I like the idea but I'm not entirely satisfied with
    > the implementation.
    >
    >> +static void print_data_error(struct mmc_blk_request *brq, struct mmc_card *card,
    >> + struct request *req)
    >> +{
    >> + struct mmc_command cmd;
    >> + char *emsg;
    >> + u32 status;
    >> + int status_err = 0;
    >> +
    >> + if (brq->data.error != -ETIMEDOUT || mmc_host_is_spi(card->host))
    >> + goto out_print;
    >> +

    >
    > The error codes are more of a hint than anything else, so you should
    > check the status for all non-zero codes. You should also not just check
    > data.error, but all of them.


    OK but the error bits are cleared as soon as the status has been reported,
    which is in the command response or, in the case of timeout, the response
    to the next command. Reading the status after that shows nothing. The
    status would have to be fished out of brq->mrq.cmd->resp[0].

    ETIMEDOUT is a special case because it is not a hint for the error, it is
    rather the normal response to an error - so it seemed very misleading to me.

    > And why exclude spi?


    I don't have SPI, so I can't test it. AFAICT timeouts are not meant to happen
    with SPI so it is a genuine error.

    The SPI error reporting is a little different. The non-SPI code that I wrote
    certainly won't work for SPI. I may look at doing something for SPI.

    >> + if (brq->mrq.stop)
    >> + /* 'Stop' response contains card status */
    >> + status = brq->mrq.stop->resp[0];
    >> + else {
    >> + cmd.opcode = MMC_SEND_STATUS;
    >> + cmd.arg = card->rca << 16;
    >> + cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
    >> + status_err = mmc_wait_for_cmd(card->host, &cmd, 0);
    >> + if (status_err)
    >> + goto out_print;
    >> + status = cmd.resp[0];
    >> + }

    >
    > Errors can occur on writes as well, so I suggest accumulating the
    > status bits instead of trying to get the entire set in one go. I.e.:
    >
    > status = 0;
    > if (mrq.stop)
    > status |= mrq.stop.resp[0];
    > while (card not ready)
    > status |= send_status();
    >
    > IOW, you should do this in the main handler (that already has the
    > status loop for writes).
    >


    The nesting may get too deep and the control paths too complicated
    if it is done in the main handler, but I will see what I can do.
    Ideally the loop needs some limit to prevent a misbehaving card
    locking up the driver.

    >> +
    >> + emsg = (status & R1_CARD_ECC_FAILED) ? "ECC" : "I/O";
    >> +

    >
    > There are also some other error codes that can be of interest.
    > "Internal controller error" for example.


    Do you mean CC_ERROR? I picked out ECC errors because NAND will get
    ECC errors if the NAND page was not written cleanly (e.g. if the power
    went off during the write) I guess a good card should recover the
    old data and never report ECC errors, but that is a separate issue.

    > (it's probably also better to state "Unknown" error and not "I/O" for
    > the fallback)


    "Unknown" is for unknown error codes, so I don't think it is appropriate.
    In this case we are simply not bothering to break down all the errors bits.

    >> @@ -281,10 +322,8 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    >> req->rq_disk->disk_name, brq.cmd.error);
    >> }
    >>
    >> - if (brq.data.error) {
    >> - printk(KERN_ERR "%s: error %d transferring data\n",
    >> - req->rq_disk->disk_name, brq.data.error);
    >> - }
    >> + if (brq.data.error)
    >> + print_data_error(&brq, card, req);
    >>

    >
    > Please keep the old message and add this as a new extra piece of
    > information. It is helpful for debugging to see both what the driver
    > and the card reported.


    It already does that.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/2] mmc_block: print better data error message after timeout

    Add command response and card status to error
    messages.

    Signed-off-by: Adrian Hunter
    ---
    drivers/mmc/card/block.c | 56 ++++++++++++++++++++++++++++++++++++++++-----
    1 files changed, 49 insertions(+), 7 deletions(-)


    Pierre

    This patch amends error messages for all error codes and is independent of SPI.
    Status values are not OR'd together from different commands as that would
    muddy the waters for anyone trying to understand the nature of the error.
    Waiting for the card to become ready after an error is also not included,
    as it is not in the standards and is a separate issue anyway.

    Regards
    Adrian


    diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
    index 3d067c3..9998718 100644
    --- a/drivers/mmc/card/block.c
    +++ b/drivers/mmc/card/block.c
    @@ -209,6 +209,27 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
    return blocks;
    }

    +static u32 get_card_status(struct mmc_card *card, struct request *req)
    +{
    + struct mmc_command cmd;
    + int err;
    +
    + /* SEND STATUS command is not supported by SDIO */
    + if (mmc_card_sdio(card))
    + return 0;
    +
    + memset(&cmd, 0, sizeof(struct mmc_command));
    + cmd.opcode = MMC_SEND_STATUS;
    + if (!mmc_host_is_spi(card->host))
    + cmd.arg = card->rca << 16;
    + cmd.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC;
    + err = mmc_wait_for_cmd(card->host, &cmd, 0);
    + if (err)
    + printk(KERN_ERR "%s: error %d sending status comand",
    + req->rq_disk->disk_name, err);
    + return cmd.resp[0];
    +}
    +
    static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    {
    struct mmc_blk_data *md = mq->data;
    @@ -220,7 +241,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)

    do {
    struct mmc_command cmd;
    - u32 readcmd, writecmd;
    + u32 readcmd, writecmd, status = 0;

    memset(&brq, 0, sizeof(struct mmc_blk_request));
    brq.mrq.cmd = &brq.cmd;
    @@ -275,19 +296,40 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
    * until later as we need to wait for the card to leave
    * programming mode even when things go wrong.
    */
    + if (brq.cmd.error || brq.data.error || brq.stop.error)
    + status = get_card_status(card, req);
    +
    if (brq.cmd.error) {
    - printk(KERN_ERR "%s: error %d sending read/write command\n",
    - req->rq_disk->disk_name, brq.cmd.error);
    + printk(KERN_ERR "%s: error %d sending read/write "
    + "command, response %#x, card status %#x\n",
    + req->rq_disk->disk_name, brq.cmd.error,
    + brq.cmd.resp[0], status);
    }

    if (brq.data.error) {
    - printk(KERN_ERR "%s: error %d transferring data\n",
    - req->rq_disk->disk_name, brq.data.error);
    + if (brq.data.error == -ETIMEDOUT) {
    + /* 'Stop' response contains card status */
    + if (brq.mrq.stop)
    + status = brq.mrq.stop->resp[0];
    + printk(KERN_ERR "%s: error transferring data,"
    + " sector %u, nr %u, card status %#x\n",
    + req->rq_disk->disk_name,
    + (unsigned)req->sector,
    + (unsigned)req->nr_sectors, status);
    + } else {
    + printk(KERN_ERR "%s: error %d transferring data,"
    + " sector %u, nr %u, card status %#x\n",
    + req->rq_disk->disk_name, brq.data.error,
    + (unsigned)req->sector,
    + (unsigned)req->nr_sectors, status);
    + }
    }

    if (brq.stop.error) {
    - printk(KERN_ERR "%s: error %d sending stop command\n",
    - req->rq_disk->disk_name, brq.stop.error);
    + printk(KERN_ERR "%s: error %d sending stop command, "
    + "response %#x, card status %#x\n",
    + req->rq_disk->disk_name, brq.stop.error,
    + brq.stop.resp[0], status);
    }

    if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
    --
    1.5.4.3

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/2] mmc_block: print better data error message after timeout

    Adrian Hunter wrote:
    > Add command response and card status to error
    > messages.
    >
    > Signed-off-by: Adrian Hunter
    > ---
    > drivers/mmc/card/block.c | 56
    > ++++++++++++++++++++++++++++++++++++++++-----
    > 1 files changed, 49 insertions(+), 7 deletions(-)
    >
    >
    > Pierre
    >
    > This patch amends error messages for all error codes and is independent
    > of SPI.
    > Status values are not OR'd together from different commands as that would
    > muddy the waters for anyone trying to understand the nature of the error.
    > Waiting for the card to become ready after an error is also not included,
    > as it is not in the standards and is a separate issue anyway.
    >
    > Regards
    > Adrian


    What is the status of this patch?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/2] mmc_block: print better data error message after timeout

    On Mon, 10 Nov 2008 10:20:23 +0200
    Adrian Hunter wrote:

    >
    > What is the status of this patch?
    >


    Waiting for me to have time to look closer at it.

    Rgds
    --
    -- Pierre Ossman

    WARNING: This correspondence is being monitored by the
    Swedish government. Make sure your server uses encryption
    for SMTP traffic and consider using PGP for end-to-end
    encryption.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.9 (GNU/Linux)

    iEYEARECAAYFAkkYBLAACgkQ7b8eESbyJLiKBACgrIl4cuULqg v34jFj+2v74Zfp
    NuEAmwYYxp43gCXMmBQMt87uEZGpJyVe
    =xJ5t
    -----END PGP SIGNATURE-----


+ Reply to Thread