Problems with the block-layer timeouts - Kernel

This is a discussion on Problems with the block-layer timeouts - Kernel ; James and Jens: I spent most of the day yesterday debugging some tricky problems in the new block-layer timeout scheme. Clearly it is in need of more work. A major reason for these problems was that there doesn't seem to ...

+ Reply to Thread
Results 1 to 18 of 18

Thread: Problems with the block-layer timeouts

  1. Problems with the block-layer timeouts

    James and Jens:

    I spent most of the day yesterday debugging some tricky problems in the
    new block-layer timeout scheme. Clearly it is in need of more work.

    A major reason for these problems was that there doesn't seem to be a
    clear a idea of when the timeout period should begin. In
    blk_add_timer() a comment says:

    * Each request has its own timer, and as it is added to the queue, we
    * set up the timer.

    On the other hand, elv_next_request() says:

    * We are now handing the request to the hardware,
    * add the timeout handler

    (Note that this comment is wrong for an additional reason.
    elv_next_request() doesn't hand the request to the hardware; it hands
    the request to a driver. What the driver chooses to do with the
    request is its own affair.)

    So when should the timeout begin? The most logical time is when the
    driver does send the request to the hardware. Of course, the block
    core has no way to know when that happens, so a suitable proxy might be
    when the request is removed from the block queue. (On the other hand,
    are there drivers which don't bother to dequeue a request until it has
    completed?) Either way, both the comments above and the actual code
    should be changed.


    The real source of the problems I encountered is in the SCSI midlayer.
    scsi_request_fn() can call elv_next_request() long before it is ready
    to begin executing the request. In particular, it does so before
    checking scsi_dev_queue_ready(). So if the lower-level driver can
    handle up to N simultaneous requests, the midlayer will call
    elv_next_request() N+1 times. The last request has to wait until one
    of the first N completes. Surely this waiting period doesn't deserve
    to be counted as part of the last request's timeout.

    In my case N was 1, and request #1 ended up being requeued. Requeuing
    restarts the timer; as a result, the timer for request #2 expired
    before request #1's second incarnation timed out. And this was before
    request #2 had even begun!

    That's not so terribly bad in itself. However the scsi_times_out()
    routine is completely unprepared to handle timeouts for requests that
    haven't yet been dispatched to the LLD. Or, for that matter, requests
    which have been returned by the LLD but were requeued and have not yet
    been sent back down.

    So what happened was that the midlayer tried to abort a request which
    hadn't started yet. Or, depending on the exact timing, it found itself
    being asked to abort two requests when only one was running, so it gave
    up and did nothing. One way leads to processes hanging, the other way
    leads to a system crash.

    How should this be fixed? It would help to call scsi_dev_queue_ready()
    before elv_next_request(), but that's not sufficient.
    scsi_times_out() needs to recognize that a timeout for a non-running
    request can be handled by directly returning BLK_EH_HANDLED. Right?


    While I'm on the subject, there are a few related items that could be
    improved. In my tests, I was generating I/O requests simply by doing

    dd if=/dev/sda ...

    I don't know where the timeouts for these requests are determined, but
    they were set to 60 seconds. That seems much too long.

    In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    the method pointer is NULL then req->deadline would be 0 anyway. In
    addition, req->deadline should be set to 0 and the end of the routine,
    just in case req gets requeued.

    In blk_add_timer(), the line

    expiry = round_jiffies(req->deadline);

    is not optimal. round_jiffies() will sometimes round a value _down_ to
    the nearest second. But blk_rq_timed_out_timer() tests whether
    req->deadline is in the past -- and if the deadline was rounded down
    then this won't be true the first time through. You wind up getting an
    unnecessary timer interrupt. Instead there should be a
    round_jiffies_up() utility routine, and it should be used in both
    blk_add_timer() and blk_rq_timed_out_timer().

    Alan Stern

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: Problems with the block-layer timeouts

    Alan Stern wrote:
    > I spent most of the day yesterday debugging some tricky problems in the
    > new block-layer timeout scheme. Clearly it is in need of more work.
    >
    > A major reason for these problems was that there doesn't seem to be a
    > clear a idea of when the timeout period should begin. In
    > blk_add_timer() a comment says:


    > How should this be fixed? It would help to call scsi_dev_queue_ready()
    > before elv_next_request(), but that's not sufficient.
    > scsi_times_out() needs to recognize that a timeout for a non-running
    > request can be handled by directly returning BLK_EH_HANDLED. Right?
    >
    >


    Tejun described a similar issue here.
    http://thread.gmane.org/gmane.linux.ide/35603

    And a fix to address the issue here.
    http://thread.gmane.org/gmane.linux.ide/35725

    Does the patch posted by Tejun address your issue?

    > While I'm on the subject, there are a few related items that could be
    > improved. In my tests, I was generating I/O requests simply by doing
    >
    > dd if=/dev/sda ...
    >
    > I don't know where the timeouts for these requests are determined, but
    > they were set to 60 seconds. That seems much too long.
    >


    It is set by a udev rule and the value is historical.
    http://thread.gmane.org/gmane.linux....31/focus=45646


    -andmike
    --
    Michael Anderson
    andmike@linux.vnet.ibm.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Problems with the block-layer timeouts

    On Sat, Nov 01 2008, Alan Stern wrote:
    > James and Jens:
    >
    > I spent most of the day yesterday debugging some tricky problems in the
    > new block-layer timeout scheme. Clearly it is in need of more work.


    It is.

    > A major reason for these problems was that there doesn't seem to be a
    > clear a idea of when the timeout period should begin. In
    > blk_add_timer() a comment says:
    >
    > * Each request has its own timer, and as it is added to the queue, we
    > * set up the timer.
    >
    > On the other hand, elv_next_request() says:
    >
    > * We are now handing the request to the hardware,
    > * add the timeout handler
    >
    > (Note that this comment is wrong for an additional reason.
    > elv_next_request() doesn't hand the request to the hardware; it hands
    > the request to a driver. What the driver chooses to do with the
    > request is its own affair.)
    >
    > So when should the timeout begin? The most logical time is when the
    > driver does send the request to the hardware. Of course, the block
    > core has no way to know when that happens, so a suitable proxy might be
    > when the request is removed from the block queue. (On the other hand,
    > are there drivers which don't bother to dequeue a request until it has
    > completed?) Either way, both the comments above and the actual code
    > should be changed.
    >
    >
    > The real source of the problems I encountered is in the SCSI midlayer.
    > scsi_request_fn() can call elv_next_request() long before it is ready
    > to begin executing the request. In particular, it does so before
    > checking scsi_dev_queue_ready(). So if the lower-level driver can
    > handle up to N simultaneous requests, the midlayer will call
    > elv_next_request() N+1 times. The last request has to wait until one
    > of the first N completes. Surely this waiting period doesn't deserve
    > to be counted as part of the last request's timeout.
    >
    > In my case N was 1, and request #1 ended up being requeued. Requeuing
    > restarts the timer; as a result, the timer for request #2 expired
    > before request #1's second incarnation timed out. And this was before
    > request #2 had even begun!
    >
    > That's not so terribly bad in itself. However the scsi_times_out()
    > routine is completely unprepared to handle timeouts for requests that
    > haven't yet been dispatched to the LLD. Or, for that matter, requests
    > which have been returned by the LLD but were requeued and have not yet
    > been sent back down.
    >
    > So what happened was that the midlayer tried to abort a request which
    > hadn't started yet. Or, depending on the exact timing, it found itself
    > being asked to abort two requests when only one was running, so it gave
    > up and did nothing. One way leads to processes hanging, the other way
    > leads to a system crash.
    >
    > How should this be fixed? It would help to call scsi_dev_queue_ready()
    > before elv_next_request(), but that's not sufficient.
    > scsi_times_out() needs to recognize that a timeout for a non-running
    > request can be handled by directly returning BLK_EH_HANDLED. Right?


    We already discussed this issue with Tejun. There's a hack in my tree
    now that just moves the activate call to dequeue time, which works ok
    for SCSI (but wont work for eg IDE). The real fix is to have a peek and
    fetch interface for retrieving requests. We've actually wanted that for
    some time, since the current 'peek and mark active' approach doesn't
    even work well now since it'll both force pushing of requests to the
    dispatch list and mark it unmergeable, since the block layer does not
    whether the driver has started handling the request or not.

    So, in summary, a short term fix will be merged soon and a longer term
    fix will be right after.

    > While I'm on the subject, there are a few related items that could be
    > improved. In my tests, I was generating I/O requests simply by doing
    >
    > dd if=/dev/sda ...
    >
    > I don't know where the timeouts for these requests are determined, but
    > they were set to 60 seconds. That seems much too long.


    Fully agreed, as Mike mentioned this actually looks like a dumb udev
    rule that didn't have any effect until this generic timeout work. For
    normal IO, something in the 10 second range is a lot more appropriate.

    > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > the method pointer is NULL then req->deadline would be 0 anyway. In
    > addition, req->deadline should be set to 0 and the end of the routine,
    > just in case req gets requeued.
    >
    > In blk_add_timer(), the line
    >
    > expiry = round_jiffies(req->deadline);
    >
    > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > the nearest second. But blk_rq_timed_out_timer() tests whether
    > req->deadline is in the past -- and if the deadline was rounded down
    > then this won't be true the first time through. You wind up getting an
    > unnecessary timer interrupt. Instead there should be a
    > round_jiffies_up() utility routine, and it should be used in both
    > blk_add_timer() and blk_rq_timed_out_timer().


    Very good point, we do indeed want a round_jiffies_up() for this!

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: Problems with the block-layer timeouts

    Jens Axboe wrote:
    >> While I'm on the subject, there are a few related items that could be
    >> improved. In my tests, I was generating I/O requests simply by doing
    >>
    >> dd if=/dev/sda ...
    >>
    >> I don't know where the timeouts for these requests are determined, but
    >> they were set to 60 seconds. That seems much too long.

    >
    > Fully agreed, as Mike mentioned this actually looks like a dumb udev
    > rule that didn't have any effect until this generic timeout work. For
    > normal IO, something in the 10 second range is a lot more appropriate.


    Yes and no. For direct-attach storage with no other initiators, ok. But
    for larger arrays, potentially with multiple initiators - no. I can
    name several arrays that depend on a 30 second timeout, and a few that,
    underload, require 60 seconds. I assume that there's usually "best
    practices" guides for the integrators to ensure the defaults are set right.

    -- james s
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: Problems with the block-layer timeouts

    Hi, Tejun! I ran across the same bug as you, but about a day later.

    On Mon, 3 Nov 2008, Jens Axboe wrote:

    > > So when should the timeout begin? The most logical time is when the
    > > driver does send the request to the hardware. Of course, the block
    > > core has no way to know when that happens, so a suitable proxy might be
    > > when the request is removed from the block queue. (On the other hand,
    > > are there drivers which don't bother to dequeue a request until it has
    > > completed?) Either way, both the comments above and the actual code
    > > should be changed.


    > We already discussed this issue with Tejun. There's a hack in my tree
    > now that just moves the activate call to dequeue time, which works ok
    > for SCSI (but wont work for eg IDE). The real fix is to have a peek and
    > fetch interface for retrieving requests. We've actually wanted that for
    > some time, since the current 'peek and mark active' approach doesn't
    > even work well now since it'll both force pushing of requests to the
    > dispatch list and mark it unmergeable, since the block layer does not
    > whether the driver has started handling the request or not.
    >
    > So, in summary, a short term fix will be merged soon and a longer term
    > fix will be right after.


    Even a "peek and fetch" interface might not be best, at least as far as
    timer issues are concerned. Ideally, the timer shouldn't be started
    until the SCSI midlayer knows that the request has successfully been
    sent to the lower-level driver.

    Therefore the best approach would be to EXPORT blk_add_timer(). It
    should be called at the end of scsi_dispatch_cmd(), when the return
    value from the queuecommand method is known to be 0.

    With something like this, Mike's fix to end_that_request_last()
    wouldn't be needed, since blkdev_dequeue_request() wouldn't
    automatically start the timer. It seems silly to start the timer when
    you know you're just going to stop it immediately afterwards.

    Alan Stern

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: Problems with the block-layer timeouts

    Hello, Alan Stern! :-)

    Alan Stern wrote:
    > Even a "peek and fetch" interface might not be best, at least as far as
    > timer issues are concerned. Ideally, the timer shouldn't be started
    > until the SCSI midlayer knows that the request has successfully been
    > sent to the lower-level driver.
    >
    > Therefore the best approach would be to EXPORT blk_add_timer(). It
    > should be called at the end of scsi_dispatch_cmd(), when the return
    > value from the queuecommand method is known to be 0.
    >
    > With something like this, Mike's fix to end_that_request_last()
    > wouldn't be needed, since blkdev_dequeue_request() wouldn't
    > automatically start the timer. It seems silly to start the timer when
    > you know you're just going to stop it immediately afterwards.


    Block layer currently doesn't know when a request is actually being
    issued. For timeout, blk_add_timer() can be exported but I think that
    only aggravate the already highly fragmented block layer interface
    (different users use it differently to the point of scary chaos). For
    minor example, block tracing considers elv_next_request() as the command
    issue point which isn't quite true for SCSI and many other drivers. For
    that too, we can export the tracing interface but I don't think that's
    the right direction. More stuff are scheduled to be moved to block
    layer and exporting more and more implementation details to block layer
    users will have hard time scaling.

    I'm trying to convert all drivers to use the same command issue model -
    elv_next_request() -> blkdev_dequeue_request() on actual issue ->
    blk_end_request(). I have first draft of the conversion patchset but
    it's gonna take me a few more days to review and test what I can as
    several drivers (mostly legacy ones) are a bit tricky.

    For the time being, SCSI layer is the only block layer timeout user and
    completion w/o dequeuing is only for error cases in SCSI, so the
    inefficiency there shouldn't matter too much.

    Jens, what do you think?

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: Problems with the block-layer timeouts

    On Tue, 4 Nov 2008, Tejun Heo wrote:

    > Hello, Alan Stern! :-)
    >
    > Alan Stern wrote:
    > > Even a "peek and fetch" interface might not be best, at least as far as
    > > timer issues are concerned. Ideally, the timer shouldn't be started
    > > until the SCSI midlayer knows that the request has successfully been
    > > sent to the lower-level driver.
    > >
    > > Therefore the best approach would be to EXPORT blk_add_timer(). It
    > > should be called at the end of scsi_dispatch_cmd(), when the return
    > > value from the queuecommand method is known to be 0.
    > >
    > > With something like this, Mike's fix to end_that_request_last()
    > > wouldn't be needed, since blkdev_dequeue_request() wouldn't
    > > automatically start the timer. It seems silly to start the timer when
    > > you know you're just going to stop it immediately afterwards.

    >
    > Block layer currently doesn't know when a request is actually being
    > issued. For timeout, blk_add_timer() can be exported but I think that
    > only aggravate the already highly fragmented block layer interface
    > (different users use it differently to the point of scary chaos). For
    > minor example, block tracing considers elv_next_request() as the command
    > issue point which isn't quite true for SCSI and many other drivers. For
    > that too, we can export the tracing interface but I don't think that's
    > the right direction. More stuff are scheduled to be moved to block
    > layer and exporting more and more implementation details to block layer
    > users will have hard time scaling.
    >
    > I'm trying to convert all drivers to use the same command issue model -
    > elv_next_request() -> blkdev_dequeue_request() on actual issue ->
    > blk_end_request(). I have first draft of the conversion patchset but
    > it's gonna take me a few more days to review and test what I can as
    > several drivers (mostly legacy ones) are a bit tricky.
    >
    > For the time being, SCSI layer is the only block layer timeout user and
    > completion w/o dequeuing is only for error cases in SCSI, so the
    > inefficiency there shouldn't matter too much.


    In fact, I have changed my mind. Starting the timer after the command
    has been sent to the low-level driver would mean that the command might
    finish before the timer was started!

    So never mind. I did confirm at least that your patch together with
    Mike's fixed the problem I encountered last week.

    I have a couple of small fixes for the block timer routines. They'll
    get posted separately later on.

    Alan Stern

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: Problems with the block-layer timeouts

    On Mon, Nov 03 2008, James Smart wrote:
    > Jens Axboe wrote:
    > >>While I'm on the subject, there are a few related items that could be
    > >>improved. In my tests, I was generating I/O requests simply by doing
    > >>
    > >> dd if=/dev/sda ...
    > >>
    > >>I don't know where the timeouts for these requests are determined, but
    > >>they were set to 60 seconds. That seems much too long.

    > >
    > >Fully agreed, as Mike mentioned this actually looks like a dumb udev
    > >rule that didn't have any effect until this generic timeout work. For
    > >normal IO, something in the 10 second range is a lot more appropriate.

    >
    > Yes and no. For direct-attach storage with no other initiators, ok.
    > But for larger arrays, potentially with multiple initiators - no. I
    > can name several arrays that depend on a 30 second timeout, and a few
    > that, underload, require 60 seconds. I assume that there's usually
    > "best practices" guides for the integrators to ensure the defaults are
    > set right.


    Sure I agree, it depends on what kind of storage you have. What I mean
    is that for a normal disk you want something like 10 seconds.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: Problems with the block-layer timeouts

    On Tue, Nov 04 2008, Tejun Heo wrote:
    > I'm trying to convert all drivers to use the same command issue model -
    > elv_next_request() -> blkdev_dequeue_request() on actual issue ->
    > blk_end_request(). I have first draft of the conversion patchset but
    > it's gonna take me a few more days to review and test what I can as
    > several drivers (mostly legacy ones) are a bit tricky.


    Don't do that, please. What we need to do is ensure that the model fits
    with whatever the driver wants to do there, and for some drivers it's
    actually a lot easier to rely on elv_next_request() returning the same
    requests again instead of keeping track of it yourself. So any patch
    that enforces the model that you must dequeue before starting the
    request, will get a big nak from me.

    What we need to do is add a 'peek' mode only, which doesn't start the
    request. Along with something to mark it started that the driver can
    use, or it can just use elv_next_request() of course.

    > For the time being, SCSI layer is the only block layer timeout user and
    > completion w/o dequeuing is only for error cases in SCSI, so the
    > inefficiency there shouldn't matter too much.


    Agree, for now we are ok since it's just SCSI.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: Problems with the block-layer timeouts

    Jens Axboe wrote:
    > On Tue, Nov 04 2008, Tejun Heo wrote:
    >> I'm trying to convert all drivers to use the same command issue model -
    >> elv_next_request() -> blkdev_dequeue_request() on actual issue ->
    >> blk_end_request(). I have first draft of the conversion patchset but
    >> it's gonna take me a few more days to review and test what I can as
    >> several drivers (mostly legacy ones) are a bit tricky.

    >
    > Don't do that, please. What we need to do is ensure that the model
    > fits with whatever the driver wants to do there, and for some
    > drivers it's actually a lot easier to rely on elv_next_request()
    > returning the same requests again instead of keeping track of it
    > yourself. So any patch that enforces the model that you must
    > dequeue before starting the request, will get a big nak from me.


    I audited every user of elv_next_request() and converting them to
    follow a single model isn't that diffcult. Most drivers which peek at
    the top of the queue already have the current rq pointer and they are
    not difficult to convert as it's guaranteed that they process one
    request at any given time and nothings else till the current request
    is finished. Usually, the only thing that's required is to check
    whether the previous request is completely done before fetching the
    next one.

    Model which fits to different driver's usage models is a good goal but
    as with everything else it needs to be balanced with other goals, and
    while enforcing single fetch/completion model to drivers doesn't cost
    much to its users, it does hurt the block layer abstraction. I think
    preemptive NACK is a bit too hasty. With sensible helper routines, I
    think we can satisfy both block layer and the legacy users of it.

    > What we need to do is add a 'peek' mode only, which doesn't start the
    > request. Along with something to mark it started that the driver can
    > use, or it can just use elv_next_request() of course.


    I don't agree adding more interfaces is a good idea when the problem
    is lack of uniformity. There is no inherent convenience or advantage
    of having two different operation modes but there are quite some
    number of rough edges added thinking about only one model.
    BLK_TA_ISSUE was one minor example. Another one would be the prep
    function and the DONTPREP flag and now the timer. Elevator accounting
    becomes different for the different drivers. Padding adjustment
    should be done right before low level driver issues the command and
    undone on requeue but it can'be done that way now and so on.

    Thanks.

    --
    tejun
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: Problems with the block-layer timeouts

    On Mon, 3 Nov 2008 09:52:48 +0100
    Jens Axboe wrote:

    > > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > > the method pointer is NULL then req->deadline would be 0 anyway. In
    > > addition, req->deadline should be set to 0 and the end of the routine,
    > > just in case req gets requeued.
    > >
    > > In blk_add_timer(), the line
    > >
    > > expiry = round_jiffies(req->deadline);
    > >
    > > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > > the nearest second. But blk_rq_timed_out_timer() tests whether
    > > req->deadline is in the past -- and if the deadline was rounded down
    > > then this won't be true the first time through. You wind up getting an
    > > unnecessary timer interrupt. Instead there should be a
    > > round_jiffies_up() utility routine, and it should be used in both
    > > blk_add_timer() and blk_rq_timed_out_timer().

    >
    > Very good point, we do indeed want a round_jiffies_up() for this!


    Just out of curiosity, why do we need to use round_jiffies here? We
    didn't do that for SCSI, right?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: Problems with the block-layer timeouts

    On Thu, Nov 06 2008, FUJITA Tomonori wrote:
    > On Mon, 3 Nov 2008 09:52:48 +0100
    > Jens Axboe wrote:
    >
    > > > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > > > the method pointer is NULL then req->deadline would be 0 anyway. In
    > > > addition, req->deadline should be set to 0 and the end of the routine,
    > > > just in case req gets requeued.
    > > >
    > > > In blk_add_timer(), the line
    > > >
    > > > expiry = round_jiffies(req->deadline);
    > > >
    > > > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > > > the nearest second. But blk_rq_timed_out_timer() tests whether
    > > > req->deadline is in the past -- and if the deadline was rounded down
    > > > then this won't be true the first time through. You wind up getting an
    > > > unnecessary timer interrupt. Instead there should be a
    > > > round_jiffies_up() utility routine, and it should be used in both
    > > > blk_add_timer() and blk_rq_timed_out_timer().

    > >
    > > Very good point, we do indeed want a round_jiffies_up() for this!

    >
    > Just out of curiosity, why do we need to use round_jiffies here? We
    > didn't do that for SCSI, right?


    We don't have to, but given that we don't care about exact timeouts, we
    may as well. It's not a new thing, we've done that since pretty much the
    beginning of the generic timeout development.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: Problems with the block-layer timeouts

    On Thu, 6 Nov 2008 08:23:54 +0100
    Jens Axboe wrote:

    > On Thu, Nov 06 2008, FUJITA Tomonori wrote:
    > > On Mon, 3 Nov 2008 09:52:48 +0100
    > > Jens Axboe wrote:
    > >
    > > > > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > > > > the method pointer is NULL then req->deadline would be 0 anyway. In
    > > > > addition, req->deadline should be set to 0 and the end of the routine,
    > > > > just in case req gets requeued.
    > > > >
    > > > > In blk_add_timer(), the line
    > > > >
    > > > > expiry = round_jiffies(req->deadline);
    > > > >
    > > > > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > > > > the nearest second. But blk_rq_timed_out_timer() tests whether
    > > > > req->deadline is in the past -- and if the deadline was rounded down
    > > > > then this won't be true the first time through. You wind up getting an
    > > > > unnecessary timer interrupt. Instead there should be a
    > > > > round_jiffies_up() utility routine, and it should be used in both
    > > > > blk_add_timer() and blk_rq_timed_out_timer().
    > > >
    > > > Very good point, we do indeed want a round_jiffies_up() for this!

    > >
    > > Just out of curiosity, why do we need to use round_jiffies here? We
    > > didn't do that for SCSI, right?

    >
    > We don't have to, but given that we don't care about exact timeouts, we
    > may as well. It's not a new thing, we've done that since pretty much the
    > beginning of the generic timeout development.


    I'm not sure that the users of the timeout feature can control exact
    timeouts because the block layer doesn't let the users directly play
    with the timer. elv_dequeue_request() is not the exact time that the
    users want to start the timer. Instead, the block layer hides the
    details behind the elevator (note that as I said before, I think that
    it's the right thing). So the round_jiffies in the block layer doesn't
    make sense to me. I prefer remove them instead of adding a bunch of
    round_jiffies_up_* (I bet that some of them will never be used).
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: Problems with the block-layer timeouts

    On Fri, Nov 07 2008, FUJITA Tomonori wrote:
    > On Thu, 6 Nov 2008 08:23:54 +0100
    > Jens Axboe wrote:
    >
    > > On Thu, Nov 06 2008, FUJITA Tomonori wrote:
    > > > On Mon, 3 Nov 2008 09:52:48 +0100
    > > > Jens Axboe wrote:
    > > >
    > > > > > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > > > > > the method pointer is NULL then req->deadline would be 0 anyway. In
    > > > > > addition, req->deadline should be set to 0 and the end of the routine,
    > > > > > just in case req gets requeued.
    > > > > >
    > > > > > In blk_add_timer(), the line
    > > > > >
    > > > > > expiry = round_jiffies(req->deadline);
    > > > > >
    > > > > > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > > > > > the nearest second. But blk_rq_timed_out_timer() tests whether
    > > > > > req->deadline is in the past -- and if the deadline was rounded down
    > > > > > then this won't be true the first time through. You wind up getting an
    > > > > > unnecessary timer interrupt. Instead there should be a
    > > > > > round_jiffies_up() utility routine, and it should be used in both
    > > > > > blk_add_timer() and blk_rq_timed_out_timer().
    > > > >
    > > > > Very good point, we do indeed want a round_jiffies_up() for this!
    > > >
    > > > Just out of curiosity, why do we need to use round_jiffies here? We
    > > > didn't do that for SCSI, right?

    > >
    > > We don't have to, but given that we don't care about exact timeouts, we
    > > may as well. It's not a new thing, we've done that since pretty much the
    > > beginning of the generic timeout development.

    >
    > I'm not sure that the users of the timeout feature can control exact
    > timeouts because the block layer doesn't let the users directly play
    > with the timer. elv_dequeue_request() is not the exact time that the
    > users want to start the timer. Instead, the block layer hides the
    > details behind the elevator (note that as I said before, I think that
    > it's the right thing). So the round_jiffies in the block layer doesn't
    > make sense to me. I prefer remove them instead of adding a bunch of
    > round_jiffies_up_* (I bet that some of them will never be used).


    I don't understand your concern, to be honest. We only need to round up
    once, and that is when we add/mod the timer. And we do that simply to
    play nice and group the timout with other timers, to save a bit of
    power.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: Problems with the block-layer timeouts

    On Fri, 7 Nov 2008 12:24:50 +0100
    Jens Axboe wrote:

    > On Fri, Nov 07 2008, FUJITA Tomonori wrote:
    > > On Thu, 6 Nov 2008 08:23:54 +0100
    > > Jens Axboe wrote:
    > >
    > > > On Thu, Nov 06 2008, FUJITA Tomonori wrote:
    > > > > On Mon, 3 Nov 2008 09:52:48 +0100
    > > > > Jens Axboe wrote:
    > > > >
    > > > > > > In blk_del_timer(), there's no reason to test q->rq_timed_out_fn. If
    > > > > > > the method pointer is NULL then req->deadline would be 0 anyway. In
    > > > > > > addition, req->deadline should be set to 0 and the end of the routine,
    > > > > > > just in case req gets requeued.
    > > > > > >
    > > > > > > In blk_add_timer(), the line
    > > > > > >
    > > > > > > expiry = round_jiffies(req->deadline);
    > > > > > >
    > > > > > > is not optimal. round_jiffies() will sometimes round a value _down_ to
    > > > > > > the nearest second. But blk_rq_timed_out_timer() tests whether
    > > > > > > req->deadline is in the past -- and if the deadline was rounded down
    > > > > > > then this won't be true the first time through. You wind up getting an
    > > > > > > unnecessary timer interrupt. Instead there should be a
    > > > > > > round_jiffies_up() utility routine, and it should be used in both
    > > > > > > blk_add_timer() and blk_rq_timed_out_timer().
    > > > > >
    > > > > > Very good point, we do indeed want a round_jiffies_up() for this!
    > > > >
    > > > > Just out of curiosity, why do we need to use round_jiffies here? We
    > > > > didn't do that for SCSI, right?
    > > >
    > > > We don't have to, but given that we don't care about exact timeouts, we
    > > > may as well. It's not a new thing, we've done that since pretty much the
    > > > beginning of the generic timeout development.

    > >
    > > I'm not sure that the users of the timeout feature can control exact
    > > timeouts because the block layer doesn't let the users directly play
    > > with the timer. elv_dequeue_request() is not the exact time that the
    > > users want to start the timer. Instead, the block layer hides the
    > > details behind the elevator (note that as I said before, I think that
    > > it's the right thing). So the round_jiffies in the block layer doesn't
    > > make sense to me. I prefer remove them instead of adding a bunch of
    > > round_jiffies_up_* (I bet that some of them will never be used).

    >
    > I don't understand your concern, to be honest. We only need to round up
    > once, and that is when we add/mod the timer. And we do that simply to
    > play nice and group the timout with other timers, to save a bit of
    > power.


    I don't worry about anything. I just think that these round_jiffies_up
    are pointless because they were added for the block-layer users that
    care about exact timeouts, however the block-layer doesn't export
    blk_add_timer() so the block-layer users can't control the exact time
    when the timer starts. So doing round_jiffies_up calculation per every
    request doesn't make sense for me.

    Anyway, it's trivial. If you like these round_jiffies_up, it's fine by
    me.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: Problems with the block-layer timeouts

    On Tue, 11 Nov 2008, FUJITA Tomonori wrote:

    > I don't worry about anything. I just think that these round_jiffies_up
    > are pointless because they were added for the block-layer users that
    > care about exact timeouts, however the block-layer doesn't export
    > blk_add_timer() so the block-layer users can't control the exact time
    > when the timer starts. So doing round_jiffies_up calculation per every
    > request doesn't make sense for me.


    In fact the round_jiffies_up() routines were added for other users as
    well as the block layer. However none of the others could be changed
    until the routines were merged. Now that the routines are in the
    mainline, you should see them start to be called in multiple places.

    Also, the users of the block layer _don't_ care about exact timeouts.
    That's an important aspect of round_jiffies() and round_jiffies_up() --
    you don't use them if you want an exact timeout.

    The reason for using round_jiffies() is to insure that the timeout
    will occur at a 1-second boundary. If several timeouts are set for
    about the same time and they all use round_jiffies() or
    round_jiffies_up(), then they will all occur at the same tick instead
    of spread out among several different ticks during the course of that
    1-second interval. As a result, the system will need to wake up only
    once to service all those timeouts, instead of waking up several
    different times. It is a power-saving scheme.

    Alan Stern

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: Problems with the block-layer timeouts

    On Tue, Nov 11 2008, Alan Stern wrote:
    > On Tue, 11 Nov 2008, FUJITA Tomonori wrote:
    >
    > > I don't worry about anything. I just think that these round_jiffies_up
    > > are pointless because they were added for the block-layer users that
    > > care about exact timeouts, however the block-layer doesn't export
    > > blk_add_timer() so the block-layer users can't control the exact time
    > > when the timer starts. So doing round_jiffies_up calculation per every
    > > request doesn't make sense for me.

    >
    > In fact the round_jiffies_up() routines were added for other users as
    > well as the block layer. However none of the others could be changed
    > until the routines were merged. Now that the routines are in the
    > mainline, you should see them start to be called in multiple places.
    >
    > Also, the users of the block layer _don't_ care about exact timeouts.
    > That's an important aspect of round_jiffies() and round_jiffies_up() --
    > you don't use them if you want an exact timeout.
    >
    > The reason for using round_jiffies() is to insure that the timeout
    > will occur at a 1-second boundary. If several timeouts are set for
    > about the same time and they all use round_jiffies() or
    > round_jiffies_up(), then they will all occur at the same tick instead
    > of spread out among several different ticks during the course of that
    > 1-second interval. As a result, the system will need to wake up only
    > once to service all those timeouts, instead of waking up several
    > different times. It is a power-saving scheme.


    I can't add anything else, can't say it any better either. The main
    point of using round_jiffies_up() is to align with other timers. I don't
    understand why you (Tomo) think that timeouts are exact? They really are
    not, and within the same second is quite adequate here.

    --
    Jens Axboe

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: Problems with the block-layer timeouts

    On Tue, 11 Nov 2008 20:19:36 +0100
    Jens Axboe wrote:

    > On Tue, Nov 11 2008, Alan Stern wrote:
    > > On Tue, 11 Nov 2008, FUJITA Tomonori wrote:
    > >
    > > > I don't worry about anything. I just think that these round_jiffies_up
    > > > are pointless because they were added for the block-layer users that
    > > > care about exact timeouts, however the block-layer doesn't export
    > > > blk_add_timer() so the block-layer users can't control the exact time
    > > > when the timer starts. So doing round_jiffies_up calculation per every
    > > > request doesn't make sense for me.

    > >
    > > In fact the round_jiffies_up() routines were added for other users as
    > > well as the block layer. However none of the others could be changed
    > > until the routines were merged. Now that the routines are in the
    > > mainline, you should see them start to be called in multiple places.
    > >
    > > Also, the users of the block layer _don't_ care about exact timeouts.
    > > That's an important aspect of round_jiffies() and round_jiffies_up() --
    > > you don't use them if you want an exact timeout.
    > >
    > > The reason for using round_jiffies() is to insure that the timeout
    > > will occur at a 1-second boundary. If several timeouts are set for
    > > about the same time and they all use round_jiffies() or
    > > round_jiffies_up(), then they will all occur at the same tick instead
    > > of spread out among several different ticks during the course of that
    > > 1-second interval. As a result, the system will need to wake up only
    > > once to service all those timeouts, instead of waking up several
    > > different times. It is a power-saving scheme.


    Hmm, but for 99.9% of the cases, the timeout of the block layer
    doesn't expire, the timeout rarely happens. The power-saving scheme
    can be applied to only 0.1%, but at the cost of the round_jiffies
    overhead per every request.

    If I understand correctly, round_jiffies() is designed for timers that
    will expire, such as periodic checking. The power-saving scheme nicely
    works for such usages.


    > I can't add anything else, can't say it any better either. The main
    > point of using round_jiffies_up() is to align with other timers. I don't
    > understand why you (Tomo) think that timeouts are exact? They really are
    > not, and within the same second is quite adequate here.


    My exact argument is for switching from round_jiffies() to
    round_jiffies_up. But I wrote above, in the first place, the
    round_jiffies didn't make sense to me.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread