[patch 0/4] firewire: keeping accesses to fw_transaction in check - Kernel

This is a discussion on [patch 0/4] firewire: keeping accesses to fw_transaction in check - Kernel ; Jay Fenlason pointed me to a race condition in firewire-core; here are some patches for it: [1/4] firewire: fix race of bus reset with request transmission [2/4] firewire: fully initialize fw_transaction before marking it pending [3/4] firewire: small fw_fill_request cleanup ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [patch 0/4] firewire: keeping accesses to fw_transaction in check

  1. [patch 0/4] firewire: keeping accesses to fw_transaction in check

    Jay Fenlason pointed me to a race condition in firewire-core; here are
    some patches for it:

    [1/4] firewire: fix race of bus reset with request transmission
    [2/4] firewire: fully initialize fw_transaction before marking it pending
    [3/4] firewire: small fw_fill_request cleanup
    [4/4] firewire: warn on unfinished transactions during card removal

    drivers/firewire/fw-card.c | 2 +-
    drivers/firewire/fw-topology.c | 2 --
    drivers/firewire/fw-transaction.c | 21 +++++++++------------
    3 files changed, 10 insertions(+), 15 deletions(-)

    Both the diffstat's +/- balance and initial testing look fine so far.
    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [patch 1/4] firewire: fix race of bus reset with request transmission

    Reported by Jay Fenlason: A bus reset tasklet may call
    fw_flush_transactions and touch transactions (call their callback which
    will free them) while the context which submitted the transaction is
    still inserting it into the transmission queue.

    A simple solution to this problem is to _not_ "flush" the transactions
    because of a bus reset (complete the transcations as 'cancelled'). They
    will now simply time out (completed as 'cancelled' by the split-timeout
    timer).

    Jay Fenlason thought of this fix too but I was quicker to type it out.
    :-)

    Background:
    Contexts which access an instance of struct fw_transaction are:
    1. the submitter, until it inserted the packet which is embedded in the
    transaction into the AT req DMA,
    2. the AsReqTrContext tasklet when the request packet was acked by the
    responder node or transmission to the responder failed,
    3. the AsRspRcvContext tasklet when it found a request which matched
    an incoming response,
    4. the card->flush_timer when it picks up timed-out transactions to
    cancel them,
    5. the bus reset tasklet when it cancels transactions (this access is
    eliminated by this patch),
    6. a process which shuts down an fw_card (unregisters it from fw-core
    when the controller is unbound from fw-ohci) --- although in this
    case there shouldn't really be any transactions anymore because we
    wait until all card users finished their business with the card.

    All of these contexts run concurrently (except for the 6th, presumably).
    The 1st is safe against the 2nd and 3rd because of the way how a request
    packet is carefully submitted to the hardware. A race between 2nd and
    3rd has been fixed a while ago (bug 9617). The 4th is almost safe
    against 1st, 2nd, 3rd; there are issues with it if huge scheduling
    latencies occur, to be fixed separately. The 5th looks safe against
    2nd, 3rd, and 4th but is unsafe against 1st. Maybe this could be fixed
    with an explicit state variable in struct fw_transaction. But this
    would require fw_transaction to be rewritten as only dynamically
    allocatable object with reference counting --- not a good solution if we
    also can simply kill this 5th accessing context (replace it by the 4th).

    Signed-off-by: Stefan Richter
    ---
    drivers/firewire/fw-topology.c | 2 --
    1 file changed, 2 deletions(-)

    Index: linux/drivers/firewire/fw-topology.c
    ================================================== =================
    --- linux.orig/drivers/firewire/fw-topology.c
    +++ linux/drivers/firewire/fw-topology.c
    @@ -510,8 +510,6 @@ fw_core_handle_bus_reset(struct fw_card
    struct fw_node *local_node;
    unsigned long flags;

    - fw_flush_transactions(card);
    -
    /*
    * If the selfID buffer is not the immediate successor of the
    * previously processed one, we cannot reliably compare the

    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [patch 3/4] firewire: small fw_fill_request cleanup

    - better name for a function argument
    - removal of a local variable which became unnecessary after
    "fully initialize fw_transaction before marking it pending"

    Signed-off-by: Stefan Richter
    ---
    drivers/firewire/fw-transaction.c | 12 +++++-------
    1 file changed, 5 insertions(+), 7 deletions(-)

    Index: linux/drivers/firewire/fw-transaction.c
    ================================================== =================
    --- linux.orig/drivers/firewire/fw-transaction.c
    +++ linux/drivers/firewire/fw-transaction.c
    @@ -151,7 +151,7 @@ transmit_complete_callback(struct fw_pac

    static void
    fw_fill_request(struct fw_packet *packet, int tcode, int tlabel,
    - int node_id, int source_id, int generation, int speed,
    + int destination_id, int source_id, int generation, int speed,
    unsigned long long offset, void *payload, size_t length)
    {
    int ext_tcode;
    @@ -166,7 +166,7 @@ fw_fill_request(struct fw_packet *packet
    HEADER_RETRY(RETRY_X) |
    HEADER_TLABEL(tlabel) |
    HEADER_TCODE(tcode) |
    - HEADER_DESTINATION(node_id);
    + HEADER_DESTINATION(destination_id);
    packet->header[1] =
    HEADER_OFFSET_HIGH(offset >> 32) | HEADER_SOURCE(source_id);
    packet->header[2] =
    @@ -252,7 +252,7 @@ fw_send_request(struct fw_card *card, st
    fw_transaction_callback_t callback, void *callback_data)
    {
    unsigned long flags;
    - int tlabel, source;
    + int tlabel;

    /*
    * Bump the flush timer up 100ms first of all so we
    @@ -268,7 +268,6 @@ fw_send_request(struct fw_card *card, st

    spin_lock_irqsave(&card->lock, flags);

    - source = card->node_id;
    tlabel = card->current_tlabel;
    if (card->tlabel_mask & (1 << tlabel)) {
    spin_unlock_irqrestore(&card->lock, flags);
    @@ -284,9 +283,8 @@ fw_send_request(struct fw_card *card, st
    t->callback = callback;
    t->callback_data = callback_data;

    - fw_fill_request(&t->packet, tcode, t->tlabel,
    - node_id, source, generation,
    - speed, offset, payload, length);
    + fw_fill_request(&t->packet, tcode, t->tlabel, node_id, card->node_id,
    + generation, speed, offset, payload, length);
    t->packet.callback = transmit_complete_callback;

    list_add_tail(&t->link, &card->transaction_list);

    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [patch 4/4] firewire: warn on unfinished transactions during card removal

    After card->done and card->work are completed, any remaining pending
    request would be a bug. We cannot safely complete a transaction at
    that point anymore.

    IOW card users must not drop their last fw_card reference (usually
    indirect references through fw_device references) before their last
    outbound transaction through that card was finished.

    Signed-off-by: Stefan Richter
    ---
    drivers/firewire/fw-card.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    Index: linux/drivers/firewire/fw-card.c
    ================================================== =================
    --- linux.orig/drivers/firewire/fw-card.c
    +++ linux/drivers/firewire/fw-card.c
    @@ -539,7 +539,7 @@ fw_core_remove_card(struct fw_card *card
    wait_for_completion(&card->done);

    cancel_delayed_work_sync(&card->work);
    - fw_flush_transactions(card);
    + WARN_ON(!list_empty(&card->transaction_list));
    del_timer_sync(&card->flush_timer);
    }
    EXPORT_SYMBOL(fw_core_remove_card);

    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [patch 2/4] firewire: fully initialize fw_transaction before marking it pending

    In theory, card->flush_timer could already access a transaction between
    fw_send_request()'s spin_unlock_irqrestore and the rest of what happens
    in fw_send_request(). This would happen if the process which sends the
    request is preempted and put to sleep right after spin_unlock_irqrestore
    for longer than 100ms.

    Therefore we fill in everything in struct fw_transaction at which the
    flush_timer might look at before we lift the lock.

    To do: Ensure that the timer does not pick up the transaction before
    the time of the AT request event plus split transaction timeout.

    Signed-off-by: Stefan Richter
    ---
    drivers/firewire/fw-transaction.c | 9 ++++-----
    1 file changed, 4 insertions(+), 5 deletions(-)

    Index: linux/drivers/firewire/fw-transaction.c
    ================================================== =================
    --- linux.orig/drivers/firewire/fw-transaction.c
    +++ linux/drivers/firewire/fw-transaction.c
    @@ -279,11 +279,6 @@ fw_send_request(struct fw_card *card, st
    card->current_tlabel = (card->current_tlabel + 1) & 0x1f;
    card->tlabel_mask |= (1 << tlabel);

    - list_add_tail(&t->link, &card->transaction_list);
    -
    - spin_unlock_irqrestore(&card->lock, flags);
    -
    - /* Initialize rest of transaction, fill out packet and send it. */
    t->node_id = node_id;
    t->tlabel = tlabel;
    t->callback = callback;
    @@ -294,6 +289,10 @@ fw_send_request(struct fw_card *card, st
    speed, offset, payload, length);
    t->packet.callback = transmit_complete_callback;

    + list_add_tail(&t->link, &card->transaction_list);
    +
    + spin_unlock_irqrestore(&card->lock, flags);
    +
    card->driver->send_request(card, &t->packet);
    }
    EXPORT_SYMBOL(fw_send_request);

    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [patch 1/4] firewire: fix race of bus reset with request transmission

    I wrote:
    > Contexts which access an instance of struct fw_transaction are:
    > 1. the submitter, until it inserted the packet which is embedded in the
    > transaction into the AT req DMA,
    > 2. the AsReqTrContext tasklet when the request packet was acked by the
    > responder node or transmission to the responder failed,
    > 3. the AsRspRcvContext tasklet when it found a request which matched
    > an incoming response,
    > 4. the card->flush_timer when it picks up timed-out transactions to
    > cancel them,
    > 5. the bus reset tasklet when it cancels transactions (this access is
    > eliminated by this patch),
    > 6. a process which shuts down an fw_card (unregisters it from fw-core
    > when the controller is unbound from fw-ohci) --- although in this
    > case there shouldn't really be any transactions anymore because we
    > wait until all card users finished their business with the card.
    >
    > All of these contexts run concurrently (except for the 6th, presumably).


    PS: Actually the three tasklets (2nd, 3rd, 5th context) are all
    scheduled on the same CPU, hence are serialized. Therefore

    > The 5th looks safe against 2nd, 3rd [...] but is unsafe against 1st.


    5th is serialized against 4th and 6th by lock protection of
    card->transaction_list accesses.
    --
    Stefan Richter
    -=====-==--- -=== -==--
    http://arcgraph.de/sr/

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread