From: Yu Kuai
mainline inclusion
from mainline-next-20211018
commit d14b304f558f8c8f53da3a8d0c0b671f14a9c2f4
category: bugfix
issue: #I4NRS5
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Signed-off-by: Yu Changchun
---------------------------
commit cddce0116058 ("nbd: Aovid double completion of a request")
try to fix that nbd_clear_que() and recv_work() can complete a
request concurrently. However, the problem still exists:
t1 t2 t3
nbd_disconnect_and_put
flush_workqueue
recv_work
blk_mq_complete_request
blk_mq_complete_request_remote -> this is true
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
blk_mq_raise_softirq
blk_done_softirq
blk_complete_reqs
nbd_complete_rq
blk_mq_end_request
blk_mq_free_request
WRITE_ONCE(rq->state, MQ_RQ_IDLE)
nbd_clear_que
blk_mq_tagset_busy_iter
nbd_clear_req
__blk_mq_free_request
blk_mq_put_tag
blk_mq_complete_request -> complete again
There are three places where request can be completed in nbd:
recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
all hold cmd->lock before completing the request, it's easy to
avoid the problem by setting and checking a cmd flag.
Signed-off-by: Yu Kuai
Reviewed-by: Ming Lei
Reviewed-by: Josef Bacik
Link: https://lore.kernel.org/r/20210916093350.1410403-3-yukuai3@huawei.com
Signed-off-by: Jens Axboe
Reviewed-by: Jason Yan
Signed-off-by: Chen Jun
Signed-off-by: Zheng Zengkai
Signed-off-by: Yu Changchun
---
drivers/block/nbd.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 1b3718238588..7271541558b6 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -395,7 +395,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
if (!mutex_trylock(&cmd->lock))
return BLK_EH_RESET_TIMER;
- __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+ if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+ mutex_unlock(&cmd->lock);
+ return BLK_EH_DONE;
+ }
+
if (!refcount_inc_not_zero(&nbd->config_refs)) {
cmd->status = BLK_STS_TIMEOUT;
mutex_unlock(&cmd->lock);
@@ -830,7 +834,10 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
return true;
mutex_lock(&cmd->lock);
- __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags);
+ if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) {
+ mutex_unlock(&cmd->lock);
+ return true;
+ }
cmd->status = BLK_STS_IOERR;
mutex_unlock(&cmd->lock);
--
2.25.1