From: Xiongfeng Wang
ohos inclusion
category: bugfix
issue: #I3ZXZF
CVE: NA
---------------------------------
When we print system information by echo 't' into 'sysrq-trigger' on
several cores at the same time, we got the following calltrace.
[ 1352.854632] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 1352.854633] Modules linked in: nf_log_arp nf_log_ipv6 nf_log_ipv4 nf_log_common binfmt_misc salsa20_generic camellia_generic cast6_generic cast_common rfkill serpent_generic twofish_generic twofish_common xts lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 sha512_generic loop jprob(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat hns_roce_hw_v2 hns_roce ib_core aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ipmi_ssif ofpart sha256_arm64 sha1_ce cmdlinepart
[ 1352.854649] hi_sfc ses enclosure mtd sg sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler spi_dw_mmio sch_fq_codel ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod realtek hclge hisi_sas_v3_hw hisi_sas_main ahci libsas libahci hns3 hinic libata usb_storage hnae3 megaraid_sas scsi_transport_sas i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_vs]
[ 1352.854658] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1
[ 1352.854659] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019
[ 1352.854659] pstate: 80400089 (Nzcv daIf +PAN -UAO)
[ 1352.854660] pc : queued_spin_lock_slowpath+0x1d8/0x2e0
[ 1352.854660] lr : print_cpu+0x414/0x690
[ 1352.854660] sp : ffff0001743afb80
[ 1352.854661] x29: ffff0001743afb80 x28: ffff805fcef6e880
[ 1352.854662] x27: 0000000000000000 x26: 0000000000000000
[ 1352.854662] x25: ffff000008cab000 x24: ffff000008cab000
[ 1352.854663] x23: 0000000000000000 x22: 0000000000000000
[ 1352.854664] x21: ffff000009478000 x20: 0000000000900001
[ 1352.854664] x19: ffff000009478d20 x18: ffffffffffffffff
[ 1352.854665] x17: 0000000000000000 x16: 0000000000000000
[ 1352.854666] x15: ffff000009273708 x14: ffff00000947af60
[ 1352.854667] x13: ffff00000947abab x12: ffff00000929d000
[ 1352.854668] x11: 0000000000006fc8 x10: ffff00000947a1c0
[ 1352.854668] x9 : 0000000000000001 x8 : 0000000000000000
[ 1352.854669] x7 : ffff0000092737c8 x6 : ffff803fffc9e1c0
[ 1352.854670] x5 : 0000000000000000 x4 : ffff803fffc9e1c0
[ 1352.854671] x3 : ffff000008f5e000 x2 : 00000000001c0000
[ 1352.854671] x1 : 0000000000000000 x0 : ffff803fffc9e1c8
[ 1352.854672] Call trace:
[ 1352.854673] queued_spin_lock_slowpath+0x1d8/0x2e0
[ 1352.854673] print_cpu+0x414/0x690
[ 1352.854673] sysrq_sched_debug_show+0x50/0x80
[ 1352.854674] show_state_filter+0xc0/0xd0
[ 1352.854674] sysrq_handle_showstate+0x18/0x28
[ 1352.854674] __handle_sysrq+0xa0/0x190
[ 1352.854675] write_sysrq_trigger+0x70/0x88
[ 1352.854675] proc_reg_write+0x80/0xd8
[ 1352.854675] __vfs_write+0x60/0x190
[ 1352.854676] vfs_write+0xac/0x1c0
[ 1352.854676] ksys_write+0x74/0xf0
[ 1352.854676] __arm64_sys_write+0x24/0x30
[ 1352.854677] el0_svc_common+0x78/0x130
[ 1352.854677] el0_svc_handler+0x38/0x78
[ 1352.854677] el0_svc+0x8/0xc
[ 1352.854678] Kernel panic - not syncing: Hard LOCKUP
[ 1352.854679] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1
[ 1352.854679] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019
[ 1352.854679] Call trace:
[ 1352.854680] dump_backtrace+0x0/0x198
[ 1352.854680] show_stack+0x24/0x30
[ 1352.854681] dump_stack+0xa4/0xc4
[ 1352.854681] panic+0x130/0x304
[ 1352.854681] __stack_chk_fail+0x0/0x28
[ 1352.854682] watchdog_hardlockup_check+0x138/0x140
[ 1352.854682] sdei_watchdog_callback+0x20/0x30
[ 1352.854682] sdei_event_handler+0x50/0xf0
[ 1352.854683] __sdei_handler+0xd8/0x228
[ 1352.854683] __sdei_asm_handler+0xbc/0x134
[ 1352.854683] queued_spin_lock_slowpath+0x1d8/0x2e0
[ 1352.854684] print_cpu+0x414/0x690
[ 1352.854684] sysrq_sched_debug_show+0x50/0x80
[ 1352.854684] show_state_filter+0xc0/0xd0
[ 1352.854685] sysrq_handle_showstate+0x18/0x28
[ 1352.854685] __handle_sysrq+0xa0/0x190
[ 1352.854685] write_sysrq_trigger+0x70/0x88
[ 1352.854686] proc_reg_write+0x80/0xd8
[ 1352.854686] __vfs_write+0x60/0x190
[ 1352.854686] vfs_write+0xac/0x1c0
[ 1352.854687] ksys_write+0x74/0xf0
[ 1352.854687] __arm64_sys_write+0x24/0x30
[ 1352.854687] el0_svc_common+0x78/0x130
[ 1352.854688] el0_svc_handler+0x38/0x78
[ 1352.854688] el0_svc+0x8/0xc
It is because there are many processes in the system. 'print_cpu()'
aquires 'sched_debug_lock', print some information, and releases
'sched_debug_lock'. This procedure takes about 4 seconds in our
testcase. When four cores concurrently print system info by sysrq, it
will takes the last core 12 seconds to get the spinlock. This will
cause a hardlockup.
Signed-off-by: Kai Shen
Signed-off-by: Xiongfeng Wang
Reviewed-By: Xie XiuQi
Signed-off-by: Yang Yingliang
Signed-off-by: Xiongfeng Wang
Signed-off-by: Chen Jun
Signed-off-by: Yu Changchun
---
drivers/tty/sysrq.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 959f9e121cc6..8d63e8fc0c25 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -1143,6 +1143,9 @@ int unregister_sysrq_key(int key, const struct sysrq_key_op *op_p)
EXPORT_SYMBOL(unregister_sysrq_key);
#ifdef CONFIG_PROC_FS
+
+static DEFINE_MUTEX(sysrq_mutex);
+
/*
* writing 'C' to /proc/sysrq-trigger is like sysrq-C
*/
@@ -1154,7 +1157,10 @@ static ssize_t write_sysrq_trigger(struct file *file, const char __user *buf,
if (get_user(c, buf))
return -EFAULT;
+
+ mutex_lock(&sysrq_mutex);
__handle_sysrq(c, false);
+ mutex_unlock(&sysrq_mutex);
}
return count;
--
2.22.0