[PATCH OpenHarmony-5.10 00/94] OpenHarmony-5.10 baseline patch 0806

The following patches are for OpenHarmony 5.10 kernel. These patches refer to some bugfixes (ext4/sched/mm, etc) and 2 major features (arm kasan and kaslr with dependent patches). ------------------------------------------------------------- Ard Biesheuvel (33): ARM: 9028/1: disable KASAN in call stack capturing routines ARM: module: add support for place relative relocations ARM: p2v: move patching code to separate assembler source file ARM: p2v: factor out shared loop processing ARM: p2v: factor out BE8 handling ARM: p2v: drop redundant 'type' argument from __pv_stub ARM: p2v: use relative references in patch site arrays ARM: p2v: simplify __fixup_pv_table() ARM: p2v: switch to MOVW for Thumb2 and ARM/LPAE ARM: p2v: reduce p2v alignment requirement to 2 MiB ARM: head-common.S: use PC-relative insn sequence for __proc_info ARM: head-common.S: use PC-relative insn sequence for idmap creation ARM: head.S: use PC-relative insn sequence for secondary_data ARM: kernel: use relative references for UP/SMP alternatives ARM: head: use PC-relative insn sequence for __smp_alt ARM: sleep.S: use PC-relative insn sequence for sleep_save_sp/mpidr_hash ARM: head.S: use PC relative insn sequence to calculate PHYS_OFFSET ARM: kvm: replace open coded VA->PA calculations with adr_l call ARM: 9031/1: hyp-stub: remove unused .L__boot_cpu_mode_offset symbol asm-generic: add .data.rel.ro sections to __ro_after_init arm-soc: exynos: replace open coded VA->PA conversions arm-soc: mvebu: replace open coded VA->PA conversion arm-soc: various: replace open coded VA->PA calculation of pen_release ARM: kernel: switch to relative exception tables ARM: kernel: make vmlinux buildable as a PIE executable ARM: kernel: use PC-relative symbol references in MMU switch code ARM: kernel: use PC relative symbol references in suspend/resume code ARM: mm: export default vmalloc base address ARM: kernel: refer to swapper_pg_dir via its symbol arm: vectors: use local symbol names for vector entry points ARM: kernel: implement randomization of the kernel load address ARM: decompressor: explicitly map decompressor binary cacheable ARM: decompressor: add KASLR support Arnd Bergmann (1): ARM: kprobes: fix gcc-7 build warning Chen Jun (1): rockchip: Make cdn_dp_resume depend on CONFIG_PM_SLEEP Cheng Jian (1): sched, rt: fix isolated CPUs leaving task_group indefinitely throttled Cui GaoSheng (6): arm32: kaslr: Add missing sections about relocatable arm32: kaslr: Fix the bug of module install failure arm32: kaslr: Fix the bug of hidden symbols when decompressing code is compiled arm32: kaslr: Adapt dts files of multiple memory nodes arm32: kaslr: Fix the bug of symbols relocation arm32: kaslr: Print the real kaslr offset when kernel panic Fangrui Song (1): ARM: 9022/1: Change arch/arm/lib/mem*.S to use WEAK instead of .weak Gaosheng Cui (2): arm32: kaslr: Bugfix of fiq when enabled kaslr arm32: kaslr: Bugfix of BSS size calculation when enabled kaslr Gu Zheng (1): mtd: avoid blktrans_open/release race and avoid insmod ftl.ko deadlock Jan Kara (13): ext4: remove redundant sb checksum recomputation ext4: standardize error message in ext4_protect_reserved_inode() ext4: make ext4_abort() use __ext4_error() ext4: move functions in super.c ext4: simplify ext4 error translation ext4: defer saving error info from atomic context ext4: combine ext4_handle_error() and save_error_info() ext4: drop sync argument of ext4_commit_super() ext4: protect superblock modifications with a buffer lock ext4: save error info to sb through journal if available ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super() ext4: drop ext4_handle_dirty_super() ext4: fix timer use-after-free on failed mount Jens Axboe (1): io_uring: convert io_buffer_idr to XArray JeongHyeon Lee (1): dm verity: allow only one error handling mode Kefeng Wang (2): mm, page_alloc: avoid page_to_pfn() in move_freepages() mm: Move HOLES_IN_ZONE into mm Li Bin (1): sched/deadline.c: pick and check task if double_lock_balance() unlock the rq Linus Walleij (5): ARM: 9013/2: Disable KASan instrumentation for some code ARM: 9014/2: Replace string mem* functions for KASan ARM: 9015/2: Define the virtual space of KASan's shadow region ARM: 9016/2: Initialize the mapping of KASan shadow memory ARM: 9017/2: Enable KASan for ARM Liu Shixin (1): mm/page_alloc: fix counting of managed_pages Lorenzo Stoakes (1): mm: page_alloc: refactor setup_per_zone_lowmem_reserve() Lu Jialin (1): cgroup: Return ERSCH when add Z process into task Matthew Wilcox (Oracle) (1): io_uring: Convert personality_idr to XArray Mauricio Faria de Oliveira (1): loop: fix I/O error on fsync() in detached loop devices Oscar Salvador (1): mm,hwpoison: return -EBUSY when migration fails Xie XiuQi (1): Document: add guideline to submitting patches to OpenHarmony kernel Xiongfeng Wang (3): PCI: add a member in 'struct pci_bus' to record the original 'pci_ops' PCI/AER: increments pci bus reference count in aer-inject process sysrq: avoid concurrently info printing by 'sysrq-trigger' Yang Jihong (1): arm_pmu: Fix write counter error in ARMv7 big-endian mode Yang Yingliang (1): drm/radeon: check the alloc_workqueue return value Ye Bin (6): arm32: kaslr: Add missing sections about relocatable arm: kaslr: Fix memtop calculate, when there is no memory top info, we can't use zero instead it. arm32: kaslr: When boot with vxboot, we must adjust dtb address before kaslr_early_init, and store dtb address after init. arm32: kaslr: pop visibility when compile decompress boot code as we need relocate BSS by GOT. arm32: kaslr: print kaslr offset when kernel panic arm32: kaslr: Fix clock_gettime and gettimeofday performance degradation when configure CONFIG_RANDOMIZE_BASE Yejune Deng (1): io_uring: simplify io_remove_personalities() Zefan Li (2): cgroup: check if cgroup root is alive in cgroupstats_show() cgroup: wait for cgroup destruction to complete when umount Zhou Chengming (1): sched/rt.c: pick and check task if double_lock_balance() unlock the rq liujian (1): driver: input: fix UBSAN warning in input_defuzz_abs_event yangerkun (1): proc: fix ubsan warning in mem_lseek zhangyi (F) (1): aio: add timeout validity check for io_[p]getevents Documentation/arm/memory.rst | 5 + Documentation/dev-tools/kasan.rst | 4 +- .../features/debug/KASAN/arch-support.txt | 2 +- README | 252 +++++++++ arch/arm/Kconfig | 29 +- arch/arm/Makefile | 6 + arch/arm/boot/compressed/Makefile | 19 +- arch/arm/boot/compressed/head.S | 154 +++++- arch/arm/boot/compressed/kaslr.c | 453 ++++++++++++++++ arch/arm/boot/compressed/misc.c | 1 - arch/arm/boot/compressed/string.c | 19 + arch/arm/include/asm/Kbuild | 1 - arch/arm/include/asm/assembler.h | 22 +- arch/arm/include/asm/elf.h | 5 + arch/arm/include/asm/extable.h | 47 ++ arch/arm/include/asm/futex.h | 6 +- arch/arm/include/asm/kasan.h | 33 ++ arch/arm/include/asm/kasan_def.h | 81 +++ arch/arm/include/asm/memory.h | 76 ++- arch/arm/include/asm/pgalloc.h | 8 +- arch/arm/include/asm/pgtable.h | 1 + arch/arm/include/asm/processor.h | 2 +- arch/arm/include/asm/string.h | 26 + arch/arm/include/asm/thread_info.h | 8 + arch/arm/include/asm/uaccess-asm.h | 2 +- arch/arm/include/asm/uaccess.h | 17 +- arch/arm/include/asm/vmlinux.lds.h | 13 +- arch/arm/include/asm/word-at-a-time.h | 6 +- arch/arm/kernel/Makefile | 4 + arch/arm/kernel/entry-armv.S | 44 +- arch/arm/kernel/entry-common.S | 9 +- arch/arm/kernel/fiq.c | 7 + arch/arm/kernel/head-common.S | 88 +--- arch/arm/kernel/head.S | 308 ++++------- arch/arm/kernel/hyp-stub.S | 33 +- arch/arm/kernel/module.c | 20 +- arch/arm/kernel/perf_event_v7.c | 4 +- arch/arm/kernel/phys2virt.S | 238 +++++++++ arch/arm/kernel/setup.c | 32 ++ arch/arm/kernel/sleep.S | 26 +- arch/arm/kernel/swp_emulate.c | 8 +- arch/arm/kernel/unwind.c | 6 +- arch/arm/kernel/vmlinux.lds.S | 19 + arch/arm/lib/backtrace.S | 13 +- arch/arm/lib/getuser.S | 24 +- arch/arm/lib/memcpy.S | 4 +- arch/arm/lib/memmove.S | 6 +- arch/arm/lib/memset.S | 4 +- arch/arm/lib/putuser.S | 15 +- arch/arm/mach-exynos/headsmp.S | 9 +- arch/arm/mach-exynos/sleep.S | 26 +- arch/arm/mach-mvebu/coherency_ll.S | 8 +- arch/arm/mach-prima2/headsmp.S | 11 +- arch/arm/mach-spear/headsmp.S | 11 +- arch/arm/mm/Makefile | 5 + arch/arm/mm/alignment.c | 24 +- arch/arm/mm/extable.c | 2 +- arch/arm/mm/kasan_init.c | 291 +++++++++++ arch/arm/mm/mmu.c | 21 +- arch/arm/mm/pgd.c | 16 +- arch/arm/nwfpe/entry.S | 6 +- arch/arm/plat-versatile/headsmp.S | 9 +- arch/arm/probes/kprobes/Makefile | 4 + arch/arm/vdso/Makefile | 2 + arch/arm/vdso/vgettimeofday.c | 5 + arch/arm64/Kconfig | 4 +- arch/ia64/Kconfig | 5 +- arch/mips/Kconfig | 3 - drivers/block/loop.c | 3 + drivers/gpu/drm/radeon/radeon_display.c | 7 +- drivers/gpu/drm/rockchip/cdn-dp-core.c | 2 + drivers/input/input.c | 13 +- drivers/md/dm-verity-target.c | 40 +- drivers/mtd/mtd_blkdevs.c | 31 +- drivers/mtd/mtdcore.c | 75 ++- drivers/mtd/mtdcore.h | 4 +- drivers/pci/bus.c | 2 + drivers/pci/pcie/aer_inject.c | 9 + drivers/pci/probe.c | 12 +- drivers/tty/sysrq.c | 6 + fs/aio.c | 11 +- fs/ext4/block_validity.c | 10 +- fs/ext4/ext4.h | 50 +- fs/ext4/ext4_jbd2.c | 21 +- fs/ext4/ext4_jbd2.h | 5 - fs/ext4/file.c | 5 +- fs/ext4/inode.c | 8 +- fs/ext4/namei.c | 10 +- fs/ext4/resize.c | 20 +- fs/ext4/super.c | 482 ++++++++++-------- fs/ext4/xattr.c | 5 +- fs/io_uring.c | 116 ++--- fs/proc/base.c | 32 +- fs/read_write.c | 5 - include/asm-generic/vmlinux.lds.h | 2 +- include/linux/cgroup-defs.h | 3 + include/linux/fs.h | 5 + include/linux/pci.h | 1 + kernel/cgroup/cgroup-v1.c | 8 +- kernel/cgroup/cgroup.c | 24 +- kernel/sched/deadline.c | 55 +- kernel/sched/rt.c | 66 ++- mm/Kconfig | 3 + mm/memory-failure.c | 6 +- mm/page_alloc.c | 61 +-- scripts/module.lds.S | 1 + scripts/sorttable.c | 2 +- 107 files changed, 2814 insertions(+), 1079 deletions(-) create mode 100644 arch/arm/boot/compressed/kaslr.c create mode 100644 arch/arm/include/asm/extable.h create mode 100644 arch/arm/include/asm/kasan.h create mode 100644 arch/arm/include/asm/kasan_def.h create mode 100644 arch/arm/kernel/phys2virt.S create mode 100644 arch/arm/mm/kasan_init.c -- 2.22.0

From: Xie XiuQi <xiexiuqi@huawei.com> ohos inclusion category: doc issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ We need to document our rules of submitting patches to OpenHarmony, it contains the basic step of submitting patches, and unified patch format. This guideline is important for OpenHarmony daily works, so I prefer to place it into kernel root directory, it would be easy to access and reference it. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> [add ohos patch template and tips for developers] Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- README | 252 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) diff --git a/README b/README index 669ac7c32292..4c1ce4b34a21 100644 --- a/README +++ b/README @@ -1,3 +1,255 @@ +Contributions to OpenHarmony linux kernel project +========================================= + +Sign DCO +-------- + +Before submitting any Contributions to OpenHarmony kernel, you have to sign DCO. + +See: + https://dco.openharmony.io/sign-dco + +Steps of submitting patches +--------------------------- + +1. Compile and test your patches successfully. + You should test your patch in OpenHamrony supported boards, hi3516dv300, etc. + +2. Generate patches + Your patches should be based on top of latest OpenHarmony branch, and should + use git-format-patch to generate patches, and if it's a patchset, it's + better to use --cover-letter option to describe what the patchset does. + + Using scripts/checkpatch.pl to make sure there's no coding style issue. + + And make sure your patch follow unified OpenHarmony patch format describe + below. + + *Tips*: Learn more about linux kernel coding style + en: https://gitee.com/openharmony/kernel_linux/blob/master/Documentation/process... + zh: https://gitee.com/openharmony/kernel_linux/blob/master/Documentation/transla... + +3. Send patch to OpenHarmony mailing list + Use this command to send patches to OpenHarmony kernel mailing list: + + git send-email *.patch -to="kernel@openharmony.io" --suppress-cc=all + + *NOTE*: that you must add --suppress-cc=all if you use git send-email, + otherwise the email will be cced to the people in upstream community and mailing + lists. + + *Tips*: Subscribe the mailing list + https://lists.openatom.io/postorius/lists/kernel.openharmony.io/ + + *See*: How to send patches using git-send-email + https://git-scm.com/docs/git-send-email + +4. Mark "v1, v2, v3 ..." in your patch subject if you have multiple versions + to send out. + + Use --subject-prefix="PATCH v2" option to add v2 tag for patchset. + git format-patch --subject-prefix="PATCH v2" -1 + + Subject examples: + Subject: [PATCH v2 01/27] fork: fix some -Wmissing-prototypes warnings + Subject: [PATCH v3] ext2: improve scalability of bitmap searching + +5. Upstream your kernel patch to kernel community is strongly recommended. + OpenHarmony will sync up with kernel master timely. + +6. Sign your work - the Developer’s Certificate of Origin + As the same of upstream kernel community, you also need to sign your patch. + + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html + + The sign-off is a simple line at the end of the explanation for the patch, + which certifies that you wrote it or otherwise have the right to pass it + on as an open-source patch. The rules are pretty simple: if you can certify + the below: + + Developer’s Certificate of Origin 1.1 + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + By making a contribution to this project, I certify that: + + (a) The contribution was created in whole or in part by me and I have + the right to submit it under the open source license indicated in + the file; or + + (b The contribution is based upon previous work that, to the best of + my knowledge, is covered under an appropriate open source license + and I have the right under that license to submit that work with + modifications, whether created in whole or in part by me, under + the same open source license (unless I am permitted to submit under + a different license), as indicated in the file; or + + (c) The contribution was provided directly to me by some other person + who certified (a), (b) or (c) and I have not modified it. + + (d) I understand and agree that this project and the contribution are + public and that a record of the contribution (including all personal + information I submit with it, including my sign-off) is maintained + indefinitely and may be redistributed consistent with this project + or the open source license(s) involved. + + then you just add a line saying: + + Signed-off-by: Random J Developer <random@developer.example.org> + + using your real name (sorry, no pseudonyms or anonymous contributions.) + +Use unified patch format +------------------------ + +Reasons: + +1. long term maintainability + OpenHarmony will merge massive patches. If all patches are merged by casual + changelog format without a unified format, the git log will be messy, and + then it's hard to figure out the original patch. + +2. kernel upgrade + We definitely will upgrade our OpenHarmony kernel in someday, using strict + patch management will alleviate the pain to migrate patches during big upgrade. + +3. easy for script parsing + Keyword highlighting is necessary for script parsing. + +Patch format definition +----------------------- + +[M] stands for "mandatory" +[O] stands for "option" +$category can be: bug preparation, bugfix, perf, feature, doc, other... + +If category is feature, then we also need to add feature name like below: + category: feature + feature: YYY (the feature name) + +If the patch is related to CVE or issue/bugzilla, then we need add the corresponding +tag like below (In general, it should include at least one of the following): + CVE: $cve-id or NA + issue: $issue-id or NA + bugzilla: $bug-id or NA + +issue: https://gitee.com/openharmony/kernel_linux/issues + +Additional changelog should include at least one of the flollwing: + 1) Why we should apply this patch + 2) What real problem in product does this patch resolved + 3) How could we reproduce this bug or how to test + 4) Other useful information for help to understand this patch or problem + +The detail information is very useful for porting patch to another kenrel branch. + +1. stable patch + stable inclusion [M] + from $stable-version [M] + commit $id [M] + bugzilla: $bug-id [O] + issue: $issue-id [O] + CVE: $cve-id [O] + + additional changelog [O] + + -------------------------------- + + original changelog + + Signed-off-by: $your_name <$your_mail> [M] + + ($stable-version would be stable-4.19.156, stable-4.19.157, etc... + $id would be stable commit) + +2. mainline patch: + + mainline inclusion [M] + from $mainline-version [M] + commit $id [M] + category: $category [M] + bugzilla: $bug-id [O] + issue: $issue-id [O] + CVE: $cve-id [O] + + additional changelog [O] + + -------------------------------- + + original changelog + + Signed-off-by: $your_name <$your_mail> [M] + + ($mainline-version could be mainline-5.6, mainline-5.10, etc... + $id would be mainline commit) + +3. ohos patch + ohos inclusion [M] + category: $category [M] + bugzilla: $bug-id or NA [O] + issue: $issue-id or NA [O] + CVE: $cve-id or NA [O] + + -------------------------------- + + changelog + + Signed-off-by: $your_name <$your_mail> [M] + +Examples +-------- + +mainline inclusion +from mainline-4.10 +commit 0becc0ae5b42828785b589f686725ff5bc3b9b25 +category: bugfix +issue: 1000 +CVE: NA + +The patch fixes a BUG_ON in the product: injecting single bit ECC error +to memory before system boot use hardware inject tools, which cause a +large amount of CMCI during system booting . + +[ 1.146580] mce: [Hardware Error]: Machine check events logged +[ 1.152908] ------------[ cut here ]------------ +[ 1.157751] kernel BUG at kernel/timer.c:951! +[ 1.162321] invalid opcode: 0000 [#1] SMP +... + +------------------------------------------------- + +original changelog + +<original S-O-B> +Signed-off-by: Zhang San <zhangsan@163.com> +Tested-by: Li Si <lisi@163.com> + +Email Client - Thunderbird Settings +----------------------------------- + +If you are newly developer in the kernel community, it is highly recommended +to use thunderbird mail client. + +1. Thunderbird Installation + Get English version Thunderbird from http://www.mozilla.org/ and install + it on your system。 + + Download url: https://www.thunderbird.net/en-US/thunderbird/all/ + +2. Settings + 2.1 Use plain text format instead of HTML format + Options -> Account Settings -> Composition & Addressing, do *NOT* select + "Compose message in HTML format". + + 2.2 Editor Settings + Tools->Options->Advanced->Config editor. + + - To bring up the thunderbird's registry editor, and set: + "mailnews.send_plaintext_flowed" to "false". + - Disable HTML Format: Set "mail.identity.id1.compose_html" to "false". + - Enable UTF8: Set "prefs.converted-to-utf8" to "true". + - View message in UTF-8: Set "mailnews.view_default_charset" to "UTF-8". + - Set mailnews.wraplength to 9999 for avoiding auto-wrap + Linux kernel ============ -- 2.22.0

From: Kefeng Wang <wangkefeng.wang@huawei.com> mainline inclusion from mainline-v5.13-rc1 commit 39ddb991fc45abcdcddbec7fcdfe28795d0133d7 category: bugfix issue: #I3ZXZF CVE: NA -------------------------------- The start_pfn and end_pfn are already available in move_freepages_block(), there is no need to go back and forth between page and pfn in move_freepages and move_freepages_block, and pfn_valid_within() should validate pfn first before touching the page. Link: https://lkml.kernel.org/r/20210323131215.934472-1-liushixin2@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- mm/page_alloc.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 81cc7fdc9c8f..423f97a1120c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2363,19 +2363,21 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone, * boundary. If alignment is required, use move_freepages_block() */ static int move_freepages(struct zone *zone, - struct page *start_page, struct page *end_page, + unsigned long start_pfn, unsigned long end_pfn, int migratetype, int *num_movable) { struct page *page; + unsigned long pfn; unsigned int order; int pages_moved = 0; - for (page = start_page; page <= end_page;) { - if (!pfn_valid_within(page_to_pfn(page))) { - page++; + for (pfn = start_pfn; pfn <= end_pfn;) { + if (!pfn_valid_within(pfn)) { + pfn++; continue; } + page = pfn_to_page(pfn); if (!PageBuddy(page)) { /* * We assume that pages that could be isolated for @@ -2385,8 +2387,7 @@ static int move_freepages(struct zone *zone, if (num_movable && (PageLRU(page) || __PageMovable(page))) (*num_movable)++; - - page++; + pfn++; continue; } @@ -2396,7 +2397,7 @@ static int move_freepages(struct zone *zone, order = buddy_order(page); move_to_free_list(page, zone, order, migratetype); - page += 1 << order; + pfn += 1 << order; pages_moved += 1 << order; } @@ -2406,25 +2407,22 @@ static int move_freepages(struct zone *zone, int move_freepages_block(struct zone *zone, struct page *page, int migratetype, int *num_movable) { - unsigned long start_pfn, end_pfn; - struct page *start_page, *end_page; + unsigned long start_pfn, end_pfn, pfn; if (num_movable) *num_movable = 0; - start_pfn = page_to_pfn(page); - start_pfn = start_pfn & ~(pageblock_nr_pages-1); - start_page = pfn_to_page(start_pfn); - end_page = start_page + pageblock_nr_pages - 1; + pfn = page_to_pfn(page); + start_pfn = pfn & ~(pageblock_nr_pages - 1); end_pfn = start_pfn + pageblock_nr_pages - 1; /* Do not cross zone boundaries */ if (!zone_spans_pfn(zone, start_pfn)) - start_page = page; + start_pfn = pfn; if (!zone_spans_pfn(zone, end_pfn)) return 0; - return move_freepages(zone, start_page, end_page, migratetype, + return move_freepages(zone, start_pfn, end_pfn, migratetype, num_movable); } -- 2.22.0

From: Zhou Chengming <zhouchengming1@huawei.com> ohos inclusion category: bugfix bugzilla: NA issue: #I3ZXZF CVE: NA --------------------------- push_rt_task() pick the first pushable task and find an eligible lowest_rq, then double_lock_balance(rq, lowest_rq). So if double_lock_balance() unlock the rq (when double_lock_balance() return 1), we have to check if this task is still on the rq. The problem is that the check conditions are not sufficient: if (unlikely(task_rq(task) != rq || !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_allowed) || task_running(rq, task) || !rt_task(task) || !task_on_rq_queued(task))) { cpu2 cpu1 cpu0 push_rt_task(rq1) pick task_A on rq1 find rq0 double_lock_balance(rq1, rq0) unlock(rq1) rq1 __schedule pick task_A run task_A sleep (dequeued) lock(rq0) lock(rq1) do_above_check(task_A) task_rq(task_A) == rq1 cpus_allowed unchanged task_running == false rt_task(task_A) == true try_to_wake_up(task_A) select_cpu = cpu3 enqueue(rq3, task_A) task_A->on_rq = 1 task_on_rq_queued(task_A) above_check passed, return rq0 ... migrate task_A from rq1 to rq0 So we can't rely on these checks of task_A to make sure the task_A is still on the rq1, even though we hold the rq1->lock. This patch will repick the first pushable task to be sure the task is still on the rq. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> [XQ: - backport to 4.4, adjusted context, - use tsk_cpus_allowed() instead of task->cpus_allowed] Link: https://lkml.org/lkml/2017/9/11/26 Link: https://lkml.org/lkml/2018/4/12/246 Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> (cherry picked from commit 6e770352b482bc48c3393530ba36ce2c113c5086) Signed-off-by: Li Ming <limingming.li@huawei.com> Conflicts: kernel/sched/rt.c Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- kernel/sched/rt.c | 50 ++++++++++++++++++++++------------------------- 1 file changed, 23 insertions(+), 27 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 49ec096a8aa1..40f1183f3e94 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1777,6 +1777,26 @@ static int find_lowest_rq(struct task_struct *task) return -1; } +static struct task_struct *pick_next_pushable_task(struct rq *rq) +{ + struct task_struct *p; + + if (!has_pushable_tasks(rq)) + return NULL; + + p = plist_first_entry(&rq->rt.pushable_tasks, + struct task_struct, pushable_tasks); + + BUG_ON(rq->cpu != task_cpu(p)); + BUG_ON(task_current(rq, p)); + BUG_ON(p->nr_cpus_allowed <= 1); + + BUG_ON(!task_on_rq_queued(p)); + BUG_ON(!rt_task(p)); + + return p; +} + /* Will lock the rq it finds */ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) { @@ -1808,14 +1828,10 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) * We had to unlock the run queue. In * the mean time, task could have * migrated already or had its affinity changed. - * Also make sure that it wasn't scheduled on its rq. */ - if (unlikely(task_rq(task) != rq || - !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr) || - task_running(rq, task) || - !rt_task(task) || - !task_on_rq_queued(task))) { - + struct task_struct *next_task = pick_next_pushable_task(rq); + if (unlikely(next_task != task || + !cpumask_test_cpu(lowest_rq->cpu, task->cpus_ptr))) { double_unlock_balance(rq, lowest_rq); lowest_rq = NULL; break; @@ -1834,26 +1850,6 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq) return lowest_rq; } -static struct task_struct *pick_next_pushable_task(struct rq *rq) -{ - struct task_struct *p; - - if (!has_pushable_tasks(rq)) - return NULL; - - p = plist_first_entry(&rq->rt.pushable_tasks, - struct task_struct, pushable_tasks); - - BUG_ON(rq->cpu != task_cpu(p)); - BUG_ON(task_current(rq, p)); - BUG_ON(p->nr_cpus_allowed <= 1); - - BUG_ON(!task_on_rq_queued(p)); - BUG_ON(!rt_task(p)); - - return p; -} - /* * If the current CPU has more than one RT task, see if the non * running task can migrate over to a CPU that is running a task -- 2.22.0

From: Li Bin <huawei.libin@huawei.com> ohos inclusion category: bugfix bugzilla: NA issue: #I3ZXZF CVE: NA --------------------------- push_dl_task() pick the first pushable task and find an eligible lowest_rq, then double_lock_balance(rq, lowest_rq). So if double_lock_balance() unlock the rq (when double_lock_balance() return 1), we have to check if this task is still on the rq. The problem is that the check conditions are not sufficient: if (unlikely(task_rq(task) != rq || !cpumask_test_cpu(later_rq->cpu, &task->cpus_allowed) || task_running(rq, task) || !dl_task(task) || !task_on_rq_queued(task))) { cpu2 cpu1 cpu0 push_dl_task(rq1) pick task_A on rq1 find rq0 double_lock_balance(rq1, rq0) unlock(rq1) rq1 __schedule pick task_A run task_A sleep (dequeued) lock(rq0) lock(rq1) do_above_check(task_A) task_rq(task_A) == rq1 cpus_allowed unchanged task_running == false dl_task(task_A) == true try_to_wake_up(task_A) select_cpu = cpu3 enqueue(rq3, task_A) task_A->on_rq = 1 task_on_rq_queued(task_A) above_check passed, return rq0 ... migrate task_A from rq1 to rq0 So we can't rely on these checks of task_A to make sure the task_A is still on the rq1, even though we hold the rq1->lock. This patch will repick the first pushable task to be sure the task is still on the rq. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [XQ: - backport to 4.4, adjusted context, - use tsk_cpus_allowed() instead of task->cpus_allowed] Link: https://lkml.org/lkml/2017/9/11/26 Link: https://lkml.org/lkml/2018/4/12/246 Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> (cherry picked from commit bf781024558706a7097840a18ca246f29506c89a) Signed-off-by: Li Ming <limingming.li@huawei.com> Conflicts: kernel/sched/deadline.c Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- kernel/sched/deadline.c | 55 ++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 8d06d1f4e2f7..576b4f6d9f07 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2038,6 +2038,26 @@ static int find_later_rq(struct task_struct *task) return -1; } +static struct task_struct *pick_next_pushable_dl_task(struct rq *rq) +{ + struct task_struct *p; + + if (!has_pushable_dl_tasks(rq)) + return NULL; + + p = rb_entry(rq->dl.pushable_dl_tasks_root.rb_leftmost, + struct task_struct, pushable_dl_tasks); + + BUG_ON(rq->cpu != task_cpu(p)); + BUG_ON(task_current(rq, p)); + BUG_ON(p->nr_cpus_allowed <= 1); + + BUG_ON(!task_on_rq_queued(p)); + BUG_ON(!dl_task(p)); + + return p; +} + /* Locks the rq it finds */ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq) { @@ -2067,11 +2087,16 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq) /* Retry if something changed. */ if (double_lock_balance(rq, later_rq)) { - if (unlikely(task_rq(task) != rq || - !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr) || - task_running(rq, task) || - !dl_task(task) || - !task_on_rq_queued(task))) { + struct task_struct *next_task; + /* + * We had to unlock the run queue. In + * the mean time, task could have + * migrated already or had its affinity changed. + * Also make sure that it wasn't scheduled on its rq. + */ + next_task = pick_next_pushable_dl_task(rq); + if (unlikely(next_task != task || + !cpumask_test_cpu(later_rq->cpu, task->cpus_ptr))) { double_unlock_balance(rq, later_rq); later_rq = NULL; break; @@ -2096,26 +2121,6 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq) return later_rq; } -static struct task_struct *pick_next_pushable_dl_task(struct rq *rq) -{ - struct task_struct *p; - - if (!has_pushable_dl_tasks(rq)) - return NULL; - - p = rb_entry(rq->dl.pushable_dl_tasks_root.rb_leftmost, - struct task_struct, pushable_dl_tasks); - - BUG_ON(rq->cpu != task_cpu(p)); - BUG_ON(task_current(rq, p)); - BUG_ON(p->nr_cpus_allowed <= 1); - - BUG_ON(!task_on_rq_queued(p)); - BUG_ON(!dl_task(p)); - - return p; -} - /* * See if the non running -deadline tasks on this rq * can be sent to some other CPU where they can preempt -- 2.22.0

From: yangerkun <yangerkun@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA --------------------------- UBSAN has reported a overflow with mem_lseek. And it's fine with mem_open set file mode with FMODE_UNSIGNED_OFFSET(memory_lseek). However, another file use mem_lseek do lseek can have not FMODE_UNSIGNED_OFFSET(proc_kpagecount_operations/proc_pagemap_operations), fix it by checking overflow and FMODE_UNSIGNED_OFFSET. Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> ================================================================== UBSAN: Undefined behaviour in ../fs/proc/base.c:941:15 signed integer overflow: 4611686018427387904 + 4611686018427387904 cannot be represented in type 'long long int' CPU: 4 PID: 4762 Comm: syz-executor.1 Not tainted 4.4.189 #3 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: [<ffffff90080a5f28>] dump_backtrace+0x0/0x590 arch/arm64/kernel/traps.c:91 [<ffffff90080a64f0>] show_stack+0x38/0x60 arch/arm64/kernel/traps.c:234 [<ffffff9008986a34>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffff9008986a34>] dump_stack+0x128/0x184 lib/dump_stack.c:51 [<ffffff9008a2d120>] ubsan_epilogue+0x34/0x9c lib/ubsan.c:166 [<ffffff9008a2d8b8>] handle_overflow+0x228/0x280 lib/ubsan.c:197 [<ffffff9008a2da2c>] __ubsan_handle_add_overflow+0x4c/0x68 lib/ubsan.c:204 [<ffffff900862b9f4>] mem_lseek+0x12c/0x130 fs/proc/base.c:941 [<ffffff90084ef78c>] vfs_llseek fs/read_write.c:260 [inline] [<ffffff90084ef78c>] SYSC_lseek fs/read_write.c:285 [inline] [<ffffff90084ef78c>] SyS_lseek+0x164/0x1f0 fs/read_write.c:276 [<ffffff9008093c80>] el0_svc_naked+0x30/0x34 ================================================================== Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> (cherry picked from commit a422358aa04c53a08b215b8dcd6814d916ef5cf1) Conflicts: fs/read_write.c Signed-off-by: Li Ming <limingming.li@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/proc/base.c | 32 ++++++++++++++++++++++++-------- fs/read_write.c | 5 ----- include/linux/fs.h | 5 +++++ 3 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index df9b17dd92cb..e1f75bdc72f6 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -901,18 +901,34 @@ static ssize_t mem_write(struct file *file, const char __user *buf, loff_t mem_lseek(struct file *file, loff_t offset, int orig) { + loff_t ret = 0; + + spin_lock(&file->f_lock); switch (orig) { - case 0: - file->f_pos = offset; - break; - case 1: - file->f_pos += offset; + case SEEK_CUR: + offset += file->f_pos; + /* fall through */ + case SEEK_SET: + /* to avoid userland mistaking f_pos=-9 as -EBADF=-9 */ + if ((unsigned long long)offset >= -MAX_ERRNO) + ret = -EOVERFLOW; break; default: - return -EINVAL; + ret = -EINVAL; + } + + if (!ret) { + if (offset < 0 && !(unsigned_offsets(file))) { + ret = -EINVAL; + } else { + file->f_pos = offset; + ret = file->f_pos; + force_successful_syscall_return(); + } } - force_successful_syscall_return(); - return file->f_pos; + + spin_unlock(&file->f_lock); + return ret; } static int mem_release(struct inode *inode, struct file *file) diff --git a/fs/read_write.c b/fs/read_write.c index 75f764b43418..88f445da7515 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -34,11 +34,6 @@ const struct file_operations generic_ro_fops = { EXPORT_SYMBOL(generic_ro_fops); -static inline bool unsigned_offsets(struct file *file) -{ - return file->f_mode & FMODE_UNSIGNED_OFFSET; -} - /** * vfs_setpos - update the file offset for lseek * @file: file structure in question diff --git a/include/linux/fs.h b/include/linux/fs.h index 8bde32cf9711..5724f099cf70 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3003,6 +3003,11 @@ extern void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping); extern loff_t noop_llseek(struct file *file, loff_t offset, int whence); extern loff_t no_llseek(struct file *file, loff_t offset, int whence); +static inline bool unsigned_offsets(struct file *file) +{ + return file->f_mode & FMODE_UNSIGNED_OFFSET; +} + extern loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize); extern loff_t generic_file_llseek(struct file *file, loff_t offset, int whence); extern loff_t generic_file_llseek_size(struct file *file, loff_t offset, -- 2.22.0

From: Arnd Bergmann <arnd@arndb.de> maillist inclusion category: bugfix issue: #I3ZXZF CVE: NA Reference: https://lore.kernel.org/patchwork/patch/877882/ ------------------------------------------------- Recent versions of binutils always warn about test-arm.s: arch/arm/probes/kprobes/test-arm.s:18262: Warning: using r15 results in unpredictable behaviour arch/arm/probes/kprobes/test-arm.s:18337: Warning: using r15 results in unpredictable behaviour We could work around this using the __inst_arm() macro for passing the two instruction as a hexadecimal literal number, but as Ard pointed out, there is no reason to leave the warnings enabled for this file in general, we intentionally test for an instruction that is not recommended to be used. For consistency, this turns off the warning in both the ARM and Thumb2 versions of this file. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> [KF: from https://lore.kernel.org/patchwork/patch/877882/] Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/probes/kprobes/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/probes/kprobes/Makefile b/arch/arm/probes/kprobes/Makefile index 14db56f49f0a..6432578fc74b 100644 --- a/arch/arm/probes/kprobes/Makefile +++ b/arch/arm/probes/kprobes/Makefile @@ -11,3 +11,7 @@ obj-$(CONFIG_KPROBES) += actions-arm.o checkers-arm.o obj-$(CONFIG_OPTPROBES) += opt-arm.o test-kprobes-objs += test-arm.o endif + +# don't warn about intentionally bogus instructions +CFLAGS_test-arm.o += -Wa,--no-warn +CFLAGS_test-thumb.o += -Wa,--no-warn -- 2.22.0

From: "zhangyi (F)" <yi.zhang@huawei.com> maillist inclusion category: bugfix issue: #I3ZXZF CVE: NA Reference: https://lore.kernel.org/lkml/1564451504-27906-1-git-send-email-yi.zhang@huaw... --------------------------- io_[p]getevents syscall should return -EINVAL if timeout is out of range, add this validity check. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/aio.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 6a21d8919409..bd182bcca23a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2050,10 +2050,17 @@ static long do_io_getevents(aio_context_t ctx_id, struct io_event __user *events, struct timespec64 *ts) { - ktime_t until = ts ? timespec64_to_ktime(*ts) : KTIME_MAX; - struct kioctx *ioctx = lookup_ioctx(ctx_id); + ktime_t until = KTIME_MAX; + struct kioctx *ioctx = NULL; long ret = -EINVAL; + if (ts) { + if (!timespec64_valid(ts)) + return ret; + until = timespec64_to_ktime(*ts); + } + + ioctx = lookup_ioctx(ctx_id); if (likely(ioctx)) { if (likely(min_nr <= nr && min_nr >= 0)) ret = read_events(ioctx, min_nr, nr, events, until); -- 2.22.0

From: Gu Zheng <guzheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ----------------------------------------------- add the new functions mtd_table_mutex_lock/unlock to instead the mutex_lock(&mtd_table_mutex)/mutex_unlock(&mtd_table_mutex),this modification can avoid the deadlock when insmod ftl.ko the deadlock is caused by the commit 857814ee65db ("mtd: fix: avoid race condition when accessing mtd->usecount") the process is as follows: init_ftl register_mtd_blktrans mutex_lock(&mtd_table_mutex) //mtd_table_mutex locked ftl_add_mtd add_mtd_blktrans_dev device_add_disk register_disk blkdev_get __blkdev_get blktrans_open mutex_lock(&mtd_table_mutex) //dead lock so we add the mtd_table_mutex_owner to record current process. if the lock is locked before , it can jump the lock where will deadlock. it solved the above issue,also can prevent some mtd_table_mutex deadlock undiscovered. Signed-off-by: Gu Zheng <guzheng1@huawei.com> Acked-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Hou Tao <houta1@huawei.com> conflict: drivers/mtd/mtdcore.c drivers/mtd/mtdcore.h Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/mtd/mtd_blkdevs.c | 31 +++++++--------- drivers/mtd/mtdcore.c | 75 ++++++++++++++++++++++++++++----------- drivers/mtd/mtdcore.h | 4 ++- 3 files changed, 70 insertions(+), 40 deletions(-) diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 0c05f77f9b21..37f4e169d305 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -209,7 +209,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) if (!dev) return -ERESTARTSYS; /* FIXME: busy loop! -arnd*/ - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); mutex_lock(&dev->lock); if (dev->open) @@ -235,7 +235,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) unlock: dev->open++; mutex_unlock(&dev->lock); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); blktrans_dev_put(dev); return ret; @@ -246,7 +246,7 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode) module_put(dev->tr->owner); kref_put(&dev->ref, blktrans_dev_release); mutex_unlock(&dev->lock); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); blktrans_dev_put(dev); return ret; } @@ -258,7 +258,7 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) if (!dev) return; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); mutex_lock(&dev->lock); if (--dev->open) @@ -274,7 +274,7 @@ static void blktrans_release(struct gendisk *disk, fmode_t mode) } unlock: mutex_unlock(&dev->lock); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); blktrans_dev_put(dev); } @@ -345,10 +345,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new) struct gendisk *gd; int ret; - if (mutex_trylock(&mtd_table_mutex)) { - mutex_unlock(&mtd_table_mutex); - BUG(); - } + mtd_table_assert_mutex_locked(); mutex_lock(&blktrans_ref_mutex); list_for_each_entry(d, &tr->devs, list) { @@ -479,11 +476,7 @@ int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old) { unsigned long flags; - if (mutex_trylock(&mtd_table_mutex)) { - mutex_unlock(&mtd_table_mutex); - BUG(); - } - + mtd_table_assert_mutex_locked(); if (old->disk_attributes) sysfs_remove_group(&disk_to_dev(old->disk)->kobj, old->disk_attributes); @@ -557,13 +550,13 @@ int register_mtd_blktrans(struct mtd_blktrans_ops *tr) register_mtd_user(&blktrans_notifier); - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); ret = register_blkdev(tr->major, tr->name); if (ret < 0) { printk(KERN_WARNING "Unable to register %s block device on major %d: %d\n", tr->name, tr->major, ret); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return ret; } @@ -579,7 +572,7 @@ int register_mtd_blktrans(struct mtd_blktrans_ops *tr) if (mtd->type != MTD_ABSENT) tr->add_mtd(tr, mtd); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return 0; } @@ -587,7 +580,7 @@ int deregister_mtd_blktrans(struct mtd_blktrans_ops *tr) { struct mtd_blktrans_dev *dev, *next; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); /* Remove it from the list of active majors */ list_del(&tr->list); @@ -596,7 +589,7 @@ int deregister_mtd_blktrans(struct mtd_blktrans_ops *tr) tr->remove_dev(dev); unregister_blkdev(tr->major, tr->name); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); BUG_ON(!list_empty(&tr->devs)); return 0; diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index 1c8c40728678..37bd3171157a 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -70,8 +70,9 @@ static DEFINE_IDR(mtd_idr); /* These are exported solely for the purpose of mtd_blkdevs.c. You should not use them for _anything_ else */ -DEFINE_MUTEX(mtd_table_mutex); -EXPORT_SYMBOL_GPL(mtd_table_mutex); +static DEFINE_MUTEX(mtd_table_mutex); +static int mtd_table_mutex_depth; +static struct task_struct *mtd_table_mutex_owner; struct mtd_info *__mtd_next_device(int i) { @@ -84,6 +85,40 @@ static LIST_HEAD(mtd_notifiers); #define MTD_DEVT(index) MKDEV(MTD_CHAR_MAJOR, (index)*2) +void mtd_table_mutex_lock(void) +{ + if (mtd_table_mutex_owner != current) { + mutex_lock(&mtd_table_mutex); + mtd_table_mutex_owner = current; + } + mtd_table_mutex_depth++; +} +EXPORT_SYMBOL_GPL(mtd_table_mutex_lock); + + +void mtd_table_mutex_unlock(void) +{ + if (mtd_table_mutex_owner != current) { + pr_err("MTD:lock_owner is %s, but current is %s\n", + mtd_table_mutex_owner->comm, current->comm); + BUG(); + } + if (--mtd_table_mutex_depth == 0) { + mtd_table_mutex_owner = NULL; + mutex_unlock(&mtd_table_mutex); + } +} +EXPORT_SYMBOL_GPL(mtd_table_mutex_unlock); + +void mtd_table_assert_mutex_locked(void) +{ + if (mtd_table_mutex_owner != current) { + pr_err("MTD:lock_owner is %s, but current is %s\n", + mtd_table_mutex_owner->comm, current->comm); + BUG(); + } +} +EXPORT_SYMBOL_GPL(mtd_table_assert_mutex_locked); /* REVISIT once MTD uses the driver model better, whoever allocates * the mtd_info will probably want to use the release() hook... */ @@ -610,7 +645,7 @@ int add_mtd_device(struct mtd_info *mtd) !master->pairing || master->_writev)) return -EINVAL; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL); if (i < 0) { @@ -686,7 +721,7 @@ int add_mtd_device(struct mtd_info *mtd) list_for_each_entry(not, &mtd_notifiers, list) not->add(mtd); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); /* We _know_ we aren't being removed, because our caller is still holding us here. So none of this try_ nonsense, and no bitching about it @@ -700,7 +735,7 @@ int add_mtd_device(struct mtd_info *mtd) of_node_put(mtd_get_of_node(mtd)); idr_remove(&mtd_idr, i); fail_locked: - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return error; } @@ -719,7 +754,7 @@ int del_mtd_device(struct mtd_info *mtd) int ret; struct mtd_notifier *not; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); debugfs_remove_recursive(mtd->dbg.dfs_dir); @@ -752,7 +787,7 @@ int del_mtd_device(struct mtd_info *mtd) } out_error: - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return ret; } @@ -894,7 +929,7 @@ void register_mtd_user (struct mtd_notifier *new) { struct mtd_info *mtd; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); list_add(&new->list, &mtd_notifiers); @@ -903,7 +938,7 @@ void register_mtd_user (struct mtd_notifier *new) mtd_for_each_device(mtd) new->add(mtd); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); } EXPORT_SYMBOL_GPL(register_mtd_user); @@ -920,7 +955,7 @@ int unregister_mtd_user (struct mtd_notifier *old) { struct mtd_info *mtd; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); module_put(THIS_MODULE); @@ -928,7 +963,7 @@ int unregister_mtd_user (struct mtd_notifier *old) old->remove(mtd); list_del(&old->list); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return 0; } EXPORT_SYMBOL_GPL(unregister_mtd_user); @@ -949,7 +984,7 @@ struct mtd_info *get_mtd_device(struct mtd_info *mtd, int num) struct mtd_info *ret = NULL, *other; int err = -ENODEV; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); if (num == -1) { mtd_for_each_device(other) { @@ -973,7 +1008,7 @@ struct mtd_info *get_mtd_device(struct mtd_info *mtd, int num) if (err) ret = ERR_PTR(err); out: - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return ret; } EXPORT_SYMBOL_GPL(get_mtd_device); @@ -1020,7 +1055,7 @@ struct mtd_info *get_mtd_device_nm(const char *name) int err = -ENODEV; struct mtd_info *mtd = NULL, *other; - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); mtd_for_each_device(other) { if (!strcmp(name, other->name)) { @@ -1036,20 +1071,20 @@ struct mtd_info *get_mtd_device_nm(const char *name) if (err) goto out_unlock; - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return mtd; out_unlock: - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return ERR_PTR(err); } EXPORT_SYMBOL_GPL(get_mtd_device_nm); void put_mtd_device(struct mtd_info *mtd) { - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); __put_mtd_device(mtd); - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); } EXPORT_SYMBOL_GPL(put_mtd_device); @@ -2161,13 +2196,13 @@ static int mtd_proc_show(struct seq_file *m, void *v) struct mtd_info *mtd; seq_puts(m, "dev: size erasesize name\n"); - mutex_lock(&mtd_table_mutex); + mtd_table_mutex_lock(); mtd_for_each_device(mtd) { seq_printf(m, "mtd%d: %8.8llx %8.8x \"%s\"\n", mtd->index, (unsigned long long)mtd->size, mtd->erasesize, mtd->name); } - mutex_unlock(&mtd_table_mutex); + mtd_table_mutex_unlock(); return 0; } #endif /* CONFIG_PROC_FS */ diff --git a/drivers/mtd/mtdcore.h b/drivers/mtd/mtdcore.h index b5eefeabf310..40c5f104794c 100644 --- a/drivers/mtd/mtdcore.h +++ b/drivers/mtd/mtdcore.h @@ -4,7 +4,6 @@ * You should not use them for _anything_ else. */ -extern struct mutex mtd_table_mutex; extern struct backing_dev_info *mtd_bdi; struct mtd_info *__mtd_next_device(int i); @@ -22,6 +21,9 @@ void mtd_part_parser_cleanup(struct mtd_partitions *parts); int __init init_mtdchar(void); void __exit cleanup_mtdchar(void); +extern void mtd_table_mutex_lock(void); +extern void mtd_table_mutex_unlock(void); +extern void mtd_table_assert_mutex_locked(void); #define mtd_for_each_device(mtd) \ for ((mtd) = __mtd_next_device(0); \ -- 2.22.0

From: Zefan Li <lizefan@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------- If a cgroup root is dying, show its hierarchy_id and num_cgroups as 0. Signed-off-by: Zefan Li <lizefan@huawei.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Changchun Yu <yuchangchun1@huawei.com> Reviewed-by: Zefan Li <lizefan@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Lu Jialin <lujialin4@huawei.com> Reviewed-by: xiu jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- kernel/cgroup/cgroup-v1.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index f6dddb3a8f4a..c7bd9d634e8e 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -655,6 +655,7 @@ int proc_cgroupstats_show(struct seq_file *m, void *v) { struct cgroup_subsys *ss; int i; + bool dead; seq_puts(m, "#subsys_name\thierarchy\tnum_cgroups\tenabled\n"); /* @@ -665,10 +666,13 @@ int proc_cgroupstats_show(struct seq_file *m, void *v) mutex_lock(&cgroup_mutex); for_each_subsys(ss, i) + for_each_subsys(ss, i) { + dead = percpu_ref_is_dying(&ss->root->cgrp.self.refcnt); seq_printf(m, "%s\t%d\t%d\t%d\n", - ss->legacy_name, ss->root->hierarchy_id, - atomic_read(&ss->root->nr_cgrps), + ss->legacy_name, dead ? 0 : ss->root->hierarchy_id, + dead ? 0 : atomic_read(&ss->root->nr_cgrps), cgroup_ssid_enabled(i)); + } mutex_unlock(&cgroup_mutex); return 0; -- 2.22.0

From: Zefan Li <lizefan@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------- Since commit 3c606d35fe97 ("cgroup: prevent mount hang due to memory controller lifetime"), a cgroup root won't be destroyed if there are any child cgroups, dead or alive. This introduced a small regression. # cat test.sh mount -t cgroup -o cpuset xxx /cgroup mkdir /cgroup/tmp rmdir /cgroup/tmp umount /cgroup After running this script, you'll probably find the cgroup hierarchy is still active. # cat /proc/cgroups | grep cpuset #subsys_name hierarchy num_cgroups enabled cpuset 1 1 1 ... Fix this by waiting for a while when umount. Now run the script again and you'll see: # cat /proc/cgroups | grep cpuset #subsys_name hierarchy num_cgroups enabled cpuset 0 1 1 ... Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Zefan Li <lizefan@huawei.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Changchun Yu <yuchangchun1@huawei.com> Reviewed-by: Zefan Li <lizefan@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Lu Jialin <lujialin4@huawei.com> conflict: kernel/cgroup/cgroup.c Reviewed-by: xiu jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- include/linux/cgroup-defs.h | 3 +++ kernel/cgroup/cgroup.c | 16 +++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index fee0b5547cd0..ffec16930b00 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -509,6 +509,9 @@ struct cgroup_root { /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps; + /* Wait while cgroups are being destroyed */ + wait_queue_head_t wait; + /* A list running through the active hierarchies */ struct list_head root_list; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index c8b811e039cc..416bec1428d0 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1914,6 +1914,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); + init_waitqueue_head(&root->wait); root->flags = ctx->flags; if (ctx->release_agent) @@ -2139,6 +2140,17 @@ static void cgroup_kill_sb(struct super_block *sb) struct kernfs_root *kf_root = kernfs_root_from_sb(sb); struct cgroup_root *root = cgroup_root_from_kf(kf_root); + /* + * Wait if there are cgroups being destroyed, because the destruction + * is asynchronous. On the other hand some controllers like memcg + * may pin cgroups for a very long time, so don't wait forever. + */ + if (root != &cgrp_dfl_root) { + wait_event_timeout(root->wait, + list_empty(&root->cgrp.self.children), + msecs_to_jiffies(500)); + } + /* * If @root doesn't have any children, start killing it. * This prevents new mounts by disabling percpu_ref_tryget_live(). @@ -5025,8 +5037,10 @@ static void css_release_work_fn(struct work_struct *work) if (cgrp->kn) RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv, NULL); + if (css->parent && !css->parent->parent && + list_empty(&css->parent->children)) + wake_up(&cgrp->root->wait); } - mutex_unlock(&cgroup_mutex); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); -- 2.22.0

From: Cheng Jian <cj.chengjian@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ---------------------------------------- e221d028bb ("sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled") only fixes isolated CPUs leaving root_task_group, and not fix all other ordinary task_groutask_group. In some scenarios where we need attach task bind to isolated CPUs in task_group, the same problem will occur. Isolated CPUs and non-isolate CPUs are not in the same root_domain. and the hrtimer only check the cpumask of this_rq's root_domain. so when the handler of RT_BANDWIDTH hrtimer is running on the isolated CPU, it will leaved the non-isolated CPUs indefinitely throttled. Because bandwidth period hrtimer can't resume them. and viceversa. Let the bandwidth timer check all the rt_rq of cpu_online_mask. Signed-off-by: Cheng Jian <cj.chengjian@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Lu Jialin <lujialin4@huawei.com> Reviewed-by: xiu jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- kernel/sched/rt.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 40f1183f3e94..71a04c491508 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -858,16 +858,14 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) span = sched_rt_period_mask(); #ifdef CONFIG_RT_GROUP_SCHED /* - * FIXME: isolated CPUs should really leave the root task group, - * whether they are isolcpus or were isolated via cpusets, lest - * the timer run on a CPU which does not service all runqueues, - * potentially leaving other CPUs indefinitely throttled. If - * isolation is really required, the user will turn the throttle - * off to kill the perturbations it causes anyway. Meanwhile, - * this maintains functionality for boot and/or troubleshooting. + * When the tasks in the task_group run on either isolated + * CPUs or non-isolated CPUs, whether they are isolcpus or + * were isolated via cpusets, check all the online rt_rq + * to lest the timer run on a CPU which does not service + * all runqueues, potentially leaving other CPUs indefinitely + * throttled. */ - if (rt_b == &root_task_group.rt_bandwidth) - span = cpu_online_mask; + span = cpu_online_mask; #endif for_each_cpu(i, span) { int enqueue = 0; -- 2.22.0

From: Xiongfeng Wang <wangxiongfeng2@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------- When I test 'aer-inject' with the following procedures: 1. inject a fatal error into a upstream PCI bridge 2. remove the upstream bridge by sysfs 3. rescan the PCI tree by 'echo 1 > /sys/bus/pci/rescan' 4. execute command 'rmmod aer-inject' 5. remove the upstream bridge by sysfs again I came across the following Oops. [ 799.713238] Internal error: Oops: 96000007 [#1] SMP [ 799.718099] Process bash (pid: 10683, stack limit = 0x00000000125a3b1b) [ 799.724686] CPU: 108 PID: 10683 Comm: bash Kdump: loaded Not tainted 4.19.36 #2 [ 799.731962] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019 [ 799.739325] pstate: 40400009 (nZcv daif +PAN -UAO) [ 799.744104] pc : pci_remove_bus+0xc0/0x1c0 [ 799.748182] lr : pci_remove_bus+0x94/0x1c0 [ 799.752260] sp : ffffa02e335df940 [ 799.755560] x29: ffffa02e335df940 x28: ffff2000088216a8 [ 799.760849] x27: 1ffff405c66bbfbc x26: ffff20000a9518c0 [ 799.766139] x25: ffffa02dea6ec418 x24: 1ffff405bd4dd883 [ 799.771427] x23: ffffa02e72576628 x22: 1ffff405ce4aecc0 [ 799.776715] x21: ffffa02e72576608 x20: ffff200002e75080 [ 799.782003] x19: ffffa02e72576600 x18: 0000000000000000 [ 799.787291] x17: 0000000000000000 x16: 0000000000000000 [ 799.792578] x15: 0000000000000001 x14: dfff200000000000 [ 799.797866] x13: ffff20000a6dfaf0 x12: 0000000000000000 [ 799.803154] x11: 1fffe4000159b217 x10: ffff04000159b217 [ 799.808442] x9 : dfff200000000000 x8 : ffff20000acd90bf [ 799.813730] x7 : 0000000000000000 x6 : 0000000000000000 [ 799.819017] x5 : 0000000000000001 x4 : 0000000000000000 [ 799.824306] x3 : 1ffff405dbe62603 x2 : 1fffe400005cea11 [ 799.829593] x1 : dfff200000000000 x0 : ffff200002e75088 [ 799.834882] Call trace: [ 799.837323] pci_remove_bus+0xc0/0x1c0 [ 799.841056] pci_remove_bus_device+0xd0/0x2f0 [ 799.845392] pci_stop_and_remove_bus_device_locked+0x2c/0x40 [ 799.851028] remove_store+0x1b8/0x1d0 [ 799.854679] dev_attr_store+0x60/0x80 [ 799.858330] sysfs_kf_write+0x104/0x170 [ 799.862149] kernfs_fop_write+0x23c/0x430 [ 799.866143] __vfs_write+0xec/0x4e0 [ 799.869615] vfs_write+0x12c/0x3d0 [ 799.873001] ksys_write+0xd0/0x190 [ 799.876389] __arm64_sys_write+0x70/0xa0 [ 799.880298] el0_svc_common+0xfc/0x278 [ 799.884030] el0_svc_handler+0x50/0xc0 [ 799.887764] el0_svc+0x8/0xc [ 799.890634] Code: d2c40001 f2fbffe1 91002280 d343fc02 (38e16841) [ 799.896700] kernel fault(0x1) notification starting on CPU 108 It is because when we alloc a new bus in rescanning process, the 'pci_ops' of the newly allocced 'pci_bus' is inherited from its parent pci bus. Whereas, the 'pci_ops' of the parent bus may be changed to 'aer_inj_pci_ops' in 'aer_inject()'. When we unload the module 'aer_inject', we only restore the 'pci_ops' for the pci bus of the error-injected device and the root port in 'aer_inject_exit'. After we have unloaded the module, the 'pci_ops' of the newly allocced pci bus is still 'aer_inj_pci_ops'. When we access it, an Oops happened. This patch add a member 'backup_ops' in 'struct pci_bus' to record the original 'ops'. When we alloc a child pci bus, we assign the 'backup_ops' of the parent bus to the 'ops' of the child bus. Maybe the best way is to not modify the 'pci_ops' in 'struct pci_bus', but this will refactor the 'aer_inject' framework a lot. I haven't found a better way to handle it. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Conflicts: drivers/pci/probe.c include/linux/pci.h Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/pci/probe.c | 12 +++++++++--- include/linux/pci.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ece90a23936d..0187fe60b7d8 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -896,6 +896,7 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) bus->sysdata = bridge->sysdata; bus->msi = bridge->msi; bus->ops = bridge->ops; + bus->backup_ops = bus->ops; bus->number = bus->busn_res.start = bridge->busnr; #ifdef CONFIG_PCI_DOMAINS_GENERIC bus->domain_nr = pci_bus_find_domain_nr(bus, parent); @@ -1057,10 +1058,15 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, child->bus_flags = parent->bus_flags; host = pci_find_host_bridge(parent); - if (host->child_ops) + if (host->child_ops) { child->ops = host->child_ops; - else - child->ops = parent->ops; + } else { + if (parent->backup_ops) + child->ops = parent->backup_ops; + else + child->ops = parent->ops; + } + child->backup_ops = child->ops; /* * Initialize some portions of the bus device, but don't register diff --git a/include/linux/pci.h b/include/linux/pci.h index 22207a79762c..936da8925d68 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -615,6 +615,7 @@ struct pci_bus { struct resource busn_res; /* Bus numbers routed to this bus */ struct pci_ops *ops; /* Configuration access functions */ + struct pci_ops *backup_ops; struct msi_controller *msi; /* MSI controller */ void *sysdata; /* Hook for sys-specific extension */ struct proc_dir_entry *procdir; /* Directory entry in /proc/bus/pci */ -- 2.22.0

From: Xiongfeng Wang <wangxiongfeng2@huawei.com> ohos inclusion category: feature issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------- When I test 'aer-inject' with the following procedures: 1. inject a fatal error into a PCI device 2. remove the parent device by sysfs 3. execute command 'rmmod aer-inject' I came across the following use-after-free. [ 297.581524] ================================================================== [ 297.581543] BUG: KASAN: use-after-free in pci_bus_set_ops+0xb4/0xb8 [ 297.581545] Read of size 8 at addr ffff802edbde80e0 by task rmmod/21839 [ 297.581552] CPU: 119 PID: 21839 Comm: rmmod Kdump: loaded Not tainted 4.19.36 #1 [ 297.581554] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019 [ 297.581556] Call trace: [ 297.581561] dump_backtrace+0x0/0x360 [ 297.581563] show_stack+0x24/0x30 [ 297.581569] dump_stack+0xd8/0x104 [ 297.581576] print_address_description+0x68/0x278 [ 297.581578] kasan_report+0x204/0x330 [ 297.581580] __asan_report_load8_noabort+0x30/0x40 [ 297.581582] pci_bus_set_ops+0xb4/0xb8 [ 297.581591] aer_inject_exit+0x198/0x334 [aer_inject] [ 297.581595] __arm64_sys_delete_module+0x310/0x490 [ 297.581601] el0_svc_common+0xfc/0x278 [ 297.581603] el0_svc_handler+0x50/0xc0 [ 297.581605] el0_svc+0x8/0xc [ 297.581608] Allocated by task 1: [ 297.581611] kasan_kmalloc+0xe0/0x190 [ 297.581614] kmem_cache_alloc_trace+0x104/0x218 [ 297.581616] pci_alloc_bus+0x50/0x2e0 [ 297.581618] pci_add_new_bus+0xa8/0xe08 [ 297.581620] pci_scan_bridge_extend+0x884/0xb28 [ 297.581623] pci_scan_child_bus_extend+0x350/0x628 [ 297.581625] pci_scan_child_bus+0x24/0x30 [ 297.581627] pci_scan_bridge_extend+0x3b8/0xb28 [ 297.581629] pci_scan_child_bus_extend+0x350/0x628 [ 297.581631] pci_scan_child_bus+0x24/0x30 [ 297.581635] acpi_pci_root_create+0x558/0x888 [ 297.581640] pci_acpi_scan_root+0x198/0x330 [ 297.581641] acpi_pci_root_add+0x7bc/0xbb0 [ 297.581646] acpi_bus_attach+0x2f4/0x728 [ 297.581647] acpi_bus_attach+0x1b0/0x728 [ 297.581649] acpi_bus_attach+0x1b0/0x728 [ 297.581651] acpi_bus_scan+0xa0/0x110 [ 297.581657] acpi_scan_init+0x20c/0x500 [ 297.581659] acpi_init+0x54c/0x5d4 [ 297.581661] do_one_initcall+0xbc/0x480 [ 297.581665] kernel_init_freeable+0x5fc/0x6ac [ 297.581670] kernel_init+0x18/0x128 [ 297.581671] ret_from_fork+0x10/0x18 [ 297.581673] Freed by task 19270: [ 297.581675] __kasan_slab_free+0x120/0x228 [ 297.581677] kasan_slab_free+0x10/0x18 [ 297.581678] kfree+0x80/0x1f8 [ 297.581680] release_pcibus_dev+0x54/0x68 [ 297.581686] device_release+0xd4/0x1c0 [ 297.581689] kobject_put+0x12c/0x400 [ 297.581691] device_unregister+0x30/0xc0 [ 297.581693] pci_remove_bus+0xe8/0x1c0 [ 297.581695] pci_remove_bus_device+0xd0/0x2f0 [ 297.581697] pci_stop_and_remove_bus_device_locked+0x2c/0x40 [ 297.581701] remove_store+0x1b8/0x1d0 [ 297.581703] dev_attr_store+0x60/0x80 [ 297.581708] sysfs_kf_write+0x104/0x170 [ 297.581710] kernfs_fop_write+0x23c/0x430 [ 297.581713] __vfs_write+0xec/0x4e0 [ 297.581714] vfs_write+0x12c/0x3d0 [ 297.581715] ksys_write+0xd0/0x190 [ 297.581716] __arm64_sys_write+0x70/0xa0 [ 297.581718] el0_svc_common+0xfc/0x278 [ 297.581720] el0_svc_handler+0x50/0xc0 [ 297.581721] el0_svc+0x8/0xc [ 297.581724] The buggy address belongs to the object at ffff802edbde8000 which belongs to the cache kmalloc-2048 of size 2048 [ 297.581726] The buggy address is located 224 bytes inside of 2048-byte region [ffff802edbde8000, ffff802edbde8800) [ 297.581727] The buggy address belongs to the page: [ 297.581730] page:ffff7e00bb6f7a00 count:1 mapcount:0 mapping:ffff8026de810780 index:0x0 compound_mapcount: 0 [ 297.591520] flags: 0x2ffffe0000008100(slab|head) [ 297.596121] raw: 2ffffe0000008100 ffff7e00bb6f5008 ffff7e00bb6ff608 ffff8026de810780 [ 297.596123] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000 [ 297.596124] page dumped because: kasan: bad access detected [ 297.596126] Memory state around the buggy address: [ 297.596128] ffff802edbde7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 297.596129] ffff802edbde8000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596131] >ffff802edbde8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596132] ^ [ 297.596133] ffff802edbde8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596135] ffff802edbde8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 297.596135] ================================================================== It is because when we unload the module and restore the member 'pci_ops' of 'pci_bus', the 'pci_bus' has been freed. This patch increments the reference count of 'pci_bus' when we modify its member 'pci_ops' and decrements the reference count after we have restored its member. Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Conflicts: drivers/pci/pcie/aer/aer_inject.c Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/pci/bus.c | 2 ++ drivers/pci/pcie/aer_inject.c | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 3cef835b375f..ff1170dd172d 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -413,9 +413,11 @@ struct pci_bus *pci_bus_get(struct pci_bus *bus) get_device(&bus->dev); return bus; } +EXPORT_SYMBOL(pci_bus_get); void pci_bus_put(struct pci_bus *bus) { if (bus) put_device(&bus->dev); } +EXPORT_SYMBOL(pci_bus_put); diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c index c2cbf425afc5..4dc1d95f085b 100644 --- a/drivers/pci/pcie/aer_inject.c +++ b/drivers/pci/pcie/aer_inject.c @@ -26,6 +26,7 @@ #include <linux/device.h> #include "portdrv.h" +#include "../pci.h" /* Override the existing corrected and uncorrected error masks */ static bool aer_mask_override; @@ -307,6 +308,13 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus) spin_lock_irqsave(&inject_lock, flags); if (ops == &aer_inj_pci_ops) goto out; + /* + * increments the reference count of the pci bus. Otherwise, when we + * restore the 'pci_ops' in 'aer_inject_exit', the 'pci_bus' may have + * been freed. + */ + pci_bus_get(bus); + pci_bus_ops_init(bus_ops, bus, ops); list_add(&bus_ops->list, &pci_bus_ops_list); bus_ops = NULL; @@ -527,6 +535,7 @@ static void __exit aer_inject_exit(void) while ((bus_ops = pci_bus_ops_pop())) { pci_bus_set_ops(bus_ops->bus, bus_ops->ops); + pci_bus_put(bus_ops->bus); kfree(bus_ops); } -- 2.22.0

From: Yang Yingliang <yangyingliang@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: CVE-2019-16230 ------------------------------------------------- check the alloc_workqueue return value in radeon_crtc_init() to avoid null-ptr-deref. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/gpu/drm/radeon/radeon_display.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index e0ae911ef427..b709836fe3d3 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -689,11 +689,16 @@ static void radeon_crtc_init(struct drm_device *dev, int index) if (radeon_crtc == NULL) return; + radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", WQ_HIGHPRI, 0); + if (!radeon_crtc->flip_queue) { + kfree(radeon_crtc); + return; + } + drm_crtc_init(dev, &radeon_crtc->base, &radeon_crtc_funcs); drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index; - radeon_crtc->flip_queue = alloc_workqueue("radeon-crtc", WQ_HIGHPRI, 0); rdev->mode_info.crtcs[index] = radeon_crtc; if (rdev->family >= CHIP_BONAIRE) { -- 2.22.0

From: Lu Jialin <lujialin4@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------- When echo a Z process into tasks, it should return -ERSCH instead of 0. Signed-off-by: Lu Jialin <lujialin4@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- kernel/cgroup/cgroup.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 416bec1428d0..adfc3bc04c0c 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2697,6 +2697,7 @@ int cgroup_migrate_prepare_dst(struct cgroup_mgctx *mgctx) int cgroup_migrate(struct task_struct *leader, bool threadgroup, struct cgroup_mgctx *mgctx) { + int err = 0; struct task_struct *task; /* @@ -2709,13 +2710,16 @@ int cgroup_migrate(struct task_struct *leader, bool threadgroup, task = leader; do { cgroup_migrate_add_task(task, mgctx); - if (!threadgroup) + if (!threadgroup) { + if (task->flags & PF_EXITING) + err = -ESRCH; break; + } } while_each_thread(leader, task); rcu_read_unlock(); spin_unlock_irq(&css_set_lock); - return cgroup_migrate_execute(mgctx); + return err ? err : cgroup_migrate_execute(mgctx); } /** -- 2.22.0

From: liujian <liujian56@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA --------------------------- syzkaller triggered a UBSAN warning: [ 196.188950] UBSAN: Undefined behaviour in drivers/input/input.c:62:23 [ 196.188958] signed integer overflow: [ 196.188964] -2147483647 - 104 cannot be represented in type 'int [2]' [ 196.188973] CPU: 7 PID: 4763 Comm: syz-executor Not tainted 4.19.0-514.55.6.9.x86_64+ #7 [ 196.188977] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 196.188979] Call Trace: [ 196.189001] dump_stack+0x91/0xeb [ 196.189014] ubsan_epilogue+0x9/0x7c [ 196.189020] handle_overflow+0x1d7/0x22c [ 196.189028] ? __ubsan_handle_negate_overflow+0x18f/0x18f [ 196.189038] ? __mutex_lock+0x213/0x13f0 [ 196.189053] ? drop_futex_key_refs+0xa0/0xa0 [ 196.189070] ? __might_fault+0xef/0x1b0 [ 196.189096] input_handle_event+0xe1b/0x1290 [ 196.189108] input_inject_event+0x1d7/0x27e [ 196.189119] evdev_write+0x2cf/0x3f0 [ 196.189129] ? evdev_pass_values+0xd40/0xd40 [ 196.189157] ? mark_held_locks+0x160/0x160 [ 196.189171] ? __vfs_write+0xe0/0x6c0 [ 196.189175] ? evdev_pass_values+0xd40/0xd40 [ 196.189179] __vfs_write+0xe0/0x6c0 [ 196.189186] ? kernel_read+0x130/0x130 [ 196.189204] ? _cond_resched+0x15/0x30 [ 196.189214] ? __inode_security_revalidate+0xb8/0xe0 [ 196.189222] ? selinux_file_permission+0x354/0x430 [ 196.189233] vfs_write+0x160/0x440 [ 196.189242] ksys_write+0xc1/0x190 [ 196.189248] ? __ia32_sys_read+0xb0/0xb0 [ 196.189259] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 196.189267] ? do_syscall_64+0x22/0x4a0 [ 196.189276] do_syscall_64+0xa5/0x4a0 [ 196.189287] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 196.189293] RIP: 0033:0x44e7c9 [ 196.189299] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 the syzkaller reproduce script(but can't reproduce it every time): r0 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) write$binfmt_elf64(r0, &(0x7f0000000240)={{0x7f, 0x45, 0x4c, 0x46, 0x40, 0x2, 0x2, 0xffffffff, 0xffffffffffff374c, 0x3, 0x0, 0x80000001, 0x103, 0x40, 0x22e, 0x26, 0x1, 0x38, 0x2, 0xa23, 0x1, 0x2}, [{0x6474e557, 0x5, 0x6, 0x2, 0x9, 0x9, 0x6c3, 0x1ff}], "", [[], [], [], []]}, 0x478) ioctl$EVIOCGSW(0xffffffffffffffff, 0x8040451b, &(0x7f0000000040)=""/7) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) r1 = syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) openat$smack_task_current(0xffffffffffffff9c, &(0x7f0000000040)='/proc/self/attr/current\x00', 0x2, 0x0) ioctl$EVIOCSABS0(r1, 0x401845c0, &(0x7f0000000000)={0x4, 0x10000, 0x4, 0xd1, 0x81, 0x3}) eventfd(0x1ff) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x200) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) syz_open_dev$evdev(&(0x7f0000000100)='/dev/input/event#\x00', 0x2, 0x1) Typecast int to long to fix the issue. Signed-off-by: liujian <liujian56@huawei.com> Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Xiang Yang <xiangyang3@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/input/input.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/input/input.c b/drivers/input/input.c index 3cfd2c18eebd..cf6f557428c0 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -56,14 +56,17 @@ static inline int is_event_supported(unsigned int code, static int input_defuzz_abs_event(int value, int old_val, int fuzz) { if (fuzz) { - if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2) + if (value > (long)old_val - fuzz / 2 && + value < (long)old_val + fuzz / 2) return old_val; - if (value > old_val - fuzz && value < old_val + fuzz) - return (old_val * 3 + value) / 4; + if (value > (long)old_val - fuzz && + value < (long)old_val + fuzz) + return ((long)old_val * 3 + value) / 4; - if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2) - return (old_val + value) / 2; + if (value > (long)old_val - fuzz * 2 && + value < (long)old_val + fuzz * 2) + return ((long)old_val + value) / 2; } return value; -- 2.22.0

From: Chen Jun <chenjun102@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------- If build Image without CONFIG_PM_SLEEP, there would be a compile warning: warning: ‘cdn_dp_resume’ defined but not used [-Wunused-function] Because SET_SYSTEM_SLEEP_PM_OPS will do nothing without CONFIG_PM_SLEEP. Make cdn_dp_resume depend on CONFIG_PM_SLEEP Signed-off-by: Chen Jun <chenjun102@huawei.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/gpu/drm/rockchip/cdn-dp-core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/rockchip/cdn-dp-core.c b/drivers/gpu/drm/rockchip/cdn-dp-core.c index a4a45daf93f2..063a60d213ba 100644 --- a/drivers/gpu/drm/rockchip/cdn-dp-core.c +++ b/drivers/gpu/drm/rockchip/cdn-dp-core.c @@ -1121,6 +1121,7 @@ static int cdn_dp_suspend(struct device *dev) return ret; } +#ifdef CONFIG_PM_SLEEP static int cdn_dp_resume(struct device *dev) { struct cdn_dp_device *dp = dev_get_drvdata(dev); @@ -1133,6 +1134,7 @@ static int cdn_dp_resume(struct device *dev) return 0; } +#endif static int cdn_dp_probe(struct platform_device *pdev) { -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit 81414b4dd48f596bf33e1b32c2e43e2047150ca6 issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- Superblock is written out either through ext4_commit_super() or through ext4_handle_dirty_super(). In both cases we recompute the checksum so it is not necessary to recompute it after updating superblock free inodes & blocks counters. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 21c4ba2513ce..6e211dc615b3 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5007,13 +5007,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) block = ext4_count_free_clusters(sb); ext4_free_blocks_count_set(sbi->s_es, EXT4_C2B(sbi, block)); - ext4_superblock_csum_set(sb); err = percpu_counter_init(&sbi->s_freeclusters_counter, block, GFP_KERNEL); if (!err) { unsigned long freei = ext4_count_free_inodes(sb); sbi->s_es->s_free_inodes_count = cpu_to_le32(freei); - ext4_superblock_csum_set(sb); err = percpu_counter_init(&sbi->s_freeinodes_counter, freei, GFP_KERNEL); } -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit 93c20bc3eafba52c134cf5183f18833b9bd22bf8 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- We use __ext4_error() when ext4_protect_reserved_inode() finds filesystem corruption. However EXT4_ERROR_INODE_ERR() is perfectly capable of reporting all the needed information. So just use that. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/block_validity.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c index 8e6ca23ed172..13ffda871227 100644 --- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -176,12 +176,10 @@ static int ext4_protect_reserved_inode(struct super_block *sb, err = add_system_zone(system_blks, map.m_pblk, n, ino); if (err < 0) { if (err == -EFSCORRUPTED) { - __ext4_error(sb, __func__, __LINE__, - -err, map.m_pblk, - "blocks %llu-%llu from inode %u overlap system zone", - map.m_pblk, - map.m_pblk + map.m_len - 1, - ino); + EXT4_ERROR_INODE_ERR(inode, -err, + "blocks %llu-%llu from inode overlap system zone", + map.m_pblk, + map.m_pblk + map.m_len - 1); } break; } -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit 014c9caa29d3a44e0de695c99ef18bec3e887d52 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- The only difference between __ext4_abort() and __ext4_error() is that the former one ignores errors=continue mount option. Unify the code to reduce duplication. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/ext4.h | 29 +++++++--------- fs/ext4/ext4_jbd2.c | 4 +-- fs/ext4/inode.c | 2 +- fs/ext4/super.c | 84 ++++++++++++--------------------------------- 4 files changed, 37 insertions(+), 82 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 7cae226b000f..5d7f2995fac6 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2955,9 +2955,9 @@ extern void ext4_mark_group_bitmap_corrupted(struct super_block *sb, ext4_group_t block_group, unsigned int flags); -extern __printf(6, 7) -void __ext4_error(struct super_block *, const char *, unsigned int, int, __u64, - const char *, ...); +extern __printf(7, 8) +void __ext4_error(struct super_block *, const char *, unsigned int, bool, + int, __u64, const char *, ...); extern __printf(6, 7) void __ext4_error_inode(struct inode *, const char *, unsigned int, ext4_fsblk_t, int, const char *, ...); @@ -2966,9 +2966,6 @@ void __ext4_error_file(struct file *, const char *, unsigned int, ext4_fsblk_t, const char *, ...); extern void __ext4_std_error(struct super_block *, const char *, unsigned int, int); -extern __printf(5, 6) -void __ext4_abort(struct super_block *, const char *, unsigned int, int, - const char *, ...); extern __printf(4, 5) void __ext4_warning(struct super_block *, const char *, unsigned int, const char *, ...); @@ -2998,6 +2995,9 @@ void __ext4_grp_locked_error(const char *, unsigned int, #define EXT4_ERROR_FILE(file, block, fmt, a...) \ ext4_error_file((file), __func__, __LINE__, (block), (fmt), ## a) +#define ext4_abort(sb, err, fmt, a...) \ + __ext4_error((sb), __func__, __LINE__, true, (err), 0, (fmt), ## a) + #ifdef CONFIG_PRINTK #define ext4_error_inode(inode, func, line, block, fmt, ...) \ @@ -3008,11 +3008,11 @@ void __ext4_grp_locked_error(const char *, unsigned int, #define ext4_error_file(file, func, line, block, fmt, ...) \ __ext4_error_file(file, func, line, block, fmt, ##__VA_ARGS__) #define ext4_error(sb, fmt, ...) \ - __ext4_error((sb), __func__, __LINE__, 0, 0, (fmt), ##__VA_ARGS__) + __ext4_error((sb), __func__, __LINE__, false, 0, 0, (fmt), \ + ##__VA_ARGS__) #define ext4_error_err(sb, err, fmt, ...) \ - __ext4_error((sb), __func__, __LINE__, (err), 0, (fmt), ##__VA_ARGS__) -#define ext4_abort(sb, err, fmt, ...) \ - __ext4_abort((sb), __func__, __LINE__, (err), (fmt), ##__VA_ARGS__) + __ext4_error((sb), __func__, __LINE__, false, (err), 0, (fmt), \ + ##__VA_ARGS__) #define ext4_warning(sb, fmt, ...) \ __ext4_warning(sb, __func__, __LINE__, fmt, ##__VA_ARGS__) #define ext4_warning_inode(inode, fmt, ...) \ @@ -3045,17 +3045,12 @@ do { \ #define ext4_error(sb, fmt, ...) \ do { \ no_printk(fmt, ##__VA_ARGS__); \ - __ext4_error(sb, "", 0, 0, 0, " "); \ + __ext4_error(sb, "", 0, false, 0, 0, " "); \ } while (0) #define ext4_error_err(sb, err, fmt, ...) \ do { \ no_printk(fmt, ##__VA_ARGS__); \ - __ext4_error(sb, "", 0, err, 0, " "); \ -} while (0) -#define ext4_abort(sb, err, fmt, ...) \ -do { \ - no_printk(fmt, ##__VA_ARGS__); \ - __ext4_abort(sb, "", 0, err, " "); \ + __ext4_error(sb, "", 0, false, err, 0, " "); \ } while (0) #define ext4_warning(sb, fmt, ...) \ do { \ diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 0fd0c42a4f7d..1a0a827a7f34 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -296,8 +296,8 @@ int __ext4_forget(const char *where, unsigned int line, handle_t *handle, if (err) { ext4_journal_abort_handle(where, line, __func__, bh, handle, err); - __ext4_abort(inode->i_sb, where, line, -err, - "error %d when attempting revoke", err); + __ext4_error(inode->i_sb, where, line, true, -err, 0, + "error %d when attempting revoke", err); } BUFFER_TRACE(bh, "exit"); return err; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3f11c948feb0..e6bed50e1786 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4619,7 +4619,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, (ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))) { if (flags & EXT4_IGET_HANDLE) return ERR_PTR(-ESTALE); - __ext4_error(sb, function, line, EFSCORRUPTED, 0, + __ext4_error(sb, function, line, false, EFSCORRUPTED, 0, "inode #%lu: comm %s: iget: illegal inode #", ino, current->comm); return ERR_PTR(-EFSCORRUPTED); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6e211dc615b3..a39098cf5e7d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -662,16 +662,21 @@ static bool system_going_down(void) * We'll just use the jbd2_journal_abort() error code to record an error in * the journal instead. On recovery, the journal will complain about * that error until we've noted it down and cleared it. + * + * If force_ro is set, we unconditionally force the filesystem into an + * ABORT|READONLY state, unless the error response on the fs has been set to + * panic in which case we take the easy way out and panic immediately. This is + * used to deal with unrecoverable failures such as journal IO errors or ENOMEM + * at a critical moment in log management. */ - -static void ext4_handle_error(struct super_block *sb) +static void ext4_handle_error(struct super_block *sb, bool force_ro) { journal_t *journal = EXT4_SB(sb)->s_journal; if (test_opt(sb, WARN_ON_ERROR)) WARN_ON_ONCE(1); - if (sb_rdonly(sb) || test_opt(sb, ERRORS_CONT)) + if (sb_rdonly(sb) || (!force_ro && test_opt(sb, ERRORS_CONT))) return; ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); @@ -682,18 +687,17 @@ static void ext4_handle_error(struct super_block *sb) * could panic during 'reboot -f' as the underlying device got already * disabled. */ - if (test_opt(sb, ERRORS_RO) || system_going_down()) { - ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only"); - /* - * Make sure updated value of ->s_mount_flags will be visible - * before ->s_flags update - */ - smp_wmb(); - sb->s_flags |= SB_RDONLY; - } else if (test_opt(sb, ERRORS_PANIC)) { + if (test_opt(sb, ERRORS_PANIC) && !system_going_down()) { panic("EXT4-fs (device %s): panic forced after error\n", sb->s_id); } + ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only"); + /* + * Make sure updated value of ->s_mount_flags will be visible before + * ->s_flags update + */ + smp_wmb(); + sb->s_flags |= SB_RDONLY; } #define ext4_error_ratelimit(sb) \ @@ -701,7 +705,7 @@ static void ext4_handle_error(struct super_block *sb) "EXT4-fs error") void __ext4_error(struct super_block *sb, const char *function, - unsigned int line, int error, __u64 block, + unsigned int line, bool force_ro, int error, __u64 block, const char *fmt, ...) { struct va_format vaf; @@ -721,7 +725,7 @@ void __ext4_error(struct super_block *sb, const char *function, va_end(args); } save_error_info(sb, error, 0, block, function, line); - ext4_handle_error(sb); + ext4_handle_error(sb, force_ro); } void __ext4_error_inode(struct inode *inode, const char *function, @@ -753,7 +757,7 @@ void __ext4_error_inode(struct inode *inode, const char *function, } save_error_info(inode->i_sb, error, inode->i_ino, block, function, line); - ext4_handle_error(inode->i_sb); + ext4_handle_error(inode->i_sb, false); } void __ext4_error_file(struct file *file, const char *function, @@ -792,7 +796,7 @@ void __ext4_error_file(struct file *file, const char *function, } save_error_info(inode->i_sb, EFSCORRUPTED, inode->i_ino, block, function, line); - ext4_handle_error(inode->i_sb); + ext4_handle_error(inode->i_sb, false); } const char *ext4_decode_error(struct super_block *sb, int errno, @@ -860,51 +864,7 @@ void __ext4_std_error(struct super_block *sb, const char *function, } save_error_info(sb, -errno, 0, 0, function, line); - ext4_handle_error(sb); -} - -/* - * ext4_abort is a much stronger failure handler than ext4_error. The - * abort function may be used to deal with unrecoverable failures such - * as journal IO errors or ENOMEM at a critical moment in log management. - * - * We unconditionally force the filesystem into an ABORT|READONLY state, - * unless the error response on the fs has been set to panic in which - * case we take the easy way out and panic immediately. - */ - -void __ext4_abort(struct super_block *sb, const char *function, - unsigned int line, int error, const char *fmt, ...) -{ - struct va_format vaf; - va_list args; - - if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) - return; - - save_error_info(sb, error, 0, 0, function, line); - va_start(args, fmt); - vaf.fmt = fmt; - vaf.va = &args; - printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: %pV\n", - sb->s_id, function, line, &vaf); - va_end(args); - - if (sb_rdonly(sb) == 0) { - ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); - if (EXT4_SB(sb)->s_journal) - jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO); - - ext4_msg(sb, KERN_CRIT, "Remounting filesystem read-only"); - /* - * Make sure updated value of ->s_mount_flags will be visible - * before ->s_flags update - */ - smp_wmb(); - sb->s_flags |= SB_RDONLY; - } - if (test_opt(sb, ERRORS_PANIC) && !system_going_down()) - panic("EXT4-fs panic from previous error\n"); + ext4_handle_error(sb, false); } void __ext4_msg(struct super_block *sb, @@ -1007,7 +967,7 @@ __acquires(bitlock) ext4_unlock_group(sb, grp); ext4_commit_super(sb, 1); - ext4_handle_error(sb); + ext4_handle_error(sb, false); /* * We only get here in the ERRORS_RO case; relocking the group * may be dangerous, but nothing bad will happen since the -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit 4067662388f97d0f360e568820d9d5bac6a3c9fa category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- Just move error info related functions in super.c close to ext4_handle_error(). We'll want to combine save_error_info() with ext4_handle_error() and this makes change more obvious and saves a forward declaration as well. No functional change. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 196 ++++++++++++++++++++++++------------------------ 1 file changed, 98 insertions(+), 98 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a39098cf5e7d..0594b08da8f6 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -423,104 +423,6 @@ static time64_t __ext4_get_tstamp(__le32 *lo, __u8 *hi) #define ext4_get_tstamp(es, tstamp) \ __ext4_get_tstamp(&(es)->tstamp, &(es)->tstamp ## _hi) -static void __save_error_info(struct super_block *sb, int error, - __u32 ino, __u64 block, - const char *func, unsigned int line) -{ - struct ext4_super_block *es = EXT4_SB(sb)->s_es; - int err; - - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - if (bdev_read_only(sb->s_bdev)) - return; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_update_tstamp(es, s_last_error_time); - strncpy(es->s_last_error_func, func, sizeof(es->s_last_error_func)); - es->s_last_error_line = cpu_to_le32(line); - es->s_last_error_ino = cpu_to_le32(ino); - es->s_last_error_block = cpu_to_le64(block); - switch (error) { - case EIO: - err = EXT4_ERR_EIO; - break; - case ENOMEM: - err = EXT4_ERR_ENOMEM; - break; - case EFSBADCRC: - err = EXT4_ERR_EFSBADCRC; - break; - case 0: - case EFSCORRUPTED: - err = EXT4_ERR_EFSCORRUPTED; - break; - case ENOSPC: - err = EXT4_ERR_ENOSPC; - break; - case ENOKEY: - err = EXT4_ERR_ENOKEY; - break; - case EROFS: - err = EXT4_ERR_EROFS; - break; - case EFBIG: - err = EXT4_ERR_EFBIG; - break; - case EEXIST: - err = EXT4_ERR_EEXIST; - break; - case ERANGE: - err = EXT4_ERR_ERANGE; - break; - case EOVERFLOW: - err = EXT4_ERR_EOVERFLOW; - break; - case EBUSY: - err = EXT4_ERR_EBUSY; - break; - case ENOTDIR: - err = EXT4_ERR_ENOTDIR; - break; - case ENOTEMPTY: - err = EXT4_ERR_ENOTEMPTY; - break; - case ESHUTDOWN: - err = EXT4_ERR_ESHUTDOWN; - break; - case EFAULT: - err = EXT4_ERR_EFAULT; - break; - default: - err = EXT4_ERR_UNKNOWN; - } - es->s_last_error_errcode = err; - if (!es->s_first_error_time) { - es->s_first_error_time = es->s_last_error_time; - es->s_first_error_time_hi = es->s_last_error_time_hi; - strncpy(es->s_first_error_func, func, - sizeof(es->s_first_error_func)); - es->s_first_error_line = cpu_to_le32(line); - es->s_first_error_ino = es->s_last_error_ino; - es->s_first_error_block = es->s_last_error_block; - es->s_first_error_errcode = es->s_last_error_errcode; - } - /* - * Start the daily error reporting function if it hasn't been - * started already - */ - if (!es->s_error_count) - mod_timer(&EXT4_SB(sb)->s_err_report, jiffies + 24*60*60*HZ); - le32_add_cpu(&es->s_error_count, 1); -} - -static void save_error_info(struct super_block *sb, int error, - __u32 ino, __u64 block, - const char *func, unsigned int line) -{ - __save_error_info(sb, error, ino, block, func, line); - if (!bdev_read_only(sb->s_bdev)) - ext4_commit_super(sb, 1); -} - /* * The del_gendisk() function uninitializes the disk-specific data * structures, including the bdi structure, without telling anyone @@ -649,6 +551,104 @@ static bool system_going_down(void) || system_state == SYSTEM_RESTART; } +static void __save_error_info(struct super_block *sb, int error, + __u32 ino, __u64 block, + const char *func, unsigned int line) +{ + struct ext4_super_block *es = EXT4_SB(sb)->s_es; + int err; + + EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; + if (bdev_read_only(sb->s_bdev)) + return; + es->s_state |= cpu_to_le16(EXT4_ERROR_FS); + ext4_update_tstamp(es, s_last_error_time); + strncpy(es->s_last_error_func, func, sizeof(es->s_last_error_func)); + es->s_last_error_line = cpu_to_le32(line); + es->s_last_error_ino = cpu_to_le32(ino); + es->s_last_error_block = cpu_to_le64(block); + switch (error) { + case EIO: + err = EXT4_ERR_EIO; + break; + case ENOMEM: + err = EXT4_ERR_ENOMEM; + break; + case EFSBADCRC: + err = EXT4_ERR_EFSBADCRC; + break; + case 0: + case EFSCORRUPTED: + err = EXT4_ERR_EFSCORRUPTED; + break; + case ENOSPC: + err = EXT4_ERR_ENOSPC; + break; + case ENOKEY: + err = EXT4_ERR_ENOKEY; + break; + case EROFS: + err = EXT4_ERR_EROFS; + break; + case EFBIG: + err = EXT4_ERR_EFBIG; + break; + case EEXIST: + err = EXT4_ERR_EEXIST; + break; + case ERANGE: + err = EXT4_ERR_ERANGE; + break; + case EOVERFLOW: + err = EXT4_ERR_EOVERFLOW; + break; + case EBUSY: + err = EXT4_ERR_EBUSY; + break; + case ENOTDIR: + err = EXT4_ERR_ENOTDIR; + break; + case ENOTEMPTY: + err = EXT4_ERR_ENOTEMPTY; + break; + case ESHUTDOWN: + err = EXT4_ERR_ESHUTDOWN; + break; + case EFAULT: + err = EXT4_ERR_EFAULT; + break; + default: + err = EXT4_ERR_UNKNOWN; + } + es->s_last_error_errcode = err; + if (!es->s_first_error_time) { + es->s_first_error_time = es->s_last_error_time; + es->s_first_error_time_hi = es->s_last_error_time_hi; + strncpy(es->s_first_error_func, func, + sizeof(es->s_first_error_func)); + es->s_first_error_line = cpu_to_le32(line); + es->s_first_error_ino = es->s_last_error_ino; + es->s_first_error_block = es->s_last_error_block; + es->s_first_error_errcode = es->s_last_error_errcode; + } + /* + * Start the daily error reporting function if it hasn't been + * started already + */ + if (!es->s_error_count) + mod_timer(&EXT4_SB(sb)->s_err_report, jiffies + 24*60*60*HZ); + le32_add_cpu(&es->s_error_count, 1); +} + +static void save_error_info(struct super_block *sb, int error, + __u32 ino, __u64 block, + const char *func, unsigned int line) +{ + __save_error_info(sb, error, ino, block, func, line); + if (!bdev_read_only(sb->s_bdev)) + ext4_commit_super(sb, 1); +} + /* Deal with the reporting of failure conditions on a filesystem such as * inconsistencies detected or read IO failures. * -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit 02a7780e4d2fcf438ac6773bc469e7ada2af56be category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- We convert errno's to ext4 on-disk format error codes in save_error_info(). Add a function and a bit of macro magic to make this simpler. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 95 +++++++++++++++++++++---------------------------- 1 file changed, 40 insertions(+), 55 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0594b08da8f6..c62f826a6893 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -551,76 +551,61 @@ static bool system_going_down(void) || system_state == SYSTEM_RESTART; } +struct ext4_err_translation { + int code; + int errno; +}; + +#define EXT4_ERR_TRANSLATE(err) { .code = EXT4_ERR_##err, .errno = err } + +static struct ext4_err_translation err_translation[] = { + EXT4_ERR_TRANSLATE(EIO), + EXT4_ERR_TRANSLATE(ENOMEM), + EXT4_ERR_TRANSLATE(EFSBADCRC), + EXT4_ERR_TRANSLATE(EFSCORRUPTED), + EXT4_ERR_TRANSLATE(ENOSPC), + EXT4_ERR_TRANSLATE(ENOKEY), + EXT4_ERR_TRANSLATE(EROFS), + EXT4_ERR_TRANSLATE(EFBIG), + EXT4_ERR_TRANSLATE(EEXIST), + EXT4_ERR_TRANSLATE(ERANGE), + EXT4_ERR_TRANSLATE(EOVERFLOW), + EXT4_ERR_TRANSLATE(EBUSY), + EXT4_ERR_TRANSLATE(ENOTDIR), + EXT4_ERR_TRANSLATE(ENOTEMPTY), + EXT4_ERR_TRANSLATE(ESHUTDOWN), + EXT4_ERR_TRANSLATE(EFAULT), +}; + +static int ext4_errno_to_code(int errno) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(err_translation); i++) + if (err_translation[i].errno == errno) + return err_translation[i].code; + return EXT4_ERR_UNKNOWN; +} + static void __save_error_info(struct super_block *sb, int error, __u32 ino, __u64 block, const char *func, unsigned int line) { struct ext4_super_block *es = EXT4_SB(sb)->s_es; - int err; EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; if (bdev_read_only(sb->s_bdev)) return; + /* We default to EFSCORRUPTED error... */ + if (error == 0) + error = EFSCORRUPTED; es->s_state |= cpu_to_le16(EXT4_ERROR_FS); ext4_update_tstamp(es, s_last_error_time); strncpy(es->s_last_error_func, func, sizeof(es->s_last_error_func)); es->s_last_error_line = cpu_to_le32(line); es->s_last_error_ino = cpu_to_le32(ino); es->s_last_error_block = cpu_to_le64(block); - switch (error) { - case EIO: - err = EXT4_ERR_EIO; - break; - case ENOMEM: - err = EXT4_ERR_ENOMEM; - break; - case EFSBADCRC: - err = EXT4_ERR_EFSBADCRC; - break; - case 0: - case EFSCORRUPTED: - err = EXT4_ERR_EFSCORRUPTED; - break; - case ENOSPC: - err = EXT4_ERR_ENOSPC; - break; - case ENOKEY: - err = EXT4_ERR_ENOKEY; - break; - case EROFS: - err = EXT4_ERR_EROFS; - break; - case EFBIG: - err = EXT4_ERR_EFBIG; - break; - case EEXIST: - err = EXT4_ERR_EEXIST; - break; - case ERANGE: - err = EXT4_ERR_ERANGE; - break; - case EOVERFLOW: - err = EXT4_ERR_EOVERFLOW; - break; - case EBUSY: - err = EXT4_ERR_EBUSY; - break; - case ENOTDIR: - err = EXT4_ERR_ENOTDIR; - break; - case ENOTEMPTY: - err = EXT4_ERR_ENOTEMPTY; - break; - case ESHUTDOWN: - err = EXT4_ERR_ESHUTDOWN; - break; - case EFAULT: - err = EXT4_ERR_EFAULT; - break; - default: - err = EXT4_ERR_UNKNOWN; - } - es->s_last_error_errcode = err; + es->s_last_error_errcode = ext4_errno_to_code(error); if (!es->s_first_error_time) { es->s_first_error_time = es->s_last_error_time; es->s_first_error_time_hi = es->s_last_error_time_hi; -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc1 commit c92dc856848f32781e37b88c1b7f875e274f5efb category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- When filesystem inconsistency is detected with group locked, we currently try to modify superblock to store error there without blocking. However this can cause superblock checksum failures (or DIF/DIX failure) when the superblock is just being written out. Make error handling code just store error information in ext4_sb_info structure and copy it to on-disk superblock only in ext4_commit_super(). In case of error happening with group locked, we just postpone the superblock flushing to a workqueue. [ Added fixup so that s_first_error_* does not get updated after the file system is remounted. Also added fix for syzbot failure. - Ted ] Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/ext4.h | 21 +++++++++ fs/ext4/super.c | 120 +++++++++++++++++++++++++++++++++--------------- 2 files changed, 104 insertions(+), 37 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5d7f2995fac6..c62dabee25a9 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1620,6 +1620,27 @@ struct ext4_sb_info { errseq_t s_bdev_wb_err; spinlock_t s_bdev_wb_lock; + /* Information about errors that happened during this mount */ + spinlock_t s_error_lock; + int s_add_error_count; + int s_first_error_code; + __u32 s_first_error_line; + __u32 s_first_error_ino; + __u64 s_first_error_block; + const char *s_first_error_func; + time64_t s_first_error_time; + int s_last_error_code; + __u32 s_last_error_line; + __u32 s_last_error_ino; + __u64 s_last_error_block; + const char *s_last_error_func; + time64_t s_last_error_time; + /* + * If we are in a context where we cannot update error information in + * the on-disk superblock, we queue this work to do it. + */ + struct work_struct s_error_work; + /* Ext4 fast commit stuff */ atomic_t s_fc_subtid; atomic_t s_fc_ineligible_updates; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c62f826a6893..9cd4d7453d18 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -404,10 +404,8 @@ void ext4_itable_unused_set(struct super_block *sb, bg->bg_itable_unused_hi = cpu_to_le16(count >> 16); } -static void __ext4_update_tstamp(__le32 *lo, __u8 *hi) +static void __ext4_update_tstamp(__le32 *lo, __u8 *hi, time64_t now) { - time64_t now = ktime_get_real_seconds(); - now = clamp_val(now, 0, (1ull << 40) - 1); *lo = cpu_to_le32(lower_32_bits(now)); @@ -419,7 +417,8 @@ static time64_t __ext4_get_tstamp(__le32 *lo, __u8 *hi) return ((time64_t)(*hi) << 32) + le32_to_cpu(*lo); } #define ext4_update_tstamp(es, tstamp) \ - __ext4_update_tstamp(&(es)->tstamp, &(es)->tstamp ## _hi) + __ext4_update_tstamp(&(es)->tstamp, &(es)->tstamp ## _hi, \ + ktime_get_real_seconds()) #define ext4_get_tstamp(es, tstamp) \ __ext4_get_tstamp(&(es)->tstamp, &(es)->tstamp ## _hi) @@ -591,7 +590,7 @@ static void __save_error_info(struct super_block *sb, int error, __u32 ino, __u64 block, const char *func, unsigned int line) { - struct ext4_super_block *es = EXT4_SB(sb)->s_es; + struct ext4_sb_info *sbi = EXT4_SB(sb); EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; if (bdev_read_only(sb->s_bdev)) @@ -599,30 +598,24 @@ static void __save_error_info(struct super_block *sb, int error, /* We default to EFSCORRUPTED error... */ if (error == 0) error = EFSCORRUPTED; - es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_update_tstamp(es, s_last_error_time); - strncpy(es->s_last_error_func, func, sizeof(es->s_last_error_func)); - es->s_last_error_line = cpu_to_le32(line); - es->s_last_error_ino = cpu_to_le32(ino); - es->s_last_error_block = cpu_to_le64(block); - es->s_last_error_errcode = ext4_errno_to_code(error); - if (!es->s_first_error_time) { - es->s_first_error_time = es->s_last_error_time; - es->s_first_error_time_hi = es->s_last_error_time_hi; - strncpy(es->s_first_error_func, func, - sizeof(es->s_first_error_func)); - es->s_first_error_line = cpu_to_le32(line); - es->s_first_error_ino = es->s_last_error_ino; - es->s_first_error_block = es->s_last_error_block; - es->s_first_error_errcode = es->s_last_error_errcode; - } - /* - * Start the daily error reporting function if it hasn't been - * started already - */ - if (!es->s_error_count) - mod_timer(&EXT4_SB(sb)->s_err_report, jiffies + 24*60*60*HZ); - le32_add_cpu(&es->s_error_count, 1); + + spin_lock(&sbi->s_error_lock); + sbi->s_add_error_count++; + sbi->s_last_error_code = error; + sbi->s_last_error_line = line; + sbi->s_last_error_ino = ino; + sbi->s_last_error_block = block; + sbi->s_last_error_func = func; + sbi->s_last_error_time = ktime_get_real_seconds(); + if (!sbi->s_first_error_time) { + sbi->s_first_error_code = error; + sbi->s_first_error_line = line; + sbi->s_first_error_ino = ino; + sbi->s_first_error_block = block; + sbi->s_first_error_func = func; + sbi->s_first_error_time = sbi->s_last_error_time; + } + spin_unlock(&sbi->s_error_lock); } static void save_error_info(struct super_block *sb, int error, @@ -685,6 +678,14 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro) sb->s_flags |= SB_RDONLY; } +static void flush_stashed_error_work(struct work_struct *work) +{ + struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info, + s_error_work); + + ext4_commit_super(sbi->s_sb, 1); +} + #define ext4_error_ratelimit(sb) \ ___ratelimit(&(EXT4_SB(sb)->s_err_ratelimit_state), \ "EXT4-fs error") @@ -925,8 +926,6 @@ __acquires(bitlock) return; trace_ext4_error(sb, function, line); - __save_error_info(sb, EFSCORRUPTED, ino, block, function, line); - if (ext4_error_ratelimit(sb)) { va_start(args, fmt); vaf.fmt = fmt; @@ -942,16 +941,15 @@ __acquires(bitlock) va_end(args); } - if (test_opt(sb, WARN_ON_ERROR)) - WARN_ON_ONCE(1); - if (test_opt(sb, ERRORS_CONT)) { - ext4_commit_super(sb, 0); + if (test_opt(sb, WARN_ON_ERROR)) + WARN_ON_ONCE(1); + __save_error_info(sb, EFSCORRUPTED, ino, block, function, line); + schedule_work(&EXT4_SB(sb)->s_error_work); return; } - ext4_unlock_group(sb, grp); - ext4_commit_super(sb, 1); + save_error_info(sb, EFSCORRUPTED, ino, block, function, line); ext4_handle_error(sb, false); /* * We only get here in the ERRORS_RO case; relocking the group @@ -1124,6 +1122,7 @@ static void ext4_put_super(struct super_block *sb) ext4_unregister_li_request(sb); ext4_quota_off_umount(sb); + flush_work(&sbi->s_error_work); destroy_workqueue(sbi->rsv_conversion_wq); /* @@ -4657,6 +4656,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) } timer_setup(&sbi->s_err_report, print_daily_error_info, 0); + spin_lock_init(&sbi->s_error_lock); + INIT_WORK(&sbi->s_error_work, flush_stashed_error_work); /* Register extent status tree shrinker */ if (ext4_es_register_shrinker(sbi)) @@ -5108,6 +5109,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ext4_es_unregister_shrinker(sbi); failed_mount3: del_timer_sync(&sbi->s_err_report); + flush_work(&sbi->s_error_work); if (sbi->s_mmp_tsk) kthread_stop(sbi->s_mmp_tsk); failed_mount2: @@ -5435,6 +5437,7 @@ static int ext4_load_journal(struct super_block *sb, static int ext4_commit_super(struct super_block *sb, int sync) { + struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_super_block *es = EXT4_SB(sb)->s_es; struct buffer_head *sbh = EXT4_SB(sb)->s_sbh; int error = 0; @@ -5473,6 +5476,46 @@ static int ext4_commit_super(struct super_block *sb, int sync) es->s_free_inodes_count = cpu_to_le32(percpu_counter_sum_positive( &EXT4_SB(sb)->s_freeinodes_counter)); + /* Copy error information to the on-disk superblock */ + spin_lock(&sbi->s_error_lock); + if (sbi->s_add_error_count > 0) { + es->s_state |= cpu_to_le16(EXT4_ERROR_FS); + if (!es->s_first_error_time && !es->s_first_error_time_hi) { + __ext4_update_tstamp(&es->s_first_error_time, + &es->s_first_error_time_hi, + sbi->s_first_error_time); + strncpy(es->s_first_error_func, sbi->s_first_error_func, + sizeof(es->s_first_error_func)); + es->s_first_error_line = + cpu_to_le32(sbi->s_first_error_line); + es->s_first_error_ino = + cpu_to_le32(sbi->s_first_error_ino); + es->s_first_error_block = + cpu_to_le64(sbi->s_first_error_block); + es->s_first_error_errcode = + ext4_errno_to_code(sbi->s_first_error_code); + } + __ext4_update_tstamp(&es->s_last_error_time, + &es->s_last_error_time_hi, + sbi->s_last_error_time); + strncpy(es->s_last_error_func, sbi->s_last_error_func, + sizeof(es->s_last_error_func)); + es->s_last_error_line = cpu_to_le32(sbi->s_last_error_line); + es->s_last_error_ino = cpu_to_le32(sbi->s_last_error_ino); + es->s_last_error_block = cpu_to_le64(sbi->s_last_error_block); + es->s_last_error_errcode = + ext4_errno_to_code(sbi->s_last_error_code); + /* + * Start the daily error reporting function if it hasn't been + * started already + */ + if (!es->s_error_count) + mod_timer(&sbi->s_err_report, jiffies + 24*60*60*HZ); + le32_add_cpu(&es->s_error_count, sbi->s_add_error_count); + sbi->s_add_error_count = 0; + } + spin_unlock(&sbi->s_error_lock); + BUFFER_TRACE(sbh, "marking dirty"); ext4_superblock_csum_set(sb); if (sync) @@ -5826,6 +5869,9 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); } + /* Flush outstanding errors before changing fs state */ + flush_work(&sbi->s_error_work); + if ((bool)(*flags & SB_RDONLY) != sb_rdonly(sb)) { if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) { err = -EROFS; -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit e789ca0cc1d51296832b8424fa4008ce6e9d1703 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- save_error_info() is always called together with ext4_handle_error(). Combine them into a single call and move unconditional bits out of save_error_info() into ext4_handle_error(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9cd4d7453d18..ea4c8f2c03a7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -592,9 +592,6 @@ static void __save_error_info(struct super_block *sb, int error, { struct ext4_sb_info *sbi = EXT4_SB(sb); - EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - if (bdev_read_only(sb->s_bdev)) - return; /* We default to EFSCORRUPTED error... */ if (error == 0) error = EFSCORRUPTED; @@ -647,13 +644,19 @@ static void save_error_info(struct super_block *sb, int error, * used to deal with unrecoverable failures such as journal IO errors or ENOMEM * at a critical moment in log management. */ -static void ext4_handle_error(struct super_block *sb, bool force_ro) +static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, + __u32 ino, __u64 block, + const char *func, unsigned int line) { journal_t *journal = EXT4_SB(sb)->s_journal; + EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; if (test_opt(sb, WARN_ON_ERROR)) WARN_ON_ONCE(1); + if (!bdev_read_only(sb->s_bdev)) + save_error_info(sb, error, ino, block, func, line); + if (sb_rdonly(sb) || (!force_ro && test_opt(sb, ERRORS_CONT))) return; @@ -710,8 +713,7 @@ void __ext4_error(struct super_block *sb, const char *function, sb->s_id, function, line, current->comm, &vaf); va_end(args); } - save_error_info(sb, error, 0, block, function, line); - ext4_handle_error(sb, force_ro); + ext4_handle_error(sb, force_ro, error, 0, block, function, line); } void __ext4_error_inode(struct inode *inode, const char *function, @@ -741,9 +743,8 @@ void __ext4_error_inode(struct inode *inode, const char *function, current->comm, &vaf); va_end(args); } - save_error_info(inode->i_sb, error, inode->i_ino, block, - function, line); - ext4_handle_error(inode->i_sb, false); + ext4_handle_error(inode->i_sb, false, error, inode->i_ino, block, + function, line); } void __ext4_error_file(struct file *file, const char *function, @@ -780,9 +781,8 @@ void __ext4_error_file(struct file *file, const char *function, current->comm, path, &vaf); va_end(args); } - save_error_info(inode->i_sb, EFSCORRUPTED, inode->i_ino, block, - function, line); - ext4_handle_error(inode->i_sb, false); + ext4_handle_error(inode->i_sb, false, EFSCORRUPTED, inode->i_ino, block, + function, line); } const char *ext4_decode_error(struct super_block *sb, int errno, @@ -849,8 +849,7 @@ void __ext4_std_error(struct super_block *sb, const char *function, sb->s_id, function, line, errstr); } - save_error_info(sb, -errno, 0, 0, function, line); - ext4_handle_error(sb, false); + ext4_handle_error(sb, false, -errno, 0, 0, function, line); } void __ext4_msg(struct super_block *sb, @@ -944,13 +943,14 @@ __acquires(bitlock) if (test_opt(sb, ERRORS_CONT)) { if (test_opt(sb, WARN_ON_ERROR)) WARN_ON_ONCE(1); + EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; __save_error_info(sb, EFSCORRUPTED, ino, block, function, line); - schedule_work(&EXT4_SB(sb)->s_error_work); + if (!bdev_read_only(sb->s_bdev)) + schedule_work(&EXT4_SB(sb)->s_error_work); return; } ext4_unlock_group(sb, grp); - save_error_info(sb, EFSCORRUPTED, ino, block, function, line); - ext4_handle_error(sb, false); + ext4_handle_error(sb, false, EFSCORRUPTED, ino, block, function, line); /* * We only get here in the ERRORS_RO case; relocking the group * may be dangerous, but nothing bad will happen since the -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit 4392fbc4bab57db3760f0fb61258cb7089b37665 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- Everybody passes 1 as sync argument of ext4_commit_super(). Just drop it. Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 47 ++++++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 25 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ea4c8f2c03a7..a1f7bf52ed2f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -65,7 +65,7 @@ static struct ratelimit_state ext4_mount_msg_ratelimit; static int ext4_load_journal(struct super_block *, struct ext4_super_block *, unsigned long journal_devnum); static int ext4_show_options(struct seq_file *seq, struct dentry *root); -static int ext4_commit_super(struct super_block *sb, int sync); +static int ext4_commit_super(struct super_block *sb); static int ext4_mark_recovery_complete(struct super_block *sb, struct ext4_super_block *es); static int ext4_clear_journal_err(struct super_block *sb, @@ -621,7 +621,7 @@ static void save_error_info(struct super_block *sb, int error, { __save_error_info(sb, error, ino, block, func, line); if (!bdev_read_only(sb->s_bdev)) - ext4_commit_super(sb, 1); + ext4_commit_super(sb); } /* Deal with the reporting of failure conditions on a filesystem such as @@ -686,7 +686,7 @@ static void flush_stashed_error_work(struct work_struct *work) struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info, s_error_work); - ext4_commit_super(sbi->s_sb, 1); + ext4_commit_super(sbi->s_sb); } #define ext4_error_ratelimit(sb) \ @@ -1152,7 +1152,7 @@ static void ext4_put_super(struct super_block *sb) es->s_state = cpu_to_le16(sbi->s_mount_state); } if (!sb_rdonly(sb)) - ext4_commit_super(sb, 1); + ext4_commit_super(sb); rcu_read_lock(); group_desc = rcu_dereference(sbi->s_group_desc); @@ -2643,7 +2643,7 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es, if (sbi->s_journal) ext4_set_feature_journal_needs_recovery(sb); - err = ext4_commit_super(sb, 1); + err = ext4_commit_super(sb); done: if (test_opt(sb, DEBUG)) printk(KERN_INFO "[EXT4 FS bs=%lu, gc=%u, " @@ -4858,7 +4858,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) if (DUMMY_ENCRYPTION_ENABLED(sbi) && !sb_rdonly(sb) && !ext4_has_feature_encrypt(sb)) { ext4_set_feature_encrypt(sb); - ext4_commit_super(sb, 1); + ext4_commit_super(sb); } /* @@ -5425,7 +5425,7 @@ static int ext4_load_journal(struct super_block *sb, es->s_journal_dev = cpu_to_le32(journal_devnum); /* Make sure we flush the recovery flag to disk. */ - ext4_commit_super(sb, 1); + ext4_commit_super(sb); } return 0; @@ -5435,7 +5435,7 @@ static int ext4_load_journal(struct super_block *sb, return err; } -static int ext4_commit_super(struct super_block *sb, int sync) +static int ext4_commit_super(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_super_block *es = EXT4_SB(sb)->s_es; @@ -5518,8 +5518,7 @@ static int ext4_commit_super(struct super_block *sb, int sync) BUFFER_TRACE(sbh, "marking dirty"); ext4_superblock_csum_set(sb); - if (sync) - lock_buffer(sbh); + lock_buffer(sbh); if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { /* * Oh, dear. A previous attempt to write the @@ -5535,16 +5534,14 @@ static int ext4_commit_super(struct super_block *sb, int sync) set_buffer_uptodate(sbh); } mark_buffer_dirty(sbh); - if (sync) { - unlock_buffer(sbh); - error = __sync_dirty_buffer(sbh, - REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0)); - if (buffer_write_io_error(sbh)) { - ext4_msg(sb, KERN_ERR, "I/O error while writing " - "superblock"); - clear_buffer_write_io_error(sbh); - set_buffer_uptodate(sbh); - } + unlock_buffer(sbh); + error = __sync_dirty_buffer(sbh, + REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0)); + if (buffer_write_io_error(sbh)) { + ext4_msg(sb, KERN_ERR, "I/O error while writing " + "superblock"); + clear_buffer_write_io_error(sbh); + set_buffer_uptodate(sbh); } return error; } @@ -5575,7 +5572,7 @@ static int ext4_mark_recovery_complete(struct super_block *sb, if (ext4_has_feature_journal_needs_recovery(sb) && sb_rdonly(sb)) { ext4_clear_feature_journal_needs_recovery(sb); - ext4_commit_super(sb, 1); + ext4_commit_super(sb); } out: jbd2_journal_unlock_updates(journal); @@ -5617,7 +5614,7 @@ static int ext4_clear_journal_err(struct super_block *sb, EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; es->s_state |= cpu_to_le16(EXT4_ERROR_FS); - ext4_commit_super(sb, 1); + ext4_commit_super(sb); jbd2_journal_clear_err(journal); jbd2_journal_update_sb_errno(journal); @@ -5719,7 +5716,7 @@ static int ext4_freeze(struct super_block *sb) ext4_clear_feature_journal_needs_recovery(sb); } - error = ext4_commit_super(sb, 1); + error = ext4_commit_super(sb); out: if (journal) /* we rely on upper layer to stop further updates */ @@ -5741,7 +5738,7 @@ static int ext4_unfreeze(struct super_block *sb) ext4_set_feature_journal_needs_recovery(sb); } - ext4_commit_super(sb, 1); + ext4_commit_super(sb); return 0; } @@ -6001,7 +5998,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) } if (sbi->s_journal == NULL && !(old_sb_flags & SB_RDONLY)) { - err = ext4_commit_super(sb, 1); + err = ext4_commit_super(sb); if (err) goto restore_opts; } -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit 05c2c00f3769abb9e323fcaca70d2de0b48af7ba category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- Protect all superblock modifications (including checksum computation) with a superblock buffer lock. That way we are sure computed checksum matches current superblock contents (a mismatch could cause checksum failures in nojournal mode or if an unjournalled superblock update races with a journalled one). Also we avoid modifying superblock contents while it is being written out (which can cause DIF/DIX failures if we are running in nojournal mode). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> conflicts: fs/ext4/file.c Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/ext4_jbd2.c | 1 - fs/ext4/file.c | 3 +++ fs/ext4/inode.c | 3 +++ fs/ext4/namei.c | 6 ++++++ fs/ext4/resize.c | 12 ++++++++++++ fs/ext4/super.c | 2 +- fs/ext4/xattr.c | 3 +++ 7 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 1a0a827a7f34..c7e410c4ab7d 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -379,7 +379,6 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, struct buffer_head *bh = EXT4_SB(sb)->s_sbh; int err = 0; - ext4_superblock_csum_set(sb); if (ext4_handle_valid(handle)) { err = jbd2_journal_dirty_metadata(handle, bh); if (err) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 7b28d44b0ddd..41e2eababdaf 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -826,8 +826,11 @@ static int ext4_sample_last_mounted(struct super_block *sb, err = ext4_journal_get_write_access(handle, sbi->s_sbh); if (err) goto out_journal; + lock_buffer(sbi->s_sbh); strncpy(sbi->s_es->s_last_mounted, cp, sizeof(sbi->s_es->s_last_mounted)); + ext4_superblock_csum_set(sb); + unlock_buffer(sbi->s_sbh); ext4_handle_dirty_super(handle, sb); out_journal: ext4_journal_stop(handle); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e6bed50e1786..769d06ba95d3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5150,7 +5150,10 @@ static int ext4_do_update_inode(handle_t *handle, err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); if (err) goto out_brelse; + lock_buffer(EXT4_SB(sb)->s_sbh); ext4_set_feature_large_file(sb); + ext4_superblock_csum_set(sb); + unlock_buffer(EXT4_SB(sb)->s_sbh); ext4_handle_sync(handle); err = ext4_handle_dirty_super(handle, sb); } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index ab7baf529917..8458079f61cc 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2984,7 +2984,10 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) (le32_to_cpu(sbi->s_es->s_inodes_count))) { /* Insert this inode at the head of the on-disk orphan list */ NEXT_ORPHAN(inode) = le32_to_cpu(sbi->s_es->s_last_orphan); + lock_buffer(sbi->s_sbh); sbi->s_es->s_last_orphan = cpu_to_le32(inode->i_ino); + ext4_superblock_csum_set(sb); + unlock_buffer(sbi->s_sbh); dirty = true; } list_add(&EXT4_I(inode)->i_orphan, &sbi->s_orphan); @@ -3067,7 +3070,10 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode) mutex_unlock(&sbi->s_orphan_lock); goto out_brelse; } + lock_buffer(sbi->s_sbh); sbi->s_es->s_last_orphan = cpu_to_le32(ino_next); + ext4_superblock_csum_set(inode->i_sb); + unlock_buffer(sbi->s_sbh); mutex_unlock(&sbi->s_orphan_lock); err = ext4_handle_dirty_super(handle, inode->i_sb); } else { diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 928700d57eb6..6155f2b9538c 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -899,7 +899,10 @@ static int add_new_gdb(handle_t *handle, struct inode *inode, EXT4_SB(sb)->s_gdb_count++; ext4_kvfree_array_rcu(o_group_desc); + lock_buffer(EXT4_SB(sb)->s_sbh); le16_add_cpu(&es->s_reserved_gdt_blocks, -1); + ext4_superblock_csum_set(sb); + unlock_buffer(EXT4_SB(sb)->s_sbh); err = ext4_handle_dirty_super(handle, sb); if (err) ext4_std_error(sb, err); @@ -1384,6 +1387,7 @@ static void ext4_update_super(struct super_block *sb, reserved_blocks *= blocks_count; do_div(reserved_blocks, 100); + lock_buffer(sbi->s_sbh); ext4_blocks_count_set(es, ext4_blocks_count(es) + blocks_count); ext4_free_blocks_count_set(es, ext4_free_blocks_count(es) + free_blocks); le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb) * @@ -1421,6 +1425,8 @@ static void ext4_update_super(struct super_block *sb, * active. */ ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) + reserved_blocks); + ext4_superblock_csum_set(sb); + unlock_buffer(sbi->s_sbh); /* Update the free space counts */ percpu_counter_add(&sbi->s_freeclusters_counter, @@ -1717,8 +1723,11 @@ static int ext4_group_extend_no_check(struct super_block *sb, goto errout; } + lock_buffer(EXT4_SB(sb)->s_sbh); ext4_blocks_count_set(es, o_blocks_count + add); ext4_free_blocks_count_set(es, ext4_free_blocks_count(es) + add); + ext4_superblock_csum_set(sb); + unlock_buffer(EXT4_SB(sb)->s_sbh); ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count, o_blocks_count + add); /* We add the blocks to the bitmap and set the group need init bit */ @@ -1874,10 +1883,13 @@ static int ext4_convert_meta_bg(struct super_block *sb, struct inode *inode) if (err) goto errout; + lock_buffer(sbi->s_sbh); ext4_clear_feature_resize_inode(sb); ext4_set_feature_meta_bg(sb); sbi->s_es->s_first_meta_bg = cpu_to_le32(num_desc_blocks(sb, sbi->s_groups_count)); + ext4_superblock_csum_set(sb); + unlock_buffer(sbi->s_sbh); err = ext4_handle_dirty_super(handle, sb); if (err) { diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a1f7bf52ed2f..3e7bbf4c5b19 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5447,6 +5447,7 @@ static int ext4_commit_super(struct super_block *sb) if (block_device_ejected(sb)) return -ENODEV; + lock_buffer(sbh); /* * If the file system is mounted read-only, don't update the * superblock write time. This avoids updating the superblock @@ -5518,7 +5519,6 @@ static int ext4_commit_super(struct super_block *sb) BUFFER_TRACE(sbh, "marking dirty"); ext4_superblock_csum_set(sb); - lock_buffer(sbh); if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { /* * Oh, dear. A previous attempt to write the diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 5462f26907c1..8b9ce0f9502f 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -792,7 +792,10 @@ static void ext4_xattr_update_super_block(handle_t *handle, BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get_write_access"); if (ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh) == 0) { + lock_buffer(EXT4_SB(sb)->s_sbh); ext4_set_feature_xattr(sb); + ext4_superblock_csum_set(sb); + unlock_buffer(EXT4_SB(sb)->s_sbh); ext4_handle_dirty_super(handle, sb); } } -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit 2d01ddc86606564fb08c56e3bc93a0693895f710 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- If journalling is still working at the moment we get to writing error information to the superblock we cannot write directly to the superblock as such write could race with journalled update of the superblock and cause journal checksum failures, writing inconsistent information to the journal or other problems. We cannot journal the superblock directly from the error handling functions as we are running in uncertain context and could deadlock so just punt journalled superblock update to a workqueue. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Conflicts: fs/ext4/super.c [ycc: adjust context for commit 12905cf9e5c4 ("ext4: fix error code in ext4_commit_super") is already merged] Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 105 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 77 insertions(+), 28 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3e7bbf4c5b19..1a871738cf4d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -65,6 +65,7 @@ static struct ratelimit_state ext4_mount_msg_ratelimit; static int ext4_load_journal(struct super_block *, struct ext4_super_block *, unsigned long journal_devnum); static int ext4_show_options(struct seq_file *seq, struct dentry *root); +static void ext4_update_super(struct super_block *sb); static int ext4_commit_super(struct super_block *sb); static int ext4_mark_recovery_complete(struct super_block *sb, struct ext4_super_block *es); @@ -586,9 +587,9 @@ static int ext4_errno_to_code(int errno) return EXT4_ERR_UNKNOWN; } -static void __save_error_info(struct super_block *sb, int error, - __u32 ino, __u64 block, - const char *func, unsigned int line) +static void save_error_info(struct super_block *sb, int error, + __u32 ino, __u64 block, + const char *func, unsigned int line) { struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -615,15 +616,6 @@ static void __save_error_info(struct super_block *sb, int error, spin_unlock(&sbi->s_error_lock); } -static void save_error_info(struct super_block *sb, int error, - __u32 ino, __u64 block, - const char *func, unsigned int line) -{ - __save_error_info(sb, error, ino, block, func, line); - if (!bdev_read_only(sb->s_bdev)) - ext4_commit_super(sb); -} - /* Deal with the reporting of failure conditions on a filesystem such as * inconsistencies detected or read IO failures. * @@ -649,20 +641,35 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, const char *func, unsigned int line) { journal_t *journal = EXT4_SB(sb)->s_journal; + bool continue_fs = !force_ro && test_opt(sb, ERRORS_CONT); EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; if (test_opt(sb, WARN_ON_ERROR)) WARN_ON_ONCE(1); - if (!bdev_read_only(sb->s_bdev)) + if (!continue_fs && !sb_rdonly(sb)) { + ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); + if (journal) + jbd2_journal_abort(journal, -EIO); + } + + if (!bdev_read_only(sb->s_bdev)) { save_error_info(sb, error, ino, block, func, line); + /* + * In case the fs should keep running, we need to writeout + * superblock through the journal. Due to lock ordering + * constraints, it may not be safe to do it right here so we + * defer superblock flushing to a workqueue. + */ + if (continue_fs) + schedule_work(&EXT4_SB(sb)->s_error_work); + else + ext4_commit_super(sb); + } - if (sb_rdonly(sb) || (!force_ro && test_opt(sb, ERRORS_CONT))) + if (sb_rdonly(sb) || continue_fs) return; - ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED); - if (journal) - jbd2_journal_abort(journal, -EIO); /* * We force ERRORS_RO behavior when system is rebooting. Otherwise we * could panic during 'reboot -f' as the underlying device got already @@ -685,7 +692,38 @@ static void flush_stashed_error_work(struct work_struct *work) { struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info, s_error_work); + journal_t *journal = sbi->s_journal; + handle_t *handle; + /* + * If the journal is still running, we have to write out superblock + * through the journal to avoid collisions of other journalled sb + * updates. + * + * We use directly jbd2 functions here to avoid recursing back into + * ext4 error handling code during handling of previous errors. + */ + if (!sb_rdonly(sbi->s_sb) && journal) { + handle = jbd2_journal_start(journal, 1); + if (IS_ERR(handle)) + goto write_directly; + if (jbd2_journal_get_write_access(handle, sbi->s_sbh)) { + jbd2_journal_stop(handle); + goto write_directly; + } + ext4_update_super(sbi->s_sb); + if (jbd2_journal_dirty_metadata(handle, sbi->s_sbh)) { + jbd2_journal_stop(handle); + goto write_directly; + } + jbd2_journal_stop(handle); + return; + } +write_directly: + /* + * Write through journal failed. Write sb directly to get error info + * out and hope for the best. + */ ext4_commit_super(sbi->s_sb); } @@ -944,9 +982,11 @@ __acquires(bitlock) if (test_opt(sb, WARN_ON_ERROR)) WARN_ON_ONCE(1); EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS; - __save_error_info(sb, EFSCORRUPTED, ino, block, function, line); - if (!bdev_read_only(sb->s_bdev)) + if (!bdev_read_only(sb->s_bdev)) { + save_error_info(sb, EFSCORRUPTED, ino, block, function, + line); schedule_work(&EXT4_SB(sb)->s_error_work); + } return; } ext4_unlock_group(sb, grp); @@ -5435,17 +5475,12 @@ static int ext4_load_journal(struct super_block *sb, return err; } -static int ext4_commit_super(struct super_block *sb) +/* Copy state of EXT4_SB(sb) into buffer for on-disk superblock */ +static void ext4_update_super(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_super_block *es = EXT4_SB(sb)->s_es; struct buffer_head *sbh = EXT4_SB(sb)->s_sbh; - int error = 0; - - if (!sbh) - return -EINVAL; - if (block_device_ejected(sb)) - return -ENODEV; lock_buffer(sbh); /* @@ -5517,8 +5552,22 @@ static int ext4_commit_super(struct super_block *sb) } spin_unlock(&sbi->s_error_lock); - BUFFER_TRACE(sbh, "marking dirty"); ext4_superblock_csum_set(sb); + unlock_buffer(sbh); +} + +static int ext4_commit_super(struct super_block *sb) +{ + struct buffer_head *sbh = EXT4_SB(sb)->s_sbh; + int error = 0; + + if (!sbh) + return -EINVAL; + if (block_device_ejected(sb)) + return -ENODEV; + + ext4_update_super(sb); + if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { /* * Oh, dear. A previous attempt to write the @@ -5533,8 +5582,8 @@ static int ext4_commit_super(struct super_block *sb) clear_buffer_write_io_error(sbh); set_buffer_uptodate(sbh); } + BUFFER_TRACE(sbh, "marking dirty"); mark_buffer_dirty(sbh); - unlock_buffer(sbh); error = __sync_dirty_buffer(sbh, REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0)); if (buffer_write_io_error(sbh)) { -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit e92ad03fa53498f12b3f5ecb8822adc3bf815b28 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- No behavioral change. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1a871738cf4d..713a2f210966 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5479,8 +5479,8 @@ static int ext4_load_journal(struct super_block *sb, static void ext4_update_super(struct super_block *sb) { struct ext4_sb_info *sbi = EXT4_SB(sb); - struct ext4_super_block *es = EXT4_SB(sb)->s_es; - struct buffer_head *sbh = EXT4_SB(sb)->s_sbh; + struct ext4_super_block *es = sbi->s_es; + struct buffer_head *sbh = sbi->s_sbh; lock_buffer(sbh); /* @@ -5497,21 +5497,20 @@ static void ext4_update_super(struct super_block *sb) ext4_update_tstamp(es, s_wtime); if (sb->s_bdev->bd_part) es->s_kbytes_written = - cpu_to_le64(EXT4_SB(sb)->s_kbytes_written + + cpu_to_le64(sbi->s_kbytes_written + ((part_stat_read(sb->s_bdev->bd_part, sectors[STAT_WRITE]) - - EXT4_SB(sb)->s_sectors_written_start) >> 1)); + sbi->s_sectors_written_start) >> 1)); else - es->s_kbytes_written = - cpu_to_le64(EXT4_SB(sb)->s_kbytes_written); - if (percpu_counter_initialized(&EXT4_SB(sb)->s_freeclusters_counter)) + es->s_kbytes_written = cpu_to_le64(sbi->s_kbytes_written); + if (percpu_counter_initialized(&sbi->s_freeclusters_counter)) ext4_free_blocks_count_set(es, - EXT4_C2B(EXT4_SB(sb), percpu_counter_sum_positive( - &EXT4_SB(sb)->s_freeclusters_counter))); - if (percpu_counter_initialized(&EXT4_SB(sb)->s_freeinodes_counter)) + EXT4_C2B(sbi, percpu_counter_sum_positive( + &sbi->s_freeclusters_counter))); + if (percpu_counter_initialized(&sbi->s_freeinodes_counter)) es->s_free_inodes_count = cpu_to_le32(percpu_counter_sum_positive( - &EXT4_SB(sb)->s_freeinodes_counter)); + &sbi->s_freeinodes_counter)); /* Copy error information to the on-disk superblock */ spin_lock(&sbi->s_error_lock); if (sbi->s_add_error_count > 0) { -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.11-rc4 commit a3f5cf14ff917d46a4d491cf86210fd639d1ff38 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- The wrapper is now useless since it does what ext4_handle_dirty_metadata() does. Just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/ext4_jbd2.c | 16 ---------------- fs/ext4/ext4_jbd2.h | 5 ----- fs/ext4/file.c | 2 +- fs/ext4/inode.c | 3 ++- fs/ext4/namei.c | 4 ++-- fs/ext4/resize.c | 8 ++++---- fs/ext4/xattr.c | 2 +- 7 files changed, 10 insertions(+), 30 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index c7e410c4ab7d..be799040a415 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -372,19 +372,3 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, } return err; } - -int __ext4_handle_dirty_super(const char *where, unsigned int line, - handle_t *handle, struct super_block *sb) -{ - struct buffer_head *bh = EXT4_SB(sb)->s_sbh; - int err = 0; - - if (ext4_handle_valid(handle)) { - err = jbd2_journal_dirty_metadata(handle, bh); - if (err) - ext4_journal_abort_handle(where, line, __func__, - bh, handle, err); - } else - mark_buffer_dirty(bh); - return err; -} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 00dc668e052b..b9881ee1bf93 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -247,9 +247,6 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, handle_t *handle, struct inode *inode, struct buffer_head *bh); -int __ext4_handle_dirty_super(const char *where, unsigned int line, - handle_t *handle, struct super_block *sb); - #define ext4_journal_get_write_access(handle, bh) \ __ext4_journal_get_write_access(__func__, __LINE__, (handle), (bh)) #define ext4_forget(handle, is_metadata, inode, bh, block_nr) \ @@ -260,8 +257,6 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, #define ext4_handle_dirty_metadata(handle, inode, bh) \ __ext4_handle_dirty_metadata(__func__, __LINE__, (handle), (inode), \ (bh)) -#define ext4_handle_dirty_super(handle, sb) \ - __ext4_handle_dirty_super(__func__, __LINE__, (handle), (sb)) handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, int type, int blocks, int rsv_blocks, diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 41e2eababdaf..3b09ddbe8970 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -831,7 +831,7 @@ static int ext4_sample_last_mounted(struct super_block *sb, sizeof(sbi->s_es->s_last_mounted)); ext4_superblock_csum_set(sb); unlock_buffer(sbi->s_sbh); - ext4_handle_dirty_super(handle, sb); + ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); out_journal: ext4_journal_stop(handle); out: diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 769d06ba95d3..27e896ef3f0b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5155,7 +5155,8 @@ static int ext4_do_update_inode(handle_t *handle, ext4_superblock_csum_set(sb); unlock_buffer(EXT4_SB(sb)->s_sbh); ext4_handle_sync(handle); - err = ext4_handle_dirty_super(handle, sb); + err = ext4_handle_dirty_metadata(handle, NULL, + EXT4_SB(sb)->s_sbh); } ext4_update_inode_fsync_trans(handle, inode, need_datasync); out_brelse: diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 8458079f61cc..41504d9d07cf 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2994,7 +2994,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode) mutex_unlock(&sbi->s_orphan_lock); if (dirty) { - err = ext4_handle_dirty_super(handle, sb); + err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); rc = ext4_mark_iloc_dirty(handle, inode, &iloc); if (!err) err = rc; @@ -3075,7 +3075,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode) ext4_superblock_csum_set(inode->i_sb); unlock_buffer(sbi->s_sbh); mutex_unlock(&sbi->s_orphan_lock); - err = ext4_handle_dirty_super(handle, inode->i_sb); + err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); } else { struct ext4_iloc iloc2; struct inode *i_prev = diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 6155f2b9538c..bd0d185654f3 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -903,7 +903,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode, le16_add_cpu(&es->s_reserved_gdt_blocks, -1); ext4_superblock_csum_set(sb); unlock_buffer(EXT4_SB(sb)->s_sbh); - err = ext4_handle_dirty_super(handle, sb); + err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh); if (err) ext4_std_error(sb, err); return err; @@ -1521,7 +1521,7 @@ static int ext4_flex_group_add(struct super_block *sb, ext4_update_super(sb, flex_gd); - err = ext4_handle_dirty_super(handle, sb); + err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); exit_journal: err2 = ext4_journal_stop(handle); @@ -1734,7 +1734,7 @@ static int ext4_group_extend_no_check(struct super_block *sb, err = ext4_group_add_blocks(handle, sb, o_blocks_count, add); if (err) goto errout; - ext4_handle_dirty_super(handle, sb); + ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh); ext4_debug("freed blocks %llu through %llu\n", o_blocks_count, o_blocks_count + add); errout: @@ -1891,7 +1891,7 @@ static int ext4_convert_meta_bg(struct super_block *sb, struct inode *inode) ext4_superblock_csum_set(sb); unlock_buffer(sbi->s_sbh); - err = ext4_handle_dirty_super(handle, sb); + err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh); if (err) { ext4_std_error(sb, err); goto errout; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 8b9ce0f9502f..2f93e8b90492 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -796,7 +796,7 @@ static void ext4_xattr_update_super_block(handle_t *handle, ext4_set_feature_xattr(sb); ext4_superblock_csum_set(sb); unlock_buffer(EXT4_SB(sb)->s_sbh); - ext4_handle_dirty_super(handle, sb); + ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh); } } -- 2.22.0

From: Jan Kara <jack@suse.cz> mainline inclusion from mainline-v5.12-rc4 commit 2a4ae3bcdf05b8639406eaa09a2939f3c6dd8e75 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- When filesystem mount fails because of corrupted filesystem we first cancel the s_err_report timer reminding fs errors every day and only then we flush s_error_work. However s_error_work may report another fs error and re-arm timer thus resulting in timer use-after-free. Fix the problem by first flushing the work and only after that canceling the s_err_report timer. Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 713a2f210966..18b724390a74 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5148,8 +5148,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) failed_mount3a: ext4_es_unregister_shrinker(sbi); failed_mount3: - del_timer_sync(&sbi->s_err_report); flush_work(&sbi->s_error_work); + del_timer_sync(&sbi->s_err_report); if (sbi->s_mmp_tsk) kthread_stop(sbi->s_mmp_tsk); failed_mount2: -- 2.22.0

From: Xiongfeng Wang <wangxiongfeng2@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA --------------------------------- When we print system information by echo 't' into 'sysrq-trigger' on several cores at the same time, we got the following calltrace. [ 1352.854632] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 [ 1352.854633] Modules linked in: nf_log_arp nf_log_ipv6 nf_log_ipv4 nf_log_common binfmt_misc salsa20_generic camellia_generic cast6_generic cast_common rfkill serpent_generic twofish_generic twofish_common xts lrw tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 sha512_generic loop jprob(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat hns_roce_hw_v2 hns_roce ib_core aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ipmi_ssif ofpart sha256_arm64 sha1_ce cmdlinepart [ 1352.854649] hi_sfc ses enclosure mtd sg sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler spi_dw_mmio sch_fq_codel ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod realtek hclge hisi_sas_v3_hw hisi_sas_main ahci libsas libahci hns3 hinic libata usb_storage hnae3 megaraid_sas scsi_transport_sas i2c_designware_platform i2c_designware_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_vs] [ 1352.854658] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854659] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854659] pstate: 80400089 (Nzcv daIf +PAN -UAO) [ 1352.854660] pc : queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854660] lr : print_cpu+0x414/0x690 [ 1352.854660] sp : ffff0001743afb80 [ 1352.854661] x29: ffff0001743afb80 x28: ffff805fcef6e880 [ 1352.854662] x27: 0000000000000000 x26: 0000000000000000 [ 1352.854662] x25: ffff000008cab000 x24: ffff000008cab000 [ 1352.854663] x23: 0000000000000000 x22: 0000000000000000 [ 1352.854664] x21: ffff000009478000 x20: 0000000000900001 [ 1352.854664] x19: ffff000009478d20 x18: ffffffffffffffff [ 1352.854665] x17: 0000000000000000 x16: 0000000000000000 [ 1352.854666] x15: ffff000009273708 x14: ffff00000947af60 [ 1352.854667] x13: ffff00000947abab x12: ffff00000929d000 [ 1352.854668] x11: 0000000000006fc8 x10: ffff00000947a1c0 [ 1352.854668] x9 : 0000000000000001 x8 : 0000000000000000 [ 1352.854669] x7 : ffff0000092737c8 x6 : ffff803fffc9e1c0 [ 1352.854670] x5 : 0000000000000000 x4 : ffff803fffc9e1c0 [ 1352.854671] x3 : ffff000008f5e000 x2 : 00000000001c0000 [ 1352.854671] x1 : 0000000000000000 x0 : ffff803fffc9e1c8 [ 1352.854672] Call trace: [ 1352.854673] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854673] print_cpu+0x414/0x690 [ 1352.854673] sysrq_sched_debug_show+0x50/0x80 [ 1352.854674] show_state_filter+0xc0/0xd0 [ 1352.854674] sysrq_handle_showstate+0x18/0x28 [ 1352.854674] __handle_sysrq+0xa0/0x190 [ 1352.854675] write_sysrq_trigger+0x70/0x88 [ 1352.854675] proc_reg_write+0x80/0xd8 [ 1352.854675] __vfs_write+0x60/0x190 [ 1352.854676] vfs_write+0xac/0x1c0 [ 1352.854676] ksys_write+0x74/0xf0 [ 1352.854676] __arm64_sys_write+0x24/0x30 [ 1352.854677] el0_svc_common+0x78/0x130 [ 1352.854677] el0_svc_handler+0x38/0x78 [ 1352.854677] el0_svc+0x8/0xc [ 1352.854678] Kernel panic - not syncing: Hard LOCKUP [ 1352.854679] CPU: 6 PID: 220569 Comm: sh Kdump: loaded Tainted: G OEL 4.19.90-vhulk2001.1.0.0026.aarch64 #1 [ 1352.854679] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.06 10/29/2019 [ 1352.854679] Call trace: [ 1352.854680] dump_backtrace+0x0/0x198 [ 1352.854680] show_stack+0x24/0x30 [ 1352.854681] dump_stack+0xa4/0xc4 [ 1352.854681] panic+0x130/0x304 [ 1352.854681] __stack_chk_fail+0x0/0x28 [ 1352.854682] watchdog_hardlockup_check+0x138/0x140 [ 1352.854682] sdei_watchdog_callback+0x20/0x30 [ 1352.854682] sdei_event_handler+0x50/0xf0 [ 1352.854683] __sdei_handler+0xd8/0x228 [ 1352.854683] __sdei_asm_handler+0xbc/0x134 [ 1352.854683] queued_spin_lock_slowpath+0x1d8/0x2e0 [ 1352.854684] print_cpu+0x414/0x690 [ 1352.854684] sysrq_sched_debug_show+0x50/0x80 [ 1352.854684] show_state_filter+0xc0/0xd0 [ 1352.854685] sysrq_handle_showstate+0x18/0x28 [ 1352.854685] __handle_sysrq+0xa0/0x190 [ 1352.854685] write_sysrq_trigger+0x70/0x88 [ 1352.854686] proc_reg_write+0x80/0xd8 [ 1352.854686] __vfs_write+0x60/0x190 [ 1352.854686] vfs_write+0xac/0x1c0 [ 1352.854687] ksys_write+0x74/0xf0 [ 1352.854687] __arm64_sys_write+0x24/0x30 [ 1352.854687] el0_svc_common+0x78/0x130 [ 1352.854688] el0_svc_handler+0x38/0x78 [ 1352.854688] el0_svc+0x8/0xc It is because there are many processes in the system. 'print_cpu()' aquires 'sched_debug_lock', print some information, and releases 'sched_debug_lock'. This procedure takes about 4 seconds in our testcase. When four cores concurrently print system info by sysrq, it will takes the last core 12 seconds to get the spinlock. This will cause a hardlockup. Signed-off-by: Kai Shen <shenkai8@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-By: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/tty/sysrq.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 959f9e121cc6..8d63e8fc0c25 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -1143,6 +1143,9 @@ int unregister_sysrq_key(int key, const struct sysrq_key_op *op_p) EXPORT_SYMBOL(unregister_sysrq_key); #ifdef CONFIG_PROC_FS + +static DEFINE_MUTEX(sysrq_mutex); + /* * writing 'C' to /proc/sysrq-trigger is like sysrq-C */ @@ -1154,7 +1157,10 @@ static ssize_t write_sysrq_trigger(struct file *file, const char __user *buf, if (get_user(c, buf)) return -EFAULT; + + mutex_lock(&sysrq_mutex); __handle_sysrq(c, false); + mutex_unlock(&sysrq_mutex); } return count; -- 2.22.0

From: Mauricio Faria de Oliveira <mfo@canonical.com> mainline inclusion from mainline-5.12-rc1 commit 4ceddce55eb35d15b0f87f5dcf6f0058fd15d3a4 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... --------------------------- There's an I/O error on fsync() in a detached loop device if it has been previously attached. The issue is write cache is enabled in the attach path in loop_configure() but it isn't disabled in the detach path; thus it remains enabled in the block device regardless of whether it is attached or not. Now fsync() can get an I/O request that will just be failed later in loop_queue_rq() as device's state is not 'Lo_bound'. So, disable write cache in the detach path. Do so based on the queue flag, not the loop device flag for read-only (used to enable) as the queue flag can be changed via sysfs even on read-only loop devices (e.g., losetup -r.) Test-case: # DEV=/dev/loop7 # IMG=/tmp/image # truncate --size 1M $IMG # losetup $DEV $IMG # losetup -d $DEV Before: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = -1 EIO (Input/output error) Warning: Error fsyncing/closing /dev/loop7: Input/output error [ 982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 After: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = 0 Co-developed-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/block/loop.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index a58084c2ed7c..cd13c0eb3f63 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1223,6 +1223,9 @@ static int __loop_clr_fd(struct loop_device *lo, bool release) goto out_unlock; } + if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags)) + blk_queue_write_cache(lo->lo_queue, false, false); + /* freeze request queue during the transition */ blk_mq_freeze_queue(lo->lo_queue); -- 2.22.0

From: Yejune Deng <yejune.deng@gmail.com> mainline inclusion from mainline-v5.12-rc1 commit 0bead8cd39b9c9c7c4e902018ccf129107ac50ef category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... --------------------------- The function io_remove_personalities() is very similar to io_unregister_personality(),so implement io_remove_personalities() calling io_unregister_personality(). Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/io_uring.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index fdbaaf579cc6..706ecf330edc 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8505,9 +8505,8 @@ static int io_uring_fasync(int fd, struct file *file, int on) return fasync_helper(fd, file, on, &ctx->cq_fasync); } -static int io_remove_personalities(int id, void *p, void *data) +static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id) { - struct io_ring_ctx *ctx = data; struct io_identity *iod; iod = idr_remove(&ctx->personality_idr, id); @@ -8515,7 +8514,17 @@ static int io_remove_personalities(int id, void *p, void *data) put_cred(iod->creds); if (refcount_dec_and_test(&iod->count)) kfree(iod); + return 0; } + + return -EINVAL; +} + +static int io_remove_personalities(int id, void *p, void *data) +{ + struct io_ring_ctx *ctx = data; + + io_unregister_personality(ctx, id); return 0; } @@ -9606,21 +9615,6 @@ static int io_register_personality(struct io_ring_ctx *ctx) return ret; } -static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id) -{ - struct io_identity *iod; - - iod = idr_remove(&ctx->personality_idr, id); - if (iod) { - put_cred(iod->creds); - if (refcount_dec_and_test(&iod->count)) - kfree(iod); - return 0; - } - - return -EINVAL; -} - static int io_register_restrictions(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args) { -- 2.22.0

From: "Matthew Wilcox (Oracle)" <willy@infradead.org> mainline inclusion from mainline-v5.12-rc3 commit 61cf93700fe6359552848ed5e3becba6cd760efa category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... --------------------------- You can't call idr_remove() from within a idr_for_each() callback, but you can call xa_erase() from an xa_for_each() loop, so switch the entire personality_idr from the IDR to the XArray. This manifests as a use-after-free as idr_for_each() attempts to walk the rest of the node after removing the last entry from it. Fixes: 071698e13ac6 ("io_uring: allow registering credentials") Cc: stable@vger.kernel.org # 5.6+ Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [Pavel: rebased (creds load was moved into io_init_req())] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ccff36e1375f2b0ebf73d957f037b43becc0dde.161521280... Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: fs/io_uring.c Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/io_uring.c | 59 ++++++++++++++++++++++++++------------------------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 706ecf330edc..01e1108c0d87 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -346,7 +346,8 @@ struct io_ring_ctx { struct idr io_buffer_idr; - struct idr personality_idr; + struct xarray personalities; + u32 pers_next; struct { unsigned cached_cq_tail; @@ -1212,7 +1213,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) init_completion(&ctx->ref_comp); init_completion(&ctx->sq_thread_comp); idr_init(&ctx->io_buffer_idr); - idr_init(&ctx->personality_idr); + xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1); mutex_init(&ctx->uring_lock); init_waitqueue_head(&ctx->wait); spin_lock_init(&ctx->completion_lock); @@ -6629,7 +6630,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, if (id) { struct io_identity *iod; - iod = idr_find(&ctx->personality_idr, id); + iod = xa_load(&ctx->personalities, id); if (unlikely(!iod)) return -EINVAL; refcount_inc(&iod->count); @@ -8445,7 +8446,6 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx) io_sqe_files_unregister(ctx); io_eventfd_unregister(ctx); io_destroy_buffers(ctx); - idr_destroy(&ctx->personality_idr); #if defined(CONFIG_UNIX) if (ctx->ring_sock) { @@ -8509,7 +8509,7 @@ static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id) { struct io_identity *iod; - iod = idr_remove(&ctx->personality_idr, id); + iod = xa_erase(&ctx->personalities, id); if (iod) { put_cred(iod->creds); if (refcount_dec_and_test(&iod->count)) @@ -8520,14 +8520,6 @@ static int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id) return -EINVAL; } -static int io_remove_personalities(int id, void *p, void *data) -{ - struct io_ring_ctx *ctx = data; - - io_unregister_personality(ctx, id); - return 0; -} - static void io_ring_exit_work(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, @@ -8554,6 +8546,9 @@ static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data) static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) { + unsigned long index; + struct io_identify *iod; + mutex_lock(&ctx->uring_lock); percpu_ref_kill(&ctx->refs); /* if force is set, the ring is going away. always drop after that */ @@ -8574,7 +8569,8 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) /* if we failed setting up the ctx, we might not have any rings */ io_iopoll_try_reap_events(ctx); - idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx); + xa_for_each(&ctx->personalities, index, iod) + io_unregister_personality(ctx, index); /* * Do this upfront, so we won't have a grace period where the ring @@ -9137,11 +9133,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } #ifdef CONFIG_PROC_FS -static int io_uring_show_cred(int id, void *p, void *data) +static int io_uring_show_cred(struct seq_file *m, unsigned int id, + const struct io_identity *iod) { - struct io_identity *iod = p; const struct cred *cred = iod->creds; - struct seq_file *m = data; struct user_namespace *uns = seq_user_ns(m); struct group_info *gi; kernel_cap_t cap; @@ -9209,9 +9204,13 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, (unsigned int) buf->len); } - if (has_lock && !idr_is_empty(&ctx->personality_idr)) { + if (has_lock && !xa_empty(&ctx->personalities)) { + unsigned long index; + const struct io_identity *iod; + seq_printf(m, "Personalities:\n"); - idr_for_each(&ctx->personality_idr, io_uring_show_cred, m); + xa_for_each(&ctx->personalities, index, iod) + io_uring_show_cred(m, index, iod); } seq_printf(m, "PollList:\n"); spin_lock_irq(&ctx->completion_lock); @@ -9597,21 +9596,23 @@ static int io_probe(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args) static int io_register_personality(struct io_ring_ctx *ctx) { - struct io_identity *id; + struct io_identity *iod; + u32 id; int ret; - id = kmalloc(sizeof(*id), GFP_KERNEL); - if (unlikely(!id)) + iod = kmalloc(sizeof(*iod), GFP_KERNEL); + if (unlikely(!iod)) return -ENOMEM; - io_init_identity(id); - id->creds = get_current_cred(); + io_init_identity(iod); + iod->creds = get_current_cred(); - ret = idr_alloc_cyclic(&ctx->personality_idr, id, 1, USHRT_MAX, GFP_KERNEL); - if (ret < 0) { - put_cred(id->creds); - kfree(id); - } + ret = xa_alloc_cyclic(&ctx->personalities, &id, (void *)iod, + XA_LIMIT(0, USHRT_MAX), &ctx->pers_next, GFP_KERNEL); + if (!ret) + return id; + put_cred(iod->creds); + kfree(iod); return ret; } -- 2.22.0

From: Jens Axboe <axboe@kernel.dk> mainline inclusion from mainline-v5.12-rc4 commit 9e15c3a0ced5a61f320b989072c24983cb1620c1 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... --------------------------- Like we did for the personality idr, convert the IO buffer idr to use XArray. This avoids a use-after-free on removal of entries, since idr doesn't like doing so from inside an iterator, and it nicely reduces the amount of code we need to support this feature. Fixes: 5a2e745d4d43 ("io_uring: buffer registration infrastructure") Cc: stable@vger.kernel.org Cc: Matthew Wilcox <willy@infradead.org> Cc: yangerkun <yangerkun@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: fs/io_uring.c Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- fs/io_uring.c | 43 +++++++++++++++---------------------------- 1 file changed, 15 insertions(+), 28 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 01e1108c0d87..72d1a34e1a35 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -344,7 +344,7 @@ struct io_ring_ctx { struct socket *ring_sock; #endif - struct idr io_buffer_idr; + struct xarray io_buffers; struct xarray personalities; u32 pers_next; @@ -1212,7 +1212,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->cq_overflow_list); init_completion(&ctx->ref_comp); init_completion(&ctx->sq_thread_comp); - idr_init(&ctx->io_buffer_idr); + xa_init_flags(&ctx->io_buffers, XA_FLAGS_ALLOC1); xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1); mutex_init(&ctx->uring_lock); init_waitqueue_head(&ctx->wait); @@ -2990,7 +2990,7 @@ static struct io_buffer *io_buffer_select(struct io_kiocb *req, size_t *len, lockdep_assert_held(&req->ctx->uring_lock); - head = idr_find(&req->ctx->io_buffer_idr, bgid); + head = xa_load(&req->ctx->io_buffers, bgid); if (head) { if (!list_empty(&head->list)) { kbuf = list_last_entry(&head->list, struct io_buffer, @@ -2998,7 +2998,7 @@ static struct io_buffer *io_buffer_select(struct io_kiocb *req, size_t *len, list_del(&kbuf->list); } else { kbuf = head; - idr_remove(&req->ctx->io_buffer_idr, bgid); + xa_erase(&req->ctx->io_buffers, bgid); } if (*len > kbuf->len) *len = kbuf->len; @@ -3960,7 +3960,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer *buf, } i++; kfree(buf); - idr_remove(&ctx->io_buffer_idr, bgid); + xa_erase(&ctx->io_buffers, bgid); return i; } @@ -3978,7 +3978,7 @@ static int io_remove_buffers(struct io_kiocb *req, bool force_nonblock, lockdep_assert_held(&ctx->uring_lock); ret = -ENOENT; - head = idr_find(&ctx->io_buffer_idr, p->bgid); + head = xa_load(&ctx->io_buffers, p->bgid); if (head) ret = __io_remove_buffers(ctx, head, p->bgid, p->nbufs); if (ret < 0) @@ -4069,21 +4069,14 @@ static int io_provide_buffers(struct io_kiocb *req, bool force_nonblock, lockdep_assert_held(&ctx->uring_lock); - list = head = idr_find(&ctx->io_buffer_idr, p->bgid); + list = head = xa_load(&ctx->io_buffers, p->bgid); ret = io_add_buffers(p, &head); - if (ret < 0) - goto out; - - if (!list) { - ret = idr_alloc(&ctx->io_buffer_idr, head, p->bgid, p->bgid + 1, - GFP_KERNEL); - if (ret < 0) { + if (ret >= 0 && !list) { + ret = xa_insert(&ctx->io_buffers, p->bgid, head, GFP_KERNEL); + if (ret < 0) __io_remove_buffers(ctx, head, p->bgid, -1U); - goto out; - } } -out: if (ret < 0) req_set_fail_links(req); @@ -8411,19 +8404,13 @@ static int io_eventfd_unregister(struct io_ring_ctx *ctx) return -ENXIO; } -static int __io_destroy_buffers(int id, void *p, void *data) -{ - struct io_ring_ctx *ctx = data; - struct io_buffer *buf = p; - - __io_remove_buffers(ctx, buf, id, -1U); - return 0; -} - static void io_destroy_buffers(struct io_ring_ctx *ctx) { - idr_for_each(&ctx->io_buffer_idr, __io_destroy_buffers, ctx); - idr_destroy(&ctx->io_buffer_idr); + struct io_buffer *buf; + unsigned long index; + + xa_for_each(&ctx->io_buffers, index, buf) + __io_remove_buffers(ctx, buf, index, -1U); } static void io_ring_ctx_free(struct io_ring_ctx *ctx) -- 2.22.0

From: Yang Jihong <yangjihong1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA -------------------------------- Commit 3a95200d3f89a ("arm_pmu: Change API to support 64bit counter values") changes the input "value" type from 32-bit to 64-bit, which introduces the following problem: ARMv7 PMU counters is 32-bit width, in big-endian mode, write counter uses high 32-bit, which writes an incorrect value. Before: Performance counter stats for 'ls': 2.22 msec task-clock # 0.675 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 49 page-faults # 0.022 M/sec 2150476593 cycles # 966.663 GHz 2148588788 instructions # 1.00 insn per cycle 2147745484 branches # 965435.074 M/sec 2147508540 branch-misses # 99.99% of all branches None of the above hw event counters are correct. After: Performance counter stats for 'ls': 2.09 msec task-clock # 0.681 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 46 page-faults # 0.022 M/sec 2807301 cycles # 1.344 GHz 1060159 instructions # 0.38 insn per cycle 250496 branches # 119.914 M/sec 23192 branch-misses # 9.26% of all branches Solution: "value" is forcibly converted to 32-bit type before being written to PMU register. Fixes: 3a95200d3f89a ("arm_pmu: Change API to support 64bit counter values") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Reviewed-by: Kuohai Xu <xukuohai@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/perf_event_v7.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c index 2924d7910b10..eb2190477da1 100644 --- a/arch/arm/kernel/perf_event_v7.c +++ b/arch/arm/kernel/perf_event_v7.c @@ -773,10 +773,10 @@ static inline void armv7pmu_write_counter(struct perf_event *event, u64 value) pr_err("CPU%u writing wrong counter %d\n", smp_processor_id(), idx); } else if (idx == ARMV7_IDX_CYCLE_COUNTER) { - asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (value)); + asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" ((u32)value)); } else { armv7_pmnc_select_counter(idx); - asm volatile("mcr p15, 0, %0, c9, c13, 2" : : "r" (value)); + asm volatile("mcr p15, 0, %0, c9, c13, 2" : : "r" ((u32)value)); } } -- 2.22.0

From: JeongHyeon Lee <jhs2.lee@samsung.com> mainline inclusion from mainline-v5.13-rc1 commit 219a9b5e738b75a6a5e9effe1d72f60037a2f131 category: bugfix issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ----------------------------------------------- If more than one one handling mode is requested during DM verity table load, the last requested mode will be used. Change this to impose more strict checking so that the table load will fail if more than one error handling mode is requested. Signed-off-by: JeongHyeon Lee <jhs2.lee@samsung.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Luo Meng <luomeng12@huawei.com> Reviewed-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- drivers/md/dm-verity-target.c | 40 +++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index 808a98ef624c..d3e76aefc1a6 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -893,6 +893,28 @@ static int verity_alloc_zero_digest(struct dm_verity *v) return r; } +static inline bool verity_is_verity_mode(const char *arg_name) +{ + return (!strcasecmp(arg_name, DM_VERITY_OPT_LOGGING) || + !strcasecmp(arg_name, DM_VERITY_OPT_RESTART) || + !strcasecmp(arg_name, DM_VERITY_OPT_PANIC)); +} + +static int verity_parse_verity_mode(struct dm_verity *v, const char *arg_name) +{ + if (v->mode) + return -EINVAL; + + if (!strcasecmp(arg_name, DM_VERITY_OPT_LOGGING)) + v->mode = DM_VERITY_MODE_LOGGING; + else if (!strcasecmp(arg_name, DM_VERITY_OPT_RESTART)) + v->mode = DM_VERITY_MODE_RESTART; + else if (!strcasecmp(arg_name, DM_VERITY_OPT_PANIC)) + v->mode = DM_VERITY_MODE_PANIC; + + return 0; +} + static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v, struct dm_verity_sig_opts *verify_args) { @@ -916,16 +938,12 @@ static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v, arg_name = dm_shift_arg(as); argc--; - if (!strcasecmp(arg_name, DM_VERITY_OPT_LOGGING)) { - v->mode = DM_VERITY_MODE_LOGGING; - continue; - - } else if (!strcasecmp(arg_name, DM_VERITY_OPT_RESTART)) { - v->mode = DM_VERITY_MODE_RESTART; - continue; - - } else if (!strcasecmp(arg_name, DM_VERITY_OPT_PANIC)) { - v->mode = DM_VERITY_MODE_PANIC; + if (verity_is_verity_mode(arg_name)) { + r = verity_parse_verity_mode(v, arg_name); + if (r) { + ti->error = "Conflicting error handling parameters"; + return r; + } continue; } else if (!strcasecmp(arg_name, DM_VERITY_OPT_IGN_ZEROES)) { @@ -1242,7 +1260,7 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv) static struct target_type verity_target = { .name = "verity", - .version = {1, 7, 0}, + .version = {1, 8, 0}, .module = THIS_MODULE, .ctr = verity_ctr, .dtr = verity_dtr, -- 2.22.0

From: Lorenzo Stoakes <lstoakes@gmail.com> mainline inclusion from mainline-v5.11-rc1 commit 470c61d70299b1826f56ff5fede10786798e3c14 category: bugfix issue: #I3ZXZF CVE: NA Refactor to adapt to commit ("mm/page_alloc: fix counting of managed_pages") Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- setup_per_zone_lowmem_reserve() iterates through each zone setting zone->lowmem_reserve[j] = 0 (where j is the zone's index) then iterates backwards through all preceding zones, setting lower_zone->lowmem_reserve[j] = sum(managed pages of higher zones) / lowmem_reserve_ratio[idx] for each (where idx is the lower zone's index). If the lower zone has no managed pages or its ratio is 0 then all of its lowmem_reserve[] entries are effectively zeroed. As these arrays are only assigned here and all lowmem_reserve[] entries for index < this zone's index are implicitly assumed to be 0 (as these are specifically output in show_free_areas() and zoneinfo_show_print() for example) there is no need to additionally zero index == this zone's index too. This patch avoids zeroing unnecessarily. Rather than iterating through zones and setting lowmem_reserve[j] for each lower zone this patch reverse the process and populates each zone's lowmem_reserve[] values in ascending order. This clarifies what is going on especially in the case of zero managed pages or ratio which is now explicitly shown to clear these values. Link: https://lkml.kernel.org/r/20201129162758.115907-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- mm/page_alloc.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 423f97a1120c..dc4cc71235aa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7786,31 +7786,24 @@ static void calculate_totalreserve_pages(void) static void setup_per_zone_lowmem_reserve(void) { struct pglist_data *pgdat; - enum zone_type j, idx; + enum zone_type i, j; for_each_online_pgdat(pgdat) { - for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone *zone = pgdat->node_zones + j; - unsigned long managed_pages = zone_managed_pages(zone); - - zone->lowmem_reserve[j] = 0; - - idx = j; - while (idx) { - struct zone *lower_zone; - - idx--; - lower_zone = pgdat->node_zones + idx; - - if (!sysctl_lowmem_reserve_ratio[idx] || - !zone_managed_pages(lower_zone)) { - lower_zone->lowmem_reserve[j] = 0; - continue; + for (i = 0; i < MAX_NR_ZONES - 1; i++) { + struct zone *zone = &pgdat->node_zones[i]; + int ratio = sysctl_lowmem_reserve_ratio[i]; + bool clear = !ratio || !zone_managed_pages(zone); + unsigned long managed_pages = 0; + + for (j = i + 1; j < MAX_NR_ZONES; j++) { + if (clear) { + zone->lowmem_reserve[j] = 0; } else { - lower_zone->lowmem_reserve[j] = - managed_pages / sysctl_lowmem_reserve_ratio[idx]; + struct zone *upper_zone = &pgdat->node_zones[j]; + + managed_pages += zone_managed_pages(upper_zone); + zone->lowmem_reserve[j] = managed_pages / ratio; } - managed_pages += zone_managed_pages(lower_zone); } } } -- 2.22.0

From: Liu Shixin <liushixin2@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------- The commit f63661566fad (mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty) clear out zone->lowmem_reserve[] if zone is empty. But when zone is not empty and sysctl_lowmem_reserve_ratio[i] is set to zero, zone_managed_pages(zone) is not counted in the managed_pages either. This is inconsistent with the description of lowmen_reserve, so fix it. Fixes: f63661566fad ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty") Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- mm/page_alloc.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dc4cc71235aa..c72ecf5c9e36 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7796,14 +7796,14 @@ static void setup_per_zone_lowmem_reserve(void) unsigned long managed_pages = 0; for (j = i + 1; j < MAX_NR_ZONES; j++) { - if (clear) { - zone->lowmem_reserve[j] = 0; - } else { - struct zone *upper_zone = &pgdat->node_zones[j]; + struct zone *upper_zone = &pgdat->node_zones[j]; + + managed_pages += zone_managed_pages(upper_zone); - managed_pages += zone_managed_pages(upper_zone); + if (clear) + zone->lowmem_reserve[j] = 0; + else zone->lowmem_reserve[j] = managed_pages / ratio; - } } } } -- 2.22.0

From: Oscar Salvador <osalvador@suse.de> mainline inclusion from mainline-v5.11-rc1 commit 3f4b815a439adfb8f238335612c4b28bc10084d8 category: bugfix issue: #I3ZXZF CVE: NA During LTP test, testcase 'move_pages12' failed. This is because __soft_offline_page() will return -EIO when fail to migrate page. This is not an error for 'move_pages12' but LTP treats 'return -EIO' as an error. Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Currently, we return -EIO when we fail to migrate the page. Migrations' failures are rather transient as they can happen due to several reasons, e.g: high page refcount bump, mapping->migrate_page failing etc. All meaning that at that time the page could not be migrated, but that has nothing to do with an EIO error. Let us return -EBUSY instead, as we do in case we failed to isolate the page. While are it, let us remove the "ret" print as its value does not change. Link: https://lkml.kernel.org/r/20201209092818.30417-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shixin Liu <liushixin2@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- mm/memory-failure.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 25fb82320e3d..01445ddff58d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1856,11 +1856,11 @@ static int __soft_offline_page(struct page *page) pr_info("soft offline: %#lx: %s migration failed %d, type %lx (%pGp)\n", pfn, msg_page[huge], ret, page->flags, &page->flags); if (ret > 0) - ret = -EIO; + ret = -EBUSY; } } else { - pr_info("soft offline: %#lx: %s isolation failed: %d, page count %d, type %lx (%pGp)\n", - pfn, msg_page[huge], ret, page_count(page), page->flags, &page->flags); + pr_info("soft offline: %#lx: %s isolation failed, page count %d, type %lx (%pGp)\n", + pfn, msg_page[huge], page_count(page), page->flags, &page->flags); ret = -EBUSY; } return ret; -- 2.22.0

From: Kefeng Wang <wangkefeng.wang@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA Reference: https://lore.kernel.org/linux-mm/20210417075946.181402-1-wangkefeng.wang@hua... -------------------------------- Move HOLES_IN_ZONE into mm/Kconfig, select it if architecture needs this feature. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm64/Kconfig | 4 +--- arch/ia64/Kconfig | 5 +---- arch/mips/Kconfig | 3 --- mm/Kconfig | 3 +++ 4 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5e5cf3af6351..c505c3b5c97e 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -183,6 +183,7 @@ config ARM64 select HAVE_KPROBES select HAVE_KRETPROBES select HAVE_GENERIC_VDSO + select HOLES_IN_ZONE select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select IRQ_FORCED_THREADING @@ -1022,9 +1023,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK def_bool y depends on NUMA -config HOLES_IN_ZONE - def_bool y - source "kernel/Kconfig.hz" config ARCH_SUPPORTS_DEBUG_PAGEALLOC diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 39b25a5a591b..cc0d4ce7a045 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -330,6 +330,7 @@ config NODES_SHIFT config VIRTUAL_MEM_MAP bool "Virtual mem map" depends on !SPARSEMEM + select HOLES_IN_ZONE default y help Say Y to compile the kernel with support for a virtual mem map. @@ -338,10 +339,6 @@ config VIRTUAL_MEM_MAP require the DISCONTIGMEM option for your machine. If you are unsure, say Y. -config HOLES_IN_ZONE - bool - default y if VIRTUAL_MEM_MAP - config HAVE_ARCH_EARLY_PFN_TO_NID def_bool NUMA && SPARSEMEM diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 1917ccd39256..7e6b0168f9a5 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1193,9 +1193,6 @@ config HAVE_PLAT_MEMCPY config ISA_DMA_API bool -config HOLES_IN_ZONE - bool - config SYS_SUPPORTS_RELOCATABLE bool help diff --git a/mm/Kconfig b/mm/Kconfig index 390165ffbb0f..9d606d258ab4 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -129,6 +129,9 @@ config HAVE_FAST_GUP depends on MMU bool +config HOLES_IN_ZONE + bool + # Don't discard allocated memory used to track "memory" and "reserved" memblocks # after early boot, so it can still be used to test for validity of memory. # Also, memblocks are updated with memory hot(un)plug. -- 2.22.0

From: Linus Walleij <linus.walleij@linaro.org> mainline inclusion from mainline-5.11-rc1 commit d5d44e7e3507b0ad868f68e0c5bca6a57afa1b8b category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Disable instrumentation for arch/arm/boot/compressed/* since that code is executed before the kernel has even set up its mappings and definately out of scope for KASan. Disable instrumentation of arch/arm/vdso/* because that code is not linked with the kernel image, so the KASan management code would fail to link. Disable instrumentation of arch/arm/mm/physaddr.c. See commit ec6d06efb0ba ("arm64: Add support for CONFIG_DEBUG_VIRTUAL") for more details. Disable kasan check in the function unwind_pop_register because it does not matter that kasan checks failed when unwind_pop_register() reads the stack memory of a task. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit d5d44e7e3507b0ad868f68e0c5bca6a57afa1b8b) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/Makefile | 1 + arch/arm/kernel/unwind.c | 6 +++++- arch/arm/mm/Makefile | 2 ++ arch/arm/vdso/Makefile | 2 ++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 0d6ee56f5831..54307db7854d 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -24,6 +24,7 @@ OBJS += hyp-stub.o endif GCOV_PROFILE := n +KASAN_SANITIZE := n # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in. KCOV_INSTRUMENT := n diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c index d2bd0df2318d..f35eb584a18a 100644 --- a/arch/arm/kernel/unwind.c +++ b/arch/arm/kernel/unwind.c @@ -236,7 +236,11 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl, if (*vsp >= (unsigned long *)ctrl->sp_high) return -URC_FAILURE; - ctrl->vrs[reg] = *(*vsp)++; + /* Use READ_ONCE_NOCHECK here to avoid this memory access + * from being tracked by KASAN. + */ + ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp)); + (*vsp)++; return URC_OK; } diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 7cb1699fbfc4..99699c32d8a5 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -7,6 +7,7 @@ obj-y := extable.o fault.o init.o iomap.o obj-y += dma-mapping$(MMUEXT).o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ mmap.o pgd.o mmu.o pageattr.o +KASAN_SANITIZE_mmu.o := n ifneq ($(CONFIG_MMU),y) obj-y += nommu.o @@ -16,6 +17,7 @@ endif obj-$(CONFIG_ARM_PTDUMP_CORE) += dump.o obj-$(CONFIG_ARM_PTDUMP_DEBUGFS) += ptdump_debugfs.o obj-$(CONFIG_MODULES) += proc-syms.o +KASAN_SANITIZE_physaddr.o := n obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile index 150ce6e6a5d3..b558bee0e1f6 100644 --- a/arch/arm/vdso/Makefile +++ b/arch/arm/vdso/Makefile @@ -42,6 +42,8 @@ GCOV_PROFILE := n # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in. KCOV_INSTRUMENT := n +KASAN_SANITIZE := n + # Force dependency $(obj)/vdso.o : $(obj)/vdso.so -- 2.22.0

From: Linus Walleij <linus.walleij@linaro.org> mainline inclusion from mainline-5.11-rc1 commit d6d51a96c7d63b7450860a3037f2d62388286a52 category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Functions like memset()/memmove()/memcpy() do a lot of memory accesses. If a bad pointer is passed to one of these functions it is important to catch this. Compiler instrumentation cannot do this since these functions are written in assembly. KASan replaces these memory functions with instrumented variants. The original functions are declared as weak symbols so that the strong definitions in mm/kasan/kasan.c can replace them. The original functions have aliases with a '__' prefix in their name, so we can call the non-instrumented variant if needed. We must use __memcpy()/__memset() in place of memcpy()/memset() when we copy .data to RAM and when we clear .bss, because kasan_early_init cannot be called before the initialization of .data and .bss. For the kernel compression and EFI libstub's custom string libraries we need a special quirk: even if these are built without KASan enabled, they rely on the global headers for their custom string libraries, which means that e.g. memcpy() will be defined to __memcpy() and we get link failures. Since these implementations are written i C rather than assembly we use e.g. __alias(memcpy) to redirected any users back to the local implementation. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: Russell King - ARM Linux <rmk+kernel@armlinux.org.uk> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit d6d51a96c7d63b7450860a3037f2d62388286a52) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/string.c | 19 +++++++++++++++++++ arch/arm/include/asm/string.h | 26 ++++++++++++++++++++++++++ arch/arm/kernel/head-common.S | 4 ++-- arch/arm/lib/memcpy.S | 3 +++ arch/arm/lib/memmove.S | 5 ++++- arch/arm/lib/memset.S | 3 +++ 6 files changed, 57 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/compressed/string.c b/arch/arm/boot/compressed/string.c index ade5079bebbf..8c0fa276d994 100644 --- a/arch/arm/boot/compressed/string.c +++ b/arch/arm/boot/compressed/string.c @@ -7,6 +7,25 @@ #include <linux/string.h> +/* + * The decompressor is built without KASan but uses the same redirects as the + * rest of the kernel when CONFIG_KASAN is enabled, defining e.g. memcpy() + * to __memcpy() but since we are not linking with the main kernel string + * library in the decompressor, that will lead to link failures. + * + * Undefine KASan's versions, define the wrapped functions and alias them to + * the right names so that when e.g. __memcpy() appear in the code, it will + * still be linked to this local version of memcpy(). + */ +#ifdef CONFIG_KASAN +#undef memcpy +#undef memmove +#undef memset +void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy); +void *__memmove(void *__dest, __const void *__src, size_t count) __alias(memmove); +void *__memset(void *s, int c, size_t count) __alias(memset); +#endif + void *memcpy(void *__dest, __const void *__src, size_t __n) { int i = 0; diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h index 111a1d8a41dd..6c607c68f3ad 100644 --- a/arch/arm/include/asm/string.h +++ b/arch/arm/include/asm/string.h @@ -5,6 +5,9 @@ /* * We don't do inline string functions, since the * optimised inline asm versions are not small. + * + * The __underscore versions of some functions are for KASan to be able + * to replace them with instrumented versions. */ #define __HAVE_ARCH_STRRCHR @@ -15,15 +18,18 @@ extern char * strchr(const char * s, int c); #define __HAVE_ARCH_MEMCPY extern void * memcpy(void *, const void *, __kernel_size_t); +extern void *__memcpy(void *dest, const void *src, __kernel_size_t n); #define __HAVE_ARCH_MEMMOVE extern void * memmove(void *, const void *, __kernel_size_t); +extern void *__memmove(void *dest, const void *src, __kernel_size_t n); #define __HAVE_ARCH_MEMCHR extern void * memchr(const void *, int, __kernel_size_t); #define __HAVE_ARCH_MEMSET extern void * memset(void *, int, __kernel_size_t); +extern void *__memset(void *s, int c, __kernel_size_t n); #define __HAVE_ARCH_MEMSET32 extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); @@ -39,4 +45,24 @@ static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n) return __memset64(p, v, n * 8, v >> 32); } +/* + * For files that are not instrumented (e.g. mm/slub.c) we + * must use non-instrumented versions of the mem* + * functions named __memcpy() etc. All such kernel code has + * been tagged with KASAN_SANITIZE_file.o = n, which means + * that the address sanitization argument isn't passed to the + * compiler, and __SANITIZE_ADDRESS__ is not set. As a result + * these defines kick in. + */ +#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__) +#define memcpy(dst, src, len) __memcpy(dst, src, len) +#define memmove(dst, src, len) __memmove(dst, src, len) +#define memset(s, c, n) __memset(s, c, n) + +#ifndef __NO_FORTIFY +#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */ +#endif + +#endif + #endif diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 4a3982812a40..6840c7c60a85 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -95,7 +95,7 @@ __mmap_switched: THUMB( ldmia r4!, {r0, r1, r2, r3} ) THUMB( mov sp, r3 ) sub r2, r2, r1 - bl memcpy @ copy .data to RAM + bl __memcpy @ copy .data to RAM #endif ARM( ldmia r4!, {r0, r1, sp} ) @@ -103,7 +103,7 @@ __mmap_switched: THUMB( mov sp, r3 ) sub r2, r1, r0 mov r1, #0 - bl memset @ clear .bss + bl __memset @ clear .bss ldmia r4, {r0, r1, r2, r3} str r9, [r0] @ Save processor ID diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S index 09a333153dc6..ad4625d16e11 100644 --- a/arch/arm/lib/memcpy.S +++ b/arch/arm/lib/memcpy.S @@ -58,6 +58,8 @@ /* Prototype: void *memcpy(void *dest, const void *src, size_t n); */ +.weak memcpy +ENTRY(__memcpy) ENTRY(mmiocpy) ENTRY(memcpy) @@ -65,3 +67,4 @@ ENTRY(memcpy) ENDPROC(memcpy) ENDPROC(mmiocpy) +ENDPROC(__memcpy) diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S index b50e5770fb44..fd123ea5a5a4 100644 --- a/arch/arm/lib/memmove.S +++ b/arch/arm/lib/memmove.S @@ -24,12 +24,14 @@ * occurring in the opposite direction. */ +.weak memmove +ENTRY(__memmove) ENTRY(memmove) UNWIND( .fnstart ) subs ip, r0, r1 cmphi r2, ip - bls memcpy + bls __memcpy stmfd sp!, {r0, r4, lr} UNWIND( .fnend ) @@ -222,3 +224,4 @@ ENTRY(memmove) 18: backward_copy_shift push=24 pull=8 ENDPROC(memmove) +ENDPROC(__memmove) diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S index 6ca4535c47fb..0e7ff0423f50 100644 --- a/arch/arm/lib/memset.S +++ b/arch/arm/lib/memset.S @@ -13,6 +13,8 @@ .text .align 5 +.weak memset +ENTRY(__memset) ENTRY(mmioset) ENTRY(memset) UNWIND( .fnstart ) @@ -132,6 +134,7 @@ UNWIND( .fnstart ) UNWIND( .fnend ) ENDPROC(memset) ENDPROC(mmioset) +ENDPROC(__memset) ENTRY(__memset32) UNWIND( .fnstart ) -- 2.22.0

From: Linus Walleij <linus.walleij@linaro.org> mainline inclusion from mainline-5.11-rc1 commit c12366ba441da2f6f2b915410aca2b5b39c16514 category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB addressable by a 32bit architecture) out of the virtual address space to use as shadow memory for KASan as follows: +----+ 0xffffffff | | | | |-> Static kernel image (vmlinux) BSS and page table | |/ +----+ PAGE_OFFSET | | | | |-> Loadable kernel modules virtual address space area | |/ +----+ MODULES_VADDR = KASAN_SHADOW_END | | | | |-> The shadow area of kernel virtual address. | |/ +----+-> TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the | | shadow address of MODULES_VADDR | | | | | | | | |-> The user space area in lowmem. The kernel address | | | sanitizer do not use this space, nor does it map it. | | | | | | | | | | | | | |/ ------ 0 0 .. TASK_SIZE is the memory that can be used by shared userspace/kernelspace. It us used for userspace processes and for passing parameters and memory buffers in system calls etc. We do not need to shadow this area. KASAN_SHADOW_START: This value begins with the MODULE_VADDR's shadow address. It is the start of kernel virtual space. Since we have modules to load, we need to cover also that area with shadow memory so we can find memory bugs in modules. KASAN_SHADOW_END This value is the 0x100000000's shadow address: the mapping that would be after the end of the kernel memory at 0xffffffff. It is the end of kernel address sanitizer shadow area. It is also the start of the module area. KASAN_SHADOW_OFFSET: This value is used to map an address to the corresponding shadow address by the following formula: shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET; As you would expect, >> 3 is equal to dividing by 8, meaning each byte in the shadow memory covers 8 bytes of kernel memory, so one bit shadow memory per byte of kernel memory is used. The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending on the VMSPLIT layout of the system: the kernel and userspace can split up lowmem in different ways according to needs, so we calculate the shadow offset depending on this. When kasan is enabled, the definition of TASK_SIZE is not an 8-bit rotated constant, so we need to modify the TASK_SIZE access code in the *.s file. The kernel and modules may use different amounts of memory, according to the VMSPLIT configuration, which in turn determines the PAGE_OFFSET. We use the following KASAN_SHADOW_OFFSETs depending on how the virtual memory is split up: - 0x1f000000 if we have 1G userspace / 3G kernelspace split: - The kernel address space is 3G (0xc0000000) - PAGE_OFFSET is then set to 0x40000000 so the kernel static image (vmlinux) uses addresses 0x40000000 .. 0xffffffff - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0x3f000000 so the modules use addresses 0x3f000000 .. 0x3fffffff - So the addresses 0x3f000000 .. 0xffffffff need to be covered with shadow memory. That is 0xc1000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x18200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0x26e00000, to KASAN_SHADOW_END at 0x3effffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0x3f000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0x26e00000 = (0x3f000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0x26e00000 - (0x3f000000 >> 3) KASAN_SHADOW_OFFSET = 0x26e00000 - 0x07e00000 KASAN_SHADOW_OFFSET = 0x1f000000 - 0x5f000000 if we have 2G userspace / 2G kernelspace split: - The kernel space is 2G (0x80000000) - PAGE_OFFSET is set to 0x80000000 so the kernel static image uses 0x80000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0x7f000000 so the modules use addresses 0x7f000000 .. 0x7fffffff - So the addresses 0x7f000000 .. 0xffffffff need to be covered with shadow memory. That is 0x81000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x10200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0x6ee00000, to KASAN_SHADOW_END at 0x7effffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0x7f000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0x6ee00000 = (0x7f000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0x6ee00000 - (0x7f000000 >> 3) KASAN_SHADOW_OFFSET = 0x6ee00000 - 0x0fe00000 KASAN_SHADOW_OFFSET = 0x5f000000 - 0x9f000000 if we have 3G userspace / 1G kernelspace split, and this is the default split for ARM: - The kernel address space is 1GB (0x40000000) - PAGE_OFFSET is set to 0xc0000000 so the kernel static image uses 0xc0000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0xbf000000 so the modules use addresses 0xbf000000 .. 0xbfffffff - So the addresses 0xbf000000 .. 0xffffffff need to be covered with shadow memory. That is 0x41000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x08200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0xb6e00000, to KASAN_SHADOW_END at 0xbfffffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0xbf000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0xb6e00000 = (0xbf000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0xb6e00000 - (0xbf000000 >> 3) KASAN_SHADOW_OFFSET = 0xb6e00000 - 0x17e00000 KASAN_SHADOW_OFFSET = 0x9f000000 - 0x8f000000 if we have 3G userspace / 1G kernelspace with full 1 GB low memory (VMSPLIT_3G_OPT): - The kernel address space is 1GB (0x40000000) - PAGE_OFFSET is set to 0xb0000000 so the kernel static image uses 0xb0000000 .. 0xffffffff. - On top of that we have the MODULES_VADDR which under the worst case (using ARM instructions) is PAGE_OFFSET - 16M (0x01000000) = 0xaf000000 so the modules use addresses 0xaf000000 .. 0xaffffff - So the addresses 0xaf000000 .. 0xffffffff need to be covered with shadow memory. That is 0x51000000 bytes of memory. - 1/8 of that is needed for its shadow memory, so 0x0a200000 bytes of shadow memory is needed. We "steal" that from the remaining lowmem. - The KASAN_SHADOW_START becomes 0xa4e00000, to KASAN_SHADOW_END at 0xaeffffff. - Now we can calculate the KASAN_SHADOW_OFFSET for any kernel address as 0xaf000000 needs to map to the first byte of shadow memory and 0xffffffff needs to map to the last byte of shadow memory. Since: SHADOW_ADDR = (address >> 3) + KASAN_SHADOW_OFFSET 0xa4e00000 = (0xaf000000 >> 3) + KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET = 0xa4e00000 - (0xaf000000 >> 3) KASAN_SHADOW_OFFSET = 0xa4e00000 - 0x15e00000 KASAN_SHADOW_OFFSET = 0x8f000000 - The default value of 0xffffffff for KASAN_SHADOW_OFFSET is an error value. We should always match one of the above shadow offsets. When we do this, TASK_SIZE will sometimes get a bit odd values that will not fit into immediate mov assembly instructions. To account for this, we need to rewrite some assembly using TASK_SIZE like this: - mov r1, #TASK_SIZE + ldr r1, =TASK_SIZE or - cmp r4, #TASK_SIZE + ldr r0, =TASK_SIZE + cmp r4, r0 this is done to avoid the immediate #TASK_SIZE that need to fit into a limited number of bits. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit c12366ba441da2f6f2b915410aca2b5b39c16514) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- Documentation/arm/memory.rst | 5 ++ arch/arm/Kconfig | 9 ++++ arch/arm/include/asm/kasan_def.h | 81 ++++++++++++++++++++++++++++++ arch/arm/include/asm/memory.h | 5 ++ arch/arm/include/asm/uaccess-asm.h | 2 +- arch/arm/kernel/entry-armv.S | 3 +- arch/arm/kernel/entry-common.S | 9 ++-- arch/arm/mm/mmu.c | 18 +++++++ 8 files changed, 127 insertions(+), 5 deletions(-) create mode 100644 arch/arm/include/asm/kasan_def.h diff --git a/Documentation/arm/memory.rst b/Documentation/arm/memory.rst index 34bb23c44a71..0cb1e2938823 100644 --- a/Documentation/arm/memory.rst +++ b/Documentation/arm/memory.rst @@ -77,6 +77,11 @@ MODULES_VADDR MODULES_END-1 Kernel module space Kernel modules inserted via insmod are placed here using dynamic mappings. +TASK_SIZE MODULES_VADDR-1 KASAn shadow memory when KASan is in use. + The range from MODULES_VADDR to the top + of the memory is shadowed here with 1 bit + per byte of memory. + 00001000 TASK_SIZE-1 User space mappings Per-thread mappings are placed here via the mmap() system call. diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 002e0cf025f5..fc530fad13b2 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1324,6 +1324,15 @@ config PAGE_OFFSET default 0xB0000000 if VMSPLIT_3G_OPT default 0xC0000000 +config KASAN_SHADOW_OFFSET + hex + depends on KASAN + default 0x1f000000 if PAGE_OFFSET=0x40000000 + default 0x5f000000 if PAGE_OFFSET=0x80000000 + default 0x9f000000 if PAGE_OFFSET=0xC0000000 + default 0x8f000000 if PAGE_OFFSET=0xB0000000 + default 0xffffffff + config NR_CPUS int "Maximum number of CPUs (2-32)" range 2 32 diff --git a/arch/arm/include/asm/kasan_def.h b/arch/arm/include/asm/kasan_def.h new file mode 100644 index 000000000000..5739605aa7cf --- /dev/null +++ b/arch/arm/include/asm/kasan_def.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * arch/arm/include/asm/kasan_def.h + * + * Copyright (c) 2018 Huawei Technologies Co., Ltd. + * + * Author: Abbott Liu <liuwenliang@huawei.com> + */ + +#ifndef __ASM_KASAN_DEF_H +#define __ASM_KASAN_DEF_H + +#ifdef CONFIG_KASAN + +/* + * Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for + * the Arm kernel address sanitizer. We are "stealing" lowmem (the 4GB + * addressable by a 32bit architecture) out of the virtual address + * space to use as shadow memory for KASan as follows: + * + * +----+ 0xffffffff + * | | \ + * | | |-> Static kernel image (vmlinux) BSS and page table + * | |/ + * +----+ PAGE_OFFSET + * | | \ + * | | |-> Loadable kernel modules virtual address space area + * | |/ + * +----+ MODULES_VADDR = KASAN_SHADOW_END + * | | \ + * | | |-> The shadow area of kernel virtual address. + * | |/ + * +----+-> TASK_SIZE (start of kernel space) = KASAN_SHADOW_START the + * | |\ shadow address of MODULES_VADDR + * | | | + * | | | + * | | |-> The user space area in lowmem. The kernel address + * | | | sanitizer do not use this space, nor does it map it. + * | | | + * | | | + * | | | + * | | | + * | |/ + * ------ 0 + * + * 1) KASAN_SHADOW_START + * This value begins with the MODULE_VADDR's shadow address. It is the + * start of kernel virtual space. Since we have modules to load, we need + * to cover also that area with shadow memory so we can find memory + * bugs in modules. + * + * 2) KASAN_SHADOW_END + * This value is the 0x100000000's shadow address: the mapping that would + * be after the end of the kernel memory at 0xffffffff. It is the end of + * kernel address sanitizer shadow area. It is also the start of the + * module area. + * + * 3) KASAN_SHADOW_OFFSET: + * This value is used to map an address to the corresponding shadow + * address by the following formula: + * + * shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET; + * + * As you would expect, >> 3 is equal to dividing by 8, meaning each + * byte in the shadow memory covers 8 bytes of kernel memory, so one + * bit shadow memory per byte of kernel memory is used. + * + * The KASAN_SHADOW_OFFSET is provided in a Kconfig option depending + * on the VMSPLIT layout of the system: the kernel and userspace can + * split up lowmem in different ways according to needs, so we calculate + * the shadow offset depending on this. + */ + +#define KASAN_SHADOW_SCALE_SHIFT 3 +#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL) +#define KASAN_SHADOW_END ((UL(1) << (32 - KASAN_SHADOW_SCALE_SHIFT)) \ + + KASAN_SHADOW_OFFSET) +#define KASAN_SHADOW_START ((KASAN_SHADOW_END >> 3) + KASAN_SHADOW_OFFSET) + +#endif +#endif diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index f717d7122d9d..38a163f50130 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -18,6 +18,7 @@ #ifdef CONFIG_NEED_MACH_MEMORY_H #include <mach/memory.h> #endif +#include <asm/kasan_def.h> /* PAGE_OFFSET - the virtual address of the start of the kernel image */ #define PAGE_OFFSET UL(CONFIG_PAGE_OFFSET) @@ -28,7 +29,11 @@ * TASK_SIZE - the maximum size of a user space task. * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area */ +#ifndef CONFIG_KASAN #define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M)) +#else +#define TASK_SIZE (KASAN_SHADOW_START) +#endif #define TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M) /* diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h index 907571fd05c6..e6eb7a2aaf1e 100644 --- a/arch/arm/include/asm/uaccess-asm.h +++ b/arch/arm/include/asm/uaccess-asm.h @@ -85,7 +85,7 @@ */ .macro uaccess_entry, tsk, tmp0, tmp1, tmp2, disable ldr \tmp1, [\tsk, #TI_ADDR_LIMIT] - mov \tmp2, #TASK_SIZE + ldr \tmp2, =TASK_SIZE str \tmp2, [\tsk, #TI_ADDR_LIMIT] DACR( mrc p15, 0, \tmp0, c3, c0, 0) DACR( str \tmp0, [sp, #SVC_DACR]) diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 1c9e6d1452c5..0ea8529a4872 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -406,7 +406,8 @@ ENDPROC(__fiq_abt) @ if it was interrupted in a critical region. Here we @ perform a quick test inline since it should be false @ 99.9999% of the time. The rest is done out of line. - cmp r4, #TASK_SIZE + ldr r0, =TASK_SIZE + cmp r4, r0 blhs kuser_cmpxchg64_fixup #endif #endif diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 271cb8a1eba1..fee279e28a72 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -50,7 +50,8 @@ __ret_fast_syscall: UNWIND(.cantunwind ) disable_irq_notrace @ disable interrupts ldr r2, [tsk, #TI_ADDR_LIMIT] - cmp r2, #TASK_SIZE + ldr r1, =TASK_SIZE + cmp r2, r1 blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK @@ -87,7 +88,8 @@ __ret_fast_syscall: #endif disable_irq_notrace @ disable interrupts ldr r2, [tsk, #TI_ADDR_LIMIT] - cmp r2, #TASK_SIZE + ldr r1, =TASK_SIZE + cmp r2, r1 blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK @@ -128,7 +130,8 @@ ret_slow_syscall: disable_irq_notrace @ disable interrupts ENTRY(ret_to_user_from_irq) ldr r2, [tsk, #TI_ADDR_LIMIT] - cmp r2, #TASK_SIZE + ldr r1, =TASK_SIZE + cmp r2, r1 blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] tst r1, #_TIF_WORK_MASK diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index fa259825310c..c06ebfbc48c4 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -29,6 +29,7 @@ #include <asm/procinfo.h> #include <asm/memory.h> #include <asm/pgalloc.h> +#include <asm/kasan_def.h> #include <asm/mach/arch.h> #include <asm/mach/map.h> @@ -1255,8 +1256,25 @@ static inline void prepare_page_table(void) /* * Clear out all the mappings below the kernel image. */ +#ifdef CONFIG_KASAN + /* + * KASan's shadow memory inserts itself between the TASK_SIZE + * and MODULES_VADDR. Do not clear the KASan shadow memory mappings. + */ + for (addr = 0; addr < KASAN_SHADOW_START; addr += PMD_SIZE) + pmd_clear(pmd_off_k(addr)); + /* + * Skip over the KASan shadow area. KASAN_SHADOW_END is sometimes + * equal to MODULES_VADDR and then we exit the pmd clearing. If we + * are using a thumb-compiled kernel, there there will be 8MB more + * to clear as KASan always offset to 16 MB below MODULES_VADDR. + */ + for (addr = KASAN_SHADOW_END; addr < MODULES_VADDR; addr += PMD_SIZE) + pmd_clear(pmd_off_k(addr)); +#else for (addr = 0; addr < MODULES_VADDR; addr += PMD_SIZE) pmd_clear(pmd_off_k(addr)); +#endif #ifdef CONFIG_XIP_KERNEL /* The XIP kernel is mapped in the module area -- skip over it */ -- 2.22.0

From: Linus Walleij <linus.walleij@linaro.org> mainline inclusion from mainline-5.11-rc1 commit 5615f69bc2097452ecc954f5264d784e158d6801 category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- This patch initializes KASan shadow region's page table and memory. There are two stage for KASan initializing: 1. At early boot stage the whole shadow region is mapped to just one physical page (kasan_zero_page). It is finished by the function kasan_early_init which is called by __mmap_switched(arch/arm/kernel/ head-common.S) 2. After the calling of paging_init, we use kasan_zero_page as zero shadow for some memory that KASan does not need to track, and we allocate a new shadow space for the other memory that KASan need to track. These issues are finished by the function kasan_init which is call by setup_arch. When using KASan we also need to increase the THREAD_SIZE_ORDER from 1 to 2 as the extra calls for shadow memory uses quite a bit of stack. As we need to make a temporary copy of the PGD when setting up shadow memory we create a helpful PGD_SIZE definition for both LPAE and non-LPAE setups. The KASan core code unconditionally calls pud_populate() so this needs to be changed from BUG() to do {} while (0) when building with KASan enabled. After the initial development by Andre Ryabinin several modifications have been made to this code: Abbott Liu <liuwenliang@huawei.com> - Add support ARM LPAE: If LPAE is enabled, KASan shadow region's mapping table need be copied in the pgd_alloc() function. - Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate, kasan_pgd_populate from .meminit.text section to .init.text section. Reported by Florian Fainelli <f.fainelli@gmail.com> Linus Walleij <linus.walleij@linaro.org>: - Drop the custom mainpulation of TTBR0 and just use cpu_switch_mm() to switch the pgd table. - Adopt to handle 4th level page tabel folding. - Rewrite the entire page directory and page entry initialization sequence to be recursive based on ARM64:s kasan_init.c. Ard Biesheuvel <ardb@kernel.org>: - Necessary underlying fixes. - Crucial bug fixes to the memory set-up code. Co-developed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Co-developed-by: Abbott Liu <liuwenliang@huawei.com> Co-developed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Cc: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: Russell King - ARM Linux <rmk+kernel@armlinux.org.uk> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 5615f69bc2097452ecc954f5264d784e158d6801) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/kasan.h | 33 ++++ arch/arm/include/asm/pgalloc.h | 8 +- arch/arm/include/asm/thread_info.h | 8 + arch/arm/kernel/head-common.S | 3 + arch/arm/kernel/setup.c | 2 + arch/arm/mm/Makefile | 3 + arch/arm/mm/kasan_init.c | 291 +++++++++++++++++++++++++++++ arch/arm/mm/pgd.c | 16 +- 8 files changed, 362 insertions(+), 2 deletions(-) create mode 100644 arch/arm/include/asm/kasan.h create mode 100644 arch/arm/mm/kasan_init.c diff --git a/arch/arm/include/asm/kasan.h b/arch/arm/include/asm/kasan.h new file mode 100644 index 000000000000..303c35df3135 --- /dev/null +++ b/arch/arm/include/asm/kasan.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * arch/arm/include/asm/kasan.h + * + * Copyright (c) 2015 Samsung Electronics Co., Ltd. + * Author: Andrey Ryabinin <ryabinin.a.a@gmail.com> + * + */ + +#ifndef __ASM_KASAN_H +#define __ASM_KASAN_H + +#ifdef CONFIG_KASAN + +#include <asm/kasan_def.h> + +#define KASAN_SHADOW_SCALE_SHIFT 3 + +/* + * The compiler uses a shadow offset assuming that addresses start + * from 0. Kernel addresses don't start from 0, so shadow + * for kernel really starts from 'compiler's shadow offset' + + * ('kernel address space start' >> KASAN_SHADOW_SCALE_SHIFT) + */ + +asmlinkage void kasan_early_init(void); +extern void kasan_init(void); + +#else +static inline void kasan_init(void) { } +#endif + +#endif diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h index 15f4674715f8..fdee1f04f4f3 100644 --- a/arch/arm/include/asm/pgalloc.h +++ b/arch/arm/include/asm/pgalloc.h @@ -21,6 +21,7 @@ #define _PAGE_KERNEL_TABLE (PMD_TYPE_TABLE | PMD_BIT4 | PMD_DOMAIN(DOMAIN_KERNEL)) #ifdef CONFIG_ARM_LPAE +#define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { @@ -28,14 +29,19 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) } #else /* !CONFIG_ARM_LPAE */ +#define PGD_SIZE (PAGE_SIZE << 2) /* * Since we have only two-level page tables, these are trivial */ #define pmd_alloc_one(mm,addr) ({ BUG(); ((pmd_t *)2); }) #define pmd_free(mm, pmd) do { } while (0) +#ifdef CONFIG_KASAN +/* The KASan core unconditionally calls pud_populate() on all architectures */ +#define pud_populate(mm,pmd,pte) do { } while (0) +#else #define pud_populate(mm,pmd,pte) BUG() - +#endif #endif /* CONFIG_ARM_LPAE */ extern pgd_t *pgd_alloc(struct mm_struct *mm); diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h index 536b6b979f63..56fae7861fd3 100644 --- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -13,7 +13,15 @@ #include <asm/fpstate.h> #include <asm/page.h> +#ifdef CONFIG_KASAN +/* + * KASan uses a lot of extra stack space so the thread size order needs to + * be increased. + */ +#define THREAD_SIZE_ORDER 2 +#else #define THREAD_SIZE_ORDER 1 +#endif #define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER) #define THREAD_START_SP (THREAD_SIZE - 8) diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 6840c7c60a85..89c80154b9ef 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -111,6 +111,9 @@ __mmap_switched: str r8, [r2] @ Save atags pointer cmp r3, #0 strne r10, [r3] @ Save control register values +#ifdef CONFIG_KASAN + bl kasan_early_init +#endif mov lr, #0 b start_kernel ENDPROC(__mmap_switched) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index f90479d8b50c..d32f7652a5bf 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -59,6 +59,7 @@ #include <asm/unwind.h> #include <asm/memblock.h> #include <asm/virt.h> +#include <asm/kasan.h> #include "atags.h" @@ -1145,6 +1146,7 @@ void __init setup_arch(char **cmdline_p) early_ioremap_reset(); paging_init(mdesc); + kasan_init(); request_standard_resources(mdesc); if (mdesc->restart) diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 99699c32d8a5..4536159bc8fa 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -113,3 +113,6 @@ obj-$(CONFIG_CACHE_L2X0_PMU) += cache-l2x0-pmu.o obj-$(CONFIG_CACHE_XSC3L2) += cache-xsc3l2.o obj-$(CONFIG_CACHE_TAUROS2) += cache-tauros2.o obj-$(CONFIG_CACHE_UNIPHIER) += cache-uniphier.o + +KASAN_SANITIZE_kasan_init.o := n +obj-$(CONFIG_KASAN) += kasan_init.o diff --git a/arch/arm/mm/kasan_init.c b/arch/arm/mm/kasan_init.c new file mode 100644 index 000000000000..9c348042a724 --- /dev/null +++ b/arch/arm/mm/kasan_init.c @@ -0,0 +1,291 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This file contains kasan initialization code for ARM. + * + * Copyright (c) 2018 Samsung Electronics Co., Ltd. + * Author: Andrey Ryabinin <ryabinin.a.a@gmail.com> + * Author: Linus Walleij <linus.walleij@linaro.org> + */ + +#define pr_fmt(fmt) "kasan: " fmt +#include <linux/kasan.h> +#include <linux/kernel.h> +#include <linux/memblock.h> +#include <linux/sched/task.h> +#include <linux/start_kernel.h> +#include <linux/pgtable.h> +#include <asm/cputype.h> +#include <asm/highmem.h> +#include <asm/mach/map.h> +#include <asm/memory.h> +#include <asm/page.h> +#include <asm/pgalloc.h> +#include <asm/procinfo.h> +#include <asm/proc-fns.h> + +#include "mm.h" + +static pgd_t tmp_pgd_table[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE); + +pmd_t tmp_pmd_table[PTRS_PER_PMD] __page_aligned_bss; + +static __init void *kasan_alloc_block(size_t size) +{ + return memblock_alloc_try_nid(size, size, __pa(MAX_DMA_ADDRESS), + MEMBLOCK_ALLOC_KASAN, NUMA_NO_NODE); +} + +static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, + unsigned long end, bool early) +{ + unsigned long next; + pte_t *ptep = pte_offset_kernel(pmdp, addr); + + do { + pte_t entry; + void *p; + + next = addr + PAGE_SIZE; + + if (!early) { + if (!pte_none(READ_ONCE(*ptep))) + continue; + + p = kasan_alloc_block(PAGE_SIZE); + if (!p) { + panic("%s failed to allocate shadow page for address 0x%lx\n", + __func__, addr); + return; + } + memset(p, KASAN_SHADOW_INIT, PAGE_SIZE); + entry = pfn_pte(virt_to_pfn(p), + __pgprot(pgprot_val(PAGE_KERNEL))); + } else if (pte_none(READ_ONCE(*ptep))) { + /* + * The early shadow memory is mapping all KASan + * operations to one and the same page in memory, + * "kasan_early_shadow_page" so that the instrumentation + * will work on a scratch area until we can set up the + * proper KASan shadow memory. + */ + entry = pfn_pte(virt_to_pfn(kasan_early_shadow_page), + __pgprot(_L_PTE_DEFAULT | L_PTE_DIRTY | L_PTE_XN)); + } else { + /* + * Early shadow mappings are PMD_SIZE aligned, so if the + * first entry is already set, they must all be set. + */ + return; + } + + set_pte_at(&init_mm, addr, ptep, entry); + } while (ptep++, addr = next, addr != end); +} + +/* + * The pmd (page middle directory) is only used on LPAE + */ +static void __init kasan_pmd_populate(pud_t *pudp, unsigned long addr, + unsigned long end, bool early) +{ + unsigned long next; + pmd_t *pmdp = pmd_offset(pudp, addr); + + do { + if (pmd_none(*pmdp)) { + /* + * We attempt to allocate a shadow block for the PMDs + * used by the PTEs for this address if it isn't already + * allocated. + */ + void *p = early ? kasan_early_shadow_pte : + kasan_alloc_block(PAGE_SIZE); + + if (!p) { + panic("%s failed to allocate shadow block for address 0x%lx\n", + __func__, addr); + return; + } + pmd_populate_kernel(&init_mm, pmdp, p); + flush_pmd_entry(pmdp); + } + + next = pmd_addr_end(addr, end); + kasan_pte_populate(pmdp, addr, next, early); + } while (pmdp++, addr = next, addr != end); +} + +static void __init kasan_pgd_populate(unsigned long addr, unsigned long end, + bool early) +{ + unsigned long next; + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp; + + pgdp = pgd_offset_k(addr); + + do { + /* + * Allocate and populate the shadow block of p4d folded into + * pud folded into pmd if it doesn't already exist + */ + if (!early && pgd_none(*pgdp)) { + void *p = kasan_alloc_block(PAGE_SIZE); + + if (!p) { + panic("%s failed to allocate shadow block for address 0x%lx\n", + __func__, addr); + return; + } + pgd_populate(&init_mm, pgdp, p); + } + + next = pgd_addr_end(addr, end); + /* + * We just immediately jump over the p4d and pud page + * directories since we believe ARM32 will never gain four + * nor five level page tables. + */ + p4dp = p4d_offset(pgdp, addr); + pudp = pud_offset(p4dp, addr); + + kasan_pmd_populate(pudp, addr, next, early); + } while (pgdp++, addr = next, addr != end); +} + +extern struct proc_info_list *lookup_processor_type(unsigned int); + +void __init kasan_early_init(void) +{ + struct proc_info_list *list; + + /* + * locate processor in the list of supported processor + * types. The linker builds this table for us from the + * entries in arch/arm/mm/proc-*.S + */ + list = lookup_processor_type(read_cpuid_id()); + if (list) { +#ifdef MULTI_CPU + processor = *list->proc; +#endif + } + + BUILD_BUG_ON((KASAN_SHADOW_END - (1UL << 29)) != KASAN_SHADOW_OFFSET); + /* + * We walk the page table and set all of the shadow memory to point + * to the scratch page. + */ + kasan_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, true); +} + +static void __init clear_pgds(unsigned long start, + unsigned long end) +{ + for (; start && start < end; start += PMD_SIZE) + pmd_clear(pmd_off_k(start)); +} + +static int __init create_mapping(void *start, void *end) +{ + void *shadow_start, *shadow_end; + + shadow_start = kasan_mem_to_shadow(start); + shadow_end = kasan_mem_to_shadow(end); + + pr_info("Mapping kernel virtual memory block: %px-%px at shadow: %px-%px\n", + start, end, shadow_start, shadow_end); + + kasan_pgd_populate((unsigned long)shadow_start & PAGE_MASK, + PAGE_ALIGN((unsigned long)shadow_end), false); + return 0; +} + +void __init kasan_init(void) +{ + phys_addr_t pa_start, pa_end; + u64 i; + + /* + * We are going to perform proper setup of shadow memory. + * + * At first we should unmap early shadow (clear_pgds() call bellow). + * However, instrumented code can't execute without shadow memory. + * + * To keep the early shadow memory MMU tables around while setting up + * the proper shadow memory, we copy swapper_pg_dir (the initial page + * table) to tmp_pgd_table and use that to keep the early shadow memory + * mapped until the full shadow setup is finished. Then we swap back + * to the proper swapper_pg_dir. + */ + + memcpy(tmp_pgd_table, swapper_pg_dir, sizeof(tmp_pgd_table)); +#ifdef CONFIG_ARM_LPAE + /* We need to be in the same PGD or this won't work */ + BUILD_BUG_ON(pgd_index(KASAN_SHADOW_START) != + pgd_index(KASAN_SHADOW_END)); + memcpy(tmp_pmd_table, + pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)), + sizeof(tmp_pmd_table)); + set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)], + __pgd(__pa(tmp_pmd_table) | PMD_TYPE_TABLE | L_PGD_SWAPPER)); +#endif + cpu_switch_mm(tmp_pgd_table, &init_mm); + local_flush_tlb_all(); + + clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); + + kasan_populate_early_shadow(kasan_mem_to_shadow((void *)VMALLOC_START), + kasan_mem_to_shadow((void *)-1UL) + 1); + + for_each_mem_range(i, &pa_start, &pa_end) { + void *start = __va(pa_start); + void *end = __va(pa_end); + + /* Do not attempt to shadow highmem */ + if (pa_start >= arm_lowmem_limit) { + pr_info("Skip highmem block at %pa-%pa\n", &pa_start, &pa_end); + continue; + } + if (pa_end > arm_lowmem_limit) { + pr_info("Truncating shadow for memory block at %pa-%pa to lowmem region at %pa\n", + &pa_start, &pa_end, &arm_lowmem_limit); + end = __va(arm_lowmem_limit); + } + if (start >= end) { + pr_info("Skipping invalid memory block %pa-%pa (virtual %p-%p)\n", + &pa_start, &pa_end, start, end); + continue; + } + + create_mapping(start, end); + } + + /* + * 1. The module global variables are in MODULES_VADDR ~ MODULES_END, + * so we need to map this area. + * 2. PKMAP_BASE ~ PKMAP_BASE+PMD_SIZE's shadow and MODULES_VADDR + * ~ MODULES_END's shadow is in the same PMD_SIZE, so we can't + * use kasan_populate_zero_shadow. + */ + create_mapping((void *)MODULES_VADDR, (void *)(PKMAP_BASE + PMD_SIZE)); + + /* + * KAsan may reuse the contents of kasan_early_shadow_pte directly, so + * we should make sure that it maps the zero page read-only. + */ + for (i = 0; i < PTRS_PER_PTE; i++) + set_pte_at(&init_mm, KASAN_SHADOW_START + i*PAGE_SIZE, + &kasan_early_shadow_pte[i], + pfn_pte(virt_to_pfn(kasan_early_shadow_page), + __pgprot(pgprot_val(PAGE_KERNEL) + | L_PTE_RDONLY))); + + cpu_switch_mm(swapper_pg_dir, &init_mm); + local_flush_tlb_all(); + + memset(kasan_early_shadow_page, 0, PAGE_SIZE); + pr_info("Kernel address sanitizer initialized\n"); + init_task.kasan_depth = 0; +} diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c index c5e1b27046a8..f8e9bc58a84f 100644 --- a/arch/arm/mm/pgd.c +++ b/arch/arm/mm/pgd.c @@ -66,7 +66,21 @@ pgd_t *pgd_alloc(struct mm_struct *mm) new_pmd = pmd_alloc(mm, new_pud, 0); if (!new_pmd) goto no_pmd; -#endif +#ifdef CONFIG_KASAN + /* + * Copy PMD table for KASAN shadow mappings. + */ + init_pgd = pgd_offset_k(TASK_SIZE); + init_p4d = p4d_offset(init_pgd, TASK_SIZE); + init_pud = pud_offset(init_p4d, TASK_SIZE); + init_pmd = pmd_offset(init_pud, TASK_SIZE); + new_pmd = pmd_offset(new_pud, TASK_SIZE); + memcpy(new_pmd, init_pmd, + (pmd_index(MODULES_VADDR) - pmd_index(TASK_SIZE)) + * sizeof(pmd_t)); + clean_dcache_area(new_pmd, PTRS_PER_PMD * sizeof(pmd_t)); +#endif /* CONFIG_KASAN */ +#endif /* CONFIG_LPAE */ if (!vectors_high()) { /* -- 2.22.0

From: Linus Walleij <linus.walleij@linaro.org> mainline inclusion from mainline-5.11-rc1 commit 421015713b306e47af95d4d61cdfbd96d462e4cb category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- This patch enables the kernel address sanitizer for ARM. XIP_KERNEL has not been tested and is therefore not allowed for now. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 421015713b306e47af95d4d61cdfbd96d462e4cb) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- Documentation/dev-tools/kasan.rst | 4 ++-- Documentation/features/debug/KASAN/arch-support.txt | 2 +- arch/arm/Kconfig | 1 + 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 2b68addaadcd..b3e489064a18 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -18,8 +18,8 @@ out-of-bounds accesses for global variables is only supported since Clang 11. Tag-based KASAN is only supported in Clang. -Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and -riscv architectures, and tag-based KASAN is supported only for arm64. +Currently generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390 +and riscv architectures, and tag-based KASAN is supported only for arm64. Usage ----- diff --git a/Documentation/features/debug/KASAN/arch-support.txt b/Documentation/features/debug/KASAN/arch-support.txt index c3fe9b266e7b..b2288dc14b72 100644 --- a/Documentation/features/debug/KASAN/arch-support.txt +++ b/Documentation/features/debug/KASAN/arch-support.txt @@ -8,7 +8,7 @@ ----------------------- | alpha: | TODO | | arc: | TODO | - | arm: | TODO | + | arm: | ok | | arm64: | ok | | c6x: | TODO | | csky: | TODO | diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index fc530fad13b2..5516dd83409e 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -68,6 +68,7 @@ config ARM select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU + select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_SECCOMP select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT -- 2.22.0

From: Fangrui Song <maskray@google.com> mainline inclusion from mainline-5.11-rc1 commit 735e8d93dc2b107f7891a9c2b1c4cfbea1fcbbbc category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Commit d6d51a96c7d6 ("ARM: 9014/2: Replace string mem* functions for KASan") add .weak directives to memcpy/memmove/memset to avoid collision with KASAN interceptors. This does not work with LLVM's integrated assembler (the assembly snippet `.weak memcpy ... .globl memcpy` produces a STB_GLOBAL memcpy while GNU as produces a STB_WEAK memcpy). LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate WEAK macro instead. Link: https://github.com/ClangBuiltLinux/linux/issues/1190 Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/lib/memcpy.S | 3 +-- arch/arm/lib/memmove.S | 3 +-- arch/arm/lib/memset.S | 3 +-- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S index ad4625d16e11..e4caf48c089f 100644 --- a/arch/arm/lib/memcpy.S +++ b/arch/arm/lib/memcpy.S @@ -58,10 +58,9 @@ /* Prototype: void *memcpy(void *dest, const void *src, size_t n); */ -.weak memcpy ENTRY(__memcpy) ENTRY(mmiocpy) -ENTRY(memcpy) +WEAK(memcpy) #include "copy_template.S" diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S index fd123ea5a5a4..6fecc12a1f51 100644 --- a/arch/arm/lib/memmove.S +++ b/arch/arm/lib/memmove.S @@ -24,9 +24,8 @@ * occurring in the opposite direction. */ -.weak memmove ENTRY(__memmove) -ENTRY(memmove) +WEAK(memmove) UNWIND( .fnstart ) subs ip, r0, r1 diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S index 0e7ff0423f50..9817cb258c1a 100644 --- a/arch/arm/lib/memset.S +++ b/arch/arm/lib/memset.S @@ -13,10 +13,9 @@ .text .align 5 -.weak memset ENTRY(__memset) ENTRY(mmioset) -ENTRY(memset) +WEAK(memset) UNWIND( .fnstart ) ands r3, r0, #3 @ 1 unaligned? mov ip, r0 @ preserve r0 as return value -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 4d576cab16f57e1f87978f6997a725179398341e category: feature feature: ARM KASAN support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- KASAN uses the routines in stacktrace.c to capture the call stack each time memory gets allocated or freed. Some of these routines are also used to log CPU and memory context when exceptions are taken, and so in some cases, memory accesses may be made that are not strictly in line with the KASAN constraints, and may therefore trigger false KASAN positives. So follow the example set by other architectures, and simply disable KASAN instrumentation for these routines. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> (cherry picked from commit 4d576cab16f57e1f87978f6997a725179398341e) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index 89e5d864e923..f6431b46866e 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -21,6 +21,9 @@ obj-y := elf.o entry-common.o irq.o opcodes.o \ setup.o signal.o sigreturn_codes.o \ stacktrace.o sys_arm.o time.o traps.o +KASAN_SANITIZE_stacktrace.o := n +KASAN_SANITIZE_traps.o := n + ifneq ($(CONFIG_ARM_UNWIND),y) obj-$(CONFIG_FRAME_POINTER) += return_address.o endif -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 22f2d23098f7d34fc5142531cffb241d14611684 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- When using the new adr_l/ldr_l/str_l macros to refer to external symbols from modules, the linker may emit place relative ELF relocations that need to be fixed up by the module loader. So add support for these. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 22f2d23098f7d34fc5142531cffb241d14611684) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/elf.h | 5 +++++ arch/arm/kernel/module.c | 20 ++++++++++++++++++-- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h index b078d992414b..0ac62a54b73c 100644 --- a/arch/arm/include/asm/elf.h +++ b/arch/arm/include/asm/elf.h @@ -51,6 +51,7 @@ typedef struct user_fp elf_fpregset_t; #define R_ARM_NONE 0 #define R_ARM_PC24 1 #define R_ARM_ABS32 2 +#define R_ARM_REL32 3 #define R_ARM_CALL 28 #define R_ARM_JUMP24 29 #define R_ARM_TARGET1 38 @@ -58,11 +59,15 @@ typedef struct user_fp elf_fpregset_t; #define R_ARM_PREL31 42 #define R_ARM_MOVW_ABS_NC 43 #define R_ARM_MOVT_ABS 44 +#define R_ARM_MOVW_PREL_NC 45 +#define R_ARM_MOVT_PREL 46 #define R_ARM_THM_CALL 10 #define R_ARM_THM_JUMP24 30 #define R_ARM_THM_MOVW_ABS_NC 47 #define R_ARM_THM_MOVT_ABS 48 +#define R_ARM_THM_MOVW_PREL_NC 49 +#define R_ARM_THM_MOVT_PREL 50 /* * These are used to set parameters in the core dumps. diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index e15444b25ca0..beac45e89ba6 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -185,14 +185,24 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex, *(u32 *)loc |= offset & 0x7fffffff; break; + case R_ARM_REL32: + *(u32 *)loc += sym->st_value - loc; + break; + case R_ARM_MOVW_ABS_NC: case R_ARM_MOVT_ABS: + case R_ARM_MOVW_PREL_NC: + case R_ARM_MOVT_PREL: offset = tmp = __mem_to_opcode_arm(*(u32 *)loc); offset = ((offset & 0xf0000) >> 4) | (offset & 0xfff); offset = (offset ^ 0x8000) - 0x8000; offset += sym->st_value; - if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS) + if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL || + ELF32_R_TYPE(rel->r_info) == R_ARM_MOVW_PREL_NC) + offset -= loc; + if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS || + ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL) offset >>= 16; tmp &= 0xfff0f000; @@ -283,6 +293,8 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex, case R_ARM_THM_MOVW_ABS_NC: case R_ARM_THM_MOVT_ABS: + case R_ARM_THM_MOVW_PREL_NC: + case R_ARM_THM_MOVT_PREL: upper = __mem_to_opcode_thumb16(*(u16 *)loc); lower = __mem_to_opcode_thumb16(*(u16 *)(loc + 2)); @@ -302,7 +314,11 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex, offset = (offset ^ 0x8000) - 0x8000; offset += sym->st_value; - if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS) + if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_PREL || + ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVW_PREL_NC) + offset -= loc; + if (ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_ABS || + ELF32_R_TYPE(rel->r_info) == R_ARM_THM_MOVT_PREL) offset >>= 16; upper = (u16)((upper & 0xfbf0) | -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit eae78e1a97201a81a851342ad9659b60f61a3951 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Move the phys2virt patching code into a separate .S file before doing some work on it. Suggested-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit eae78e1a97201a81a851342ad9659b60f61a3951) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/Makefile | 1 + arch/arm/kernel/head.S | 138 -------------------------------- arch/arm/kernel/phys2virt.S | 151 ++++++++++++++++++++++++++++++++++++ 3 files changed, 152 insertions(+), 138 deletions(-) create mode 100644 arch/arm/kernel/phys2virt.S diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index f6431b46866e..c38c8b019d24 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -95,6 +95,7 @@ obj-$(CONFIG_PARAVIRT) += paravirt.o head-y := head$(MMUEXT).o obj-$(CONFIG_DEBUG_LL) += debug.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o +obj-$(CONFIG_ARM_PATCH_PHYS_VIRT) += phys2virt.o # This is executed very early using a temporary stack when no memory allocator # nor global data is available. Everything has to be allocated on the stack. diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 4af5c7679624..dbc409a66555 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -585,142 +585,4 @@ ENTRY(fixup_smp) ldmfd sp!, {r4 - r6, pc} ENDPROC(fixup_smp) -#ifdef __ARMEB__ -#define LOW_OFFSET 0x4 -#define HIGH_OFFSET 0x0 -#else -#define LOW_OFFSET 0x0 -#define HIGH_OFFSET 0x4 -#endif - -#ifdef CONFIG_ARM_PATCH_PHYS_VIRT - -/* __fixup_pv_table - patch the stub instructions with the delta between - * PHYS_OFFSET and PAGE_OFFSET, which is assumed to be 16MiB aligned and - * can be expressed by an immediate shifter operand. The stub instruction - * has a form of '(add|sub) rd, rn, #imm'. - */ - __HEAD -__fixup_pv_table: - adr r0, 1f - ldmia r0, {r3-r7} - mvn ip, #0 - subs r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET - add r4, r4, r3 @ adjust table start address - add r5, r5, r3 @ adjust table end address - add r6, r6, r3 @ adjust __pv_phys_pfn_offset address - add r7, r7, r3 @ adjust __pv_offset address - mov r0, r8, lsr #PAGE_SHIFT @ convert to PFN - str r0, [r6] @ save computed PHYS_OFFSET to __pv_phys_pfn_offset - strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits - mov r6, r3, lsr #24 @ constant for add/sub instructions - teq r3, r6, lsl #24 @ must be 16MiB aligned -THUMB( it ne @ cross section branch ) - bne __error - str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits - b __fixup_a_pv_table -ENDPROC(__fixup_pv_table) - - .align -1: .long . - .long __pv_table_begin - .long __pv_table_end -2: .long __pv_phys_pfn_offset - .long __pv_offset - - .text -__fixup_a_pv_table: - adr r0, 3f - ldr r6, [r0] - add r6, r6, r3 - ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word - ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word - mov r6, r6, lsr #24 - cmn r0, #1 -#ifdef CONFIG_THUMB2_KERNEL - moveq r0, #0x200000 @ set bit 21, mov to mvn instruction - lsls r6, #24 - beq 2f - clz r7, r6 - lsr r6, #24 - lsl r6, r7 - bic r6, #0x0080 - lsrs r7, #1 - orrcs r6, #0x0080 - orr r6, r6, r7, lsl #12 - orr r6, #0x4000 - b 2f -1: add r7, r3 - ldrh ip, [r7, #2] -ARM_BE8(rev16 ip, ip) - tst ip, #0x4000 - and ip, #0x8f00 - orrne ip, r6 @ mask in offset bits 31-24 - orreq ip, r0 @ mask in offset bits 7-0 -ARM_BE8(rev16 ip, ip) - strh ip, [r7, #2] - bne 2f - ldrh ip, [r7] -ARM_BE8(rev16 ip, ip) - bic ip, #0x20 - orr ip, ip, r0, lsr #16 -ARM_BE8(rev16 ip, ip) - strh ip, [r7] -2: cmp r4, r5 - ldrcc r7, [r4], #4 @ use branch for delay slot - bcc 1b - bx lr -#else - moveq r0, #0x400000 @ set bit 22, mov to mvn instruction - b 2f -1: ldr ip, [r7, r3] -#ifdef CONFIG_CPU_ENDIAN_BE8 - @ in BE8, we load data in BE, but instructions still in LE - bic ip, ip, #0xff000000 - tst ip, #0x000f0000 @ check the rotation field - orrne ip, ip, r6, lsl #24 @ mask in offset bits 31-24 - biceq ip, ip, #0x00004000 @ clear bit 22 - orreq ip, ip, r0, ror #8 @ mask in offset bits 7-0 -#else - bic ip, ip, #0x000000ff - tst ip, #0xf00 @ check the rotation field - orrne ip, ip, r6 @ mask in offset bits 31-24 - biceq ip, ip, #0x400000 @ clear bit 22 - orreq ip, ip, r0 @ mask in offset bits 7-0 -#endif - str ip, [r7, r3] -2: cmp r4, r5 - ldrcc r7, [r4], #4 @ use branch for delay slot - bcc 1b - ret lr -#endif -ENDPROC(__fixup_a_pv_table) - - .align -3: .long __pv_offset - -ENTRY(fixup_pv_table) - stmfd sp!, {r4 - r7, lr} - mov r3, #0 @ no offset - mov r4, r0 @ r0 = table start - add r5, r0, r1 @ r1 = table size - bl __fixup_a_pv_table - ldmfd sp!, {r4 - r7, pc} -ENDPROC(fixup_pv_table) - - .data - .align 2 - .globl __pv_phys_pfn_offset - .type __pv_phys_pfn_offset, %object -__pv_phys_pfn_offset: - .word 0 - .size __pv_phys_pfn_offset, . -__pv_phys_pfn_offset - - .globl __pv_offset - .type __pv_offset, %object -__pv_offset: - .quad 0 - .size __pv_offset, . -__pv_offset -#endif - #include "head-common.S" diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S new file mode 100644 index 000000000000..7c17fbfeeedd --- /dev/null +++ b/arch/arm/kernel/phys2virt.S @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 1994-2002 Russell King + * Copyright (c) 2003 ARM Limited + * All Rights Reserved + */ + +#include <linux/init.h> +#include <linux/linkage.h> +#include <asm/assembler.h> +#include <asm/page.h> + +#ifdef __ARMEB__ +#define LOW_OFFSET 0x4 +#define HIGH_OFFSET 0x0 +#else +#define LOW_OFFSET 0x0 +#define HIGH_OFFSET 0x4 +#endif + +/* + * __fixup_pv_table - patch the stub instructions with the delta between + * PHYS_OFFSET and PAGE_OFFSET, which is assumed to be + * 16MiB aligned. + * + * Called from head.S, which expects the following registers to be preserved: + * r1 = machine no, r2 = atags or dtb, + * r8 = phys_offset, r9 = cpuid, r10 = procinfo + */ + __HEAD +ENTRY(__fixup_pv_table) + adr r0, 1f + ldmia r0, {r3-r7} + mvn ip, #0 + subs r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET + add r4, r4, r3 @ adjust table start address + add r5, r5, r3 @ adjust table end address + add r6, r6, r3 @ adjust __pv_phys_pfn_offset address + add r7, r7, r3 @ adjust __pv_offset address + mov r0, r8, lsr #PAGE_SHIFT @ convert to PFN + str r0, [r6] @ save computed PHYS_OFFSET to __pv_phys_pfn_offset + strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits + mov r6, r3, lsr #24 @ constant for add/sub instructions + teq r3, r6, lsl #24 @ must be 16MiB aligned + bne 0f + str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits + b __fixup_a_pv_table +0: mov r0, r0 @ deadloop on error + b 0b +ENDPROC(__fixup_pv_table) + + .align +1: .long . + .long __pv_table_begin + .long __pv_table_end +2: .long __pv_phys_pfn_offset + .long __pv_offset + + .text +__fixup_a_pv_table: + adr r0, 3f + ldr r6, [r0] + add r6, r6, r3 + ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word + ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word + mov r6, r6, lsr #24 + cmn r0, #1 +#ifdef CONFIG_THUMB2_KERNEL + moveq r0, #0x200000 @ set bit 21, mov to mvn instruction + lsls r6, #24 + beq 2f + clz r7, r6 + lsr r6, #24 + lsl r6, r7 + bic r6, #0x0080 + lsrs r7, #1 + orrcs r6, #0x0080 + orr r6, r6, r7, lsl #12 + orr r6, #0x4000 + b 2f +1: add r7, r3 + ldrh ip, [r7, #2] +ARM_BE8(rev16 ip, ip) + tst ip, #0x4000 + and ip, #0x8f00 + orrne ip, r6 @ mask in offset bits 31-24 + orreq ip, r0 @ mask in offset bits 7-0 +ARM_BE8(rev16 ip, ip) + strh ip, [r7, #2] + bne 2f + ldrh ip, [r7] +ARM_BE8(rev16 ip, ip) + bic ip, #0x20 + orr ip, ip, r0, lsr #16 +ARM_BE8(rev16 ip, ip) + strh ip, [r7] +2: cmp r4, r5 + ldrcc r7, [r4], #4 @ use branch for delay slot + bcc 1b + bx lr +#else + moveq r0, #0x400000 @ set bit 22, mov to mvn instruction + b 2f +1: ldr ip, [r7, r3] +#ifdef CONFIG_CPU_ENDIAN_BE8 + @ in BE8, we load data in BE, but instructions still in LE + bic ip, ip, #0xff000000 + tst ip, #0x000f0000 @ check the rotation field + orrne ip, ip, r6, lsl #24 @ mask in offset bits 31-24 + biceq ip, ip, #0x00004000 @ clear bit 22 + orreq ip, ip, r0, ror #8 @ mask in offset bits 7-0 +#else + bic ip, ip, #0x000000ff + tst ip, #0xf00 @ check the rotation field + orrne ip, ip, r6 @ mask in offset bits 31-24 + biceq ip, ip, #0x400000 @ clear bit 22 + orreq ip, ip, r0 @ mask in offset bits 7-0 +#endif + str ip, [r7, r3] +2: cmp r4, r5 + ldrcc r7, [r4], #4 @ use branch for delay slot + bcc 1b + ret lr +#endif +ENDPROC(__fixup_a_pv_table) + + .align +3: .long __pv_offset + +ENTRY(fixup_pv_table) + stmfd sp!, {r4 - r7, lr} + mov r3, #0 @ no offset + mov r4, r0 @ r0 = table start + add r5, r0, r1 @ r1 = table size + bl __fixup_a_pv_table + ldmfd sp!, {r4 - r7, pc} +ENDPROC(fixup_pv_table) + + .data + .align 2 + .globl __pv_phys_pfn_offset + .type __pv_phys_pfn_offset, %object +__pv_phys_pfn_offset: + .word 0 + .size __pv_phys_pfn_offset, . -__pv_phys_pfn_offset + + .globl __pv_offset + .type __pv_offset, %object +__pv_offset: + .quad 0 + .size __pv_offset, . -__pv_offset -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 4b16421c3e955f440eb45546db6ce33d47f29c78 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- The ARM and Thumb2 versions of the p2v patching loop have some overlap at the end of the loop, so factor that out. As numeric labels are not required to be unique, and may therefore be ambiguous, use named local labels for the start and end of the loop instead. Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 4b16421c3e955f440eb45546db6ce33d47f29c78) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/phys2virt.S | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index 7c17fbfeeedd..8fb1f7bcc720 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -68,7 +68,7 @@ __fixup_a_pv_table: #ifdef CONFIG_THUMB2_KERNEL moveq r0, #0x200000 @ set bit 21, mov to mvn instruction lsls r6, #24 - beq 2f + beq .Lnext clz r7, r6 lsr r6, #24 lsl r6, r7 @@ -77,8 +77,8 @@ __fixup_a_pv_table: orrcs r6, #0x0080 orr r6, r6, r7, lsl #12 orr r6, #0x4000 - b 2f -1: add r7, r3 + b .Lnext +.Lloop: add r7, r3 ldrh ip, [r7, #2] ARM_BE8(rev16 ip, ip) tst ip, #0x4000 @@ -87,21 +87,17 @@ ARM_BE8(rev16 ip, ip) orreq ip, r0 @ mask in offset bits 7-0 ARM_BE8(rev16 ip, ip) strh ip, [r7, #2] - bne 2f + bne .Lnext ldrh ip, [r7] ARM_BE8(rev16 ip, ip) bic ip, #0x20 orr ip, ip, r0, lsr #16 ARM_BE8(rev16 ip, ip) strh ip, [r7] -2: cmp r4, r5 - ldrcc r7, [r4], #4 @ use branch for delay slot - bcc 1b - bx lr #else moveq r0, #0x400000 @ set bit 22, mov to mvn instruction - b 2f -1: ldr ip, [r7, r3] + b .Lnext +.Lloop: ldr ip, [r7, r3] #ifdef CONFIG_CPU_ENDIAN_BE8 @ in BE8, we load data in BE, but instructions still in LE bic ip, ip, #0xff000000 @@ -117,11 +113,13 @@ ARM_BE8(rev16 ip, ip) orreq ip, ip, r0 @ mask in offset bits 7-0 #endif str ip, [r7, r3] -2: cmp r4, r5 +#endif + +.Lnext: + cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot - bcc 1b + bcc .Lloop ret lr -#endif ENDPROC(__fixup_a_pv_table) .align -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 7a94849e81b5c10e71f0a555300313c2789d9b0d category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- The big and little endian versions of the ARM p2v patching routine only differ in the values of the constants, so factor those out into macros so that we only have one version of the logic sequence to maintain. Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 7a94849e81b5c10e71f0a555300313c2789d9b0d) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/phys2virt.S | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index 8fb1f7bcc720..5031e5a2e78b 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -95,23 +95,25 @@ ARM_BE8(rev16 ip, ip) ARM_BE8(rev16 ip, ip) strh ip, [r7] #else - moveq r0, #0x400000 @ set bit 22, mov to mvn instruction - b .Lnext -.Lloop: ldr ip, [r7, r3] #ifdef CONFIG_CPU_ENDIAN_BE8 - @ in BE8, we load data in BE, but instructions still in LE - bic ip, ip, #0xff000000 - tst ip, #0x000f0000 @ check the rotation field - orrne ip, ip, r6, lsl #24 @ mask in offset bits 31-24 - biceq ip, ip, #0x00004000 @ clear bit 22 - orreq ip, ip, r0, ror #8 @ mask in offset bits 7-0 +@ in BE8, we load data in BE, but instructions still in LE +#define PV_BIT22 0x00004000 +#define PV_IMM8_MASK 0xff000000 +#define PV_ROT_MASK 0x000f0000 #else - bic ip, ip, #0x000000ff - tst ip, #0xf00 @ check the rotation field - orrne ip, ip, r6 @ mask in offset bits 31-24 - biceq ip, ip, #0x400000 @ clear bit 22 - orreq ip, ip, r0 @ mask in offset bits 7-0 +#define PV_BIT22 0x00400000 +#define PV_IMM8_MASK 0x000000ff +#define PV_ROT_MASK 0xf00 #endif + + moveq r0, #0x400000 @ set bit 22, mov to mvn instruction + b .Lnext +.Lloop: ldr ip, [r7, r3] + bic ip, ip, #PV_IMM8_MASK + tst ip, #PV_ROT_MASK @ check the rotation field + orrne ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 + biceq ip, ip, #PV_BIT22 @ clear bit 22 + orreq ip, ip, r0 ARM_BE8(, ror #8) @ mask in offset bits 7-0 (or bit 22) str ip, [r7, r3] #endif -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 0869f3b9da38889faef2ccafcf675c713d4a3aa8 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- We always pass the same value for 'type' so pull it into the __pv_stub macro itself. Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 0869f3b9da38889faef2ccafcf675c713d4a3aa8) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/memory.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 38a163f50130..4201c10325d0 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -193,14 +193,14 @@ extern const void *__pv_table_begin, *__pv_table_end; #define PHYS_OFFSET ((phys_addr_t)__pv_phys_pfn_offset << PAGE_SHIFT) #define PHYS_PFN_OFFSET (__pv_phys_pfn_offset) -#define __pv_stub(from,to,instr,type) \ +#define __pv_stub(from,to,instr) \ __asm__("@ __pv_stub\n" \ "1: " instr " %0, %1, %2\n" \ " .pushsection .pv_table,\"a\"\n" \ " .long 1b\n" \ " .popsection\n" \ : "=r" (to) \ - : "r" (from), "I" (type)) + : "r" (from), "I" (__PV_BITS_31_24)) #define __pv_stub_mov_hi(t) \ __asm__ volatile("@ __pv_stub_mov\n" \ @@ -227,7 +227,7 @@ static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x) phys_addr_t t; if (sizeof(phys_addr_t) == 4) { - __pv_stub(x, t, "add", __PV_BITS_31_24); + __pv_stub(x, t, "add"); } else { __pv_stub_mov_hi(t); __pv_add_carry_stub(x, t); @@ -245,7 +245,7 @@ static inline unsigned long __phys_to_virt(phys_addr_t x) * assembler expression receives 32 bit argument * in place where 'r' 32 bit operand is expected. */ - __pv_stub((unsigned long) x, t, "sub", __PV_BITS_31_24); + __pv_stub((unsigned long) x, t, "sub"); return t; } -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 2730e8eaa4f2baccc03296e0c5ee109c0673fe5f category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Free up a register in the p2v patching code by switching to relative references, which don't require keeping the phys-to-virt displacement live in a register. Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 2730e8eaa4f2baccc03296e0c5ee109c0673fe5f) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/memory.h | 6 +++--- arch/arm/kernel/phys2virt.S | 18 +++++++----------- 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 4201c10325d0..96814304a797 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -197,7 +197,7 @@ extern const void *__pv_table_begin, *__pv_table_end; __asm__("@ __pv_stub\n" \ "1: " instr " %0, %1, %2\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b\n" \ + " .long 1b - .\n" \ " .popsection\n" \ : "=r" (to) \ : "r" (from), "I" (__PV_BITS_31_24)) @@ -206,7 +206,7 @@ extern const void *__pv_table_begin, *__pv_table_end; __asm__ volatile("@ __pv_stub_mov\n" \ "1: mov %R0, %1\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b\n" \ + " .long 1b - .\n" \ " .popsection\n" \ : "=r" (t) \ : "I" (__PV_BITS_7_0)) @@ -216,7 +216,7 @@ extern const void *__pv_table_begin, *__pv_table_end; "1: adds %Q0, %1, %2\n" \ " adc %R0, %R0, #0\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b\n" \ + " .long 1b - .\n" \ " .popsection\n" \ : "+r" (y) \ : "r" (x), "I" (__PV_BITS_31_24) \ diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index 5031e5a2e78b..8e4be15e1559 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -58,9 +58,7 @@ ENDPROC(__fixup_pv_table) .text __fixup_a_pv_table: - adr r0, 3f - ldr r6, [r0] - add r6, r6, r3 + adr_l r6, __pv_offset ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word mov r6, r6, lsr #24 @@ -78,7 +76,8 @@ __fixup_a_pv_table: orr r6, r6, r7, lsl #12 orr r6, #0x4000 b .Lnext -.Lloop: add r7, r3 +.Lloop: add r7, r4 + adds r4, #4 ldrh ip, [r7, #2] ARM_BE8(rev16 ip, ip) tst ip, #0x4000 @@ -108,28 +107,25 @@ ARM_BE8(rev16 ip, ip) moveq r0, #0x400000 @ set bit 22, mov to mvn instruction b .Lnext -.Lloop: ldr ip, [r7, r3] +.Lloop: ldr ip, [r7, r4] bic ip, ip, #PV_IMM8_MASK tst ip, #PV_ROT_MASK @ check the rotation field orrne ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 biceq ip, ip, #PV_BIT22 @ clear bit 22 orreq ip, ip, r0 ARM_BE8(, ror #8) @ mask in offset bits 7-0 (or bit 22) - str ip, [r7, r3] + str ip, [r7, r4] + add r4, r4, #4 #endif .Lnext: cmp r4, r5 - ldrcc r7, [r4], #4 @ use branch for delay slot + ldrcc r7, [r4] @ use branch for delay slot bcc .Lloop ret lr ENDPROC(__fixup_a_pv_table) - .align -3: .long __pv_offset - ENTRY(fixup_pv_table) stmfd sp!, {r4 - r7, lr} - mov r3, #0 @ no offset mov r4, r0 @ r0 = table start add r5, r0, r1 @ r1 = table size bl __fixup_a_pv_table -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 0e3db6c9d7f6fd0ee263325027e8d3fdac5a4c9e category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Declutter the code in __fixup_pv_table() by using the new adr_l/str_l macros to take PC relative references to external symbols, and by using the value of PHYS_OFFSET passed in r8 to calculate the p2v offset. Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 0e3db6c9d7f6fd0ee263325027e8d3fdac5a4c9e) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/phys2virt.S | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index 8e4be15e1559..be8fb0d89877 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -29,33 +29,27 @@ */ __HEAD ENTRY(__fixup_pv_table) - adr r0, 1f - ldmia r0, {r3-r7} - mvn ip, #0 - subs r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET - add r4, r4, r3 @ adjust table start address - add r5, r5, r3 @ adjust table end address - add r6, r6, r3 @ adjust __pv_phys_pfn_offset address - add r7, r7, r3 @ adjust __pv_offset address mov r0, r8, lsr #PAGE_SHIFT @ convert to PFN - str r0, [r6] @ save computed PHYS_OFFSET to __pv_phys_pfn_offset - strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits - mov r6, r3, lsr #24 @ constant for add/sub instructions - teq r3, r6, lsl #24 @ must be 16MiB aligned + str_l r0, __pv_phys_pfn_offset, r3 + + adr_l r0, __pv_offset + subs r3, r8, #PAGE_OFFSET @ PHYS_OFFSET - PAGE_OFFSET + mvn ip, #0 + strcc ip, [r0, #HIGH_OFFSET] @ save to __pv_offset high bits + str r3, [r0, #LOW_OFFSET] @ save to __pv_offset low bits + + mov r0, r3, lsr #24 @ constant for add/sub instructions + teq r3, r0, lsl #24 @ must be 16MiB aligned bne 0f - str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits + + adr_l r4, __pv_table_begin + adr_l r5, __pv_table_end b __fixup_a_pv_table + 0: mov r0, r0 @ deadloop on error b 0b ENDPROC(__fixup_pv_table) - .align -1: .long . - .long __pv_table_begin - .long __pv_table_end -2: .long __pv_phys_pfn_offset - .long __pv_offset - .text __fixup_a_pv_table: adr_l r6, __pv_offset -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit e8e00f5afb087912fb3edb225ee373aa6499bb79 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- In preparation for reducing the phys-to-virt minimum relative alignment from 16 MiB to 2 MiB, switch to patchable sequences involving MOVW instructions that can more easily be manipulated to carry a 12-bit immediate. Note that the non-LPAE ARM sequence is not updated: MOVW may not be supported on non-LPAE platforms, and the sequence itself can be updated more easily to apply the 12 bits of displacement. For Thumb2, which has many more versions of opcodes, switch to a sequence that can be patched by the same patching code for both versions. Note that the Thumb2 opcodes for MOVW and MVN are unambiguous, and have no rotation bits in their immediate fields, so there is no need to use placeholder constants in the asm blocks. While at it, drop the 'volatile' qualifiers from the asm blocks: the code does not have any side effects that are invisible to the compiler, so it is free to omit these sequences if the outputs are not used. Suggested-by: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit e8e00f5afb087912fb3edb225ee373aa6499bb79) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/memory.h | 44 +++++++--- arch/arm/kernel/phys2virt.S | 147 +++++++++++++++++++++++++++------- 2 files changed, 148 insertions(+), 43 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 96814304a797..47fb189d6234 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -193,6 +193,7 @@ extern const void *__pv_table_begin, *__pv_table_end; #define PHYS_OFFSET ((phys_addr_t)__pv_phys_pfn_offset << PAGE_SHIFT) #define PHYS_PFN_OFFSET (__pv_phys_pfn_offset) +#ifndef CONFIG_THUMB2_KERNEL #define __pv_stub(from,to,instr) \ __asm__("@ __pv_stub\n" \ "1: " instr " %0, %1, %2\n" \ @@ -202,25 +203,45 @@ extern const void *__pv_table_begin, *__pv_table_end; : "=r" (to) \ : "r" (from), "I" (__PV_BITS_31_24)) -#define __pv_stub_mov_hi(t) \ - __asm__ volatile("@ __pv_stub_mov\n" \ - "1: mov %R0, %1\n" \ +#define __pv_add_carry_stub(x, y) \ + __asm__("@ __pv_add_carry_stub\n" \ + "0: movw %R0, #0\n" \ + " adds %Q0, %1, %R0, lsl #24\n" \ + "1: mov %R0, %2\n" \ + " adc %R0, %R0, #0\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b - .\n" \ + " .long 0b - ., 1b - .\n" \ " .popsection\n" \ - : "=r" (t) \ - : "I" (__PV_BITS_7_0)) + : "=&r" (y) \ + : "r" (x), "I" (__PV_BITS_7_0) \ + : "cc") + +#else +#define __pv_stub(from,to,instr) \ + __asm__("@ __pv_stub\n" \ + "0: movw %0, #0\n" \ + " lsl %0, #24\n" \ + " " instr " %0, %1, %0\n" \ + " .pushsection .pv_table,\"a\"\n" \ + " .long 0b - .\n" \ + " .popsection\n" \ + : "=&r" (to) \ + : "r" (from)) #define __pv_add_carry_stub(x, y) \ - __asm__ volatile("@ __pv_add_carry_stub\n" \ - "1: adds %Q0, %1, %2\n" \ + __asm__("@ __pv_add_carry_stub\n" \ + "0: movw %R0, #0\n" \ + " lsls %R0, #24\n" \ + " adds %Q0, %1, %R0\n" \ + "1: mvn %R0, #0\n" \ " adc %R0, %R0, #0\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b - .\n" \ + " .long 0b - ., 1b - .\n" \ " .popsection\n" \ - : "+r" (y) \ - : "r" (x), "I" (__PV_BITS_31_24) \ + : "=&r" (y) \ + : "r" (x) \ : "cc") +#endif static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x) { @@ -229,7 +250,6 @@ static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x) if (sizeof(phys_addr_t) == 4) { __pv_stub(x, t, "add"); } else { - __pv_stub_mov_hi(t); __pv_add_carry_stub(x, t); } return t; diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index be8fb0d89877..a4e364689663 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 1994-2002 Russell King - * Copyright (c) 2003 ARM Limited + * Copyright (c) 2003, 2020 ARM Limited * All Rights Reserved */ @@ -58,55 +58,140 @@ __fixup_a_pv_table: mov r6, r6, lsr #24 cmn r0, #1 #ifdef CONFIG_THUMB2_KERNEL + @ + @ The Thumb-2 versions of the patchable sequences are + @ + @ phys-to-virt: movw <reg>, #offset<31:24> + @ lsl <reg>, #24 + @ sub <VA>, <PA>, <reg> + @ + @ virt-to-phys (non-LPAE): movw <reg>, #offset<31:24> + @ lsl <reg>, #24 + @ add <PA>, <VA>, <reg> + @ + @ virt-to-phys (LPAE): movw <reg>, #offset<31:24> + @ lsl <reg>, #24 + @ adds <PAlo>, <VA>, <reg> + @ mov <PAhi>, #offset<39:32> + @ adc <PAhi>, <PAhi>, #0 + @ + @ In the non-LPAE case, all patchable instructions are MOVW + @ instructions, where we need to patch in the offset into the + @ second halfword of the opcode (the 16-bit immediate is encoded + @ as imm4:i:imm3:imm8) + @ + @ 15 11 10 9 4 3 0 15 14 12 11 8 7 0 + @ +-----------+---+-------------+------++---+------+----+------+ + @ MOVW | 1 1 1 1 0 | i | 1 0 0 1 0 0 | imm4 || 0 | imm3 | Rd | imm8 | + @ +-----------+---+-------------+------++---+------+----+------+ + @ + @ In the LPAE case, we also need to patch in the high word of the + @ offset into the immediate field of the MOV instruction, or patch it + @ to a MVN instruction if the offset is negative. In this case, we + @ need to inspect the first halfword of the opcode, to check whether + @ it is MOVW or MOV/MVN, and to perform the MOV to MVN patching if + @ needed. The encoding of the immediate is rather complex for values + @ of i:imm3 != 0b0000, but fortunately, we never need more than 8 lower + @ order bits, which can be patched into imm8 directly (and i:imm3 + @ cleared) + @ + @ 15 11 10 9 5 0 15 14 12 11 8 7 0 + @ +-----------+---+---------------------++---+------+----+------+ + @ MOV | 1 1 1 1 0 | i | 0 0 0 1 0 0 1 1 1 1 || 0 | imm3 | Rd | imm8 | + @ MVN | 1 1 1 1 0 | i | 0 0 0 1 1 0 1 1 1 1 || 0 | imm3 | Rd | imm8 | + @ +-----------+---+---------------------++---+------+----+------+ + @ moveq r0, #0x200000 @ set bit 21, mov to mvn instruction - lsls r6, #24 - beq .Lnext - clz r7, r6 - lsr r6, #24 - lsl r6, r7 - bic r6, #0x0080 - lsrs r7, #1 - orrcs r6, #0x0080 - orr r6, r6, r7, lsl #12 - orr r6, #0x4000 b .Lnext .Lloop: add r7, r4 - adds r4, #4 - ldrh ip, [r7, #2] -ARM_BE8(rev16 ip, ip) - tst ip, #0x4000 - and ip, #0x8f00 - orrne ip, r6 @ mask in offset bits 31-24 - orreq ip, r0 @ mask in offset bits 7-0 -ARM_BE8(rev16 ip, ip) - strh ip, [r7, #2] - bne .Lnext + adds r4, #4 @ clears Z flag +#ifdef CONFIG_ARM_LPAE ldrh ip, [r7] ARM_BE8(rev16 ip, ip) - bic ip, #0x20 - orr ip, ip, r0, lsr #16 + tst ip, #0x200 @ MOVW has bit 9 set, MVN has it clear + bne 0f @ skip to MOVW handling (Z flag is clear) + bic ip, #0x20 @ clear bit 5 (MVN -> MOV) + orr ip, ip, r0, lsr #16 @ MOV -> MVN if offset < 0 ARM_BE8(rev16 ip, ip) strh ip, [r7] + @ Z flag is set +0: +#endif + ldrh ip, [r7, #2] +ARM_BE8(rev16 ip, ip) + and ip, #0xf00 @ clear everything except Rd field + orreq ip, r0 @ Z flag set -> MOV/MVN -> patch in high bits + orrne ip, r6 @ Z flag clear -> MOVW -> patch in low bits +ARM_BE8(rev16 ip, ip) + strh ip, [r7, #2] #else #ifdef CONFIG_CPU_ENDIAN_BE8 @ in BE8, we load data in BE, but instructions still in LE -#define PV_BIT22 0x00004000 +#define PV_BIT24 0x00000001 #define PV_IMM8_MASK 0xff000000 -#define PV_ROT_MASK 0x000f0000 #else -#define PV_BIT22 0x00400000 +#define PV_BIT24 0x01000000 #define PV_IMM8_MASK 0x000000ff -#define PV_ROT_MASK 0xf00 #endif + @ + @ The ARM versions of the patchable sequences are + @ + @ phys-to-virt: sub <VA>, <PA>, #offset<31:24>, lsl #24 + @ + @ virt-to-phys (non-LPAE): add <PA>, <VA>, #offset<31:24>, lsl #24 + @ + @ virt-to-phys (LPAE): movw <reg>, #offset<31:24> + @ adds <PAlo>, <VA>, <reg>, lsl #24 + @ mov <PAhi>, #offset<39:32> + @ adc <PAhi>, <PAhi>, #0 + @ + @ In the non-LPAE case, all patchable instructions are ADD or SUB + @ instructions, where we need to patch in the offset into the + @ immediate field of the opcode, which is emitted with the correct + @ rotation value. (The effective value of the immediate is imm12<7:0> + @ rotated right by [2 * imm12<11:8>] bits) + @ + @ 31 28 27 23 22 20 19 16 15 12 11 0 + @ +------+-----------------+------+------+-------+ + @ ADD | cond | 0 0 1 0 1 0 0 0 | Rn | Rd | imm12 | + @ SUB | cond | 0 0 1 0 0 1 0 0 | Rn | Rd | imm12 | + @ MOV | cond | 0 0 1 1 1 0 1 0 | Rn | Rd | imm12 | + @ MVN | cond | 0 0 1 1 1 1 1 0 | Rn | Rd | imm12 | + @ +------+-----------------+------+------+-------+ + @ + @ In the LPAE case, we use a MOVW instruction to carry the low offset + @ word, and patch in the high word of the offset into the immediate + @ field of the subsequent MOV instruction, or patch it to a MVN + @ instruction if the offset is negative. We can distinguish MOVW + @ instructions based on bits 23:22 of the opcode, and ADD/SUB can be + @ distinguished from MOV/MVN (all using the encodings above) using + @ bit 24. + @ + @ 31 28 27 23 22 20 19 16 15 12 11 0 + @ +------+-----------------+------+------+-------+ + @ MOVW | cond | 0 0 1 1 0 0 0 0 | imm4 | Rd | imm12 | + @ +------+-----------------+------+------+-------+ + @ moveq r0, #0x400000 @ set bit 22, mov to mvn instruction b .Lnext .Lloop: ldr ip, [r7, r4] +#ifdef CONFIG_ARM_LPAE + tst ip, #PV_BIT24 @ ADD/SUB have bit 24 clear + beq 1f +ARM_BE8(rev ip, ip) + tst ip, #0xc00000 @ MOVW has bits 23:22 clear + bic ip, ip, #0x400000 @ clear bit 22 + bfc ip, #0, #12 @ clear imm12 field of MOV[W] instruction + orreq ip, ip, r6 @ MOVW -> mask in offset bits 31-24 + orrne ip, ip, r0 @ MOV -> mask in offset bits 7-0 (or bit 22) +ARM_BE8(rev ip, ip) + b 2f +1: +#endif bic ip, ip, #PV_IMM8_MASK - tst ip, #PV_ROT_MASK @ check the rotation field - orrne ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 - biceq ip, ip, #PV_BIT22 @ clear bit 22 - orreq ip, ip, r0 ARM_BE8(, ror #8) @ mask in offset bits 7-0 (or bit 22) + orr ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 +2: str ip, [r7, r4] add r4, r4, #4 #endif -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 9443076e4330a14ae2c6114307668b98a8293b77 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- The ARM kernel's linear map starts at PAGE_OFFSET, which maps to a physical address (PHYS_OFFSET) that is platform specific, and is discovered at boot. Since we don't want to slow down translations between physical and virtual addresses by keeping the offset in a variable in memory, we implement this by patching the code performing the translation, and putting the offset between PAGE_OFFSET and the start of physical RAM directly into the instruction opcodes. As we only patch up to 8 bits of offset, yielding 4 GiB >> 8 == 16 MiB of granularity, we have to round up PHYS_OFFSET to the next multiple if the start of physical RAM is not a multiple of 16 MiB. This wastes some physical RAM, since the memory that was skipped will now live below PAGE_OFFSET, making it inaccessible to the kernel. We can improve this by changing the patchable sequences and the patching logic to carry more bits of offset: 11 bits gives us 4 GiB >> 11 == 2 MiB of granularity, and so we will never waste more than that amount by rounding up the physical start of DRAM to the next multiple of 2 MiB. (Note that 2 MiB granularity guarantees that the linear mapping can be created efficiently, whereas less than 2 MiB may result in the linear mapping needing another level of page tables) This helps Zhen Lei's scenario, where the start of DRAM is known to be occupied. It also helps EFI boot, which relies on the firmware's page allocator to allocate space for the decompressed kernel as low as possible. And if the KASLR patches ever land for 32-bit, it will give us 3 more bits of randomization of the placement of the kernel inside the linear region. For the ARM code path, it simply comes down to using two add/sub instructions instead of one for the carryless version, and patching each of them with the correct immediate depending on the rotation field. For the LPAE calculation, which has to deal with a carry, it patches the MOVW instruction with up to 12 bits of offset (but we only need 11 bits anyway) For the Thumb2 code path, patching more than 11 bits of displacement would be somewhat cumbersome, but the 11 bits we need fit nicely into the second word of the u16[2] opcode, so we simply update the immediate assignment and the left shift to create an addend of the right magnitude. Suggested-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 9443076e4330a14ae2c6114307668b98a8293b77) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/Kconfig | 2 +- arch/arm/include/asm/memory.h | 13 +++++++----- arch/arm/kernel/phys2virt.S | 40 +++++++++++++++++++++++------------ 3 files changed, 35 insertions(+), 20 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5516dd83409e..97f9d004e6d0 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -245,7 +245,7 @@ config ARM_PATCH_PHYS_VIRT kernel in system memory. This can only be used with non-XIP MMU kernels where the base - of physical memory is at a 16MB boundary. + of physical memory is at a 2 MiB boundary. Only disable this option if you know that you do not require this feature (eg, building a kernel for a single machine) and diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 47fb189d6234..2f841cb65c30 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -183,6 +183,7 @@ extern unsigned long vectors_base; * so that all we need to do is modify the 8-bit constant field. */ #define __PV_BITS_31_24 0x81000000 +#define __PV_BITS_23_16 0x810000 #define __PV_BITS_7_0 0x81 extern unsigned long __pv_phys_pfn_offset; @@ -197,16 +198,18 @@ extern const void *__pv_table_begin, *__pv_table_end; #define __pv_stub(from,to,instr) \ __asm__("@ __pv_stub\n" \ "1: " instr " %0, %1, %2\n" \ + "2: " instr " %0, %0, %3\n" \ " .pushsection .pv_table,\"a\"\n" \ - " .long 1b - .\n" \ + " .long 1b - ., 2b - .\n" \ " .popsection\n" \ : "=r" (to) \ - : "r" (from), "I" (__PV_BITS_31_24)) + : "r" (from), "I" (__PV_BITS_31_24), \ + "I"(__PV_BITS_23_16)) #define __pv_add_carry_stub(x, y) \ __asm__("@ __pv_add_carry_stub\n" \ "0: movw %R0, #0\n" \ - " adds %Q0, %1, %R0, lsl #24\n" \ + " adds %Q0, %1, %R0, lsl #20\n" \ "1: mov %R0, %2\n" \ " adc %R0, %R0, #0\n" \ " .pushsection .pv_table,\"a\"\n" \ @@ -220,7 +223,7 @@ extern const void *__pv_table_begin, *__pv_table_end; #define __pv_stub(from,to,instr) \ __asm__("@ __pv_stub\n" \ "0: movw %0, #0\n" \ - " lsl %0, #24\n" \ + " lsl %0, #21\n" \ " " instr " %0, %1, %0\n" \ " .pushsection .pv_table,\"a\"\n" \ " .long 0b - .\n" \ @@ -231,7 +234,7 @@ extern const void *__pv_table_begin, *__pv_table_end; #define __pv_add_carry_stub(x, y) \ __asm__("@ __pv_add_carry_stub\n" \ "0: movw %R0, #0\n" \ - " lsls %R0, #24\n" \ + " lsls %R0, #21\n" \ " adds %Q0, %1, %R0\n" \ "1: mvn %R0, #0\n" \ " adc %R0, %R0, #0\n" \ diff --git a/arch/arm/kernel/phys2virt.S b/arch/arm/kernel/phys2virt.S index a4e364689663..fb53db78fe78 100644 --- a/arch/arm/kernel/phys2virt.S +++ b/arch/arm/kernel/phys2virt.S @@ -21,7 +21,7 @@ /* * __fixup_pv_table - patch the stub instructions with the delta between * PHYS_OFFSET and PAGE_OFFSET, which is assumed to be - * 16MiB aligned. + * 2 MiB aligned. * * Called from head.S, which expects the following registers to be preserved: * r1 = machine no, r2 = atags or dtb, @@ -38,8 +38,8 @@ ENTRY(__fixup_pv_table) strcc ip, [r0, #HIGH_OFFSET] @ save to __pv_offset high bits str r3, [r0, #LOW_OFFSET] @ save to __pv_offset low bits - mov r0, r3, lsr #24 @ constant for add/sub instructions - teq r3, r0, lsl #24 @ must be 16MiB aligned + mov r0, r3, lsr #21 @ constant for add/sub instructions + teq r3, r0, lsl #21 @ must be 2 MiB aligned bne 0f adr_l r4, __pv_table_begin @@ -55,22 +55,21 @@ __fixup_a_pv_table: adr_l r6, __pv_offset ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word - mov r6, r6, lsr #24 cmn r0, #1 #ifdef CONFIG_THUMB2_KERNEL @ @ The Thumb-2 versions of the patchable sequences are @ - @ phys-to-virt: movw <reg>, #offset<31:24> - @ lsl <reg>, #24 + @ phys-to-virt: movw <reg>, #offset<31:21> + @ lsl <reg>, #21 @ sub <VA>, <PA>, <reg> @ - @ virt-to-phys (non-LPAE): movw <reg>, #offset<31:24> - @ lsl <reg>, #24 + @ virt-to-phys (non-LPAE): movw <reg>, #offset<31:21> + @ lsl <reg>, #21 @ add <PA>, <VA>, <reg> @ - @ virt-to-phys (LPAE): movw <reg>, #offset<31:24> - @ lsl <reg>, #24 + @ virt-to-phys (LPAE): movw <reg>, #offset<31:21> + @ lsl <reg>, #21 @ adds <PAlo>, <VA>, <reg> @ mov <PAhi>, #offset<39:32> @ adc <PAhi>, <PAhi>, #0 @@ -102,6 +101,9 @@ __fixup_a_pv_table: @ +-----------+---+---------------------++---+------+----+------+ @ moveq r0, #0x200000 @ set bit 21, mov to mvn instruction + lsrs r3, r6, #29 @ isolate top 3 bits of displacement + ubfx r6, r6, #21, #8 @ put bits 28:21 into the MOVW imm8 field + bfi r6, r3, #12, #3 @ put bits 31:29 into the MOVW imm3 field b .Lnext .Lloop: add r7, r4 adds r4, #4 @ clears Z flag @@ -129,20 +131,24 @@ ARM_BE8(rev16 ip, ip) @ in BE8, we load data in BE, but instructions still in LE #define PV_BIT24 0x00000001 #define PV_IMM8_MASK 0xff000000 +#define PV_IMMR_MSB 0x00080000 #else #define PV_BIT24 0x01000000 #define PV_IMM8_MASK 0x000000ff +#define PV_IMMR_MSB 0x00000800 #endif @ @ The ARM versions of the patchable sequences are @ @ phys-to-virt: sub <VA>, <PA>, #offset<31:24>, lsl #24 + @ sub <VA>, <PA>, #offset<23:16>, lsl #16 @ @ virt-to-phys (non-LPAE): add <PA>, <VA>, #offset<31:24>, lsl #24 + @ add <PA>, <VA>, #offset<23:16>, lsl #16 @ - @ virt-to-phys (LPAE): movw <reg>, #offset<31:24> - @ adds <PAlo>, <VA>, <reg>, lsl #24 + @ virt-to-phys (LPAE): movw <reg>, #offset<31:20> + @ adds <PAlo>, <VA>, <reg>, lsl #20 @ mov <PAhi>, #offset<39:32> @ adc <PAhi>, <PAhi>, #0 @ @@ -174,6 +180,9 @@ ARM_BE8(rev16 ip, ip) @ +------+-----------------+------+------+-------+ @ moveq r0, #0x400000 @ set bit 22, mov to mvn instruction + mov r3, r6, lsr #16 @ put offset bits 31-16 into r3 + mov r6, r6, lsr #24 @ put offset bits 31-24 into r6 + and r3, r3, #0xf0 @ only keep offset bits 23-20 in r3 b .Lnext .Lloop: ldr ip, [r7, r4] #ifdef CONFIG_ARM_LPAE @@ -183,14 +192,17 @@ ARM_BE8(rev ip, ip) tst ip, #0xc00000 @ MOVW has bits 23:22 clear bic ip, ip, #0x400000 @ clear bit 22 bfc ip, #0, #12 @ clear imm12 field of MOV[W] instruction - orreq ip, ip, r6 @ MOVW -> mask in offset bits 31-24 + orreq ip, ip, r6, lsl #4 @ MOVW -> mask in offset bits 31-24 + orreq ip, ip, r3, lsr #4 @ MOVW -> mask in offset bits 23-20 orrne ip, ip, r0 @ MOV -> mask in offset bits 7-0 (or bit 22) ARM_BE8(rev ip, ip) b 2f 1: #endif + tst ip, #PV_IMMR_MSB @ rotation value >= 16 ? bic ip, ip, #PV_IMM8_MASK - orr ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 + orreq ip, ip, r6 ARM_BE8(, lsl #24) @ mask in offset bits 31-24 + orrne ip, ip, r3 ARM_BE8(, lsl #24) @ mask in offset bits 23-20 2: str ip, [r7, r4] add r4, r4, #4 -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 62c4a2e202b18e1d7176875b7e7af240f340596b category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded PC relative offset calculations with a pair of adr_l invocations. This removes some open coded arithmetic involving virtual addresses, avoids literal pools on v7+, and slightly reduces the footprint of the code. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 62c4a2e202b18e1d7176875b7e7af240f340596b) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head-common.S | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 89c80154b9ef..29b2eda136bb 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -173,11 +173,12 @@ ENDPROC(lookup_processor_type) * r9 = cpuid (preserved) */ __lookup_processor_type: - adr r3, __lookup_processor_type_data - ldmia r3, {r4 - r6} - sub r3, r3, r4 @ get offset between virt&phys - add r5, r5, r3 @ convert virt addresses to - add r6, r6, r3 @ physical address space + /* + * Look in <asm/procinfo.h> for information about the __proc_info + * structure. + */ + adr_l r5, __proc_info_begin + adr_l r6, __proc_info_end 1: ldmia r5, {r3, r4} @ value, mask and r4, r4, r9 @ mask wanted bits teq r3, r4 @@ -189,17 +190,6 @@ __lookup_processor_type: 2: ret lr ENDPROC(__lookup_processor_type) -/* - * Look in <asm/procinfo.h> for information about the __proc_info structure. - */ - .align 2 - .type __lookup_processor_type_data, %object -__lookup_processor_type_data: - .long . - .long __proc_info_begin - .long __proc_info_end - .size __lookup_processor_type_data, . - __lookup_processor_type_data - __error_lpae: #ifdef CONFIG_DEBUG_LL adr r0, str_lpae -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 172c34c9ff0144c3e1d96a9b54d6fecfe5d17c3c category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded PC relative offset calculations involving __turn_mmu_on and __turn_mmu_on_end with a pair of adr_l invocations. This removes some open coded arithmetic involving virtual addresses, avoids literal pools on v7+, and slightly reduces the footprint of the code. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 172c34c9ff0144c3e1d96a9b54d6fecfe5d17c3c) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index dbc409a66555..205881bb0919 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -224,11 +224,8 @@ __create_page_tables: * Create identity mapping to cater for __enable_mmu. * This identity mapping will be removed by paging_init(). */ - adr r0, __turn_mmu_on_loc - ldmia r0, {r3, r5, r6} - sub r0, r0, r3 @ virt->phys offset - add r5, r5, r0 @ phys __turn_mmu_on - add r6, r6, r0 @ phys __turn_mmu_on_end + adr_l r5, __turn_mmu_on @ _pa(__turn_mmu_on) + adr_l r6, __turn_mmu_on_end @ _pa(__turn_mmu_on_end) mov r5, r5, lsr #SECTION_SHIFT mov r6, r6, lsr #SECTION_SHIFT @@ -350,11 +347,6 @@ __create_page_tables: ret lr ENDPROC(__create_page_tables) .ltorg - .align -__turn_mmu_on_loc: - .long . - .long __turn_mmu_on - .long __turn_mmu_on_end #if defined(CONFIG_SMP) .text -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 91580f0dbf24c6d616091526a900213bc7aa48fe category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded PC relative offset calculations with adr_l and ldr_l invocations. This removes some open coded arithmetic involving virtual addresses, avoids literal pools on v7+, and slightly reduces the footprint of the code. Note that it also removes a stale comment about the contents of r6. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 91580f0dbf24c6d616091526a900213bc7aa48fe) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 205881bb0919..fcd998ba9c9a 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -382,10 +382,8 @@ ENTRY(secondary_startup) /* * Use the page tables supplied from __cpu_up. */ - adr r4, __secondary_data - ldmia r4, {r5, r7, r12} @ address to jump to after - sub lr, r4, r5 @ mmu has been enabled - add r3, r7, lr + adr_l r3, secondary_data + mov_l r12, __secondary_switched ldrd r4, r5, [r3, #0] @ get secondary_data.pgdir ARM_BE8(eor r4, r4, r5) @ Swap r5 and r4 in BE: ARM_BE8(eor r5, r4, r5) @ it can be done in 3 steps @@ -400,22 +398,13 @@ ARM_BE8(eor r4, r4, r5) @ without using a temp reg. ENDPROC(secondary_startup) ENDPROC(secondary_startup_arm) - /* - * r6 = &secondary_data - */ ENTRY(__secondary_switched) - ldr sp, [r7, #12] @ get secondary_data.stack + ldr_l r7, secondary_data + 12 @ get secondary_data.stack + mov sp, r7 mov fp, #0 b secondary_start_kernel ENDPROC(__secondary_switched) - .align - - .type __secondary_data, %object -__secondary_data: - .long . - .long secondary_data - .long __secondary_switched #endif /* defined(CONFIG_SMP) */ -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 450abd38fe6c6313ce9bdd9dce81c1dd604f6fb0 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Currently, the .alt.smp.init section contains the virtual addresses of the patch sites. Since patching may occur both before and after switching into virtual mode, this requires some manual handling of the address when applying the UP alternative. Let's simplify this by using relative offsets in the table entries: this allows us to simply add each entry's address to its contents, regardless of whether we are running in virtual mode or not. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 450abd38fe6c6313ce9bdd9dce81c1dd604f6fb0) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/assembler.h | 4 ++-- arch/arm/include/asm/processor.h | 2 +- arch/arm/kernel/head.S | 10 +++++----- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 72627c5fb3b2..6ed30421f697 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -259,7 +259,7 @@ */ #define ALT_UP(instr...) \ .pushsection ".alt.smp.init", "a" ;\ - .long 9998b ;\ + .long 9998b - . ;\ 9997: instr ;\ .if . - 9997b == 2 ;\ nop ;\ @@ -270,7 +270,7 @@ .popsection #define ALT_UP_B(label) \ .pushsection ".alt.smp.init", "a" ;\ - .long 9998b ;\ + .long 9998b - . ;\ W(b) . + (label - 9998b) ;\ .popsection #else diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index b9241051e5cb..9e6b97286307 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -96,7 +96,7 @@ unsigned long get_wchan(struct task_struct *p); #define __ALT_SMP_ASM(smp, up) \ "9998: " smp "\n" \ " .pushsection \".alt.smp.init\", \"a\"\n" \ - " .long 9998b\n" \ + " .long 9998b - .\n" \ " " up "\n" \ " .popsection\n" #else diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index fcd998ba9c9a..e87e0878b8f2 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -545,14 +545,15 @@ smp_on_up: __do_fixup_smp_on_up: cmp r4, r5 reths lr - ldmia r4!, {r0, r6} - ARM( str r6, [r0, r3] ) - THUMB( add r0, r0, r3 ) + ldmia r4, {r0, r6} + ARM( str r6, [r0, r4] ) + THUMB( add r0, r0, r4 ) + add r4, r4, #8 #ifdef __ARMEB__ THUMB( mov r6, r6, ror #16 ) @ Convert word order for big-endian. #endif THUMB( strh r6, [r0], #2 ) @ For Thumb-2, store as two halfwords - THUMB( mov r6, r6, lsr #16 ) @ to be robust against misaligned r3. + THUMB( mov r6, r6, lsr #16 ) @ to be robust against misaligned r0. THUMB( strh r6, [r0] ) b __do_fixup_smp_on_up ENDPROC(__do_fixup_smp_on_up) @@ -561,7 +562,6 @@ ENTRY(fixup_smp) stmfd sp!, {r4 - r6, lr} mov r4, r0 add r5, r0, r1 - mov r3, #0 bl __do_fixup_smp_on_up ldmfd sp!, {r4 - r6, pc} ENDPROC(fixup_smp) -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 59d2f2827dfdccf8911d5e51465136b52ba623c4 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Now that calling __do_fixup_smp_on_up() can be done without passing the physical-to-virtual offset in r3, we can replace the open coded PC relative offset calculations with a pair of adr_l invocations. This removes some open coded arithmetic involving virtual addresses, avoids literal pools on v7+, and slightly reduces the footprint of the code. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 59d2f2827dfdccf8911d5e51465136b52ba623c4) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index e87e0878b8f2..7adab03a31bc 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -519,19 +519,11 @@ ARM_BE8(rev r0, r0) @ byteswap if big endian retne lr __fixup_smp_on_up: - adr r0, 1f - ldmia r0, {r3 - r5} - sub r3, r0, r3 - add r4, r4, r3 - add r5, r5, r3 + adr_l r4, __smpalt_begin + adr_l r5, __smpalt_end b __do_fixup_smp_on_up ENDPROC(__fixup_smp) - .align -1: .word . - .word __smpalt_begin - .word __smpalt_end - .pushsection .data .align 2 .globl smp_on_up -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit d74d2b225018baa0e04e080ee9e80b21667ba3a2 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded PC relative offset calculations with adr_l and ldr_l invocations. This removes some open coded PC relative arithmetic, avoids literal pools on v7+, and slightly reduces the footprint of the code. Note that ALT_SMP() expects a single instruction so move the macro invocation after it. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit d74d2b225018baa0e04e080ee9e80b21667ba3a2) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/sleep.S | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/arm/kernel/sleep.S b/arch/arm/kernel/sleep.S index 5dc8b80bb693..43077e11dafd 100644 --- a/arch/arm/kernel/sleep.S +++ b/arch/arm/kernel/sleep.S @@ -72,8 +72,9 @@ ENTRY(__cpu_suspend) ldr r3, =sleep_save_sp stmfd sp!, {r0, r1} @ save suspend func arg and pointer ldr r3, [r3, #SLEEP_SAVE_SP_VIRT] - ALT_SMP(ldr r0, =mpidr_hash) + ALT_SMP(W(nop)) @ don't use adr_l inside ALT_SMP() ALT_UP_B(1f) + adr_l r0, mpidr_hash /* This ldmia relies on the memory layout of the mpidr_hash struct */ ldmia r0, {r1, r6-r8} @ r1 = mpidr mask (r6,r7,r8) = l[0,1,2] shifts compute_mpidr_hash r0, r6, r7, r8, r2, r1 @@ -147,9 +148,8 @@ no_hyp: mov r1, #0 ALT_SMP(mrc p15, 0, r0, c0, c0, 5) ALT_UP_B(1f) - adr r2, mpidr_hash_ptr - ldr r3, [r2] - add r2, r2, r3 @ r2 = struct mpidr_hash phys address + adr_l r2, mpidr_hash @ r2 = struct mpidr_hash phys address + /* * This ldmia relies on the memory layout of the mpidr_hash * struct mpidr_hash. @@ -157,10 +157,7 @@ no_hyp: ldmia r2, { r3-r6 } @ r3 = mpidr mask (r4,r5,r6) = l[0,1,2] shifts compute_mpidr_hash r1, r4, r5, r6, r0, r3 1: - adr r0, _sleep_save_sp - ldr r2, [r0] - add r0, r0, r2 - ldr r0, [r0, #SLEEP_SAVE_SP_PHYS] + ldr_l r0, sleep_save_sp + SLEEP_SAVE_SP_PHYS ldr r0, [r0, r1, lsl #2] @ load phys pgd, stack, resume fn @@ -177,12 +174,6 @@ ENDPROC(cpu_resume_arm) ENDPROC(cpu_resume_no_hyp) #endif - .align 2 -_sleep_save_sp: - .long sleep_save_sp - . -mpidr_hash_ptr: - .long mpidr_hash - . @ mpidr_hash struct offset - .data .align 2 .type sleep_save_sp, #object -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 3bcf906b194cebb6817cbb2f07b69e12aa5d7f51 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded arithmetic with a simple adr_l/sub pair. This removes some open coded arithmetic involving virtual addresses, avoids literal pools on v7+, and slightly reduces the footprint of the code. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit 3bcf906b194cebb6817cbb2f07b69e12aa5d7f51) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 7adab03a31bc..7f62c5eccdf3 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -103,10 +103,8 @@ ENTRY(stext) #endif #ifndef CONFIG_XIP_KERNEL - adr r3, 2f - ldmia r3, {r4, r8} - sub r4, r3, r4 @ (PHYS_OFFSET - PAGE_OFFSET) - add r8, r8, r4 @ PHYS_OFFSET + adr_l r8, _text @ __pa(_text) + sub r8, r8, #TEXT_OFFSET @ PHYS_OFFSET #else ldr r8, =PLAT_PHYS_OFFSET @ always constant in this case #endif @@ -158,10 +156,6 @@ ENTRY(stext) 1: b __enable_mmu ENDPROC(stext) .ltorg -#ifndef CONFIG_XIP_KERNEL -2: .long . - .long PAGE_OFFSET -#endif /* * Setup the initial page tables. We only setup the barest -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit aaac3733171fca948c4fb66b78257620e3885339 category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ------------------------------------------------- Replace the open coded calculations of the actual physical address of the KVM stub vector table with a single adr_l invocation. Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> (cherry picked from commit aaac3733171fca948c4fb66b78257620e3885339) Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/head.S | 15 ++------------- arch/arm/kernel/hyp-stub.S | 27 ++++++++++++--------------- 2 files changed, 14 insertions(+), 28 deletions(-) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 247ce9055990..538c2b08b7e5 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -468,15 +468,10 @@ dtb_check_done: /* * Compute the address of the hyp vectors after relocation. - * This requires some arithmetic since we cannot directly - * reference __hyp_stub_vectors in a PC-relative way. * Call __hyp_set_vectors with the new address so that we * can HVC again after the copy. */ -0: adr r0, 0b - movw r1, #:lower16:__hyp_stub_vectors - 0b - movt r1, #:upper16:__hyp_stub_vectors - 0b - add r0, r0, r1 + adr_l r0, __hyp_stub_vectors sub r0, r0, r5 add r0, r0, r10 bl __hyp_set_vectors @@ -627,17 +622,11 @@ not_relocated: mov r0, #0 cmp r0, #HYP_MODE @ if not booted in HYP mode... bne __enter_kernel @ boot kernel directly - adr r12, .L__hyp_reentry_vectors_offset - ldr r0, [r12] - add r0, r0, r12 - + adr_l r0, __hyp_reentry_vectors bl __hyp_set_vectors __HVC(0) @ otherwise bounce to hyp mode b . @ should never be reached - - .align 2 -.L__hyp_reentry_vectors_offset: .long __hyp_reentry_vectors - . #else b __enter_kernel #endif diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S index 26d8e03b1dd3..103d0bdb2b7e 100644 --- a/arch/arm/kernel/hyp-stub.S +++ b/arch/arm/kernel/hyp-stub.S @@ -24,41 +24,38 @@ ENTRY(__boot_cpu_mode) .text /* - * Save the primary CPU boot mode. Requires 3 scratch registers. + * Save the primary CPU boot mode. Requires 2 scratch registers. */ - .macro store_primary_cpu_mode reg1, reg2, reg3 + .macro store_primary_cpu_mode reg1, reg2 mrs \reg1, cpsr and \reg1, \reg1, #MODE_MASK - adr \reg2, .L__boot_cpu_mode_offset - ldr \reg3, [\reg2] - str \reg1, [\reg2, \reg3] + str_l \reg1, __boot_cpu_mode, \reg2 .endm /* * Compare the current mode with the one saved on the primary CPU. * If they don't match, record that fact. The Z bit indicates * if there's a match or not. - * Requires 3 additionnal scratch registers. + * Requires 2 additional scratch registers. */ - .macro compare_cpu_mode_with_primary mode, reg1, reg2, reg3 - adr \reg2, .L__boot_cpu_mode_offset - ldr \reg3, [\reg2] - ldr \reg1, [\reg2, \reg3] + .macro compare_cpu_mode_with_primary mode, reg1, reg2 + adr_l \reg2, __boot_cpu_mode + ldr \reg1, [\reg2] cmp \mode, \reg1 @ matches primary CPU boot mode? orrne \reg1, \reg1, #BOOT_CPU_MODE_MISMATCH - strne \reg1, [\reg2, \reg3] @ record what happened and give up + strne \reg1, [\reg2] @ record what happened and give up .endm #else /* ZIMAGE */ - .macro store_primary_cpu_mode reg1:req, reg2:req, reg3:req + .macro store_primary_cpu_mode reg1:req, reg2:req .endm /* * The zImage loader only runs on one CPU, so we don't bother with mult-CPU * consistency checking: */ - .macro compare_cpu_mode_with_primary mode, reg1, reg2, reg3 + .macro compare_cpu_mode_with_primary mode, reg1, reg2 cmp \mode, \mode .endm @@ -73,7 +70,7 @@ ENTRY(__boot_cpu_mode) */ @ Call this from the primary CPU ENTRY(__hyp_stub_install) - store_primary_cpu_mode r4, r5, r6 + store_primary_cpu_mode r4, r5 ENDPROC(__hyp_stub_install) @ fall through... @@ -87,7 +84,7 @@ ENTRY(__hyp_stub_install_secondary) * If the secondary has booted with a different mode, give up * immediately. */ - compare_cpu_mode_with_primary r4, r5, r6, r7 + compare_cpu_mode_with_primary r4, r5, r6 retne lr /* -- 2.22.0

From: Ard Biesheuvel <ardb@kernel.org> mainline inclusion from mainline-5.11-rc1 commit 6c7a6d22fcef9181239ea7248c6b0c4117b9325e category: feature issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Commit aaac3733171fca94 ("ARM: kvm: replace open coded VA->PA calculations with adr_l call") removed all uses of .L__boot_cpu_mode_offset, so there is no longer a need to define it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Chen Jun <chenjun102@huawei.com> Acked-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/hyp-stub.S | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S index 103d0bdb2b7e..b699b22a4db1 100644 --- a/arch/arm/kernel/hyp-stub.S +++ b/arch/arm/kernel/hyp-stub.S @@ -225,12 +225,6 @@ ENTRY(__hyp_soft_restart) ret lr ENDPROC(__hyp_soft_restart) -#ifndef ZIMAGE -.align 2 -.L__boot_cpu_mode_offset: - .long __boot_cpu_mode - . -#endif - .align 5 ENTRY(__hyp_stub_vectors) __hyp_stub_reset: W(b) . -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 857ddf520d76d6516d5cdca396461141b7ca921b category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=857ddf520d76d6516d5cdca396461141b7ca921b ------------------------------------------------- When running in PIC mode, the compiler will emit const structures containing runtime relocatable quantities into .data.rel.ro.* sections, so that the linker can be smart about placing them together in a segment that is read-write initially, and is remapped read-only afterwards. This is exactly what __ro_after_init aims to provide, so move these sections together. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- include/asm-generic/vmlinux.lds.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 18468b46c450..e6cab3f86b5e 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -406,7 +406,7 @@ #define RO_AFTER_INIT_DATA \ . = ALIGN(8); \ __start_ro_after_init = .; \ - *(.data..ro_after_init) \ + *(.data..ro_after_init .data.rel.ro.*) \ JUMP_TABLE_DATA \ STATIC_CALL_DATA \ __end_ro_after_init = .; -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 6b12f315331362a8ec9e8fe3f97d9ae09e43fd28 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=6b12f315331362a8ec9e8fe3f97d9ae09e43fd28 ------------------------------------------------- This replaces a couple of open coded calculations to obtain the physical address of a far symbol with calls to the new adr_l etc macros. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/mach-exynos/headsmp.S | 9 +-------- arch/arm/mach-exynos/sleep.S | 26 +++++--------------------- 2 files changed, 6 insertions(+), 29 deletions(-) diff --git a/arch/arm/mach-exynos/headsmp.S b/arch/arm/mach-exynos/headsmp.S index 0ac2cb9a7355..be7cd0eebe1d 100644 --- a/arch/arm/mach-exynos/headsmp.S +++ b/arch/arm/mach-exynos/headsmp.S @@ -19,10 +19,7 @@ ENTRY(exynos4_secondary_startup) ARM_BE8(setend be) mrc p15, 0, r0, c0, c0, 5 and r0, r0, #15 - adr r4, 1f - ldmia r4, {r5, r6} - sub r4, r4, r5 - add r6, r6, r4 + adr_l r6, exynos_pen_release pen: ldr r7, [r6] cmp r7, r0 bne pen @@ -33,7 +30,3 @@ pen: ldr r7, [r6] */ b secondary_startup ENDPROC(exynos4_secondary_startup) - - .align 2 -1: .long . - .long exynos_pen_release diff --git a/arch/arm/mach-exynos/sleep.S b/arch/arm/mach-exynos/sleep.S index ed93f91853b8..ed27515a4458 100644 --- a/arch/arm/mach-exynos/sleep.S +++ b/arch/arm/mach-exynos/sleep.S @@ -8,6 +8,7 @@ #include <linux/linkage.h> #include <asm/asm-offsets.h> +#include <asm/assembler.h> #include <asm/hardware/cache-l2x0.h> #include "smc.h" @@ -54,19 +55,13 @@ ENTRY(exynos_cpu_resume_ns) cmp r0, r1 bne skip_cp15 - adr r0, _cp15_save_power - ldr r1, [r0] - ldr r1, [r0, r1] - adr r0, _cp15_save_diag - ldr r2, [r0] - ldr r2, [r0, r2] + ldr_l r1, cp15_save_power + ldr_l r2, cp15_save_diag mov r0, #SMC_CMD_C15RESUME dsb smc #0 #ifdef CONFIG_CACHE_L2X0 - adr r0, 1f - ldr r2, [r0] - add r0, r2, r0 + adr_l r0, l2x0_saved_regs /* Check that the address has been initialised. */ ldr r1, [r0, #L2X0_R_PHY_BASE] @@ -85,9 +80,7 @@ ENTRY(exynos_cpu_resume_ns) smc #0 /* Reload saved regs pointer because smc corrupts registers. */ - adr r0, 1f - ldr r2, [r0] - add r0, r2, r0 + adr_l r0, l2x0_saved_regs ldr r1, [r0, #L2X0_R_PWR_CTRL] ldr r2, [r0, #L2X0_R_AUX_CTRL] @@ -106,15 +99,6 @@ skip_cp15: b cpu_resume ENDPROC(exynos_cpu_resume_ns) - .align -_cp15_save_power: - .long cp15_save_power - . -_cp15_save_diag: - .long cp15_save_diag - . -#ifdef CONFIG_CACHE_L2X0 -1: .long l2x0_saved_regs - . -#endif /* CONFIG_CACHE_L2X0 */ - .data .align 2 .globl cp15_save_diag -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 59dee05a68727a7fc3c62240542f8753797e38d6 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=59dee05a68727a7fc3c62240542f8753797e38d6 ------------------------------------------------- This replaces an open coded calculation to obtain the physical address of a far symbol with a call to the new ldr_l etc macro. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/mach-mvebu/coherency_ll.S | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/arm/mach-mvebu/coherency_ll.S b/arch/arm/mach-mvebu/coherency_ll.S index a3a64bf97250..9ae65b1e9745 100644 --- a/arch/arm/mach-mvebu/coherency_ll.S +++ b/arch/arm/mach-mvebu/coherency_ll.S @@ -37,9 +37,7 @@ ENTRY(ll_get_coherency_base) * MMU is disabled, use the physical address of the coherency * base address, (or 0x0 if the coherency fabric is not mapped) */ - adr r1, 3f - ldr r3, [r1] - ldr r1, [r1, r3] + ldr_l r1, coherency_phys_base b 2f 1: /* @@ -155,7 +153,3 @@ ENTRY(ll_disable_coherency) dsb ret lr ENDPROC(ll_disable_coherency) - - .align 2 -3: - .long coherency_phys_base - . -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit e2aa765c4eb9bbcdd3046744e6f73050d1175138 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=e2aa765c4eb9bbcdd3046744e6f73050d1175138 ------------------------------------------------- This replaces a few copies of the open coded calculations of the physical address of 'pen_release' in the secondary startup code of a couple of platforms. This ensures these quantities are invariant under runtime relocation. Cc: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/mach-prima2/headsmp.S | 11 +++-------- arch/arm/mach-spear/headsmp.S | 11 +++-------- arch/arm/plat-versatile/headsmp.S | 9 +-------- 3 files changed, 7 insertions(+), 24 deletions(-) diff --git a/arch/arm/mach-prima2/headsmp.S b/arch/arm/mach-prima2/headsmp.S index 88ea1243942a..e09dcaca320e 100644 --- a/arch/arm/mach-prima2/headsmp.S +++ b/arch/arm/mach-prima2/headsmp.S @@ -8,6 +8,8 @@ #include <linux/linkage.h> #include <linux/init.h> +#include <asm/assembler.h> + /* * SIRFSOC specific entry point for secondary CPUs. This provides * a "holding pen" into which all secondary cores are held until we're @@ -16,10 +18,7 @@ ENTRY(sirfsoc_secondary_startup) mrc p15, 0, r0, c0, c0, 5 and r0, r0, #15 - adr r4, 1f - ldmia r4, {r5, r6} - sub r4, r4, r5 - add r6, r6, r4 + adr_l r6, prima2_pen_release pen: ldr r7, [r6] cmp r7, r0 bne pen @@ -30,7 +29,3 @@ pen: ldr r7, [r6] */ b secondary_startup ENDPROC(sirfsoc_secondary_startup) - - .align -1: .long . - .long prima2_pen_release diff --git a/arch/arm/mach-spear/headsmp.S b/arch/arm/mach-spear/headsmp.S index 96f89436ccf6..32ffc75ff332 100644 --- a/arch/arm/mach-spear/headsmp.S +++ b/arch/arm/mach-spear/headsmp.S @@ -10,6 +10,8 @@ #include <linux/linkage.h> #include <linux/init.h> +#include <asm/assembler.h> + __INIT /* @@ -20,10 +22,7 @@ ENTRY(spear13xx_secondary_startup) mrc p15, 0, r0, c0, c0, 5 and r0, r0, #15 - adr r4, 1f - ldmia r4, {r5, r6} - sub r4, r4, r5 - add r6, r6, r4 + adr_l r6, spear_pen_release pen: ldr r7, [r6] cmp r7, r0 bne pen @@ -37,8 +36,4 @@ pen: ldr r7, [r6] * should now contain the SVC stack for this core */ b secondary_startup - - .align -1: .long . - .long spear_pen_release ENDPROC(spear13xx_secondary_startup) diff --git a/arch/arm/plat-versatile/headsmp.S b/arch/arm/plat-versatile/headsmp.S index 09d9fc30c8ca..cec71853b0b3 100644 --- a/arch/arm/plat-versatile/headsmp.S +++ b/arch/arm/plat-versatile/headsmp.S @@ -18,10 +18,7 @@ ENTRY(versatile_secondary_startup) ARM_BE8(setend be) mrc p15, 0, r0, c0, c0, 5 bic r0, #0xff000000 - adr r4, 1f - ldmia r4, {r5, r6} - sub r4, r4, r5 - add r6, r6, r4 + adr_l r6, versatile_cpu_release pen: ldr r7, [r6] cmp r7, r0 bne pen @@ -31,8 +28,4 @@ pen: ldr r7, [r6] * should now contain the SVC stack for this core */ b secondary_startup - - .align -1: .long . - .long versatile_cpu_release ENDPROC(versatile_secondary_startup) -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit ccb456783dd71f474e5783a81d7f18c2cd4dda81 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=ccb456783dd71f474e5783a81d7f18c2cd4dda81 ------------------------------------------------- To avoid having to relocate the contents of extable entries at runtime when running with KASLR enabled, wire up the existing support for emitting them as relative references. This ensures these quantities are invariant under runtime relocation. Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/Kbuild | 1 - arch/arm/include/asm/assembler.h | 16 +++------ arch/arm/include/asm/extable.h | 47 +++++++++++++++++++++++++++ arch/arm/include/asm/futex.h | 6 ++-- arch/arm/include/asm/uaccess.h | 17 +++------- arch/arm/include/asm/word-at-a-time.h | 6 ++-- arch/arm/kernel/entry-armv.S | 9 +++-- arch/arm/kernel/swp_emulate.c | 8 ++--- arch/arm/lib/backtrace.S | 13 +++----- arch/arm/lib/getuser.S | 24 +++++++------- arch/arm/lib/putuser.S | 15 ++++----- arch/arm/mm/alignment.c | 24 +++++--------- arch/arm/mm/extable.c | 2 +- arch/arm/nwfpe/entry.S | 6 ++-- scripts/sorttable.c | 2 +- 15 files changed, 102 insertions(+), 94 deletions(-) create mode 100644 arch/arm/include/asm/extable.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index f1398b9267c0..e243acbd850f 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 generic-y += early_ioremap.h -generic-y += extable.h generic-y += flat.h generic-y += parport.h generic-y += seccomp.h diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 6ed30421f697..b6d5c0d83674 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -18,6 +18,7 @@ #endif #include <asm/ptrace.h> +#include <asm/extable.h> #include <asm/opcodes-virt.h> #include <asm/asm-offsets.h> #include <asm/page.h> @@ -242,10 +243,7 @@ #define USERL(l, x...) \ 9999: x; \ - .pushsection __ex_table,"a"; \ - .align 3; \ - .long 9999b,l; \ - .popsection + ex_entry 9999b,l; #define USER(x...) USERL(9001f, x) @@ -379,10 +377,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) .error "Unsupported inc macro argument" .endif - .pushsection __ex_table,"a" - .align 3 - .long 9999b, \abort - .popsection + ex_entry 9999b, \abort .endm .macro usracc, instr, reg, ptr, inc, cond, rept, abort @@ -420,10 +415,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) .error "Unsupported inc macro argument" .endif - .pushsection __ex_table,"a" - .align 3 - .long 9999b, \abort - .popsection + ex_entry 9999b, \abort .endr .endm diff --git a/arch/arm/include/asm/extable.h b/arch/arm/include/asm/extable.h new file mode 100644 index 000000000000..175d42247b96 --- /dev/null +++ b/arch/arm/include/asm/extable.h @@ -0,0 +1,47 @@ +#ifndef __ASM_EXTABLE_H +#define __ASM_EXTABLE_H + +#ifndef __ASSEMBLY__ + +/* + * The exception table consists of pairs of relative offsets: the first + * is the relative offset to an instruction that is allowed to fault, + * and the second is the relative offset at which the program should + * continue. No registers are modified, so it is entirely up to the + * continuation code to figure out what to do. + */ + +struct exception_table_entry { + int insn, fixup; +}; + +#define ARCH_HAS_RELATIVE_EXTABLE + +extern int fixup_exception(struct pt_regs *regs); + + /* + * ex_entry - place-relative extable entry + */ +asm( ".macro ex_entry, insn, fixup \n" + ".pushsection __ex_table, \"a\", %progbits \n" + ".align 3 \n" + ".long \\insn - . \n" + ".long \\fixup - . \n" + ".popsection \n" + ".endm \n"); + +#else + + /* + * ex_entry - place-relative extable entry + */ + .macro ex_entry, insn, fixup + .pushsection __ex_table, "a", %progbits + .align 3 + .long \insn - . + .long \fixup - . + .popsection + .endm + +#endif +#endif diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index a9151884bc85..6921c58c6c7c 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -10,10 +10,8 @@ #define __futex_atomic_ex_table(err_reg) \ "3:\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 4f, 2b, 4f\n" \ - " .popsection\n" \ + " ex_entry 1b, 4f\n" \ + " ex_entry 2b, 4f\n" \ " .pushsection .text.fixup,\"ax\"\n" \ " .align 2\n" \ "4: mov %0, " err_reg "\n" \ diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index a13d90206472..476d1a15e669 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -340,10 +340,7 @@ do { \ " mov %1, #0\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 3b\n" \ - " .popsection" \ + " ex_entry 1b, 3b\n" \ : "+r" (err), "=&r" (x) \ : "r" (addr), "i" (-EFAULT) \ : "cc") @@ -442,10 +439,7 @@ do { \ "3: mov %0, %3\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 3b\n" \ - " .popsection" \ + " ex_entry 1b, 3b\n" \ : "+r" (err) \ : "r" (x), "r" (__pu_addr), "i" (-EFAULT) \ : "cc") @@ -501,11 +495,8 @@ do { \ "4: mov %0, %3\n" \ " b 3b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 4b\n" \ - " .long 2b, 4b\n" \ - " .popsection" \ + " ex_entry 1b, 4b\n" \ + " ex_entry 2b, 4b\n" \ : "+r" (err), "+r" (__pu_addr) \ : "r" (x), "i" (-EFAULT) \ : "cc") diff --git a/arch/arm/include/asm/word-at-a-time.h b/arch/arm/include/asm/word-at-a-time.h index 352ab213520d..a440ec1cd85b 100644 --- a/arch/arm/include/asm/word-at-a-time.h +++ b/arch/arm/include/asm/word-at-a-time.h @@ -9,6 +9,7 @@ * Heavily based on the x86 algorithm. */ #include <linux/kernel.h> +#include <asm/extable.h> struct word_at_a_time { const unsigned long one_bits, high_bits; @@ -85,10 +86,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr) #endif " b 2b\n" " .popsection\n" - " .pushsection __ex_table,\"a\"\n" - " .align 3\n" - " .long 1b, 3b\n" - " .popsection" + " ex_entry 1b, 3b\n" : "=&r" (ret), "=&r" (offset) : "r" (addr), "Qo" (*(unsigned long *)addr)); diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 0ea8529a4872..c9af7e2338eb 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -15,6 +15,7 @@ #include <linux/init.h> #include <asm/assembler.h> +#include <asm/extable.h> #include <asm/memory.h> #include <asm/glue-df.h> #include <asm/glue-pf.h> @@ -535,13 +536,11 @@ ENDPROC(__und_usr) 4: str r4, [sp, #S_PC] @ retry current instruction ret r9 .popsection - .pushsection __ex_table,"a" - .long 1b, 4b + ex_entry 1b, 4b #if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && CONFIG_CPU_V7 - .long 2b, 4b - .long 3b, 4b + ex_entry 2b, 4b + ex_entry 3b, 4b #endif - .popsection /* * Check whether the instruction is a co-processor instruction. diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c index 6166ba38bf99..f80230610753 100644 --- a/arch/arm/kernel/swp_emulate.c +++ b/arch/arm/kernel/swp_emulate.c @@ -24,6 +24,7 @@ #include <linux/syscalls.h> #include <linux/perf_event.h> +#include <asm/extable.h> #include <asm/opcodes.h> #include <asm/system_info.h> #include <asm/traps.h> @@ -45,11 +46,8 @@ "3: mov %0, %5\n" \ " b 2b\n" \ " .previous\n" \ - " .section __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 0b, 3b\n" \ - " .long 1b, 3b\n" \ - " .previous" \ + " ex_entry 0b, 3b\n" \ + " ex_entry 1b, 3b\n" \ : "=&r" (res), "+r" (data), "=&r" (temp) \ : "r" (addr), "i" (-EAGAIN), "i" (-EFAULT) \ : "cc", "memory") diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S index 872f658638d9..baccd30cf23f 100644 --- a/arch/arm/lib/backtrace.S +++ b/arch/arm/lib/backtrace.S @@ -106,14 +106,11 @@ for_each_frame: tst frame, mask @ Check for address exceptions bl printk no_frame: ldmfd sp!, {r4 - r9, pc} ENDPROC(c_backtrace) - - .pushsection __ex_table,"a" - .align 3 - .long 1001b, 1006b - .long 1002b, 1006b - .long 1003b, 1006b - .long 1004b, 1006b - .popsection + + ex_entry 1001b, 1006b + ex_entry 1002b, 1006b + ex_entry 1003b, 1006b + ex_entry 1004b, 1006b .Lbad: .asciz "%sBacktrace aborted due to bad frame pointer <%p>\n" .align diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S index c5e420750c48..d120e0223a8c 100644 --- a/arch/arm/lib/getuser.S +++ b/arch/arm/lib/getuser.S @@ -27,6 +27,7 @@ #include <linux/linkage.h> #include <asm/assembler.h> #include <asm/errno.h> +#include <asm/extable.h> #include <asm/domain.h> ENTRY(__get_user_1) @@ -149,19 +150,18 @@ _ASM_NOKPROBE(__get_user_bad) _ASM_NOKPROBE(__get_user_bad8) .pushsection __ex_table, "a" - .long 1b, __get_user_bad - .long 2b, __get_user_bad + ex_entry 1b, __get_user_bad + ex_entry 2b, __get_user_bad #if __LINUX_ARM_ARCH__ < 6 - .long 3b, __get_user_bad + ex_entry 3b, __get_user_bad #endif - .long 4b, __get_user_bad - .long 5b, __get_user_bad8 - .long 6b, __get_user_bad8 + ex_entry 4b, __get_user_bad + ex_entry 5b, __get_user_bad8 + ex_entry 6b, __get_user_bad8 #ifdef __ARMEB__ - .long 7b, __get_user_bad - .long 8b, __get_user_bad8 - .long 9b, __get_user_bad8 - .long 10b, __get_user_bad8 - .long 11b, __get_user_bad8 + ex_entry 7b, __get_user_bad + ex_entry 8b, __get_user_bad8 + ex_entry 9b, __get_user_bad8 + ex_entry 10b, __get_user_bad8 + ex_entry 11b, __get_user_bad8 #endif -.popsection diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S index bdd8836dc5c2..1bb85192ba60 100644 --- a/arch/arm/lib/putuser.S +++ b/arch/arm/lib/putuser.S @@ -27,6 +27,7 @@ #include <linux/linkage.h> #include <asm/assembler.h> #include <asm/errno.h> +#include <asm/extable.h> #include <asm/domain.h> ENTRY(__put_user_1) @@ -83,13 +84,11 @@ __put_user_bad: ret lr ENDPROC(__put_user_bad) -.pushsection __ex_table, "a" - .long 1b, __put_user_bad - .long 2b, __put_user_bad + ex_entry 1b, __put_user_bad + ex_entry 2b, __put_user_bad #if __LINUX_ARM_ARCH__ < 6 - .long 3b, __put_user_bad + ex_entry 3b, __put_user_bad #endif - .long 4b, __put_user_bad - .long 5b, __put_user_bad - .long 6b, __put_user_bad -.popsection + ex_entry 4b, __put_user_bad + ex_entry 5b, __put_user_bad + ex_entry 6b, __put_user_bad diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index ea81e89e7740..f26a748fe041 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -21,6 +21,7 @@ #include <linux/uaccess.h> #include <asm/cp15.h> +#include <asm/extable.h> #include <asm/system_info.h> #include <asm/unaligned.h> #include <asm/opcodes.h> @@ -204,10 +205,7 @@ union offset_union { "3: mov %0, #1\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 3b\n" \ - " .popsection\n" \ + " ex_entry 1b, 3b\n" \ : "=r" (err), "=&r" (val), "=r" (addr) \ : "0" (err), "2" (addr)) @@ -264,11 +262,8 @@ union offset_union { "4: mov %0, #1\n" \ " b 3b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 4b\n" \ - " .long 2b, 4b\n" \ - " .popsection\n" \ + " ex_entry 1b, 4b\n" \ + " ex_entry 2b, 4b\n" \ : "=r" (err), "=&r" (v), "=&r" (a) \ : "0" (err), "1" (v), "2" (a)); \ if (err) \ @@ -304,13 +299,10 @@ union offset_union { "6: mov %0, #1\n" \ " b 5b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ - " .align 3\n" \ - " .long 1b, 6b\n" \ - " .long 2b, 6b\n" \ - " .long 3b, 6b\n" \ - " .long 4b, 6b\n" \ - " .popsection\n" \ + " ex_entry 1b, 6b\n" \ + " ex_entry 2b, 6b\n" \ + " ex_entry 3b, 6b\n" \ + " ex_entry 4b, 6b\n" \ : "=r" (err), "=&r" (v), "=&r" (a) \ : "0" (err), "1" (v), "2" (a)); \ if (err) \ diff --git a/arch/arm/mm/extable.c b/arch/arm/mm/extable.c index fc33564597b8..46c4a8a7f5da 100644 --- a/arch/arm/mm/extable.c +++ b/arch/arm/mm/extable.c @@ -11,7 +11,7 @@ int fixup_exception(struct pt_regs *regs) fixup = search_exception_tables(instruction_pointer(regs)); if (fixup) { - regs->ARM_pc = fixup->fixup; + regs->ARM_pc = (unsigned long)&fixup->fixup + fixup->fixup; #ifdef CONFIG_THUMB2_KERNEL /* Clear the IT state to avoid nasty surprises in the fixup */ regs->ARM_cpsr &= ~PSR_IT_MASK; diff --git a/arch/arm/nwfpe/entry.S b/arch/arm/nwfpe/entry.S index d8f9915566e1..7816b3df00aa 100644 --- a/arch/arm/nwfpe/entry.S +++ b/arch/arm/nwfpe/entry.S @@ -8,6 +8,7 @@ */ #include <asm/assembler.h> +#include <asm/extable.h> #include <asm/opcodes.h> /* This is the kernel's entry point into the floating point emulator. @@ -107,7 +108,4 @@ next: .Lfix: ret r9 @ let the user eat segfaults .popsection - .pushsection __ex_table,"a" - .align 3 - .long .Lx1, .Lfix - .popsection + ex_entry .Lx1, .Lfix diff --git a/scripts/sorttable.c b/scripts/sorttable.c index 0ef3abfc4a51..ac93e033b7cb 100644 --- a/scripts/sorttable.c +++ b/scripts/sorttable.c @@ -341,12 +341,12 @@ static int do_file(char const *const fname, void *addr) case EM_AARCH64: case EM_PARISC: case EM_PPC: + case EM_ARM: case EM_PPC64: custom_sort = sort_relative_table; break; case EM_ARCOMPACT: case EM_ARCV2: - case EM_ARM: case EM_MICROBLAZE: case EM_MIPS: case EM_XTENSA: -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 04be01192973461cdd00ab47908a78f0e2f55ef8 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=04be01192973461cdd00ab47908a78f0e2f55ef8 Update the Kconfig RELOCATABLE depends on !JUMP_LABEL to resolve compilation conflicts between fpic and JUMP_LABEL ------------------------------------------------- Update the build flags and linker script to allow vmlinux to be built as a PIE binary, which retains relocation information about absolute symbol references so that they can be fixed up at runtime. This will be used for implementing KASLR, Cc: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/Kconfig | 5 +++++ arch/arm/Makefile | 5 +++++ arch/arm/include/asm/assembler.h | 2 +- arch/arm/include/asm/vmlinux.lds.h | 6 +++++- arch/arm/kernel/vmlinux.lds.S | 6 ++++++ scripts/module.lds.S | 1 + 6 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 97f9d004e6d0..c856542174a7 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1678,6 +1678,11 @@ config STACKPROTECTOR_PER_TASK Enable this option to switch to a different method that uses a different canary value for each task. +config RELOCATABLE + bool + depends on !XIP_KERNEL && !JUMP_LABEL + select HAVE_ARCH_PREL32_RELOCATIONS + endmenu menu "Boot options" diff --git a/arch/arm/Makefile b/arch/arm/Makefile index e15f76ca2887..4f550fc0e811 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -48,6 +48,11 @@ CHECKFLAGS += -D__ARMEL__ KBUILD_LDFLAGS += -EL endif +ifeq ($(CONFIG_RELOCATABLE),y) +KBUILD_CFLAGS += -fpic -include $(srctree)/include/linux/hidden.h +LDFLAGS_vmlinux += -pie -shared -Bsymbolic +endif + # # The Scalar Replacement of Aggregates (SRA) optimization pass in GCC 4.9 and # later may result in code being generated that handles signed short and signed diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index b6d5c0d83674..20993615087a 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -528,7 +528,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) * mov_l - move a constant value or [relocated] address into a register */ .macro mov_l, dst:req, imm:req - .if __LINUX_ARM_ARCH__ < 7 + .if CONFIG_RELOCATABLE == 1 || __LINUX_ARM_ARCH__ < 7 ldr \dst, =\imm .else movw \dst, #:lower16:\imm diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h index 4a91428c324d..c1ced85cd8e5 100644 --- a/arch/arm/include/asm/vmlinux.lds.h +++ b/arch/arm/include/asm/vmlinux.lds.h @@ -50,7 +50,11 @@ EXIT_CALL \ ARM_MMU_DISCARD(*(.text.fixup)) \ ARM_MMU_DISCARD(*(__ex_table)) \ - COMMON_DISCARDS + COMMON_DISCARDS \ + *(.ARM.exidx.discard.text) \ + *(.interp .dynamic) \ + *(.dynsym .dynstr .hash) + /* * Sections that should stay zero sized, which is safer to explicitly diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index f7f4620d59c3..262ba776d32d 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -115,6 +115,12 @@ SECTIONS __smpalt_end = .; } #endif + .rel.dyn : ALIGN(8) { + __rel_begin = .; + *(.rel .rel.* .rel.dyn) + } + __rel_end = ADDR(.rel.dyn) + SIZEOF(.rel.dyn); + .init.pv_table : { __pv_table_begin = .; *(.pv_table) diff --git a/scripts/module.lds.S b/scripts/module.lds.S index 69b9b71a6a47..088a5a2c446d 100644 --- a/scripts/module.lds.S +++ b/scripts/module.lds.S @@ -7,6 +7,7 @@ SECTIONS { /DISCARD/ : { *(.discard) *(.discard.*) + *(*.discard.*) } __ksymtab 0 : { *(SORT(___ksymtab+*)) } -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 7e279c05992a88d0517df371a48e72d060b2ca21 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=7e279c05992a88d0517df371a48e72d060b2ca21 ------------------------------------------------- To prepare for adding support for KASLR, which relocates all absolute symbol references at runtime after the caches have been enabled, update the MMU switch code to avoid using absolute symbol references where possible. This ensures these quantities are invariant under runtime relocation. Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head-common.S | 61 +++++++++++------------------------ 1 file changed, 18 insertions(+), 43 deletions(-) diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 29b2eda136bb..69fcbf8876b3 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -80,70 +80,45 @@ __mmap_switched: mov r8, r2 mov r10, r0 - adr r4, __mmap_switched_data mov fp, #0 #if defined(CONFIG_XIP_DEFLATED_DATA) - ARM( ldr sp, [r4], #4 ) - THUMB( ldr sp, [r4] ) - THUMB( add r4, #4 ) + adr_l r4, __bss_stop + mov sp, r4 @ sp (temporary stack in .bss) bl __inflate_kernel_data @ decompress .data to RAM teq r0, #0 bne __error #elif defined(CONFIG_XIP_KERNEL) - ARM( ldmia r4!, {r0, r1, r2, sp} ) - THUMB( ldmia r4!, {r0, r1, r2, r3} ) - THUMB( mov sp, r3 ) + adr_l r0, _sdata + adr_l r1, __data_loc + adr_l r2, _edata_loc + adr_l r3, __bss_stop + mov sp, r3 @ sp (temporary stack in .bss) sub r2, r2, r1 bl __memcpy @ copy .data to RAM #endif - ARM( ldmia r4!, {r0, r1, sp} ) - THUMB( ldmia r4!, {r0, r1, r3} ) - THUMB( mov sp, r3 ) + adr_l r0, __bss_start + adr_l r1, __bss_stop + adr_l r3, init_thread_union + THREAD_START_SP + mov sp, r3 sub r2, r1, r0 mov r1, #0 bl __memset @ clear .bss - ldmia r4, {r0, r1, r2, r3} - str r9, [r0] @ Save processor ID - str r7, [r1] @ Save machine type - str r8, [r2] @ Save atags pointer - cmp r3, #0 - strne r10, [r3] @ Save control register values #ifdef CONFIG_KASAN bl kasan_early_init #endif - mov lr, #0 - b start_kernel -ENDPROC(__mmap_switched) - - .align 2 - .type __mmap_switched_data, %object -__mmap_switched_data: -#ifdef CONFIG_XIP_KERNEL -#ifndef CONFIG_XIP_DEFLATED_DATA - .long _sdata @ r0 - .long __data_loc @ r1 - .long _edata_loc @ r2 -#endif - .long __bss_stop @ sp (temporary stack in .bss) -#endif - .long __bss_start @ r0 - .long __bss_stop @ r1 - .long init_thread_union + THREAD_START_SP @ sp - - .long processor_id @ r0 - .long __machine_arch_type @ r1 - .long __atags_pointer @ r2 + str_l r9, processor_id, r4 @ Save processor ID + str_l r7, __machine_arch_type, r4 @ Save machine type + str_l r8, __atags_pointer, r4 @ Save atags pointer #ifdef CONFIG_CPU_CP15 - .long cr_alignment @ r3 -#else -M_CLASS(.long exc_ret) @ r3 -AR_CLASS(.long 0) @ r3 + str_l r10, cr_alignment, r4 @ Save control register values #endif - .size __mmap_switched_data, . - __mmap_switched_data + mov lr, #0 + b start_kernel +ENDPROC(__mmap_switched) __FINIT .text -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 2c7e6b4d7cbff417ff96a24c243508e16168f90c category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=2c7e6b4d7cbff417ff96a24c243508e16168f90c ------------------------------------------------- Replace some unnecessary absolute references with relative ones. Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/sleep.S | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/arm/kernel/sleep.S b/arch/arm/kernel/sleep.S index 43077e11dafd..7e64648654ae 100644 --- a/arch/arm/kernel/sleep.S +++ b/arch/arm/kernel/sleep.S @@ -61,15 +61,14 @@ ENTRY(__cpu_suspend) stmfd sp!, {r4 - r11, lr} #ifdef MULTI_CPU - ldr r10, =processor - ldr r4, [r10, #CPU_SLEEP_SIZE] @ size of CPU sleep state + ldr_l r4, processor + CPU_SLEEP_SIZE @ size of CPU sleep state #else - ldr r4, =cpu_suspend_size + adr_l r4, cpu_suspend_size #endif mov r5, sp @ current virtual SP add r4, r4, #12 @ Space for pgd, virt sp, phys resume fn sub sp, sp, r4 @ allocate CPU state on stack - ldr r3, =sleep_save_sp + adr_l r3, sleep_save_sp stmfd sp!, {r0, r1} @ save suspend func arg and pointer ldr r3, [r3, #SLEEP_SAVE_SP_VIRT] ALT_SMP(W(nop)) @ don't use adr_l inside ALT_SMP() -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit c3ae0029ea41f4a26a40f592062155412d1b6d07 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=c3ae0029ea41f4a26a40f592062155412d1b6d07 ------------------------------------------------- In order for the EFI stub to be able to decide over what range to randomize the load address of the kernel, expose the definition of the default vmalloc base address as VMALLOC_DEFAULT_BASE. Cc: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/pgtable.h | 1 + arch/arm/mm/mmu.c | 3 +-- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index c02f24400369..340d3306c5e4 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -41,6 +41,7 @@ #define VMALLOC_OFFSET (8*1024*1024) #define VMALLOC_START (((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1)) #define VMALLOC_END 0xff800000UL +#define VMALLOC_DEFAULT_BASE (VMALLOC_END - (240 << 20) - VMALLOC_OFFSET) #define LIBRARY_TEXT_START 0x0c000000 diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index c06ebfbc48c4..d98fa83f53dd 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1123,8 +1123,7 @@ void __init debug_ll_io_init(void) } #endif -static void * __initdata vmalloc_min = - (void *)(VMALLOC_END - (240 << 20) - VMALLOC_OFFSET); +static void * __initdata vmalloc_min = (void *)VMALLOC_DEFAULT_BASE; /* * vmalloc=size forces the vmalloc area to be exactly 'size' -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit fe64d7efe89877bc52454f9f2bc9ab0ce01ae8fc category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=fe64d7efe89877bc52454f9f2bc9ab0ce01ae8fc ------------------------------------------------- The location of swapper_pg_dir is relative to the kernel, not to PAGE_OFFSET or PHYS_OFFSET. So define the symbol relative to the start of the kernel image, and refer to it via its name. Cc: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 7f62c5eccdf3..cd850e6c6b1a 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -45,14 +45,6 @@ #define PMD_ORDER 2 #endif - .globl swapper_pg_dir - .equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE - - .macro pgtbl, rd, phys - add \rd, \phys, #TEXT_OFFSET - sub \rd, \rd, #PG_DIR_SIZE - .endm - /* * Kernel startup entry point. * --------------------------- @@ -74,6 +66,9 @@ .arm __HEAD + .globl swapper_pg_dir + .equ swapper_pg_dir, . - PG_DIR_SIZE + ENTRY(stext) ARM_BE8(setend be ) @ ensure we are in BE8 mode @@ -169,7 +164,7 @@ ENDPROC(stext) * r4 = physical page table address */ __create_page_tables: - pgtbl r4, r8 @ page table address + adr_l r4, swapper_pg_dir @ page table address /* * Clear the swapper page table -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit 11f8bbc5b0d4d76b3d7114bf9af1805607a20372 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=11f8bbc5b0d4d76b3d7114bf9af1805607a20372 ------------------------------------------------- The location of the ARM vector table in virtual memory is not a compile time constant, and so the virtual addresses of the various entry points are rather meaningless (although they are most likely to reside at the offsets below) ffff1004 t vector_rst ffff1020 t vector_irq ffff10a0 t vector_dabt ffff1120 t vector_pabt ffff11a0 t vector_und ffff1220 t vector_addrexcptn ffff1240 T vector_fiq However, when running with KASLR enabled, the virtual addresses are subject to runtime relocation, which means we should avoid to take absolute references to these symbols, not only directly (by taking the address in C code), but also via /proc/kallsyms or other kernel facilities that deal with ELF symbols. For instance, /proc/kallsyms will list their addresses as 0abf1004 t vector_rst 0abf1020 t vector_irq 0abf10a0 t vector_dabt 0abf1120 t vector_pabt 0abf11a0 t vector_und 0abf1220 t vector_addrexcptn 0abf1240 T vector_fiq when running randomized, which may confuse tools like perf that may use /proc/kallsyms to annotate stack traces. So use .L prefixes for these symbols. This will prevent them from being visible at all outside the assembler source. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/vmlinux.lds.h | 4 +--- arch/arm/kernel/entry-armv.S | 32 ++++++++++++++++-------------- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h index c1ced85cd8e5..68e1fc0b7175 100644 --- a/arch/arm/include/asm/vmlinux.lds.h +++ b/arch/arm/include/asm/vmlinux.lds.h @@ -126,10 +126,8 @@ *(.stubs) \ } \ . = __stubs_start + SIZEOF(.stubs); \ - __stubs_end = .; \ + __stubs_end = .; \ - PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors)); - #define ARM_TCM \ __itcm_start = ALIGN(4); \ .text_itcm ITCM_OFFSET : AT(__itcm_start - LOAD_OFFSET) { \ diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index c9af7e2338eb..cd0a00032b22 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -1002,7 +1002,7 @@ __kuser_helper_end: .macro vector_stub, name, mode, correction=0 .align 5 -vector_\name: +.Lvector_\name: .if \correction sub lr, lr, #\correction .endif @@ -1031,7 +1031,7 @@ vector_\name: mov r0, sp ARM( ldr lr, [pc, lr, lsl #2] ) movs pc, lr @ branch to handler in SVC mode -ENDPROC(vector_\name) +ENDPROC(.Lvector_\name) .align 2 @ handler addresses follow this label @@ -1039,14 +1039,18 @@ ENDPROC(vector_\name) .endm .section .stubs, "ax", %progbits +#ifdef CONFIG_FIQ + .global vector_fiq_offset + .set vector_fiq_offset, .Lvector_fiq - . + 0x1000 +#endif @ This must be the first word .word vector_swi -vector_rst: +.Lvector_rst: ARM( swi SYS_ERROR0 ) THUMB( svc #0 ) THUMB( nop ) - b vector_und + b .Lvector_und /* * Interrupt dispatcher @@ -1148,8 +1152,8 @@ vector_rst: * (they're not supposed to happen, and won't happen in 32-bit data mode). */ -vector_addrexcptn: - b vector_addrexcptn +.Lvector_addrexcptn: + b .Lvector_addrexcptn /*============================================================================= * FIQ "NMI" handler @@ -1176,18 +1180,16 @@ vector_addrexcptn: .long __fiq_svc @ e .long __fiq_svc @ f - .globl vector_fiq - .section .vectors, "ax", %progbits .L__vectors_start: - W(b) vector_rst - W(b) vector_und + W(b) .Lvector_rst + W(b) .Lvector_und W(ldr) pc, .L__vectors_start + 0x1000 - W(b) vector_pabt - W(b) vector_dabt - W(b) vector_addrexcptn - W(b) vector_irq - W(b) vector_fiq + W(b) .Lvector_pabt + W(b) .Lvector_dabt + W(b) .Lvector_addrexcptn + W(b) .Lvector_irq + W(b) .Lvector_fiq .data .align 2 -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit c11744cd7b351b0fbc5233c04c32822544c96fc1 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=c11744cd7b351b0fbc5233c04c32822544c96fc1 Update the Kconfig RANDOMIZE_BASE depends on !JUMP_LABEL to resolve compilation conflicts between fpic and JUMP_LABEL. ------------------------------------------------- This implements randomization of the placement of the kernel image inside the lowmem region. It is intended to work together with the decompressor to place the kernel at an offset in physical memory that is a multiple of 2 MB, and to take the same offset into account when creating the virtual mapping. This uses runtime relocation of the kernel built as a PIE binary, to fix up all absolute symbol references to refer to their runtime virtual address. The physical-to-virtual mapping remains unchanged. In order to allow the decompressor to hand over to the core kernel without making assumptions that are not guaranteed to hold when invoking the core kernel directly using bootloaders that are not KASLR aware, the KASLR offset is expected to be placed in r3 when entering the kernel 4 bytes past the entry point, skipping the first instruction. Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/Kconfig | 12 +++++ arch/arm/kernel/head.S | 102 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 105 insertions(+), 9 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index c856542174a7..2f7a32b77282 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1683,6 +1683,18 @@ config RELOCATABLE depends on !XIP_KERNEL && !JUMP_LABEL select HAVE_ARCH_PREL32_RELOCATIONS +config RANDOMIZE_BASE + bool "Randomize the address of the kernel image" + depends on MMU && AUTO_ZRELADDR + depends on !XIP_KERNEL && !ZBOOT_ROM && !JUMP_LABEL + select RELOCATABLE + select ARM_MODULE_PLTS if MODULES + select MODULE_REL_CRCS if MODVERSIONS + help + Randomizes the virtual and physical address at which the kernel + image is loaded, as a security feature that deters exploit attempts + relying on knowledge of the location of kernel internals. + endmenu menu "Boot options" diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index cd850e6c6b1a..26f3e9a0ffe9 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -45,6 +45,28 @@ #define PMD_ORDER 2 #endif + .macro get_kaslr_offset, reg +#ifdef CONFIG_RANDOMIZE_BASE + ldr_l \reg, __kaslr_offset +#else + mov \reg, #0 +#endif + .endm + + .macro add_kaslr_offset, reg, tmp +#ifdef CONFIG_RANDOMIZE_BASE + get_kaslr_offset \tmp + add \reg, \reg, \tmp +#endif + .endm + + .macro sub_kaslr_offset, reg, tmp +#ifdef CONFIG_RANDOMIZE_BASE + get_kaslr_offset \tmp + sub \reg, \reg, \tmp +#endif + .endm + /* * Kernel startup entry point. * --------------------------- @@ -70,6 +92,7 @@ .equ swapper_pg_dir, . - PG_DIR_SIZE ENTRY(stext) + mov r3, #0 @ normal entry point - clear r3 ARM_BE8(setend be ) @ ensure we are in BE8 mode THUMB( badr r9, 1f ) @ Kernel is always entered in ARM. @@ -77,6 +100,16 @@ ENTRY(stext) THUMB( .thumb ) @ switch to Thumb now. THUMB(1: ) +#ifdef CONFIG_RANDOMIZE_BASE + str_l r3, __kaslr_offset, r9 @ offset in r3 if entered via kaslr ep + + .section ".bss", "aw", %nobits + .align 2 +__kaslr_offset: + .long 0 @ will be wiped before entering C code + .previous +#endif + #ifdef CONFIG_ARM_VIRT_EXT bl __hyp_stub_install #endif @@ -100,6 +133,7 @@ ENTRY(stext) #ifndef CONFIG_XIP_KERNEL adr_l r8, _text @ __pa(_text) sub r8, r8, #TEXT_OFFSET @ PHYS_OFFSET + sub_kaslr_offset r8, r12 #else ldr r8, =PLAT_PHYS_OFFSET @ always constant in this case #endif @@ -136,8 +170,8 @@ ENTRY(stext) * r0 will hold the CPU control register value, r1, r2, r4, and * r9 will be preserved. r5 will also be preserved if LPAE. */ - ldr r13, =__mmap_switched @ address to jump to after - @ mmu has been enabled + adr_l lr, __primary_switch @ address to jump to after + mov r13, lr @ mmu has been enabled badr lr, 1f @ return (PIC) address #ifdef CONFIG_ARM_LPAE mov r5, #0 @ high TTBR0 @@ -148,7 +182,8 @@ ENTRY(stext) ldr r12, [r10, #PROCINFO_INITFUNC] add r12, r12, r10 ret r12 -1: b __enable_mmu +1: get_kaslr_offset r12 @ get before turning MMU on + b __enable_mmu ENDPROC(stext) .ltorg @@ -227,9 +262,14 @@ __create_page_tables: /* * Map our RAM from the start to the end of the kernel .bss section. */ - add r0, r4, #PAGE_OFFSET >> (SECTION_SHIFT - PMD_ORDER) - ldr r6, =(_end - 1) - orr r3, r8, r7 + get_kaslr_offset r3 + add r0, r3, #PAGE_OFFSET + add r0, r4, r0, lsr #(SECTION_SHIFT - PMD_ORDER) + adr_l r6, _end - 1 + sub r6, r6, r8 + add r6, r6, #PAGE_OFFSET + add r3, r3, r8 + orr r3, r3, r7 add r6, r4, r6, lsr #(SECTION_SHIFT - PMD_ORDER) 1: str r3, [r0], #1 << PMD_ORDER add r3, r3, #1 << SECTION_SHIFT @@ -372,7 +412,7 @@ ENTRY(secondary_startup) * Use the page tables supplied from __cpu_up. */ adr_l r3, secondary_data - mov_l r12, __secondary_switched + mov_l r12, __secondary_switch ldrd r4, r5, [r3, #0] @ get secondary_data.pgdir ARM_BE8(eor r4, r4, r5) @ Swap r5 and r4 in BE: ARM_BE8(eor r5, r4, r5) @ it can be done in 3 steps @@ -410,6 +450,7 @@ ENDPROC(__secondary_switched) * r4 = TTBR pointer (low word) * r5 = TTBR pointer (high word if LPAE) * r9 = processor ID + * r12 = KASLR offset * r13 = *virtual* address to jump to upon completion */ __enable_mmu: @@ -447,6 +488,7 @@ ENDPROC(__enable_mmu) * r1 = machine ID * r2 = atags or dtb pointer * r9 = processor ID + * r12 = KASLR offset * r13 = *virtual* address to jump to upon completion * * other registers depend on the function called upon completion @@ -462,10 +504,52 @@ ENTRY(__turn_mmu_on) mov r3, r3 mov r3, r13 ret r3 -__turn_mmu_on_end: ENDPROC(__turn_mmu_on) - .popsection +__primary_switch: +#ifdef CONFIG_RELOCATABLE + adr_l r7, _text @ r7 := __pa(_text) + sub r7, r7, #TEXT_OFFSET @ r7 := PHYS_OFFSET + + adr_l r5, __rel_begin + adr_l r6, __rel_end + sub r5, r5, r7 + sub r6, r6, r7 + + add r5, r5, #PAGE_OFFSET + add r6, r6, #PAGE_OFFSET + add r5, r5, r12 + add r6, r6, r12 + + adr_l r3, __stubs_start @ __pa(__stubs_start) + sub r3, r3, r7 @ offset of __stubs_start + add r3, r3, #PAGE_OFFSET @ __va(__stubs_start) + sub r3, r3, #0xffff1000 @ subtract VA of stubs section + +0: cmp r5, r6 + bge 1f + ldm r5!, {r7, r8} @ load next relocation entry + cmp r8, #23 @ R_ARM_RELATIVE + bne 0b + cmp r7, #0xff000000 @ vector page? + addgt r7, r7, r3 @ fix up VA offset + ldr r8, [r7, r12] + add r8, r8, r12 + str r8, [r7, r12] + b 0b +1: +#endif + ldr pc, =__mmap_switched +ENDPROC(__primary_switch) + +#ifdef CONFIG_SMP +__secondary_switch: + ldr pc, =__secondary_switched +ENDPROC(__secondary_switch) +#endif + .ltorg +__turn_mmu_on_end: + .popsection #ifdef CONFIG_SMP_ON_UP __HEAD -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit a58cdcfbee11974669a651e3ce049ef729e81411 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=a58cdcfbee11974669a651e3ce049ef729e81411 ------------------------------------------------- When randomizing the kernel load address, there may be a large distance in memory between the decompressor binary and its payload and the destination area in memory. Ensure that the decompressor itself is mapped cacheable in this case, by tweaking the existing routine that takes care of this for XIP decompressors. Cc: Russell King <linux@armlinux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/head.S | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 538c2b08b7e5..1b92d8a66036 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -790,20 +790,24 @@ __setup_mmu: sub r3, r4, #16384 @ Page directory size teq r0, r2 bne 1b /* - * If ever we are running from Flash, then we surely want the cache - * to be enabled also for our execution instance... We map 2MB of it - * so there is no map overlap problem for up to 1 MB compressed kernel. - * If the execution is in RAM then we would only be duplicating the above. + * Make sure our entire executable image (including payload) is mapped + * cacheable, in case it is located outside the region we covered above. + * (This may be the case if running from flash or with randomization enabled) + * If the regions happen to overlap, we just duplicate some of the above. */ orr r1, r6, #0x04 @ ensure B is set for this orr r1, r1, #3 << 10 mov r2, pc + adr_l r9, _end mov r2, r2, lsr #20 + mov r9, r9, lsr #20 orr r1, r1, r2, lsl #20 add r0, r3, r2, lsl #2 - str r1, [r0], #4 + add r9, r3, r9, lsl #2 +0: str r1, [r0], #4 add r1, r1, #1048576 - str r1, [r0] + cmp r0, r9 + bls 0b mov pc, lr ENDPROC(__setup_mmu) -- 2.22.0

From: Ard Biesheuvel <ard.biesheuvel@linaro.org> maillist inclusion commit b152e5c5054c3937211a541be50d8a7c98a59974 category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=arm-kaslr-latest&id=b152e5c5054c3937211a541be50d8a7c98a59974 ------------------------------------------------- Add support to the decompressor to load the kernel at a randomized offset, and invoke the kernel proper while passing on the information about the offset at which the kernel was loaded. This implementation will extract some pseudo-randomness from the low bits of the generic timer (if available), and use CRC-16 to combine it with the build ID string and the device tree binary (which ideally has a /chosen/kaslr-seed property, but may also have other properties that differ between boots). This seed is used to select one of the candidate offsets in the lowmem region that don't overlap the zImage itself, the DTB, the initrd and /memreserve/s and/or /reserved-memory nodes that should be left alone. When booting via the UEFI stub, it is left up to the firmware to supply a suitable seed and select an offset. Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> [ycc: remove __efi_kaslr_offset for UEFI is not supported for now] Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/Makefile | 9 +- arch/arm/boot/compressed/head.S | 88 ++++++ arch/arm/boot/compressed/kaslr.c | 434 ++++++++++++++++++++++++++++++ 3 files changed, 530 insertions(+), 1 deletion(-) create mode 100644 arch/arm/boot/compressed/kaslr.c diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 54307db7854d..7d6d00416411 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -84,8 +84,15 @@ compress-$(CONFIG_KERNEL_LZ4) = lz4 libfdt_objs := fdt_rw.o fdt_ro.o fdt_wip.o fdt.o +ifneq ($(CONFIG_ARM_ATAG_DTB_COMPAT)$(CONFIG_RANDOMIZE_BASE),) +OBJS += $(libfdt_objs) ifeq ($(CONFIG_ARM_ATAG_DTB_COMPAT),y) -OBJS += $(libfdt_objs) atags_to_fdt.o +OBJS += atags_to_fdt.o +endif +ifeq ($(CONFIG_RANDOMIZE_BASE),y) +OBJS += kaslr.o +CFLAGS_kaslr.o := -I $(srctree)/scripts/dtc/libfdt +endif endif # -fstack-protector-strong triggers protection checks in this code, diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 1b92d8a66036..adc17fce4b00 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -165,6 +165,25 @@ orr \res, \res, \tmp1, lsl #24 .endm + .macro record_seed +#ifdef CONFIG_RANDOMIZE_BASE + sub ip, r1, ip, ror #1 @ poor man's kaslr seed, will + sub ip, r2, ip, ror #2 @ be superseded by kaslr-seed + sub ip, r3, ip, ror #3 @ from /chosen if present + sub ip, r4, ip, ror #5 + sub ip, r5, ip, ror #8 + sub ip, r6, ip, ror #13 + sub ip, r7, ip, ror #21 + sub ip, r8, ip, ror #3 + sub ip, r9, ip, ror #24 + sub ip, r10, ip, ror #27 + sub ip, r11, ip, ror #19 + sub ip, r13, ip, ror #14 + sub ip, r14, ip, ror #2 + str_l ip, __kaslr_seed, r9 +#endif + .endm + .section ".start", "ax" /* * sort out different calling conventions @@ -212,6 +231,7 @@ start: __EFI_HEADER 1: ARM_BE8( setend be ) @ go BE8 if compiled for BE8 + record_seed AR_CLASS( mrs r9, cpsr ) #ifdef CONFIG_ARM_VIRT_EXT bl __hyp_stub_install @ get into SVC mode, reversibly @@ -422,6 +442,38 @@ restart: adr r0, LC1 dtb_check_done: #endif +#ifdef CONFIG_RANDOMIZE_BASE + ldr r1, __kaslr_offset @ check if the kaslr_offset is + cmp r1, #0 @ already set + bne 1f + + stmfd sp!, {r0-r3, ip, lr} + adr_l r2, _text @ start of zImage + stmfd sp!, {r2, r8, r10} @ pass stack arguments + + ldr_l r3, __kaslr_seed +#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K) || defined(CONFIG_CPU_V7) + /* + * Get some pseudo-entropy from the low bits of the generic + * timer if it is implemented. + */ + mrc p15, 0, r1, c0, c1, 1 @ read ID_PFR1 register + tst r1, #0x10000 @ have generic timer? + mrrcne p15, 1, r3, r1, c14 @ read CNTVCT +#endif + adr_l r0, __kaslr_offset @ pass &__kaslr_offset in r0 + mov r1, r4 @ pass base address + mov r2, r9 @ pass decompressed image size + eor r3, r3, r3, ror #16 @ pass pseudorandom seed + bl kaslr_early_init + add sp, sp, #12 + cmp r0, #0 + addne r4, r4, r0 @ add offset to base address + ldmfd sp!, {r0-r3, ip, lr} + bne restart +1: +#endif + /* * Check to see if we will overwrite ourselves. * r4 = final kernel address (possibly with LSB set) @@ -1415,10 +1467,46 @@ __enter_kernel: mov r0, #0 @ must be 0 mov r1, r7 @ restore architecture number mov r2, r8 @ restore atags pointer +#ifdef CONFIG_RANDOMIZE_BASE + ldr r3, __kaslr_offset + add r4, r4, #4 @ skip first instruction +#endif ARM( mov pc, r4 ) @ call kernel M_CLASS( add r4, r4, #1 ) @ enter in Thumb mode for M class THUMB( bx r4 ) @ entry point is always ARM for A/R classes +#ifdef CONFIG_RANDOMIZE_BASE + /* + * Minimal implementation of CRC-16 that does not use a + * lookup table and uses 32-bit wide loads, so it still + * performs reasonably well with the D-cache off. Equivalent + * to lib/crc16.c for input sizes that are 4 byte multiples. + */ +ENTRY(__crc16) + push {r4, lr} + ldr r3, =0xa001 @ CRC-16 polynomial +0: subs r2, r2, #4 + popmi {r4, pc} + ldr r4, [r1], #4 +#ifdef __ARMEB__ + eor ip, r4, r4, ror #16 @ endian swap + bic ip, ip, #0x00ff0000 + mov r4, r4, ror #8 + eor r4, r4, ip, lsr #8 +#endif + eor r0, r0, r4 + .rept 32 + lsrs r0, r0, #1 + eorcs r0, r0, r3 + .endr + b 0b +ENDPROC(__crc16) + + .align 2 +__kaslr_seed: .long 0 +__kaslr_offset: .long 0 +#endif + reloc_code_end: #ifdef CONFIG_EFI_STUB diff --git a/arch/arm/boot/compressed/kaslr.c b/arch/arm/boot/compressed/kaslr.c new file mode 100644 index 000000000000..6ac666f3d8be --- /dev/null +++ b/arch/arm/boot/compressed/kaslr.c @@ -0,0 +1,434 @@ +/* + * Copyright (C) 2017 Linaro Ltd; <ard.biesheuvel@linaro.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/libfdt_env.h> +#include <libfdt.h> +#include <linux/types.h> +#include <generated/compile.h> +#include <generated/utsrelease.h> +#include <asm/pgtable.h> + +#include CONFIG_UNCOMPRESS_INCLUDE + +struct regions { + u32 pa_start; + u32 pa_end; + u32 image_size; + u32 zimage_start; + u32 zimage_size; + u32 dtb_start; + u32 dtb_size; + u32 initrd_start; + u32 initrd_size; + int reserved_mem; + int reserved_mem_addr_cells; + int reserved_mem_size_cells; +}; + +extern u32 __crc16(u32 crc, u32 const input[], int byte_count); + +static u32 __memparse(const char *val, const char **retptr) +{ + int base = 10; + u32 ret = 0; + + if (*val == '0') { + val++; + if (*val == 'x' || *val == 'X') { + val++; + base = 16; + } else { + base = 8; + } + } + + while (*val != ',' && *val != ' ' && *val != '\0') { + char c = *val++; + + switch (c) { + case '0' ... '9': + ret = ret * base + (c - '0'); + continue; + case 'a' ... 'f': + ret = ret * base + (c - 'a' + 10); + continue; + case 'A' ... 'F': + ret = ret * base + (c - 'A' + 10); + continue; + case 'g': + case 'G': + ret <<= 10; + case 'm': + case 'M': + ret <<= 10; + case 'k': + case 'K': + ret <<= 10; + break; + default: + if (retptr) + *retptr = NULL; + return 0; + } + } + if (retptr) + *retptr = val; + return ret; +} + +static bool regions_intersect(u32 s1, u32 e1, u32 s2, u32 e2) +{ + return e1 >= s2 && e2 >= s1; +} + +static bool intersects_reserved_region(const void *fdt, u32 start, + u32 end, struct regions *regions) +{ + int subnode, len, i; + u64 base, size; + + /* check for overlap with /memreserve/ entries */ + for (i = 0; i < fdt_num_mem_rsv(fdt); i++) { + if (fdt_get_mem_rsv(fdt, i, &base, &size) < 0) + continue; + if (regions_intersect(start, end, base, base + size)) + return true; + } + + if (regions->reserved_mem < 0) + return false; + + /* check for overlap with static reservations in /reserved-memory */ + for (subnode = fdt_first_subnode(fdt, regions->reserved_mem); + subnode >= 0; + subnode = fdt_next_subnode(fdt, subnode)) { + const fdt32_t *reg; + + len = 0; + reg = fdt_getprop(fdt, subnode, "reg", &len); + while (len >= (regions->reserved_mem_addr_cells + + regions->reserved_mem_size_cells)) { + + base = fdt32_to_cpu(reg[0]); + if (regions->reserved_mem_addr_cells == 2) + base = (base << 32) | fdt32_to_cpu(reg[1]); + + reg += regions->reserved_mem_addr_cells; + len -= 4 * regions->reserved_mem_addr_cells; + + size = fdt32_to_cpu(reg[0]); + if (regions->reserved_mem_size_cells == 2) + size = (size << 32) | fdt32_to_cpu(reg[1]); + + reg += regions->reserved_mem_size_cells; + len -= 4 * regions->reserved_mem_size_cells; + + if (base >= regions->pa_end) + continue; + + if (regions_intersect(start, end, base, + min(base + size, (u64)U32_MAX))) + return true; + } + } + return false; +} + +static bool intersects_occupied_region(const void *fdt, u32 start, + u32 end, struct regions *regions) +{ + if (regions_intersect(start, end, regions->zimage_start, + regions->zimage_start + regions->zimage_size)) + return true; + + if (regions_intersect(start, end, regions->initrd_start, + regions->initrd_start + regions->initrd_size)) + return true; + + if (regions_intersect(start, end, regions->dtb_start, + regions->dtb_start + regions->dtb_size)) + return true; + + return intersects_reserved_region(fdt, start, end, regions); +} + +static u32 count_suitable_regions(const void *fdt, struct regions *regions, + u32 *bitmap) +{ + u32 pa, i = 0, ret = 0; + + for (pa = regions->pa_start; pa < regions->pa_end; pa += SZ_2M, i++) { + if (!intersects_occupied_region(fdt, pa, + pa + regions->image_size, + regions)) { + ret++; + } else { + /* set 'occupied' bit */ + bitmap[i >> 5] |= BIT(i & 0x1f); + } + } + return ret; +} + +static u32 get_region_number(u32 num, u32 *bitmap) +{ + u32 i; + + for (i = 0; num > 0; i++) + if (!(bitmap[i >> 5] & BIT(i & 0x1f))) + num--; + return i; +} + +static void get_cell_sizes(const void *fdt, int node, int *addr_cells, + int *size_cells) +{ + const int *prop; + int len; + + /* + * Retrieve the #address-cells and #size-cells properties + * from the 'node', or use the default if not provided. + */ + *addr_cells = *size_cells = 1; + + prop = fdt_getprop(fdt, node, "#address-cells", &len); + if (len == 4) + *addr_cells = fdt32_to_cpu(*prop); + prop = fdt_getprop(fdt, node, "#size-cells", &len); + if (len == 4) + *size_cells = fdt32_to_cpu(*prop); +} + +static u32 get_memory_end(const void *fdt) +{ + int mem_node, address_cells, size_cells, len; + const fdt32_t *reg; + u64 memory_end = 0; + + /* Look for a node called "memory" at the lowest level of the tree */ + mem_node = fdt_path_offset(fdt, "/memory"); + if (mem_node <= 0) + return 0; + + get_cell_sizes(fdt, 0, &address_cells, &size_cells); + + /* + * Now find the 'reg' property of the /memory node, and iterate over + * the base/size pairs. + */ + len = 0; + reg = fdt_getprop(fdt, mem_node, "reg", &len); + while (len >= 4 * (address_cells + size_cells)) { + u64 base, size; + + base = fdt32_to_cpu(reg[0]); + if (address_cells == 2) + base = (base << 32) | fdt32_to_cpu(reg[1]); + + reg += address_cells; + len -= 4 * address_cells; + + size = fdt32_to_cpu(reg[0]); + if (size_cells == 2) + size = (size << 32) | fdt32_to_cpu(reg[1]); + + reg += size_cells; + len -= 4 * size_cells; + + memory_end = max(memory_end, base + size); + } + return min(memory_end, (u64)U32_MAX); +} + +static char *__strstr(const char *s1, const char *s2, int l2) +{ + int l1; + + l1 = strlen(s1); + while (l1 >= l2) { + l1--; + if (!memcmp(s1, s2, l2)) + return (char *)s1; + s1++; + } + return NULL; +} + +static const char *get_cmdline_param(const char *cmdline, const char *param, + int param_size) +{ + static const char default_cmdline[] = CONFIG_CMDLINE; + const char *p; + + if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && cmdline != NULL) { + p = __strstr(cmdline, param, param_size); + if (p == cmdline || + (p > cmdline && *(p - 1) == ' ')) + return p; + } + + if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || + IS_ENABLED(CONFIG_CMDLINE_EXTEND)) { + p = __strstr(default_cmdline, param, param_size); + if (p == default_cmdline || + (p > default_cmdline && *(p - 1) == ' ')) + return p; + } + return NULL; +} + +static void __puthex32(const char *name, u32 val) +{ + int i; + + while (*name) + putc(*name++); + putc(':'); + for (i = 28; i >= 0; i -= 4) { + char c = (val >> i) & 0xf; + + if (c < 10) + putc(c + '0'); + else + putc(c + 'a' - 10); + } + putc('\r'); + putc('\n'); +} +#define puthex32(val) __puthex32(#val, (val)) + +u32 kaslr_early_init(u32 *kaslr_offset, u32 image_base, u32 image_size, + u32 seed, u32 zimage_start, const void *fdt, + u32 zimage_end) +{ + static const char __aligned(4) build_id[] = UTS_VERSION UTS_RELEASE; + u32 bitmap[(VMALLOC_END - PAGE_OFFSET) / SZ_2M / 32] = {}; + struct regions regions; + const char *command_line; + const char *p; + int chosen, len; + u32 lowmem_top, count, num; + + if (fdt_check_header(fdt)) + return 0; + + chosen = fdt_path_offset(fdt, "/chosen"); + if (chosen < 0) + return 0; + + command_line = fdt_getprop(fdt, chosen, "bootargs", &len); + + /* check the command line for the presence of 'nokaslr' */ + p = get_cmdline_param(command_line, "nokaslr", sizeof("nokaslr") - 1); + if (p != NULL) + return 0; + + /* check the command line for the presence of 'vmalloc=' */ + p = get_cmdline_param(command_line, "vmalloc=", sizeof("vmalloc=") - 1); + if (p != NULL) + lowmem_top = VMALLOC_END - __memparse(p + 8, NULL) - + VMALLOC_OFFSET; + else + lowmem_top = VMALLOC_DEFAULT_BASE; + + regions.image_size = image_base % SZ_128M + round_up(image_size, SZ_2M); + regions.pa_start = round_down(image_base, SZ_128M); + regions.pa_end = lowmem_top - PAGE_OFFSET + regions.pa_start; + regions.zimage_start = zimage_start; + regions.zimage_size = zimage_end - zimage_start; + regions.dtb_start = (u32)fdt; + regions.dtb_size = fdt_totalsize(fdt); + + /* + * Stir up the seed a bit by taking the CRC of the DTB: + * hopefully there's a /chosen/kaslr-seed in there. + */ + seed = __crc16(seed, fdt, regions.dtb_size); + + /* stir a bit more using data that changes between builds */ + seed = __crc16(seed, (u32 *)build_id, sizeof(build_id)); + + /* check for initrd on the command line */ + regions.initrd_start = regions.initrd_size = 0; + p = get_cmdline_param(command_line, "initrd=", sizeof("initrd=") - 1); + if (p != NULL) { + regions.initrd_start = __memparse(p + 7, &p); + if (*p++ == ',') + regions.initrd_size = __memparse(p, NULL); + if (regions.initrd_size == 0) + regions.initrd_start = 0; + } + + /* ... or in /chosen */ + if (regions.initrd_size == 0) { + const fdt32_t *prop; + u64 start = 0, end = 0; + + prop = fdt_getprop(fdt, chosen, "linux,initrd-start", &len); + if (prop) { + start = fdt32_to_cpu(prop[0]); + if (len == 8) + start = (start << 32) | fdt32_to_cpu(prop[1]); + } + + prop = fdt_getprop(fdt, chosen, "linux,initrd-end", &len); + if (prop) { + end = fdt32_to_cpu(prop[0]); + if (len == 8) + end = (end << 32) | fdt32_to_cpu(prop[1]); + } + if (start != 0 && end != 0 && start < U32_MAX) { + regions.initrd_start = start; + regions.initrd_size = max_t(u64, end, U32_MAX) - start; + } + } + + /* check the memory nodes for the size of the lowmem region */ + regions.pa_end = min(regions.pa_end, get_memory_end(fdt)) - + regions.image_size; + + puthex32(regions.image_size); + puthex32(regions.pa_start); + puthex32(regions.pa_end); + puthex32(regions.zimage_start); + puthex32(regions.zimage_size); + puthex32(regions.dtb_start); + puthex32(regions.dtb_size); + puthex32(regions.initrd_start); + puthex32(regions.initrd_size); + + /* check for a reserved-memory node and record its cell sizes */ + regions.reserved_mem = fdt_path_offset(fdt, "/reserved-memory"); + if (regions.reserved_mem >= 0) + get_cell_sizes(fdt, regions.reserved_mem, + ®ions.reserved_mem_addr_cells, + ®ions.reserved_mem_size_cells); + + /* + * Iterate over the physical memory range covered by the lowmem region + * in 2 MB increments, and count each offset at which we don't overlap + * with any of the reserved regions for the zImage itself, the DTB, + * the initrd and any regions described as reserved in the device tree. + * If the region does overlap, set the respective bit in the bitmap[]. + * Using this random value, we go over the bitmap and count zero bits + * until we counted enough iterations, and return the offset we ended + * up at. + */ + count = count_suitable_regions(fdt, ®ions, bitmap); + puthex32(count); + + num = ((u16)seed * count) >> 16; + puthex32(num); + + *kaslr_offset = get_region_number(num, bitmap) * SZ_2M; + puthex32(*kaslr_offset); + + return *kaslr_offset; +} -- 2.22.0

From: Ye Bin <yebin10@huawei.com> maillist inclusion from mainline-v5.7-rc1 commit 32830a0534700f86366f371b150b17f0f0d140d7 category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Fix follow warnings: arm-linux-gnueabihf-ld: warning: orphan section `.data.rel.local' from `net/sunrpc/xprt.o' being placed in section `.data.rel.local'. ...... arm-linux-gnueabihf-ld: warning: orphan section `.got.plt' from `arch/arm/kernel/head.o' being placed in section `.got.plt'. arm-linux-gnueabihf-ld: warning: orphan section `.plt' from `arch/arm/kernel/head.o' being placed in section `.plt'. arm-linux-gnueabihf-ld: warning: orphan section `.data.rel.ro' from `arch/arm/kernel/head.o' being placed in section `.data.rel.ro'. ...... Fixes:("ARM: kernel: make vmlinux buildable as a PIE executable") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/kaslr.c | 4 +++- arch/arm/include/asm/vmlinux.lds.h | 3 ++- arch/arm/kernel/vmlinux.lds.S | 11 ++++++++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/compressed/kaslr.c b/arch/arm/boot/compressed/kaslr.c index 6ac666f3d8be..6a4c60b22908 100644 --- a/arch/arm/boot/compressed/kaslr.c +++ b/arch/arm/boot/compressed/kaslr.c @@ -12,7 +12,7 @@ #include <linux/types.h> #include <generated/compile.h> #include <generated/utsrelease.h> -#include <asm/pgtable.h> +#include <linux/pgtable.h> #include CONFIG_UNCOMPRESS_INCLUDE @@ -64,9 +64,11 @@ static u32 __memparse(const char *val, const char **retptr) case 'g': case 'G': ret <<= 10; + /* fall through */ case 'm': case 'M': ret <<= 10; + /* fall through */ case 'k': case 'K': ret <<= 10; diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h index 68e1fc0b7175..be04f5b5056f 100644 --- a/arch/arm/include/asm/vmlinux.lds.h +++ b/arch/arm/include/asm/vmlinux.lds.h @@ -62,7 +62,7 @@ */ #define ARM_ASSERTS \ .plt : { \ - *(.iplt) *(.rel.iplt) *(.iplt) *(.igot.plt) \ + *(.iplt) *(.rel.iplt) *(.iplt) *(.igot.plt) *(.plt) \ } \ ASSERT(SIZEOF(.plt) == 0, \ "Unexpected run-time procedure linkages detected!") @@ -93,6 +93,7 @@ ARM_STUBS_TEXT \ . = ALIGN(4); \ *(.got) /* Global offset table */ \ + *(.got.plt) \ ARM_CPU_KEEP(PROC_INFO) /* Stack unwinding tables */ diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 262ba776d32d..1a9849968ce6 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -117,7 +117,7 @@ SECTIONS #endif .rel.dyn : ALIGN(8) { __rel_begin = .; - *(.rel .rel.* .rel.dyn) + *(.rel .rel.* .rel.dyn .rel*) } __rel_end = ADDR(.rel.dyn) + SIZEOF(.rel.dyn); @@ -150,6 +150,15 @@ SECTIONS _sdata = .; RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE) + + .data.rel.local : { + *(.data.rel.local) + } + + .data.rel.ro : { + *(.data.rel.ro) + } + _edata = .; BSS_SECTION(0, 0, 0) -- 2.22.0

From: Ye Bin <yebin10@huawei.com> ohos inclusion category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA ------------------------------------------------- Fix memtop calculate, when there is no memory top info, we can't use zero instead it. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/kaslr.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/compressed/kaslr.c b/arch/arm/boot/compressed/kaslr.c index 6a4c60b22908..a4446202332f 100644 --- a/arch/arm/boot/compressed/kaslr.c +++ b/arch/arm/boot/compressed/kaslr.c @@ -316,7 +316,7 @@ u32 kaslr_early_init(u32 *kaslr_offset, u32 image_base, u32 image_size, const char *command_line; const char *p; int chosen, len; - u32 lowmem_top, count, num; + u32 lowmem_top, count, num, mem_fdt; if (fdt_check_header(fdt)) return 0; @@ -393,8 +393,11 @@ u32 kaslr_early_init(u32 *kaslr_offset, u32 image_base, u32 image_size, } /* check the memory nodes for the size of the lowmem region */ - regions.pa_end = min(regions.pa_end, get_memory_end(fdt)) - - regions.image_size; + mem_fdt = get_memory_end(fdt); + if (mem_fdt) + regions.pa_end = min(regions.pa_end, mem_fdt) - regions.image_size; + else + regions.pa_end = regions.pa_end - regions.image_size; puthex32(regions.image_size); puthex32(regions.pa_start); -- 2.22.0

From: Ye Bin <yebin10@huawei.com> ohso inclusion category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA ------------------------------------------------- When boot with vxboot, we must adjust dtb address before kaslr_early_init, and store dtb address after init. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/head.S | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index adc17fce4b00..4b95a9268aba 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -448,6 +448,29 @@ dtb_check_done: bne 1f stmfd sp!, {r0-r3, ip, lr} +#ifdef CONFIG_ARCH_HISI +#ifdef CONFIG_ARM_APPENDED_DTB +#ifdef CONFIG_START_MEM_2M_ALIGN + mov r0, r4 +#ifdef CONFIG_CORTEX_A9 + lsr r0, r0, #20 + lsl r0, r0, #20 +#else + lsr r0, r0, #21 + lsl r0, r0, #21 +#endif + add r0, r0, #0x1000 + ldr r1, [r0] +#ifndef __ARMEB__ + ldr r2, =0xedfe0dd0 @ sig is 0xd00dfeed big endian +#else + ldr r2, =0xd00dfeed +#endif + cmp r1, r2 + moveq r8, r0 +#endif +#endif +#endif adr_l r2, _text @ start of zImage stmfd sp!, {r2, r8, r10} @ pass stack arguments @@ -469,6 +492,14 @@ dtb_check_done: add sp, sp, #12 cmp r0, #0 addne r4, r4, r0 @ add offset to base address +#ifdef CONFIG_VXBOOT +#ifdef CONFIG_START_MEM_2M_ALIGN +#ifdef CONFIG_CORTEX_A9 + adr r1, vx_edata + strne r6, [r1] +#endif +#endif +#endif ldmfd sp!, {r0-r3, ip, lr} bne restart 1: -- 2.22.0

From: Ye Bin <yebin10@huawei.com> ohos inclusion category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA ------------------------------------------------- If we not hide the GOT, when insert module which reference global variable we got error "Unknown symbol_GLOBAL_OFFSET_TABLE_ (err 0)". Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/decompress.c | 4 ++++ arch/arm/boot/compressed/misc.c | 3 +++ 2 files changed, 7 insertions(+) diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c index aa075d8372ea..38a5dd847e03 100644 --- a/arch/arm/boot/compressed/decompress.c +++ b/arch/arm/boot/compressed/decompress.c @@ -1,6 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 #define _LINUX_STRING_H_ +#ifdef CONFIG_RANDOMIZE_BASE +#pragma GCC visibility pop +#endif + #include <linux/compiler.h> /* for inline */ #include <linux/types.h> /* for size_t */ #include <linux/stddef.h> /* for NULL */ diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c index e1e9a5dde853..974a08df7c7a 100644 --- a/arch/arm/boot/compressed/misc.c +++ b/arch/arm/boot/compressed/misc.c @@ -16,6 +16,9 @@ * which should point to addresses in RAM and cleared to 0 on start. * This allows for a much quicker boot time. */ +#ifdef CONFIG_RANDOMIZE_BASE +#pragma GCC visibility pop +#endif unsigned int __machine_arch_type; -- 2.22.0

From: Ye Bin <yebin10@huawei.com> ohos inclusion category: feature feature: ARM kaslr support issue: #I3ZXZF CVE: NA ------------------------------------------------- Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/include/asm/memory.h | 14 ++++++++++++++ arch/arm/kernel/head.S | 2 +- arch/arm/kernel/setup.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 2f841cb65c30..a7a22bf5ca7e 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -167,6 +167,20 @@ extern unsigned long vectors_base; #ifndef __ASSEMBLY__ +#ifdef CONFIG_RANDOMIZE_BASE +extern unsigned long __kaslr_offset; + +static inline unsigned long kaslr_offset(void) +{ + return __kaslr_offset; +} +#else +static inline unsigned long kaslr_offset(void) +{ + return 0; +} +#endif + /* * Physical vs virtual RAM address space conversion. These are * private definitions which should NOT be used outside memory.h diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 26f3e9a0ffe9..b6b82387289b 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -105,7 +105,7 @@ ENTRY(stext) .section ".bss", "aw", %nobits .align 2 -__kaslr_offset: +ENTRY(__kaslr_offset) .long 0 @ will be wiped before entering C code .previous #endif diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index d32f7652a5bf..e37ebb7f1b0b 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -1338,3 +1338,33 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = c_show }; + +/* + * Dump out kernel offset information on panic. + */ +static int dump_kernel_offset(struct notifier_block *self, unsigned long v, + void *p) +{ + const unsigned long offset = kaslr_offset(); + + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && offset > 0) { + pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n", + offset, PAGE_OFFSET); + + } else { + pr_emerg("Kernel Offset: disabled\n"); + } + return 0; +} + +static struct notifier_block kernel_offset_notifier = { + .notifier_call = dump_kernel_offset +}; + +static int __init register_kernel_offset_dumper(void) +{ + atomic_notifier_chain_register(&panic_notifier_list, + &kernel_offset_notifier); + return 0; +} +__initcall(register_kernel_offset_dumper); -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Fix follow warnings: armeb-linux-gnueabi-ld: warning: orphan section `.gnu.hash' from `arch/arm/kernel/head.o' being placed in section `.gnu.hash' Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/vmlinux.lds.S | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 1a9849968ce6..8af654bd16bf 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -70,6 +70,10 @@ SECTIONS #endif _etext = .; /* End of text section */ + .gnu.hash : { + *(.gnu.hash) + } + RO_DATA(PAGE_SIZE) . = ALIGN(4); -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Linux can't enable fpic to compile modules, because the modules have their own relocation table, and they can't use the got table for symbolic addressing. Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 4f550fc0e811..cb7c6b02fede 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -50,6 +50,7 @@ endif ifeq ($(CONFIG_RELOCATABLE),y) KBUILD_CFLAGS += -fpic -include $(srctree)/include/linux/hidden.h +CFLAGS_MODULE += -fno-pic LDFLAGS_vmlinux += -pie -shared -Bsymbolic endif -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Fix the bug of hidden symbols when decompressing code is compiled, we can't enable hidden cflags because decompressed code needs to support symbol relocation. Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/Makefile | 5 +++++ arch/arm/boot/compressed/decompress.c | 4 ---- arch/arm/boot/compressed/misc.c | 4 ---- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 7d6d00416411..4b554eaf4c38 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -113,6 +113,11 @@ targets := vmlinux vmlinux.lds piggy_data piggy.o \ clean-files += piggy_data lib1funcs.S ashldi3.S bswapsdi2.S hyp-stub.S +ifeq ($(CONFIG_RELOCATABLE),y) +HIDDEN_STR := -include $(srctree)/include/linux/hidden.h +KBUILD_CFLAGS := $(subst $(HIDDEN_STR), , $(KBUILD_CFLAGS)) +endif + KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING ccflags-y := -fpic $(call cc-option,-mno-single-pic-base,) -fno-builtin \ diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c index 38a5dd847e03..aa075d8372ea 100644 --- a/arch/arm/boot/compressed/decompress.c +++ b/arch/arm/boot/compressed/decompress.c @@ -1,10 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #define _LINUX_STRING_H_ -#ifdef CONFIG_RANDOMIZE_BASE -#pragma GCC visibility pop -#endif - #include <linux/compiler.h> /* for inline */ #include <linux/types.h> /* for size_t */ #include <linux/stddef.h> /* for NULL */ diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c index 974a08df7c7a..abc083b0db96 100644 --- a/arch/arm/boot/compressed/misc.c +++ b/arch/arm/boot/compressed/misc.c @@ -16,10 +16,6 @@ * which should point to addresses in RAM and cleared to 0 on start. * This allows for a much quicker boot time. */ -#ifdef CONFIG_RANDOMIZE_BASE -#pragma GCC visibility pop -#endif - unsigned int __machine_arch_type; #include <linux/compiler.h> /* for inline */ -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ The dts files of some boards may have mutiple memory nodes, so when calculating the offset value of kaslr, we need to consider the memory layout, and choose the memory node where the zImage is located. Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/kaslr.c | 64 +++++++++++++++++++------------- 1 file changed, 39 insertions(+), 25 deletions(-) diff --git a/arch/arm/boot/compressed/kaslr.c b/arch/arm/boot/compressed/kaslr.c index a4446202332f..cd54b49eec2f 100644 --- a/arch/arm/boot/compressed/kaslr.c +++ b/arch/arm/boot/compressed/kaslr.c @@ -208,11 +208,15 @@ static void get_cell_sizes(const void *fdt, int node, int *addr_cells, *size_cells = fdt32_to_cpu(*prop); } -static u32 get_memory_end(const void *fdt) +/* + * Original method only consider the first memory node in dtb, + * but there may be more than one memory nodes, we only consider + * the memory node zImage exists. + */ +static u32 get_memory_end(const void *fdt, u32 zimage_start) { int mem_node, address_cells, size_cells, len; const fdt32_t *reg; - u64 memory_end = 0; /* Look for a node called "memory" at the lowest level of the tree */ mem_node = fdt_path_offset(fdt, "/memory"); @@ -221,32 +225,38 @@ static u32 get_memory_end(const void *fdt) get_cell_sizes(fdt, 0, &address_cells, &size_cells); - /* - * Now find the 'reg' property of the /memory node, and iterate over - * the base/size pairs. - */ - len = 0; - reg = fdt_getprop(fdt, mem_node, "reg", &len); - while (len >= 4 * (address_cells + size_cells)) { - u64 base, size; - - base = fdt32_to_cpu(reg[0]); - if (address_cells == 2) - base = (base << 32) | fdt32_to_cpu(reg[1]); + while(mem_node >= 0) { + /* + * Now find the 'reg' property of the /memory node, and iterate over + * the base/size pairs. + */ + len = 0; + reg = fdt_getprop(fdt, mem_node, "reg", &len); + while (len >= 4 * (address_cells + size_cells)) { + u64 base, size; + base = fdt32_to_cpu(reg[0]); + if (address_cells == 2) + base = (base << 32) | fdt32_to_cpu(reg[1]); - reg += address_cells; - len -= 4 * address_cells; + reg += address_cells; + len -= 4 * address_cells; - size = fdt32_to_cpu(reg[0]); - if (size_cells == 2) - size = (size << 32) | fdt32_to_cpu(reg[1]); + size = fdt32_to_cpu(reg[0]); + if (size_cells == 2) + size = (size << 32) | fdt32_to_cpu(reg[1]); - reg += size_cells; - len -= 4 * size_cells; + reg += size_cells; + len -= 4 * size_cells; - memory_end = max(memory_end, base + size); + /* Get the base and size of the zimage memory node */ + if (zimage_start >= base && zimage_start < base + size) + return base + size; + } + /* If current memory node is not the one zImage exists, then traverse next memory node. */ + mem_node = fdt_node_offset_by_prop_value(fdt, mem_node, "device_type", "memory", sizeof("memory")); } - return min(memory_end, (u64)U32_MAX); + + return 0; } static char *__strstr(const char *s1, const char *s2, int l2) @@ -392,8 +402,12 @@ u32 kaslr_early_init(u32 *kaslr_offset, u32 image_base, u32 image_size, } } - /* check the memory nodes for the size of the lowmem region */ - mem_fdt = get_memory_end(fdt); + /* + * check the memory nodes for the size of the lowmem region, traverse + * all memory nodes to find the node in which zImage exists, we + * randomize kernel only in the one zImage exists. + */ + mem_fdt = get_memory_end(fdt, zimage_start); if (mem_fdt) regions.pa_end = min(regions.pa_end, mem_fdt) - regions.image_size; else -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Use adr_l instead of adr macro for symbol relocation, because linux symbol relocation has scope restrictions. Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 4b95a9268aba..1ba21b868ea3 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -152,7 +152,7 @@ * in little-endian form. */ .macro get_inflated_image_size, res:req, tmp1:req, tmp2:req - adr \res, .Linflated_image_size_offset + adr_l \res, .Linflated_image_size_offset ldr \tmp1, [\res] add \tmp1, \tmp1, \res @ address of inflated image size @@ -308,7 +308,7 @@ not_angel: orrcc r4, r4, #1 @ remember we skipped cache_on blcs cache_on -restart: adr r0, LC1 +restart: adr_l r0, LC1 ldr sp, [r0] ldr r6, [r0, #4] add sp, sp, r0 -- 2.22.0

From: Cui GaoSheng <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ The bss section is cleared when the kernel is started, and __kaslr_offset variable is located in the bss section, __kaslr_offset is reset to zero, so we move __kaslr_offset from bss section to data section. Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index b6b82387289b..21185e9b033c 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -103,11 +103,11 @@ ENTRY(stext) #ifdef CONFIG_RANDOMIZE_BASE str_l r3, __kaslr_offset, r9 @ offset in r3 if entered via kaslr ep - .section ".bss", "aw", %nobits + .pushsection .data @ data in bss will be cleared .align 2 ENTRY(__kaslr_offset) .long 0 @ will be wiped before entering C code - .previous + .popsection #endif #ifdef CONFIG_ARM_VIRT_EXT -- 2.22.0

From: Ye Bin <yebin10@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ----------------------------------------------- When we configure CONFIG_RANDOMIZE_BASE we find that: [XX]$arm-linux-gnueabihf-readelf -s ./arch/arm/vdso/vdso.so Symbol table '.dynsym' contains 5 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000278 0 SECTION LOCAL DEFAULT 8 2: 00000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6 We can't find __vdso_gettimeofday and __vdso_clock_gettime symbol. So call clock_gettime and gettimeofday will call system call. This results in performance degradation. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/vdso/vgettimeofday.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm/vdso/vgettimeofday.c b/arch/arm/vdso/vgettimeofday.c index 1976c6f325a4..425e4f2458ec 100644 --- a/arch/arm/vdso/vgettimeofday.c +++ b/arch/arm/vdso/vgettimeofday.c @@ -4,6 +4,11 @@ * * Copyright 2015 Mentor Graphics Corporation. */ + +#ifdef CONFIG_RANDOMIZE_BASE +#pragma GCC visibility pop +#endif + #include <linux/time.h> #include <linux/types.h> -- 2.22.0

From: Gaosheng Cui <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Fix vector fiq offset when enabled kaslr, we need to get the real symbol address according to __kaslr_offset, otherwise the fiq interrupt will fail to register. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/kernel/fiq.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/arm/kernel/fiq.c b/arch/arm/kernel/fiq.c index 98ca3e3fa847..91e37dfe0396 100644 --- a/arch/arm/kernel/fiq.c +++ b/arch/arm/kernel/fiq.c @@ -48,10 +48,17 @@ #include <asm/irq.h> #include <asm/traps.h> +#ifdef CONFIG_RANDOMIZE_BASE +#define FIQ_OFFSET ({ \ + extern void *vector_fiq_offset; \ + (unsigned)&vector_fiq_offset - kaslr_offset(); \ + }) +#else #define FIQ_OFFSET ({ \ extern void *vector_fiq_offset; \ (unsigned)&vector_fiq_offset; \ }) +#endif static unsigned long dfl_fiq_insn; static struct pt_regs dfl_fiq_regs; -- 2.22.0

From: Gaosheng Cui <cuigaosheng1@huawei.com> ohos inclusion category: bugfix issue: #I3ZXZF CVE: NA ------------------------------------------------------------------------ Gcc flag '-fvisibility=hidden' specifies the visibility attribute for external linkage entities in object files. You can also selectively set visibility attributes for entities by using pairs of the #pragma GCC visibility push and #pragma GCC visibility pop compiler directives throughout your source program.when we include the hidden.h, __bss_start and __bss_end went from global symbol to local symbol,so we need to modify the regular expression to accommodate this change. Fixes: 27fdadbf34d2 ("[Backport] ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Yu Changchun <yuchangchun1@huawei.com> --- arch/arm/boot/compressed/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 4b554eaf4c38..7a972136af38 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -127,8 +127,8 @@ asflags-y := -DZIMAGE # Supply kernel BSS size to the decompressor via a linker symbol. KBSS_SZ = $(shell echo $$(($$($(NM) $(obj)/../../../../vmlinux | \ - sed -n -e 's/^\([^ ]*\) [ABD] __bss_start$$/-0x\1/p' \ - -e 's/^\([^ ]*\) [ABD] __bss_stop$$/+0x\1/p') )) ) + sed -n -e 's/^\([^ ]*\) [ABbD] __bss_start$$/-0x\1/p' \ + -e 's/^\([^ ]*\) [ABbD] __bss_stop$$/+0x\1/p') )) ) LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ) # Supply ZRELADDR to the decompressor via a linker symbol. ifneq ($(CONFIG_AUTO_ZRELADDR),y) -- 2.22.0
participants (1)
-
Yu Changchun