[PATCH OpenHarmony-4.19 00/49] pstore/blk: Introduce backend for block devices
The following patches are for OpenHarmony 4.19 kernel. These patches refer to pstore/bl from upstream. ------------------------------------------------------------- Al Viro (1): pstore: switch to copy_from_user() Christoph Hellwig (1): mark pstore-blk as broken Dmitry Osipenko (1): pstore/ram: Rate-limit "uncorrectable error in header" message Douglas Anderson (1): pstore/ram: Improve backward compatibility with older Chromebooks Greg Kroah-Hartman (1): pstore: no need to check return value of debugfs_create functions Gustavo A. R. Silva (1): pstore/ram: Replace zero-length array with flexible-array member Jiri Bohac (1): pstore: Fix typo in compression option name Joel Fernandes (Google) (2): pstore: Map PSTORE_TYPE_* to strings pstore/ram: Simplify ramoops_get_next_prz() arguments Kees Cook (19): pstore/ram: Clarify resource reservation labels pstore/ram: Standardize module name in ramoops pstore: Improve and update some comments and status output pstore/ram: Report backend assignments with finer granularity pstore: Replace open-coded << with BIT() pstore/ram: Avoid needless alloc during header write pstore/ram: Regularize prz label allocation lifetime pstore: Rename "allpstore" to "records_list" pstore: Convert "records_list" locking to mutex pstore: Refactor pstorefs record list removal pstore: Add locking around superblock changes pstore/platform: Switch pstore_info::name to const pstore/platform: Move module params after declarations pstore/ram: Adjust module param permissions to reflect reality pstore/ram: Refactor DT size parsing pstore/ram: Refactor ftrace buffer merging pstore/ftrace: Provide ftrace log merging routine pstore/ram: Introduce max_reason and convert dump_oops pstore/blk: Introduce "best_effort" mode Mukesh Ojha (1): pstore: Add mem_type property DT parsing support Pavel Tatashin (1): pstore/platform: Pass max_reason to kmesg dump Peng Wang (1): pstore: Avoid duplicate call of persistent_ram_zap() Sai Prakash Ranjan (1): pstore/ram: Fix console ramoops to show the previous boot logs Tetsuo Handa (1): pstore: Fix warning in pstore_kill_sb() Thomas Meyer (1): pstore: Fix bool initialization/comparison Vasile-Laurentiu Stanimir (1): pstore: Move kmsg_bytes default into Kconfig WeiXiong Liao (9): pstore/zone: Introduce common layer to manage storage zones pstore/blk: Introduce backend for block devices pstore/zone,blk: Add support for pmsg frontend pstore/zone,blk: Add console frontend support pstore/zone,blk: Add ftrace frontend support Documentation: Add details for pstore/blk pstore/zone: Provide way to skip "broken" zone for MTD devices pstore/blk: Provide way to query pstore configuration pstore/blk: Support non-block storage devices Yue Hu (4): pstore/ram: Replace dummy_data heap memory with stack memory pstore/ram: Move initialization earlier pstore: Avoid writing records with zero size pstore/ram: Add kmsg hlen zero check to ramoops_pstore_write() chenqiwu (1): pstore/ram: remove unnecessary ramoops_unregister_dummy() Documentation/admin-guide/pstore-blk.rst | 238 +++ Documentation/admin-guide/ramoops.rst | 18 +- .../bindings/reserved-memory/ramoops.txt | 10 +- MAINTAINERS | 5 + drivers/acpi/apei/erst.c | 2 +- drivers/platform/chrome/chromeos_pstore.c | 2 +- fs/pstore/Kconfig | 118 ++ fs/pstore/Makefile | 6 + fs/pstore/blk.c | 517 ++++++ fs/pstore/ftrace.c | 74 +- fs/pstore/inode.c | 152 +- fs/pstore/internal.h | 11 +- fs/pstore/platform.c | 78 +- fs/pstore/ram.c | 323 ++-- fs/pstore/ram_core.c | 67 +- fs/pstore/zone.c | 1469 +++++++++++++++++ include/linux/pstore.h | 37 +- include/linux/pstore_blk.h | 118 ++ include/linux/pstore_ram.h | 46 +- include/linux/pstore_zone.h | 60 + 20 files changed, 3026 insertions(+), 325 deletions(-) create mode 100644 Documentation/admin-guide/pstore-blk.rst create mode 100644 fs/pstore/blk.c create mode 100644 fs/pstore/zone.c create mode 100644 include/linux/pstore_blk.h create mode 100644 include/linux/pstore_zone.h -- 2.17.1
From: Kees Cook
From: Kees Cook
From: Peng Wang
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: "Joel Fernandes (Google)"
From: "Joel Fernandes (Google)"
stable inclusion from stable-5.10 category: feature commit:f0f23e5469dc80b482d985898a930be0e249a162 issue: #I4919J
--------------------------------
In later patches we will need to map types to names, so create a constant table for that which can also be used in different parts of old and new code. This saves the type in the PRZ which will be useful in later patches.
Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX.
This includes removing the now redundant filename templates which can use a single format string. Also, there's no reason to limit the "is it still compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs filename. Records are zero-initialized, so a backend would need to have explicitly set compressed=1.
Signed-off-by: Joel Fernandes (Google)
Co-developed-by: Kees Cook Signed-off-by: Kees Cook (cherry picked from commit f0f23e5469dc80b482d985898a930be0e249a162) Signed-off-by: roger --- drivers/acpi/apei/erst.c | 2 +- fs/pstore/inode.c | 51 +++----------------------------------- fs/pstore/platform.c | 37 +++++++++++++++++++++++++++ fs/pstore/ram.c | 4 ++- include/linux/pstore.h | 17 ++++++++++--- include/linux/pstore_ram.h | 40 ++++++++++++++++++++++++++++++ 6 files changed, 99 insertions(+), 52 deletions(-) diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index ab8faa6d6616..9953e50667ec 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -1035,7 +1035,7 @@ static ssize_t erst_reader(struct pstore_record *record) CPER_SECTION_TYPE_MCE) == 0) record->type = PSTORE_TYPE_MCE; else - record->type = PSTORE_TYPE_UNKNOWN; + record->type = PSTORE_TYPE_MAX;
if (rcd->hdr.validation_bits & CPER_VALID_TIMESTAMP) record->time.tv_sec = rcd->hdr.timestamp; diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 6f51c6d7b965..d49fb7a8b664 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -334,53 +334,10 @@ int pstore_mkfile(struct dentry *root, struct pstore_record *record) inode->i_mode = S_IFREG | 0444; inode->i_fop = &pstore_file_operations;
- switch (record->type) { - case PSTORE_TYPE_DMESG: - scnprintf(name, sizeof(name), "dmesg-%s-%llu%s", - record->psi->name, record->id, - record->compressed ? ".enc.z" : ""); - break; - case PSTORE_TYPE_CONSOLE: - scnprintf(name, sizeof(name), "console-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_FTRACE: - scnprintf(name, sizeof(name), "ftrace-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_MCE: - scnprintf(name, sizeof(name), "mce-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_PPC_RTAS: - scnprintf(name, sizeof(name), "rtas-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_PPC_OF: - scnprintf(name, sizeof(name), "powerpc-ofw-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_PPC_COMMON: - scnprintf(name, sizeof(name), "powerpc-common-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_PMSG: - scnprintf(name, sizeof(name), "pmsg-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_PPC_OPAL: - scnprintf(name, sizeof(name), "powerpc-opal-%s-%llu", - record->psi->name, record->id); - break; - case PSTORE_TYPE_UNKNOWN: - scnprintf(name, sizeof(name), "unknown-%s-%llu", - record->psi->name, record->id); - break; - default: - scnprintf(name, sizeof(name), "type%d-%s-%llu", - record->type, record->psi->name, record->id); - break; - } + scnprintf(name, sizeof(name), "%s-%s-%llu%s", + pstore_type_to_name(record->type), + record->psi->name, record->id, + record->compressed ? ".enc.z" : "");
private = kzalloc(sizeof(*private), GFP_KERNEL); if (!private) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index ce610b0fb669..7a873006d525 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -59,6 +59,19 @@ MODULE_PARM_DESC(update_ms, "milliseconds before pstore updates its content " "enabling this option is not safe, it may lead to further " "corruption on Oopses)");
+/* Names should be in the same order as the enum pstore_type_id */ +static const char * const pstore_type_names[] = { + "dmesg", + "mce", + "console", + "ftrace", + "rtas", + "powerpc-ofw", + "powerpc-common", + "pmsg", + "powerpc-opal", +}; + static int pstore_new_entry;
static void pstore_timefunc(struct timer_list *); @@ -104,6 +117,30 @@ void pstore_set_kmsg_bytes(int bytes) /* Tag each group of saved records with a sequence number */ static int oopscount;
+const char *pstore_type_to_name(enum pstore_type_id type) +{ + BUILD_BUG_ON(ARRAY_SIZE(pstore_type_names) != PSTORE_TYPE_MAX); + + if (WARN_ON_ONCE(type >= PSTORE_TYPE_MAX)) + return "unknown"; + + return pstore_type_names[type]; +} +EXPORT_SYMBOL_GPL(pstore_type_to_name); + +enum pstore_type_id pstore_name_to_type(const char *name) +{ + int i; + + for (i = 0; i < PSTORE_TYPE_MAX; i++) { + if (!strcmp(pstore_type_names[i], name)) + return i; + } + + return PSTORE_TYPE_MAX; +} +EXPORT_SYMBOL_GPL(pstore_name_to_type); + static const char *get_reason_str(enum kmsg_dump_reason reason) { switch (reason) { diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index e2fb0ce04d74..f7a6b40dbe0f 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -624,6 +624,7 @@ static int ramoops_init_przs(const char *name, goto fail; } *paddr += zone_sz; + prz_ar[i]->type = pstore_name_to_type(name); }
*przs = prz_ar; @@ -663,6 +664,7 @@ static int ramoops_init_prz(const char *name, }
*paddr += sz; + (*prz)->type = pstore_name_to_type(name);
return 0; } @@ -795,7 +797,7 @@ static int ramoops_probe(struct platform_device *pdev)
dump_mem_sz = cxt->size - cxt->console_size - cxt->ftrace_size - cxt->pmsg_size; - err = ramoops_init_przs("dump", dev, cxt, &cxt->dprzs, &paddr, + err = ramoops_init_przs("dmesg", dev, cxt, &cxt->dprzs, &paddr, dump_mem_sz, cxt->record_size, &cxt->max_dump_cnt, 0, 0); if (err) diff --git a/include/linux/pstore.h b/include/linux/pstore.h index dd8e55742bf9..b146181e8709 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -32,21 +32,32 @@
struct module;
-/* pstore record types (see fs/pstore/inode.c for filename templates) */ +/* + * pstore record types (see fs/pstore/platform.c for pstore_type_names[]) + * These values may be written to storage (see EFI vars backend), so + * they are kind of an ABI. Be careful changing the mappings. + */ enum pstore_type_id { + /* Frontend storage types */ PSTORE_TYPE_DMESG = 0, PSTORE_TYPE_MCE = 1, PSTORE_TYPE_CONSOLE = 2, PSTORE_TYPE_FTRACE = 3, - /* PPC64 partition types */ + + /* PPC64-specific partition types */ PSTORE_TYPE_PPC_RTAS = 4, PSTORE_TYPE_PPC_OF = 5, PSTORE_TYPE_PPC_COMMON = 6, PSTORE_TYPE_PMSG = 7, PSTORE_TYPE_PPC_OPAL = 8, - PSTORE_TYPE_UNKNOWN = 255 + + /* End of the list */ + PSTORE_TYPE_MAX };
+const char *pstore_type_to_name(enum pstore_type_id type); +enum pstore_type_id pstore_name_to_type(const char *name); + struct pstore_info; /** * struct pstore_record - details of a pstore record entry diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h index 6e94980357d2..dc4e1423ee27 100644 --- a/include/linux/pstore_ram.h +++ b/include/linux/pstore_ram.h @@ -22,6 +22,7 @@ #include
#include #include +#include #include /* @@ -43,11 +44,50 @@ struct persistent_ram_ecc_info { uint16_t *par; };
+/** + * struct persistent_ram_zone - Details of a persistent RAM zone (PRZ) + * used as a pstore backend + * + * @paddr: physical address of the mapped RAM area + * @size: size of mapping + * @label: unique name of this PRZ + * @type: frontend type for this PRZ + * @flags: holds PRZ_FLAGS_* bits + * + * @buffer_lock: + * locks access to @buffer "size" bytes and "start" offset + * @buffer: + * pointer to actual RAM area managed by this PRZ + * @buffer_size: + * bytes in @buffer->data (not including any trailing ECC bytes) + * + * @par_buffer: + * pointer into @buffer->data containing ECC bytes for @buffer->data + * @par_header: + * pointer into @buffer->data containing ECC bytes for @buffer header + * (i.e. all fields up to @data) + * @rs_decoder: + * RSLIB instance for doing ECC calculations + * @corrected_bytes: + * ECC corrected bytes accounting since boot + * @bad_blocks: + * ECC uncorrectable bytes accounting since boot + * @ecc_info: + * ECC configuration details + * + * @old_log: + * saved copy of @buffer->data prior to most recent wipe + * @old_log_size: + * bytes contained in @old_log + * + */ 注释部分放在本来的补丁中:c208f7d4b037(pstore/ram: Add kern-doc for struct
在 2021/10/12 9:58, roger 写道: persistent_ram_zone) 或者就不要注释部分,同时commit信息中可以保留conflicts文件信息:fs/pstore/inode.c等。
struct persistent_ram_zone { phys_addr_t paddr; size_t size; void *vaddr; char *label; + enum pstore_type_id type; + struct persistent_ram_buffer *buffer; size_t buffer_size; u32 flags;
From: "Joel Fernandes (Google)"
From: Thomas Meyer
From: Sai Prakash Ranjan
From: Yue Hu
From: Yue Hu
From: Yue Hu
From: Yue Hu
From: Kees Cook
From: Douglas Anderson
From: Greg Kroah-Hartman
From: Kees Cook
在 2021/10/12 9:58, roger 写道:
From: Kees Cook
stable inclusion from stable-5.10 category: feature commit:e163fdb3f7f8c62dccf194f3f37a7bcb3c333aa8 issue: #I4919J
--------------------------------
In my attempt to fix a memory leak, I introduced a double-free in the pstore error path. Instead of trying to manage the allocation lifetime between persistent_ram_new() and its callers, adjust the logic so persistent_ram_new() always takes a kstrdup() copy, and leaves the caller's allocation lifetime up to the caller. Therefore callers are _always_ responsible for freeing their label. Before, it only needed freeing when the prz itself failed to allocate, and not in any of the other prz failure cases, which callers would have no visibility into, which is the root design problem that lead to both the leak and now double-free bugs.
Reported-by: Cengiz Can
Link: https://lore.kernel.org/lkml/d4ec59002ede4aaf9928c7f7526da87c@kernel.wtf Fixes: 8df955a32a73 ("pstore/ram: Fix error-path memory leak in persistent_ram_new() callers")
补丁修改适配没啥问题,为保持完整性,应该把该补丁解决的问题补丁也合进来。 8df955a32a73 ("pstore/ram: Fix error-path memory leak in persistent_ram_new() callers")
Cc: stable@vger.kernel.org Signed-off-by: Kees Cook
(cherry picked from commit e163fdb3f7f8c62dccf194f3f37a7bcb3c333aa8) Signed-off-by: roger --- fs/pstore/ram.c | 2 ++ fs/pstore/ram_core.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index f83639feb7dc..7a61743177ee 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -597,6 +597,7 @@ static int ramoops_init_przs(const char *name, prz_ar[i] = persistent_ram_new(*paddr, zone_sz, sig, &cxt->ecc_info, cxt->memtype, flags, label); + kfree(label); if (IS_ERR(prz_ar[i])) { err = PTR_ERR(prz_ar[i]); dev_err(dev, "failed to request %s mem region (0x%zx@0x%llx): %d\n", @@ -642,6 +643,7 @@ static int ramoops_init_prz(const char *name, label = kasprintf(GFP_KERNEL, "ramoops:%s", name); *prz = persistent_ram_new(*paddr, sz, sig, &cxt->ecc_info, cxt->memtype, PRZ_FLAG_ZAP_OLD, label); + kfree(label); if (IS_ERR(*prz)) { int err = PTR_ERR(*prz);
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 8200debeb6b7..a908ed97a24c 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -573,7 +573,7 @@ struct persistent_ram_zone *persistent_ram_new(phys_addr_t start, size_t size, /* Initialize general buffer state. */ raw_spin_lock_init(&prz->buffer_lock); prz->flags = flags; - prz->label = label; + prz->label = kstrdup(label, GFP_KERNEL);
ret = persistent_ram_buffer_map(start, size, prz, memtype); if (ret)
From: chenqiwu
From: Al Viro
From: "Gustavo A. R. Silva"
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Pavel Tatashin
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: Kees Cook
From: WeiXiong Liao
在 2021/10/12 9:58, roger 写道:
From: WeiXiong Liao
stable inclusion from stable-5.10 category: feature commit:d26c3321fe18dc74517dc1f518d584aa33b0a851 issue: #I4919J
--------------------------------
Implement a common set of APIs needed to support pstore storage zones, based on how ramoops is designed. This will be used by pstore/blk with the intention of migrating pstore/ram in the future.
Signed-off-by: WeiXiong Liao
Link: https://lore.kernel.org/lkml/20200511233229.27745-2-keescook@chromium.org/ Co-developed-by: Kees Cook Signed-off-by: Kees Cook (cherry picked from commit d26c3321fe18dc74517dc1f518d584aa33b0a851) Signed-off-by: roger --- fs/pstore/Kconfig | 7 + fs/pstore/Makefile | 3 + fs/pstore/platform.c | 2 +- fs/pstore/zone.c | 989 ++++++++++++++++++++++++++++++++++++ include/linux/pstore.h | 1 + include/linux/pstore_zone.h | 44 ++ 6 files changed, 1045 insertions(+), 1 deletion(-) create mode 100644 fs/pstore/zone.c create mode 100644 include/linux/pstore_zone.h diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index 503086f7f7c1..4c01efa459b3 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -153,3 +153,10 @@ config PSTORE_RAM "ramoops.ko".
For more information, see Documentation/admin-guide/ramoops.rst. + +config PSTORE_ZONE + tristate + depends on PSTORE + help + The common layer for pstore/blk (and pstore/ram in the future) + to manage storage in zones. diff --git a/fs/pstore/Makefile b/fs/pstore/Makefile index 967b5891f325..58a967cbe4af 100644 --- a/fs/pstore/Makefile +++ b/fs/pstore/Makefile @@ -12,3 +12,6 @@ pstore-$(CONFIG_PSTORE_PMSG) += pmsg.o
ramoops-objs += ram.o ram_core.o obj-$(CONFIG_PSTORE_RAM) += ramoops.o + +pstore_zone-objs += zone.o +obj-$(CONFIG_PSTORE_ZONE) += pstore_zone.o diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 310034809594..216c3247a2c6 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -146,7 +146,7 @@ enum pstore_type_id pstore_name_to_type(const char *name) } EXPORT_SYMBOL_GPL(pstore_name_to_type);
-static const char *get_reason_str(enum kmsg_dump_reason reason) +const char *get_reason_str(enum kmsg_dump_reason reason)
commit信息中可以补一下因为需要get_reason_str新引用了linux/pstore.h
{ switch (reason) { case KMSG_DUMP_PANIC: diff --git a/fs/pstore/zone.c b/fs/pstore/zone.c new file mode 100644 index 000000000000..31c7ce48c0c8 --- /dev/null +++ b/fs/pstore/zone.c @@ -0,0 +1,989 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Provide a pstore intermediate backend, organized into kernel memory + * allocated zones that are then mapped and flushed into a single + * contiguous region on a storage backend of some kind (block, mtd, etc). + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include
+#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +#ifndef SECTOR_SIZE +#define SECTOR_SIZE 512 +#endif + +/** + * struct psz_head - header of zone to flush to storage + * + * @sig: signature to indicate header (PSZ_SIG xor PSZONE-type value) + * @datalen: length of data in @data + * @data: zone data. + */ +struct psz_buffer { +#define PSZ_SIG (0x43474244) /* DBGC */ + uint32_t sig; + atomic_t datalen; + uint8_t data[]; +}; + +/** + * struct psz_kmsg_header - kmsg dump-specific header to flush to storage + * + * @magic: magic num for kmsg dump header + * @time: kmsg dump trigger time + * @compressed: whether conpressed + * @counter: kmsg dump counter + * @reason: the kmsg dump reason (e.g. oops, panic, etc) + * @data: pointer to log data + * + * This is a sub-header for a kmsg dump, trailing after &psz_buffer. + */ +struct psz_kmsg_header { +#define PSTORE_KMSG_HEADER_MAGIC 0x4dfc3ae5 /* Just a random number */ + uint32_t magic; + struct timespec64 time; + bool compressed; + uint32_t counter; + enum kmsg_dump_reason reason; + uint8_t data[]; +}; + +/** + * struct pstore_zone - single stored buffer + * + * @off: zone offset of storage + * @type: front-end type for this zone + * @name: front-end name for this zone + * @buffer: pointer to data buffer managed by this zone + * @oldbuf: pointer to old data buffer + * @buffer_size: bytes in @buffer->data + * @should_recover: whether this zone should recover from storage + * @dirty: whether the data in @buffer dirty + * + * zone structure in memory. + */ +struct pstore_zone { + loff_t off; + const char *name; + enum pstore_type_id type; + + struct psz_buffer *buffer; + struct psz_buffer *oldbuf; + size_t buffer_size; + bool should_recover; + atomic_t dirty; +}; + +/** + * struct psz_context - all about running state of pstore/zone + * + * @kpszs: kmsg dump storage zones + * @kmsg_max_cnt: max count of @kpszs + * @kmsg_read_cnt: counter of total read kmsg dumps + * @kmsg_write_cnt: counter of total kmsg dump writes + * @oops_counter: counter of oops dumps + * @panic_counter: counter of panic dumps + * @recovered: whether finished recovering data from storage + * @on_panic: whether panic is happening + * @pstore_zone_info_lock: lock to @pstore_zone_info + * @pstore_zone_info: information from backend + * @pstore: structure for pstore + */ +struct psz_context { + struct pstore_zone **kpszs; + unsigned int kmsg_max_cnt; + unsigned int kmsg_read_cnt; + unsigned int kmsg_write_cnt; + /* + * These counters should be calculated during recovery. + * It records the oops/panic times after crashes rather than boots. + */ + unsigned int oops_counter; + unsigned int panic_counter; + atomic_t recovered; + atomic_t on_panic; + + /* + * pstore_zone_info_lock protects this entire structure during calls + * to register_pstore_zone()/unregister_pstore_zone(). + */ + struct mutex pstore_zone_info_lock; + struct pstore_zone_info *pstore_zone_info; + struct pstore_info pstore; +}; +static struct psz_context pstore_zone_cxt; + +/** + * enum psz_flush_mode - flush mode for psz_zone_write() + * + * @FLUSH_NONE: do not flush to storage but update data on memory + * @FLUSH_PART: just flush part of data including meta data to storage + * @FLUSH_META: just flush meta data of zone to storage + * @FLUSH_ALL: flush all of zone + */ +enum psz_flush_mode { + FLUSH_NONE = 0, + FLUSH_PART, + FLUSH_META, + FLUSH_ALL, +}; + +static inline int buffer_datalen(struct pstore_zone *zone) +{ + return atomic_read(&zone->buffer->datalen); +} + +static inline bool is_on_panic(void) +{ + return atomic_read(&pstore_zone_cxt.on_panic); +} + +static ssize_t psz_zone_read(struct pstore_zone *zone, char *buf, + size_t len, unsigned long off) +{ + if (!buf || !zone->buffer) + return -EINVAL; + if (off > zone->buffer_size) + return -EINVAL; + len = min_t(size_t, len, zone->buffer_size - off); + memcpy(buf, zone->buffer->data + off, len); + return len; +} + +static int psz_zone_write(struct pstore_zone *zone, + enum psz_flush_mode flush_mode, const char *buf, + size_t len, unsigned long off) +{ + struct pstore_zone_info *info = pstore_zone_cxt.pstore_zone_info; + ssize_t wcnt = 0; + ssize_t (*writeop)(const char *buf, size_t bytes, loff_t pos); + size_t wlen; + + if (off > zone->buffer_size) + return -EINVAL; + + wlen = min_t(size_t, len, zone->buffer_size - off); + if (buf && wlen) { + memcpy(zone->buffer->data + off, buf, wlen); + atomic_set(&zone->buffer->datalen, wlen + off); + } + + /* avoid to damage old records */ + if (!is_on_panic() && !atomic_read(&pstore_zone_cxt.recovered)) + goto dirty; + + writeop = is_on_panic() ? info->panic_write : info->write; + if (!writeop) + goto dirty; + + switch (flush_mode) { + case FLUSH_NONE: + if (unlikely(buf && wlen)) + goto dirty; + return 0; + case FLUSH_PART: + wcnt = writeop((const char *)zone->buffer->data + off, wlen, + zone->off + sizeof(*zone->buffer) + off); + if (wcnt != wlen) + goto dirty; + /* fallthrough */ + case FLUSH_META: + wlen = sizeof(struct psz_buffer); + wcnt = writeop((const char *)zone->buffer, wlen, zone->off); + if (wcnt != wlen) + goto dirty; + break; + case FLUSH_ALL: + wlen = zone->buffer_size + sizeof(*zone->buffer); + wcnt = writeop((const char *)zone->buffer, wlen, zone->off); + if (wcnt != wlen) + goto dirty; + break; + } + + return 0; +dirty: + atomic_set(&zone->dirty, true); + return -EBUSY; +} + +static int psz_flush_dirty_zone(struct pstore_zone *zone) +{ + int ret; + + if (unlikely(!zone)) + return -EINVAL; + + if (unlikely(!atomic_read(&pstore_zone_cxt.recovered))) + return -EBUSY; + + if (!atomic_xchg(&zone->dirty, false)) + return 0; + + ret = psz_zone_write(zone, FLUSH_ALL, NULL, 0, 0); + if (ret) + atomic_set(&zone->dirty, true); + return ret; +} + +static int psz_flush_dirty_zones(struct pstore_zone **zones, unsigned int cnt) +{ + int i, ret; + struct pstore_zone *zone; + + if (!zones) + return -EINVAL; + + for (i = 0; i < cnt; i++) { + zone = zones[i]; + if (!zone) + return -EINVAL; + ret = psz_flush_dirty_zone(zone); + if (ret) + return ret; + } + return 0; +} + +static int psz_move_zone(struct pstore_zone *old, struct pstore_zone *new) +{ + const char *data = (const char *)old->buffer->data; + int ret; + + ret = psz_zone_write(new, FLUSH_ALL, data, buffer_datalen(old), 0); + if (ret) { + atomic_set(&new->buffer->datalen, 0); + atomic_set(&new->dirty, false); + return ret; + } + atomic_set(&old->buffer->datalen, 0); + return 0; +} + +static int psz_kmsg_recover_data(struct psz_context *cxt) +{ + struct pstore_zone_info *info = cxt->pstore_zone_info; + struct pstore_zone *zone = NULL; + struct psz_buffer *buf; + unsigned long i; + ssize_t rcnt; + + if (!info->read) + return -EINVAL; + + for (i = 0; i < cxt->kmsg_max_cnt; i++) { + zone = cxt->kpszs[i]; + if (unlikely(!zone)) + return -EINVAL; + if (atomic_read(&zone->dirty)) { + unsigned int wcnt = cxt->kmsg_write_cnt; + struct pstore_zone *new = cxt->kpszs[wcnt]; + int ret; + + ret = psz_move_zone(zone, new); + if (ret) { + pr_err("move zone from %lu to %d failed\n", + i, wcnt); + return ret; + } + cxt->kmsg_write_cnt = (wcnt + 1) % cxt->kmsg_max_cnt; + } + if (!zone->should_recover) + continue; + buf = zone->buffer; + rcnt = info->read((char *)buf, zone->buffer_size + sizeof(*buf), + zone->off); + if (rcnt != zone->buffer_size + sizeof(*buf)) + return (int)rcnt < 0 ? (int)rcnt : -EIO; + } + return 0; +} + +static int psz_kmsg_recover_meta(struct psz_context *cxt) +{ + struct pstore_zone_info *info = cxt->pstore_zone_info; + struct pstore_zone *zone; + size_t rcnt, len; + struct psz_buffer *buf; + struct psz_kmsg_header *hdr; + struct timespec64 time = { }; + unsigned long i; + /* + * Recover may on panic, we can't allocate any memory by kmalloc. + * So, we use local array instead. + */ + char buffer_header[sizeof(*buf) + sizeof(*hdr)] = {0}; + + if (!info->read) + return -EINVAL; + + len = sizeof(*buf) + sizeof(*hdr); + buf = (struct psz_buffer *)buffer_header; + for (i = 0; i < cxt->kmsg_max_cnt; i++) { + zone = cxt->kpszs[i]; + if (unlikely(!zone)) + return -EINVAL; + + rcnt = info->read((char *)buf, len, zone->off); + if (rcnt != len) { + pr_err("read %s with id %lu failed\n", zone->name, i); + return (int)rcnt < 0 ? (int)rcnt : -EIO; + } + + if (buf->sig != zone->buffer->sig) { + pr_debug("no valid data in kmsg dump zone %lu\n", i); + continue; + } + + if (zone->buffer_size < atomic_read(&buf->datalen)) { + pr_info("found overtop zone: %s: id %lu, off %lld, size %zu\n", + zone->name, i, zone->off, + zone->buffer_size); + continue; + } + + hdr = (struct psz_kmsg_header *)buf->data; + if (hdr->magic != PSTORE_KMSG_HEADER_MAGIC) { + pr_info("found invalid zone: %s: id %lu, off %lld, size %zu\n", + zone->name, i, zone->off, + zone->buffer_size); + continue; + } + + /* + * we get the newest zone, and the next one must be the oldest + * or unused zone, because we do write one by one like a circle. + */ + if (hdr->time.tv_sec >= time.tv_sec) { + time.tv_sec = hdr->time.tv_sec; + cxt->kmsg_write_cnt = (i + 1) % cxt->kmsg_max_cnt; + } + + if (hdr->reason == KMSG_DUMP_OOPS) + cxt->oops_counter = + max(cxt->oops_counter, hdr->counter); + else if (hdr->reason == KMSG_DUMP_PANIC) + cxt->panic_counter = + max(cxt->panic_counter, hdr->counter); + + if (!atomic_read(&buf->datalen)) { + pr_debug("found erased zone: %s: id %lu, off %lld, size %zu, datalen %d\n", + zone->name, i, zone->off, + zone->buffer_size, + atomic_read(&buf->datalen)); + continue; + } + + if (!is_on_panic()) + zone->should_recover = true; + pr_debug("found nice zone: %s: id %lu, off %lld, size %zu, datalen %d\n", + zone->name, i, zone->off, + zone->buffer_size, atomic_read(&buf->datalen)); + } + + return 0; +} + +static int psz_kmsg_recover(struct psz_context *cxt) +{ + int ret; + + if (!cxt->kpszs) + return 0; + + ret = psz_kmsg_recover_meta(cxt); + if (ret) + goto recover_fail; + + ret = psz_kmsg_recover_data(cxt); + if (ret) + goto recover_fail; + + return 0; +recover_fail: + pr_debug("psz_recover_kmsg failed\n"); + return ret; +} + +/** + * psz_recovery() - recover data from storage + * @cxt: the context of pstore/zone + * + * recovery means reading data back from storage after rebooting + * + * Return: 0 on success, others on failure. + */ +static inline int psz_recovery(struct psz_context *cxt) +{ + int ret; + + if (atomic_read(&cxt->recovered)) + return 0; + + ret = psz_kmsg_recover(cxt); + + if (unlikely(ret)) + pr_err("recover failed\n"); + else { + pr_debug("recover end!\n"); + atomic_set(&cxt->recovered, 1); + } + return ret; +} + +static int psz_pstore_open(struct pstore_info *psi) +{ + struct psz_context *cxt = psi->data; + + cxt->kmsg_read_cnt = 0; + return 0; +} + +static inline bool psz_ok(struct pstore_zone *zone) +{ + if (zone && zone->buffer && buffer_datalen(zone)) + return true; + return false; +} + +static inline int psz_kmsg_erase(struct psz_context *cxt, + struct pstore_zone *zone, struct pstore_record *record) +{ + struct psz_buffer *buffer = zone->buffer; + struct psz_kmsg_header *hdr = + (struct psz_kmsg_header *)buffer->data; + + if (unlikely(!psz_ok(zone))) + return 0; + /* this zone is already updated, no need to erase */ + if (record->count != hdr->counter) + return 0; + + atomic_set(&zone->buffer->datalen, 0); + return psz_zone_write(zone, FLUSH_META, NULL, 0, 0); +} + +static int psz_pstore_erase(struct pstore_record *record) +{ + struct psz_context *cxt = record->psi->data; + + switch (record->type) { + case PSTORE_TYPE_DMESG: + if (record->id >= cxt->kmsg_max_cnt) + return -EINVAL; + return psz_kmsg_erase(cxt, cxt->kpszs[record->id], record); + default: + return -EINVAL; + } +} + +static void psz_write_kmsg_hdr(struct pstore_zone *zone, + struct pstore_record *record) +{ + struct psz_context *cxt = record->psi->data; + struct psz_buffer *buffer = zone->buffer; + struct psz_kmsg_header *hdr = + (struct psz_kmsg_header *)buffer->data; + + hdr->magic = PSTORE_KMSG_HEADER_MAGIC; + hdr->compressed = record->compressed; + hdr->time.tv_sec = record->time.tv_sec; + hdr->time.tv_nsec = record->time.tv_nsec; + hdr->reason = record->reason; + if (hdr->reason == KMSG_DUMP_OOPS) + hdr->counter = ++cxt->oops_counter; + else if (hdr->reason == KMSG_DUMP_PANIC) + hdr->counter = ++cxt->panic_counter; + else + hdr->counter = 0; +} + +static inline int notrace psz_kmsg_write_record(struct psz_context *cxt, + struct pstore_record *record) +{ + size_t size, hlen; + struct pstore_zone *zone; + unsigned int zonenum; + + zonenum = cxt->kmsg_write_cnt; + zone = cxt->kpszs[zonenum]; + if (unlikely(!zone)) + return -ENOSPC; + cxt->kmsg_write_cnt = (zonenum + 1) % cxt->kmsg_max_cnt; + + pr_debug("write %s to zone id %d\n", zone->name, zonenum); + psz_write_kmsg_hdr(zone, record); + hlen = sizeof(struct psz_kmsg_header); + size = min_t(size_t, record->size, zone->buffer_size - hlen); + return psz_zone_write(zone, FLUSH_ALL, record->buf, size, hlen); +} + +static int notrace psz_kmsg_write(struct psz_context *cxt, + struct pstore_record *record) +{ + int ret; + + /* + * Explicitly only take the first part of any new crash. + * If our buffer is larger than kmsg_bytes, this can never happen, + * and if our buffer is smaller than kmsg_bytes, we don't want the + * report split across multiple records. + */ + if (record->part != 1) + return -ENOSPC; + + if (!cxt->kpszs) + return -ENOSPC; + + ret = psz_kmsg_write_record(cxt, record); + if (!ret) { + pr_debug("try to flush other dirty zones\n"); + psz_flush_dirty_zones(cxt->kpszs, cxt->kmsg_max_cnt); + } + + /* always return 0 as we had handled it on buffer */ + return 0; +} + +static int notrace psz_pstore_write(struct pstore_record *record) +{ + struct psz_context *cxt = record->psi->data; + + if (record->type == PSTORE_TYPE_DMESG && + record->reason == KMSG_DUMP_PANIC) + atomic_set(&cxt->on_panic, 1); + + switch (record->type) { + case PSTORE_TYPE_DMESG: + return psz_kmsg_write(cxt, record); + default: + return -EINVAL; + } +} + +static struct pstore_zone *psz_read_next_zone(struct psz_context *cxt) +{ + struct pstore_zone *zone = NULL; + + while (cxt->kmsg_read_cnt < cxt->kmsg_max_cnt) { + zone = cxt->kpszs[cxt->kmsg_read_cnt++]; + if (psz_ok(zone)) + return zone; + } + + return NULL; +} + +static int psz_kmsg_read_hdr(struct pstore_zone *zone, + struct pstore_record *record) +{ + struct psz_buffer *buffer = zone->buffer; + struct psz_kmsg_header *hdr = + (struct psz_kmsg_header *)buffer->data; + + if (hdr->magic != PSTORE_KMSG_HEADER_MAGIC) + return -EINVAL; + record->compressed = hdr->compressed; + record->time.tv_sec = hdr->time.tv_sec; + record->time.tv_nsec = hdr->time.tv_nsec; + record->reason = hdr->reason; + record->count = hdr->counter; + return 0; +} + +static ssize_t psz_kmsg_read(struct pstore_zone *zone, + struct pstore_record *record) +{ + ssize_t size, hlen = 0; + + size = buffer_datalen(zone); + /* Clear and skip this kmsg dump record if it has no valid header */ + if (psz_kmsg_read_hdr(zone, record)) { + atomic_set(&zone->buffer->datalen, 0); + atomic_set(&zone->dirty, 0); + return -ENOMSG; + } + size -= sizeof(struct psz_kmsg_header); + + if (!record->compressed) { + char *buf = kasprintf(GFP_KERNEL, "%s: Total %d times\n", + get_reason_str(record->reason), + record->count); + hlen = strlen(buf); + record->buf = krealloc(buf, hlen + size, GFP_KERNEL); + if (!record->buf) { + kfree(buf); + return -ENOMEM; + } + } else { + record->buf = kmalloc(size, GFP_KERNEL); + if (!record->buf) + return -ENOMEM; + } + + size = psz_zone_read(zone, record->buf + hlen, size, + sizeof(struct psz_kmsg_header)); + if (unlikely(size < 0)) { + kfree(record->buf); + return -ENOMSG; + } + + return size + hlen; +} + +static ssize_t psz_pstore_read(struct pstore_record *record) +{ + struct psz_context *cxt = record->psi->data; + ssize_t (*readop)(struct pstore_zone *zone, + struct pstore_record *record); + struct pstore_zone *zone; + ssize_t ret; + + /* before read, we must recover from storage */ + ret = psz_recovery(cxt); + if (ret) + return ret; + +next_zone: + zone = psz_read_next_zone(cxt); + if (!zone) + return 0; + + record->type = zone->type; + switch (record->type) { + case PSTORE_TYPE_DMESG: + readop = psz_kmsg_read; + record->id = cxt->kmsg_read_cnt - 1; + break; + default: + goto next_zone; + } + + ret = readop(zone, record); + if (ret == -ENOMSG) + goto next_zone; + return ret; +} + +static struct psz_context pstore_zone_cxt = { + .pstore_zone_info_lock = + __MUTEX_INITIALIZER(pstore_zone_cxt.pstore_zone_info_lock), + .recovered = ATOMIC_INIT(0), + .on_panic = ATOMIC_INIT(0), + .pstore = { + .owner = THIS_MODULE, + .open = psz_pstore_open, + .read = psz_pstore_read, + .write = psz_pstore_write, + .erase = psz_pstore_erase, + }, +}; + +static void psz_free_zone(struct pstore_zone **pszone) +{ + struct pstore_zone *zone = *pszone; + + if (!zone) + return; + + kfree(zone->buffer); + kfree(zone); + *pszone = NULL; +} + +static void psz_free_zones(struct pstore_zone ***pszones, unsigned int *cnt) +{ + struct pstore_zone **zones = *pszones; + + if (!zones) + return; + + while (*cnt > 0) { + (*cnt)--; + psz_free_zone(&(zones[*cnt])); + } + kfree(zones); + *pszones = NULL; +} + +static void psz_free_all_zones(struct psz_context *cxt) +{ + if (cxt->kpszs) + psz_free_zones(&cxt->kpszs, &cxt->kmsg_max_cnt); +} + +static struct pstore_zone *psz_init_zone(enum pstore_type_id type, + loff_t *off, size_t size) +{ + struct pstore_zone_info *info = pstore_zone_cxt.pstore_zone_info; + struct pstore_zone *zone; + const char *name = pstore_type_to_name(type); + + if (!size) + return NULL; + + if (*off + size > info->total_size) { + pr_err("no room for %s (0x%zx@0x%llx over 0x%lx)\n", + name, size, *off, info->total_size); + return ERR_PTR(-ENOMEM); + } + + zone = kzalloc(sizeof(struct pstore_zone), GFP_KERNEL); + if (!zone) + return ERR_PTR(-ENOMEM); + + zone->buffer = kmalloc(size, GFP_KERNEL); + if (!zone->buffer) { + kfree(zone); + return ERR_PTR(-ENOMEM); + } + memset(zone->buffer, 0xFF, size); + zone->off = *off; + zone->name = name; + zone->type = type; + zone->buffer_size = size - sizeof(struct psz_buffer); + zone->buffer->sig = type ^ PSZ_SIG; + atomic_set(&zone->dirty, 0); + atomic_set(&zone->buffer->datalen, 0); + + *off += size; + + pr_debug("pszone %s: off 0x%llx, %zu header, %zu data\n", zone->name, + zone->off, sizeof(*zone->buffer), zone->buffer_size); + return zone; +} + +static struct pstore_zone **psz_init_zones(enum pstore_type_id type, + loff_t *off, size_t total_size, ssize_t record_size, + unsigned int *cnt) +{ + struct pstore_zone_info *info = pstore_zone_cxt.pstore_zone_info; + struct pstore_zone **zones, *zone; + const char *name = pstore_type_to_name(type); + int c, i; + + *cnt = 0; + if (!total_size || !record_size) + return NULL; + + if (*off + total_size > info->total_size) { + pr_err("no room for zones %s (0x%zx@0x%llx over 0x%lx)\n", + name, total_size, *off, info->total_size); + return ERR_PTR(-ENOMEM); + } + + c = total_size / record_size; + zones = kcalloc(c, sizeof(*zones), GFP_KERNEL); + if (!zones) { + pr_err("allocate for zones %s failed\n", name); + return ERR_PTR(-ENOMEM); + } + memset(zones, 0, c * sizeof(*zones)); + + for (i = 0; i < c; i++) { + zone = psz_init_zone(type, off, record_size); + if (!zone || IS_ERR(zone)) { + pr_err("initialize zones %s failed\n", name); + psz_free_zones(&zones, &i); + return (void *)zone; + } + zones[i] = zone; + } + + *cnt = c; + return zones; +} + +static int psz_alloc_zones(struct psz_context *cxt) +{ + struct pstore_zone_info *info = cxt->pstore_zone_info; + loff_t off = 0; + int err; + size_t size; + + size = info->total_size; + cxt->kpszs = psz_init_zones(PSTORE_TYPE_DMESG, &off, size, + info->kmsg_size, &cxt->kmsg_max_cnt); + if (IS_ERR(cxt->kpszs)) { + err = PTR_ERR(cxt->kpszs); + cxt->kpszs = NULL; + goto fail_out; + } + + return 0; +fail_out: + return err; +} + +/** + * register_pstore_zone() - register to pstore/zone + * + * @info: back-end driver information. See &struct pstore_zone_info. + * + * Only one back-end at one time. + * + * Return: 0 on success, others on failure. + */ +int register_pstore_zone(struct pstore_zone_info *info) +{ + int err = -EINVAL; + struct psz_context *cxt = &pstore_zone_cxt; + + if (info->total_size < 4096) { + pr_warn("total_size must be >= 4096\n"); + return -EINVAL; + } + + if (!info->kmsg_size) { + pr_warn("at least one record size must be non-zero\n"); + return -EINVAL; + } + + if (!info->name || !info->name[0]) + return -EINVAL; + +#define check_size(name, size) { \ + if (info->name > 0 && info->name < (size)) { \ + pr_err(#name " must be over %d\n", (size)); \ + return -EINVAL; \ + } \ + if (info->name & (size - 1)) { \ + pr_err(#name " must be a multiple of %d\n", \ + (size)); \ + return -EINVAL; \ + } \ + } + + check_size(total_size, 4096); + check_size(kmsg_size, SECTOR_SIZE); + +#undef check_size + + /* + * the @read and @write must be applied. + * if no @read, pstore may mount failed. + * if no @write, pstore do not support to remove record file. + */ + if (!info->read || !info->write) { + pr_err("no valid general read/write interface\n"); + return -EINVAL; + } + + mutex_lock(&cxt->pstore_zone_info_lock); + if (cxt->pstore_zone_info) { + pr_warn("'%s' already loaded: ignoring '%s'\n", + cxt->pstore_zone_info->name, info->name); + mutex_unlock(&cxt->pstore_zone_info_lock); + return -EBUSY; + } + cxt->pstore_zone_info = info; + + pr_debug("register %s with properties:\n", info->name); + pr_debug("\ttotal size : %ld Bytes\n", info->total_size); + pr_debug("\tkmsg size : %ld Bytes\n", info->kmsg_size); + + err = psz_alloc_zones(cxt); + if (err) { + pr_err("alloc zones failed\n"); + goto fail_out; + } + + if (info->kmsg_size) { + cxt->pstore.bufsize = cxt->kpszs[0]->buffer_size - + sizeof(struct psz_kmsg_header); + cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL); + if (!cxt->pstore.buf) { + err = -ENOMEM; + goto fail_free; + } + } + cxt->pstore.data = cxt; + + pr_info("registered %s as backend for", info->name); + cxt->pstore.max_reason = info->max_reason; + cxt->pstore.name = info->name; + if (info->kmsg_size) { + cxt->pstore.flags |= PSTORE_FLAGS_DMESG; + pr_cont(" kmsg(%s", get_reason_str(cxt->pstore.max_reason)); + if (cxt->pstore_zone_info->panic_write) + pr_cont(",panic_write"); + pr_cont(")"); + } + pr_cont("\n"); + + err = pstore_register(&cxt->pstore); + if (err) { + pr_err("registering with pstore failed\n"); + goto fail_free; + } + mutex_unlock(&pstore_zone_cxt.pstore_zone_info_lock); + + return 0; + +fail_free: + kfree(cxt->pstore.buf); + cxt->pstore.buf = NULL; + cxt->pstore.bufsize = 0; + psz_free_all_zones(cxt); +fail_out: + pstore_zone_cxt.pstore_zone_info = NULL; + mutex_unlock(&pstore_zone_cxt.pstore_zone_info_lock); + return err; +} +EXPORT_SYMBOL_GPL(register_pstore_zone); + +/** + * unregister_pstore_zone() - unregister to pstore/zone + * + * @info: back-end driver information. See struct pstore_zone_info. + */ +void unregister_pstore_zone(struct pstore_zone_info *info) +{ + struct psz_context *cxt = &pstore_zone_cxt; + + mutex_lock(&cxt->pstore_zone_info_lock); + if (!cxt->pstore_zone_info) { + mutex_unlock(&cxt->pstore_zone_info_lock); + return; + } + + /* Stop incoming writes from pstore. */ + pstore_unregister(&cxt->pstore); + + /* Clean up allocations. */ + kfree(cxt->pstore.buf); + cxt->pstore.buf = NULL; + cxt->pstore.bufsize = 0; + cxt->pstore_zone_info = NULL; + + psz_free_all_zones(cxt); + + /* Clear counters and zone state. */ + cxt->oops_counter = 0; + cxt->panic_counter = 0; + atomic_set(&cxt->recovered, 0); + atomic_set(&cxt->on_panic, 0); + + mutex_unlock(&cxt->pstore_zone_info_lock); +} +EXPORT_SYMBOL_GPL(unregister_pstore_zone); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("WeiXiong Liao "); +MODULE_AUTHOR("Kees Cook "); +MODULE_DESCRIPTION("Storage Manager for pstore/blk"); diff --git a/include/linux/pstore.h b/include/linux/pstore.h index a2755fdd260d..4bc3a674eeeb 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -57,6 +57,7 @@ enum pstore_type_id { const char *pstore_type_to_name(enum pstore_type_id type); enum pstore_type_id pstore_name_to_type(const char *name); +const char *get_reason_str(enum kmsg_dump_reason reason);
struct pstore_info; /** diff --git a/include/linux/pstore_zone.h b/include/linux/pstore_zone.h new file mode 100644 index 000000000000..eb005d9ae40c --- /dev/null +++ b/include/linux/pstore_zone.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __PSTORE_ZONE_H_ +#define __PSTORE_ZONE_H_ + +#include
+ +typedef ssize_t (*pstore_zone_read_op)(char *, size_t, loff_t); +typedef ssize_t (*pstore_zone_write_op)(const char *, size_t, loff_t); +/** + * struct pstore_zone_info - pstore/zone back-end driver structure + * + * @owner: Module which is responsible for this back-end driver. + * @name: Name of the back-end driver. + * @total_size: The total size in bytes pstore/zone can use. It must be greater + * than 4096 and be multiple of 4096. + * @kmsg_size: The size of oops/panic zone. Zero means disabled, otherwise, + * it must be multiple of SECTOR_SIZE(512 Bytes). + * @max_reason: Maximum kmsg dump reason to store. + * @read: The general read operation. Both of the function parameters + * @size and @offset are relative value to storage. + * On success, the number of bytes should be returned, others + * means error. + * @write: The same as @read. + * @panic_write:The write operation only used for panic case. It's optional + * if you do not care panic log. The parameters and return value + * are the same as @read. + */ +struct pstore_zone_info { + struct module *owner; + const char *name; + + unsigned long total_size; + unsigned long kmsg_size; + int max_reason; + pstore_zone_read_op read; + pstore_zone_write_op write; + pstore_zone_write_op panic_write; +}; + +extern int register_pstore_zone(struct pstore_zone_info *info); +extern void unregister_pstore_zone(struct pstore_zone_info *info); + +#endif
From: WeiXiong Liao
From: WeiXiong Liao
From: WeiXiong Liao
From: WeiXiong Liao
From: WeiXiong Liao
在 2021/10/12 9:58, roger 写道:
From: WeiXiong Liao
stable inclusion from stable-5.10 category: feature commit:649304c936cd4d2a2128bb877044772416c7d4f5 issue: #I4919J
--------------------------------
Add details on using pstore/blk, the new backend of pstore to record dumps to block devices, in Documentation/admin-guide/pstore-blk.rst
Signed-off-by: WeiXiong Liao
Link: https://lore.kernel.org/lkml/20200511233229.27745-7-keescook@chromium.org/ Signed-off-by: Kees Cook (cherry picked from commit 649304c936cd4d2a2128bb877044772416c7d4f5) Signed-off-by: roger --- Documentation/admin-guide/pstore-blk.rst | 229 +++++++++++++++++++++++ MAINTAINERS | 5 + fs/pstore/Kconfig | 2 + 3 files changed, 236 insertions(+) create mode 100644 Documentation/admin-guide/pstore-blk.rst
diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst new file mode 100644 index 000000000000..bef8c7436721 --- /dev/null +++ b/Documentation/admin-guide/pstore-blk.rst @@ -0,0 +1,229 @@ +.. SPDX-License-Identifier: GPL-2.0 + +pstore block oops/panic logger +============================== + +Introduction +------------ + +pstore block (pstore/blk) is an oops/panic logger that writes its logs to a +block device before the system crashes. You can get these log files by +mounting pstore filesystem like:: + + mount -t pstore pstore /sys/fs/pstore + + +pstore block concepts +--------------------- + +pstore/blk provides efficient configuration method for pstore/blk, which +divides all configurations into two parts, configurations for user and +configurations for driver. + +Configurations for user determine how pstore/blk works, such as pmsg_size, +kmsg_size and so on. All of them support both Kconfig and module parameters, +but module parameters have priority over Kconfig. + +Configurations for driver are all about block device, such as total_size +of block device and read/write operations. + +Configurations for user +----------------------- + +All of these configurations support both Kconfig and module parameters, but +module parameters have priority over Kconfig. + +Here is an example for module parameters:: + + pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64 + +The detail of each configurations may be of interest to you. + +blkdev +~~~~~~ + +The block device to use. Most of the time, it is a partition of block device. +It's required for pstore/blk. + +It accepts the following variants: + +1.
device number in hexadecimal represents itself; no + leading 0x, for example b302. +#. /dev/ represents the device number of disk +#. /dev/ <decimal> represents the device number of partition - device + number of disk plus the partition number +#. /dev/ p<decimal> - same as the above; this form is used when disk + name of partitioned disk ends with a digit. +#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of + a partition if the partition table provides it. The UUID may be either an + EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP, + where SSSSSSSS is a zero-filled hex representation of the 32-bit + "NT disk signature", and PP is a zero-filled hex representation of the + 1-based partition number. +#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a + partition with a known unique id. +#. <major>:<minor> major and minor number of the device separated by a colon. + +kmsg_size +~~~~~~~~~ + +The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4. +It's optional if you do not care oops/panic log. + +There are multiple chunks for oops/panic front-end depending on the remaining +space except other pstore front-ends. + +pstore/blk will log to oops/panic chunks one by one, and always overwrite the +oldest chunk if there is no more free chunk. + +pmsg_size +~~~~~~~~~ + +The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4. +It's optional if you do not care pmsg log. + +Unlike oops/panic front-end, there is only one chunk for pmsg front-end. + +Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are +appended to the chunk. On reboot the contents are available in +*/sys/fs/pstore/pmsg-pstore-blk-0*. + +console_size +~~~~~~~~~~~~ + +The chunk size in KB for console front-end. It **MUST** be a multiple of 4. +It's optional if you do not care console log. + +Similar to pmsg front-end, there is only one chunk for console front-end. + +All log of console will be appended to the chunk. On reboot the contents are +available in */sys/fs/pstore/console-pstore-blk-0*. + +ftrace_size +~~~~~~~~~~~ + +The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4. +It's optional if you do not care console log. + +Similar to oops front-end, there are multiple chunks for ftrace front-end +depending on the count of cpu processors. Each chunk size is equal to +ftrace_size / processors_count. + +All log of ftrace will be appended to the chunk. On reboot the contents are +combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*. + +Persistent function tracing might be useful for debugging software or hardware +related hangs. Here is an example of usage:: + + # mount -t pstore pstore /sys/fs/pstore + # mount -t debugfs debugfs /sys/kernel/debug/ + # echo 1 > /sys/kernel/debug/pstore/record_ftrace + # reboot -f + [...] + # mount -t pstore pstore /sys/fs/pstore + # tail /sys/fs/pstore/ftrace-pstore-blk-0 + CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0 + CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48 + CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314 + CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314 + CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204 + CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204 + CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358 + CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358 + +max_reason +~~~~~~~~~~ + +Limiting which kinds of kmsg dumps are stored can be controlled via +the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's +``enum kmsg_dump_reason``. For example, to store both Oopses and Panics, +``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics +``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0 +(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the +``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS, +otherwise KMSG_DUMP_MAX. + +Configurations for driver +------------------------- + +Only a block device driver cares about these configurations. A block device +driver uses ``register_pstore_blk`` to register to pstore/blk. + +.. kernel-doc:: fs/pstore/blk.c + :identifiers: register_pstore_blk + +Compression and header +---------------------- + +Block device is large enough for uncompressed oops data. Actually we do not +recommend data compression because pstore/blk will insert some information into +the first line of oops/panic data. For example:: + + Panic: Total 16 times + +It means that it's OOPS|Panic for the 16th time since the first booting. +Sometimes the number of occurrences of oops|panic since the first booting is +important to judge whether the system is stable. + +The following line is inserted by pstore filesystem. For example:: + + Oops#2 Part1 + +It means that it's OOPS for the 2nd time on the last boot. + +Reading the data +---------------- + +The dump data can be read from the pstore filesystem. The format for these +files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end, +``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the +dump file records the trigger time. To delete a stored record from block +device, simply unlink the respective pstore file. + +Attentions in panic read/write APIs +----------------------------------- + +If on panic, the kernel is not going to run for much longer, the tasks will not +be scheduled and most kernel resources will be out of service. It +looks like a single-threaded program running on a single-core computer. + +The following points require special attention for panic read/write APIs: + +1. Can **NOT** allocate any memory. + If you need memory, just allocate while the block driver is initializing + rather than waiting until the panic. +#. Must be polled, **NOT** interrupt driven. + No task schedule any more. The block driver should delay to ensure the write + succeeds, but NOT sleep. +#. Can **NOT** take any lock. + There is no other task, nor any shared resource; you are safe to break all + locks. +#. Just use CPU to transfer. + Do not use DMA to transfer unless you are sure that DMA will not keep lock. +#. Control registers directly. + Please control registers directly rather than use Linux kernel resources. + Do I/O map while initializing rather than wait until a panic occurs. +#. Reset your block device and controller if necessary. + If you are not sure of the state of your block device and controller when + a panic occurs, you are safe to stop and reset them. + +pstore/blk supports psblk_blkdev_info(), which is defined in +*linux/pstore_blk.h*, to get information of using block device, such as the +device number, sector count and start sector of the whole disk. + +pstore block internals +---------------------- + +For developer reference, here are all the important structures and APIs: + +.. kernel-doc:: fs/pstore/zone.c + :internal: + +.. kernel-doc:: include/linux/pstore_zone.h + :internal: + +.. kernel-doc:: fs/pstore/blk.c + :export: + +.. kernel-doc:: include/linux/pstore_blk.h + :internal: diff --git a/MAINTAINERS b/MAINTAINERS index 1061db6fbc32..032e1d0f681b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11801,6 +11801,11 @@ M: Colin Cross M: Tony Luck S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore +F: Documentation/admin-guide/ramoops.rst +F: Documentation/admin-guide/pstore-blk.rst +F: Documentation/devicetree/bindings/reserved-memory/ramoops.txt +F: drivers/acpi/apei/erst.c +F: drivers/firmware/efi/efi-pstore.c 注意保持原补丁内容,如上有重复文件ramoops.rst ramoops.txt efi-pstore.c等......... F: fs/pstore/ F: include/linux/pstore* F: drivers/firmware/efi/efi-pstore.c diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index 34e769a06b86..0f5a4bdb3b34 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -171,6 +171,8 @@ config PSTORE_BLK This enables panic and oops message to be logged to a block dev where it can be read back at some later point. + For more information, see Documentation/admin-guide/pstore-blk.rst + If unsure, say N.
config PSTORE_BLK_BLKDEV
From: WeiXiong Liao
From: WeiXiong Liao
From: WeiXiong Liao
From: Kees Cook
From: Vasile-Laurentiu Stanimir
From: Jiri Bohac
From: Tetsuo Handa
From: Dmitry Osipenko
From: Mukesh Ojha
From: Christoph Hellwig
在 2021/10/12 9:59, roger 写道:
From: Christoph Hellwig
stable inclusion from stable-5.10 category: feature commit:0317b728d8aee5cb065b5fbe9f99c97954bf2b4a issue: #I4919J
--------------------------------
[ Upstream commit d07f3b081ee632268786601f55e1334d1f68b997 ]
pstore-blk just pokes directly into the pagecache for the block device without going through the file operations for that by faking up it's own file operations that do not match the block device ones.
As this breaks the control of the block layer of it's page cache, and even now just works by accident only the best thing is to just disable this driver.
Fixes: 17639f67c1d6 ("pstore/blk: Introduce backend for block devices") Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin (cherry picked from commit 0317b728d8aee5cb065b5fbe9f99c97954bf2b4a) Signed-off-by: roger --- fs/pstore/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig index e0ac6601a928..eae480dcd4b7 100644 --- a/fs/pstore/Kconfig +++ b/fs/pstore/Kconfig @@ -173,6 +173,7 @@ config PSTORE_BLK tristate "Log panic/oops to a block device" depends on PSTORE depends on BLOCK + depends on BROKEN
既然由于存在会破坏块层的pagecache等问题,pstore_blk还是broken状态,且主线也没使能,那暂时*没必要*使用pstore_blk。 Thanks.
select PSTORE_ZONE default n help
participants (2)
-
roger
-
yuchangchun (C)