Merge tag 'v6.15-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix multichannel decryption UAF
- Fix regression mounting to onedrive shares
- Fix missing mount option check for posix vs. noposix
- Fix version field in WSL symlinks
- Three minor cleanup to reparse point handling
- SMB1 fix for WSL special files
- SMB1 Kerberos fix
- Add SMB3 defines for two new FS attributes
* tag 'v6.15-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Add defines for two new FileSystemAttributes
cifs: Fix querying of WSL CHR and BLK reparse points over SMB1
cifs: Split parse_reparse_point callback to functions: get buffer and parse buffer
cifs: Improve handling of name surrogate reparse points in reparse.c
cifs: Remove explicit handling of IO_REPARSE_TAG_MOUNT_POINT in inode.c
cifs: Fix encoding of SMB1 Session Setup Kerberos Request in non-UNICODE mode
smb: client: fix UAF in decryption with multichannel
cifs: Fix support for WSL-style symlinks
smb311 client: fix missing tcon check when mounting with linux/posix extensions
cifs: Ensure that all non-client-specific reparse points are processed by the server
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index ff0b440..451874b 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -22,3 +22,4 @@
srso
gather_data_sampling
reg-file-data-sampling
+ rsb
diff --git a/Documentation/admin-guide/hw-vuln/rsb.rst b/Documentation/admin-guide/hw-vuln/rsb.rst
new file mode 100644
index 0000000..21dbf9c
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/rsb.rst
@@ -0,0 +1,268 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+RSB-related mitigations
+=======================
+
+.. warning::
+ Please keep this document up-to-date, otherwise you will be
+ volunteered to update it and convert it to a very long comment in
+ bugs.c!
+
+Since 2018 there have been many Spectre CVEs related to the Return Stack
+Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or
+Return Address Predictor (RAP) on AMD).
+
+Information about these CVEs and how to mitigate them is scattered
+amongst a myriad of microarchitecture-specific documents.
+
+This document attempts to consolidate all the relevant information in
+once place and clarify the reasoning behind the current RSB-related
+mitigations. It's meant to be as concise as possible, focused only on
+the current kernel mitigations: what are the RSB-related attack vectors
+and how are they currently being mitigated?
+
+It's *not* meant to describe how the RSB mechanism operates or how the
+exploits work. More details about those can be found in the references
+below.
+
+Rather, this is basically a glorified comment, but too long to actually
+be one. So when the next CVE comes along, a kernel developer can
+quickly refer to this as a refresher to see what we're actually doing
+and why.
+
+At a high level, there are two classes of RSB attacks: RSB poisoning
+(Intel and AMD) and RSB underflow (Intel only). They must each be
+considered individually for each attack vector (and microarchitecture
+where applicable).
+
+----
+
+RSB poisoning (Intel and AMD)
+=============================
+
+SpectreRSB
+~~~~~~~~~~
+
+RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where
+an attacker poisons an RSB entry to cause a victim's return instruction
+to speculate to an attacker-controlled address. This can happen when
+there are unbalanced CALLs/RETs after a context switch or VMEXIT.
+
+* All attack vectors can potentially be mitigated by flushing out any
+ poisoned RSB entries using an RSB filling sequence
+ [#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between
+ untrusted and trusted domains. But this has a performance impact and
+ should be avoided whenever possible.
+
+ .. DANGER::
+ **FIXME**: Currently we're flushing 32 entries. However, some CPU
+ models have more than 32 entries. The loop count needs to be
+ increased for those. More detailed information is needed about RSB
+ sizes.
+
+* On context switch, the user->user mitigation requires ensuring the
+ RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_
+ during a context switch:
+
+ * AMD:
+ On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB.
+ This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_.
+
+ On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be
+ always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is
+ indicated by X86_BUG_IBPB_NO_RET.
+
+ * Intel:
+ IBPB always clears the RSB:
+
+ "Software that executed before the IBPB command cannot control
+ the predicted targets of indirect branches executed after the
+ command on the same logical processor. The term indirect branch
+ in this context includes near return instructions, so these
+ predicted targets may come from the RSB." [#intel-ibpb-rsb]_
+
+* On context switch, user->kernel attacks are prevented by SMEP. User
+ space can only insert user space addresses into the RSB. Even
+ non-canonical addresses can't be inserted due to the page gap at the
+ end of the user canonical address space reserved by TASK_SIZE_MAX.
+ A SMEP #PF at instruction fetch prevents the kernel from speculatively
+ executing user space.
+
+ * AMD:
+ "Finally, branches that are predicted as 'ret' instructions get
+ their predicted targets from the Return Address Predictor (RAP).
+ AMD recommends software use a RAP stuffing sequence (mitigation
+ V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP)
+ to ensure that the addresses in the RAP are safe for
+ speculation. Collectively, we refer to these mitigations as "RAP
+ Protection"." [#amd-smep-rsb]_
+
+ * Intel:
+ "On processors with enhanced IBRS, an RSB overwrite sequence may
+ not suffice to prevent the predicted target of a near return
+ from using an RSB entry created in a less privileged predictor
+ mode. Software can prevent this by enabling SMEP (for
+ transitions from user mode to supervisor mode) and by having
+ IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_
+
+* On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB
+ mitigation if needed):
+
+ * AMD:
+ "When Automatic IBRS is enabled, the internal return address
+ stack used for return address predictions is cleared on VMEXIT."
+ [#amd-eibrs-vmexit]_
+
+ * Intel:
+ "On processors with enhanced IBRS, an RSB overwrite sequence may
+ not suffice to prevent the predicted target of a near return
+ from using an RSB entry created in a less privileged predictor
+ mode. Software can prevent this by enabling SMEP (for
+ transitions from user mode to supervisor mode) and by having
+ IA32_SPEC_CTRL.IBRS set during VM exits. Processors with
+ enhanced IBRS still support the usage model where IBRS is set
+ only in the OS/VMM for OSes that enable SMEP. To do this, such
+ processors will ensure that guest behavior cannot control the
+ RSB after a VM exit once IBRS is set, even if IBRS was not set
+ at the time of the VM exit." [#intel-eibrs-vmexit]_
+
+ Note that some Intel CPUs are susceptible to Post-barrier Return
+ Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last
+ CALL from the guest can be used to predict the first unbalanced RET.
+ In this case the PBRSB mitigation is needed in addition to eIBRS.
+
+AMD RETBleed / SRSO / Branch Type Confusion
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On AMD, poisoned RSB entries can also be created by the AMD RETBleed
+variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack
+Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel
+protects itself by replacing every RET in the kernel with a branch to a
+single safe RET.
+
+----
+
+RSB underflow (Intel only)
+==========================
+
+RSB Alternate (RSBA) ("Intel Retbleed")
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some Intel Skylake-generation CPUs are susceptible to the Intel variant
+of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow
+[#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due
+to mismatched CALLs/RETs or returning from a deep call stack, the branch
+predictor can fall back to using the Branch Target Buffer (BTB). If a
+user forces a BTB collision then the RET can speculatively branch to a
+user-controlled address.
+
+* Note that RSB filling doesn't fully mitigate this issue. If there
+ are enough unbalanced RETs, the RSB may still underflow and fall back
+ to using a poisoned BTB entry.
+
+* On context switch, user->user underflow attacks are mitigated by the
+ conditional IBPB [#cond-ibpb]_ on context switch which effectively
+ clears the BTB:
+
+ * "The indirect branch predictor barrier (IBPB) is an indirect branch
+ control mechanism that establishes a barrier, preventing software
+ that executed before the barrier from controlling the predicted
+ targets of indirect branches executed after the barrier on the same
+ logical processor." [#intel-ibpb-btb]_
+
+* On context switch and VMEXIT, user->kernel and guest->host RSB
+ underflows are mitigated by IBRS or eIBRS:
+
+ * "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU"
+ attack demonstrated by the researchers. As previously documented,
+ Intel recommends the use of enhanced IBRS, where supported. This
+ includes any processor that enumerates RRSBA but not RRSBA_DIS_S."
+ [#intel-rsbu]_
+
+ However, note that eIBRS and IBRS do not mitigate intra-mode attacks.
+ Like RRSBA below, this is mitigated by clearing the BHB on kernel
+ entry.
+
+ As an alternative to classic IBRS, call depth tracking (combined with
+ retpolines) can be used to track kernel returns and fill the RSB when
+ it gets close to being empty.
+
+Restricted RSB Alternate (RRSBA)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior,
+which, similar to RSBA described above, also falls back to using the BTB
+on RSB underflow. The only difference is that the predicted targets are
+restricted to the current domain when eIBRS is enabled:
+
+* "Restricted RSB Alternate (RRSBA) behavior allows alternate branch
+ predictors to be used by near RET instructions when the RSB is
+ empty. When eIBRS is enabled, the predicted targets of these
+ alternate predictors are restricted to those belonging to the
+ indirect branch predictor entries of the current prediction domain.
+ [#intel-eibrs-rrsba]_
+
+When a CPU with RRSBA is vulnerable to Branch History Injection
+[#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an
+intra-mode BTI attack. This is mitigated by clearing the BHB on
+kernel entry.
+
+However if the kernel uses retpolines instead of eIBRS, it needs to
+disable RRSBA:
+
+* "Where software is using retpoline as a mitigation for BHI or
+ intra-mode BTI, and the processor both enumerates RRSBA and
+ enumerates RRSBA_DIS controls, it should disable this behavior."
+ [#intel-retpoline-rrsba]_
+
+----
+
+References
+==========
+
+.. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://arxiv.org/pdf/1807.07940.pdf>`_
+
+.. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html#inpage-nav-5-1>`_
+
+.. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf>`_
+
+.. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt.
+
+.. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD.
+
+.. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf>`_. SBPB behaves the same way according to `this email <https://lore.kernel.org/5175b163a3736ca5fd01cedf406735636c99a>`_.
+
+.. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution <https://comsec.ethz.ch/wp-content/files/ibpb_sp25.pdf>`_
+
+.. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
+
+.. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
+
+.. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
+
+.. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
+
+.. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
+
+.. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
+
+.. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction <https://comsec.ethz.ch/wp-content/files/retbleed_sec22.pdf>`_
+
+.. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
+
+.. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow <https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf>`_
+
+.. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution <https://comsec.ethz.ch/wp-content/files/inception_sec23.pdf>`_
+
+.. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
+
+.. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html>`_
+
+.. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
+
+.. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks <http://download.vusec.net/papers/bhi-spectre-bhb_sec22.pdf>`_
+
+.. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
+
+.. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 76e538c..d9fd26b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1407,18 +1407,15 @@
earlyprintk=serial[,0x...[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
+ earlyprintk=mmio32,membase[,{nocfg|baudrate}]
earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}]
earlyprintk=xdbc[xhciController#]
earlyprintk=bios
- earlyprintk=mmio,membase[,{nocfg|baudrate}]
earlyprintk is useful when the kernel crashes before
the normal console is initialized. It is not enabled by
default because it has some cosmetic problems.
- Only 32-bit memory addresses are supported for "mmio"
- and "pciserial" devices.
-
Use "nocfg" to skip UART configuration, assume
BIOS/firmware has configured UART correctly.
diff --git a/Documentation/arch/x86/cpuinfo.rst b/Documentation/arch/x86/cpuinfo.rst
index 6ef426a..f80e2a5 100644
--- a/Documentation/arch/x86/cpuinfo.rst
+++ b/Documentation/arch/x86/cpuinfo.rst
@@ -79,8 +79,9 @@
How are feature flags created?
==============================
-a: Feature flags can be derived from the contents of CPUID leaves.
-------------------------------------------------------------------
+Feature flags can be derived from the contents of CPUID leaves
+--------------------------------------------------------------
+
These feature definitions are organized mirroring the layout of CPUID
leaves and grouped in words with offsets as mapped in enum cpuid_leafs
in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
@@ -89,8 +90,9 @@
displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
comes from X86_FEATURE_AVX2 in cpufeatures.h.
-b: Flags can be from scattered CPUID-based features.
-----------------------------------------------------
+Flags can be from scattered CPUID-based features
+------------------------------------------------
+
Hardware features enumerated in sparsely populated CPUID leaves get
software-defined values. Still, CPUID needs to be queried to determine
if a given feature is present. This is done in init_scattered_cpuid_features().
@@ -104,8 +106,9 @@
array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
memory is not trivial.
-c: Flags can be created synthetically under certain conditions for hardware features.
--------------------------------------------------------------------------------------
+Flags can be created synthetically under certain conditions for hardware features
+---------------------------------------------------------------------------------
+
Examples of conditions include whether certain features are present in
MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
conditions are met, the features are enabled by the set_cpu_cap or
@@ -114,8 +117,8 @@
"split_lock_detect" will be displayed. The flag "ring3mwait" will be
displayed only when running on INTEL_XEON_PHI_[KNL|KNM] processors.
-d: Flags can represent purely software features.
-------------------------------------------------
+Flags can represent purely software features
+--------------------------------------------
These flags do not represent hardware features. Instead, they represent a
software feature implemented in the kernel. For example, Kernel Page Table
Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
@@ -130,14 +133,18 @@
resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
of flags in the x86_cap/bug_flags[] are as follows:
-a: The name of the flag is from the string in X86_FEATURE_<name> by default.
-----------------------------------------------------------------------------
-By default, the flag <name> in /proc/cpuinfo is extracted from the respective
-X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
-X86_FEATURE_AVX2.
+Flags do not appear by default in /proc/cpuinfo
+-----------------------------------------------
-b: The naming can be overridden.
---------------------------------
+Feature flags are omitted by default from /proc/cpuinfo as it does not make
+sense for the feature to be exposed to userspace in most cases. For example,
+X86_FEATURE_ALWAYS is defined in cpufeatures.h but that flag is an internal
+kernel feature used in the alternative runtime patching functionality. So the
+flag does not appear in /proc/cpuinfo.
+
+Specify a flag name if absolutely needed
+----------------------------------------
+
If the comment on the line for the #define X86_FEATURE_* starts with a
double-quote character (""), the string inside the double-quote characters
will be the name of the flags. For example, the flag "sse4_1" comes from
@@ -148,36 +155,31 @@
constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
shall override the new naming with the name already used in /proc/cpuinfo.
-c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
-----------------------------------------------------------------------------------
-The feature shall be omitted from /proc/cpuinfo if it does not make sense for
-the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
-defined in cpufeatures.h but that flag is an internal kernel feature used
-in the alternative runtime patching functionality. So, its name is overridden
-with "". Its flag will not appear in /proc/cpuinfo.
-
Flags are missing when one or more of these happen
==================================================
-a: The hardware does not enumerate support for it.
---------------------------------------------------
+The hardware does not enumerate support for it
+----------------------------------------------
+
For example, when a new kernel is running on old hardware or the feature is
not enabled by boot firmware. Even if the hardware is new, there might be a
problem enabling the feature at run time, the flag will not be displayed.
-b: The kernel does not know about the flag.
--------------------------------------------
+The kernel does not know about the flag
+---------------------------------------
+
For example, when an old kernel is running on new hardware.
-c: The kernel disabled support for it at compile-time.
-------------------------------------------------------
+The kernel disabled support for it at compile-time
+--------------------------------------------------
+
For example, if 5-level-paging is not enabled when building (i.e.,
CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
Even though the feature will still be detected via CPUID, the kernel disables
it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
-d: The feature is disabled at boot-time.
-----------------------------------------
+The feature is disabled at boot-time
+------------------------------------
A feature can be disabled either using a command-line parameter or because
it failed to be enabled. The command-line parameter clearcpuid= can be used
to disable features using the feature number as defined in
@@ -190,8 +192,9 @@
to, nofsgsbase, nosgx, noxsave, etc. 5-level paging can also be disabled using
"no5lvl".
-e: The feature was known to be non-functional.
-----------------------------------------------
+The feature was known to be non-functional
+------------------------------------------
+
The feature was known to be non-functional because a dependency was
missing at runtime. For example, AVX flags will not show up if XSAVE feature
is disabled since they depend on XSAVE feature. Another example would be broken
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 6c2d894..eab601a 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -338,10 +338,11 @@
Devices drivers are encouraged to rely on the instance lock where possible.
For the (mostly software) drivers that need to interact with the core stack,
-there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
-``dev_set_mtu`` and ``netif_set_mtu``). The ``dev_xxx`` functions handle
-acquiring the instance lock themselves, while the ``netif_xxx`` functions
-assume that the driver has already acquired the instance lock.
+there are two sets of interfaces: ``dev_xxx``/``netdev_xxx`` and ``netif_xxx``
+(e.g., ``dev_set_mtu`` and ``netif_set_mtu``). The ``dev_xxx``/``netdev_xxx``
+functions handle acquiring the instance lock themselves, while the
+``netif_xxx`` functions assume that the driver has already acquired
+the instance lock.
Notifiers and netdev instance lock
==================================
@@ -354,6 +355,7 @@
running under the lock:
* ``NETDEV_REGISTER``
* ``NETDEV_UP``
+* ``NETDEV_CHANGE``
The following notifiers are running without the lock:
* ``NETDEV_UNREGISTER``
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1f8625b..47c7c3f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7447,6 +7447,75 @@
This capability connects the vcpu to an in-kernel XIVE device.
+6.76 KVM_CAP_HYPERV_SYNIC
+-------------------------
+
+:Architectures: x86
+:Target: vcpu
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that the kernel has an implementation of the
+Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
+used to support Windows Hyper-V based guest paravirt drivers(VMBus).
+
+In order to use SynIC, it has to be activated by setting this
+capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
+will disable the use of APIC hardware virtualization even if supported
+by the CPU, as it's incompatible with SynIC auto-EOI behavior.
+
+6.77 KVM_CAP_HYPERV_SYNIC2
+--------------------------
+
+:Architectures: x86
+:Target: vcpu
+
+This capability enables a newer version of Hyper-V Synthetic interrupt
+controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
+doesn't clear SynIC message and event flags pages when they are enabled by
+writing to the respective MSRs.
+
+6.78 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
+-----------------------------------
+
+:Architectures: x86
+:Target: vcpu
+
+This capability indicates that KVM running on top of Hyper-V hypervisor
+enables Direct TLB flush for its guests meaning that TLB flush
+hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
+Due to the different ABI for hypercall parameters between Hyper-V and
+KVM, enabling this capability effectively disables all hypercall
+handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
+flush hypercalls by Hyper-V) so userspace should disable KVM identification
+in CPUID and only exposes Hyper-V identification. In this case, guest
+thinks it's running on Hyper-V and only use Hyper-V hypercalls.
+
+6.79 KVM_CAP_HYPERV_ENFORCE_CPUID
+---------------------------------
+
+:Architectures: x86
+:Target: vcpu
+
+When enabled, KVM will disable emulated Hyper-V features provided to the
+guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
+currently implemented Hyper-V features are provided unconditionally when
+Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
+leaf.
+
+6.80 KVM_CAP_ENFORCE_PV_FEATURE_CPUID
+-------------------------------------
+
+:Architectures: x86
+:Target: vcpu
+
+When enabled, KVM will disable paravirtual features provided to the
+guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
+(0x40000001). Otherwise, a guest may use the paravirtual features
+regardless of what has actually been exposed through the CPUID leaf.
+
+.. _KVM_CAP_DIRTY_LOG_RING:
+
+
.. _cap_enable_vm:
7. Capabilities that can be enabled on VMs
@@ -7927,10 +7996,10 @@
7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
-------------------------------------
-Architectures: x86 SEV enabled
-Type: vm
-Parameters: args[0] is the fd of the source vm
-Returns: 0 on success; ENOTTY on error
+:Architectures: x86 SEV enabled
+:Type: vm
+:Parameters: args[0] is the fd of the source vm
+:Returns: 0 on success; ENOTTY on error
This capability enables userspace to copy encryption context from the vm
indicated by the fd to the vm this is called on.
@@ -7963,24 +8032,6 @@
See Documentation/arch/x86/sgx.rst for more details.
-7.26 KVM_CAP_PPC_RPT_INVALIDATE
--------------------------------
-
-:Capability: KVM_CAP_PPC_RPT_INVALIDATE
-:Architectures: ppc
-:Type: vm
-
-This capability indicates that the kernel is capable of handling
-H_RPT_INVALIDATE hcall.
-
-In order to enable the use of H_RPT_INVALIDATE in the guest,
-user space might have to advertise it for the guest. For example,
-IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
-present in the "ibm,hypertas-functions" device-tree property.
-
-This capability is enabled for hypervisors on platforms like POWER9
-that support radix MMU.
-
7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE
--------------------------------------
@@ -8038,24 +8089,9 @@
This is intended to support intra-host migration of VMs between userspace VMMs,
upgrading the VMM process without interrupting the guest.
-7.30 KVM_CAP_PPC_AIL_MODE_3
--------------------------------
-
-:Capability: KVM_CAP_PPC_AIL_MODE_3
-:Architectures: ppc
-:Type: vm
-
-This capability indicates that the kernel supports the mode 3 setting for the
-"Address Translation Mode on Interrupt" aka "Alternate Interrupt Location"
-resource that is controlled with the H_SET_MODE hypercall.
-
-This capability allows a guest kernel to use a better-performance mode for
-handling interrupts and system calls.
-
7.31 KVM_CAP_DISABLE_QUIRKS2
----------------------------
-:Capability: KVM_CAP_DISABLE_QUIRKS2
:Parameters: args[0] - set of KVM quirks to disable
:Architectures: x86
:Type: vm
@@ -8210,27 +8246,6 @@
cause CPU stuck (due to event windows don't open up) and make the CPU
unavailable to host or other VMs.
-7.34 KVM_CAP_MEMORY_FAULT_INFO
-------------------------------
-
-:Architectures: x86
-:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
-
-The presence of this capability indicates that KVM_RUN will fill
-kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if
-there is a valid memslot but no backing VMA for the corresponding host virtual
-address.
-
-The information in kvm_run.memory_fault is valid if and only if KVM_RUN returns
-an error with errno=EFAULT or errno=EHWPOISON *and* kvm_run.exit_reason is set
-to KVM_EXIT_MEMORY_FAULT.
-
-Note: Userspaces which attempt to resolve memory faults so that they can retry
-KVM_RUN are encouraged to guard against repeatedly receiving the same
-error/annotated fault.
-
-See KVM_EXIT_MEMORY_FAULT for more information.
-
7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
-----------------------------------
@@ -8248,421 +8263,11 @@
Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
-7.36 KVM_CAP_X86_GUEST_MODE
-------------------------------
-
-:Architectures: x86
-:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
-
-The presence of this capability indicates that KVM_RUN will update the
-KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
-vCPU was executing nested guest code when it exited.
-
-KVM exits with the register state of either the L1 or L2 guest
-depending on which executed at the time of an exit. Userspace must
-take care to differentiate between these cases.
-
-7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
--------------------------------------
-
-:Architectures: arm64
-:Target: VM
-:Parameters: None
-:Returns: 0 on success, -EINVAL if vCPUs have been created before enabling this
- capability.
-
-This capability changes the behavior of the registers that identify a PE
-implementation of the Arm architecture: MIDR_EL1, REVIDR_EL1, and AIDR_EL1.
-By default, these registers are visible to userspace but treated as invariant.
-
-When this capability is enabled, KVM allows userspace to change the
-aforementioned registers before the first KVM_RUN. These registers are VM
-scoped, meaning that the same set of values are presented on all vCPUs in a
-given VM.
-
-8. Other capabilities.
-======================
-
-This section lists capabilities that give information about other
-features of the KVM implementation.
-
-8.1 KVM_CAP_PPC_HWRNG
----------------------
-
-:Architectures: ppc
-
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that the kernel has an implementation of the
-H_RANDOM hypercall backed by a hardware random-number generator.
-If present, the kernel H_RANDOM handler can be enabled for guest use
-with the KVM_CAP_PPC_ENABLE_HCALL capability.
-
-8.2 KVM_CAP_HYPERV_SYNIC
-------------------------
-
-:Architectures: x86
-
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that the kernel has an implementation of the
-Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
-used to support Windows Hyper-V based guest paravirt drivers(VMBus).
-
-In order to use SynIC, it has to be activated by setting this
-capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
-will disable the use of APIC hardware virtualization even if supported
-by the CPU, as it's incompatible with SynIC auto-EOI behavior.
-
-8.3 KVM_CAP_PPC_MMU_RADIX
--------------------------
-
-:Architectures: ppc
-
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that the kernel can support guests using the
-radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
-processor).
-
-8.4 KVM_CAP_PPC_MMU_HASH_V3
----------------------------
-
-:Architectures: ppc
-
-This capability, if KVM_CHECK_EXTENSION indicates that it is
-available, means that the kernel can support guests using the
-hashed page table MMU defined in Power ISA V3.00 (as implemented in
-the POWER9 processor), including in-memory segment tables.
-
-8.5 KVM_CAP_MIPS_VZ
--------------------
-
-:Architectures: mips
-
-This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
-it is available, means that full hardware assisted virtualization capabilities
-of the hardware are available for use through KVM. An appropriate
-KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
-utilises it.
-
-If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
-available, it means that the VM is using full hardware assisted virtualization
-capabilities of the hardware. This is useful to check after creating a VM with
-KVM_VM_MIPS_DEFAULT.
-
-The value returned by KVM_CHECK_EXTENSION should be compared against known
-values (see below). All other values are reserved. This is to allow for the
-possibility of other hardware assisted virtualization implementations which
-may be incompatible with the MIPS VZ ASE.
-
-== ==========================================================================
- 0 The trap & emulate implementation is in use to run guest code in user
- mode. Guest virtual memory segments are rearranged to fit the guest in the
- user mode address space.
-
- 1 The MIPS VZ ASE is in use, providing full hardware assisted
- virtualization, including standard guest virtual memory segments.
-== ==========================================================================
-
-8.6 KVM_CAP_MIPS_TE
--------------------
-
-:Architectures: mips
-
-This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
-it is available, means that the trap & emulate implementation is available to
-run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
-assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
-to KVM_CREATE_VM to create a VM which utilises it.
-
-If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
-available, it means that the VM is using trap & emulate.
-
-8.7 KVM_CAP_MIPS_64BIT
-----------------------
-
-:Architectures: mips
-
-This capability indicates the supported architecture type of the guest, i.e. the
-supported register and address width.
-
-The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
-kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
-be checked specifically against known values (see below). All other values are
-reserved.
-
-== ========================================================================
- 0 MIPS32 or microMIPS32.
- Both registers and addresses are 32-bits wide.
- It will only be possible to run 32-bit guest code.
-
- 1 MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
- Registers are 64-bits wide, but addresses are 32-bits wide.
- 64-bit guest code may run but cannot access MIPS64 memory segments.
- It will also be possible to run 32-bit guest code.
-
- 2 MIPS64 or microMIPS64 with access to all address segments.
- Both registers and addresses are 64-bits wide.
- It will be possible to run 64-bit or 32-bit guest code.
-== ========================================================================
-
-8.9 KVM_CAP_ARM_USER_IRQ
-------------------------
-
-:Architectures: arm64
-
-This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
-that if userspace creates a VM without an in-kernel interrupt controller, it
-will be notified of changes to the output level of in-kernel emulated devices,
-which can generate virtual interrupts, presented to the VM.
-For such VMs, on every return to userspace, the kernel
-updates the vcpu's run->s.regs.device_irq_level field to represent the actual
-output level of the device.
-
-Whenever kvm detects a change in the device output level, kvm guarantees at
-least one return to userspace before running the VM. This exit could either
-be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
-userspace can always sample the device output level and re-compute the state of
-the userspace interrupt controller. Userspace should always check the state
-of run->s.regs.device_irq_level on every kvm exit.
-The value in run->s.regs.device_irq_level can represent both level and edge
-triggered interrupt signals, depending on the device. Edge triggered interrupt
-signals will exit to userspace with the bit in run->s.regs.device_irq_level
-set exactly once per edge signal.
-
-The field run->s.regs.device_irq_level is available independent of
-run->kvm_valid_regs or run->kvm_dirty_regs bits.
-
-If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
-number larger than 0 indicating the version of this capability is implemented
-and thereby which bits in run->s.regs.device_irq_level can signal values.
-
-Currently the following bits are defined for the device_irq_level bitmap::
-
- KVM_CAP_ARM_USER_IRQ >= 1:
-
- KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
- KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
- KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
-
-Future versions of kvm may implement additional events. These will get
-indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
-listed above.
-
-8.10 KVM_CAP_PPC_SMT_POSSIBLE
------------------------------
-
-:Architectures: ppc
-
-Querying this capability returns a bitmap indicating the possible
-virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
-(counting from the right) is set, then a virtual SMT mode of 2^N is
-available.
-
-8.11 KVM_CAP_HYPERV_SYNIC2
---------------------------
-
-:Architectures: x86
-
-This capability enables a newer version of Hyper-V Synthetic interrupt
-controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
-doesn't clear SynIC message and event flags pages when they are enabled by
-writing to the respective MSRs.
-
-8.12 KVM_CAP_HYPERV_VP_INDEX
-----------------------------
-
-:Architectures: x86
-
-This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
-value is used to denote the target vcpu for a SynIC interrupt. For
-compatibility, KVM initializes this msr to KVM's internal vcpu index. When this
-capability is absent, userspace can still query this msr's value.
-
-8.13 KVM_CAP_S390_AIS_MIGRATION
--------------------------------
-
-:Architectures: s390
-:Parameters: none
-
-This capability indicates if the flic device will be able to get/set the
-AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
-to discover this without having to create a flic device.
-
-8.14 KVM_CAP_S390_PSW
----------------------
-
-:Architectures: s390
-
-This capability indicates that the PSW is exposed via the kvm_run structure.
-
-8.15 KVM_CAP_S390_GMAP
-----------------------
-
-:Architectures: s390
-
-This capability indicates that the user space memory used as guest mapping can
-be anywhere in the user memory address space, as long as the memory slots are
-aligned and sized to a segment (1MB) boundary.
-
-8.16 KVM_CAP_S390_COW
----------------------
-
-:Architectures: s390
-
-This capability indicates that the user space memory used as guest mapping can
-use copy-on-write semantics as well as dirty pages tracking via read-only page
-tables.
-
-8.17 KVM_CAP_S390_BPB
----------------------
-
-:Architectures: s390
-
-This capability indicates that kvm will implement the interfaces to handle
-reset, migration and nested KVM for branch prediction blocking. The stfle
-facility 82 should not be provided to the guest without this capability.
-
-8.18 KVM_CAP_HYPERV_TLBFLUSH
-----------------------------
-
-:Architectures: x86
-
-This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
-hypercalls:
-HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
-HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
-
-8.19 KVM_CAP_ARM_INJECT_SERROR_ESR
-----------------------------------
-
-:Architectures: arm64
-
-This capability indicates that userspace can specify (via the
-KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
-takes a virtual SError interrupt exception.
-If KVM advertises this capability, userspace can only specify the ISS field for
-the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
-CPU when the exception is taken. If this virtual SError is taken to EL1 using
-AArch64, this value will be reported in the ISS field of ESR_ELx.
-
-See KVM_CAP_VCPU_EVENTS for more details.
-
-8.20 KVM_CAP_HYPERV_SEND_IPI
-----------------------------
-
-:Architectures: x86
-
-This capability indicates that KVM supports paravirtualized Hyper-V IPI send
-hypercalls:
-HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
-
-8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
------------------------------------
-
-:Architectures: x86
-
-This capability indicates that KVM running on top of Hyper-V hypervisor
-enables Direct TLB flush for its guests meaning that TLB flush
-hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
-Due to the different ABI for hypercall parameters between Hyper-V and
-KVM, enabling this capability effectively disables all hypercall
-handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
-flush hypercalls by Hyper-V) so userspace should disable KVM identification
-in CPUID and only exposes Hyper-V identification. In this case, guest
-thinks it's running on Hyper-V and only use Hyper-V hypercalls.
-
-8.22 KVM_CAP_S390_VCPU_RESETS
------------------------------
-
-:Architectures: s390
-
-This capability indicates that the KVM_S390_NORMAL_RESET and
-KVM_S390_CLEAR_RESET ioctls are available.
-
-8.23 KVM_CAP_S390_PROTECTED
----------------------------
-
-:Architectures: s390
-
-This capability indicates that the Ultravisor has been initialized and
-KVM can therefore start protected VMs.
-This capability governs the KVM_S390_PV_COMMAND ioctl and the
-KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
-guests when the state change is invalid.
-
-8.24 KVM_CAP_STEAL_TIME
------------------------
-
-:Architectures: arm64, x86
-
-This capability indicates that KVM supports steal time accounting.
-When steal time accounting is supported it may be enabled with
-architecture-specific interfaces. This capability and the architecture-
-specific interfaces must be consistent, i.e. if one says the feature
-is supported, than the other should as well and vice versa. For arm64
-see Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL".
-For x86 see Documentation/virt/kvm/x86/msr.rst "MSR_KVM_STEAL_TIME".
-
-8.25 KVM_CAP_S390_DIAG318
--------------------------
-
-:Architectures: s390
-
-This capability enables a guest to set information about its control program
-(i.e. guest kernel type and version). The information is helpful during
-system/firmware service events, providing additional data about the guest
-environments running on the machine.
-
-The information is associated with the DIAGNOSE 0x318 instruction, which sets
-an 8-byte value consisting of a one-byte Control Program Name Code (CPNC) and
-a 7-byte Control Program Version Code (CPVC). The CPNC determines what
-environment the control program is running in (e.g. Linux, z/VM...), and the
-CPVC is used for information specific to OS (e.g. Linux version, Linux
-distribution...)
-
-If this capability is available, then the CPNC and CPVC can be synchronized
-between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).
-
-8.26 KVM_CAP_X86_USER_SPACE_MSR
--------------------------------
-
-:Architectures: x86
-
-This capability indicates that KVM supports deflection of MSR reads and
-writes to user space. It can be enabled on a VM level. If enabled, MSR
-accesses that would usually trigger a #GP by KVM into the guest will
-instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
-KVM_EXIT_X86_WRMSR exit notifications.
-
-8.27 KVM_CAP_X86_MSR_FILTER
----------------------------
-
-:Architectures: x86
-
-This capability indicates that KVM supports that accesses to user defined MSRs
-may be rejected. With this capability exposed, KVM exports new VM ioctl
-KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
-ranges that KVM should deny access to.
-
-In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
-trap and emulate MSRs that are outside of the scope of KVM as well as
-limit the attack surface on KVM's MSR emulation code.
-
-8.28 KVM_CAP_ENFORCE_PV_FEATURE_CPUID
--------------------------------------
-
-Architectures: x86
-
-When enabled, KVM will disable paravirtual features provided to the
-guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
-(0x40000001). Otherwise, a guest may use the paravirtual features
-regardless of what has actually been exposed through the CPUID leaf.
-
-.. _KVM_CAP_DIRTY_LOG_RING:
-
-8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
+7.36 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------
:Architectures: x86, arm64
+:Type: vm
:Parameters: args[0] - size of the dirty log ring
KVM is capable of tracking dirty memory using ring buffers that are
@@ -8783,6 +8388,426 @@
vgic3 pending table through KVM_DEV_ARM_VGIC_{GRP_CTRL, SAVE_PENDING_TABLES}
command on KVM device "kvm-arm-vgic-v3".
+7.37 KVM_CAP_PMU_CAPABILITY
+---------------------------
+
+:Architectures: x86
+:Type: vm
+:Parameters: arg[0] is bitmask of PMU virtualization capabilities.
+:Returns: 0 on success, -EINVAL when arg[0] contains invalid bits
+
+This capability alters PMU virtualization in KVM.
+
+Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
+PMU virtualization capabilities that can be adjusted on a VM.
+
+The argument to KVM_ENABLE_CAP is also a bitmask and selects specific
+PMU virtualization capabilities to be applied to the VM. This can
+only be invoked on a VM prior to the creation of VCPUs.
+
+At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting
+this capability will disable PMU virtualization for that VM. Usermode
+should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
+
+7.38 KVM_CAP_VM_DISABLE_NX_HUGE_PAGES
+-------------------------------------
+
+:Architectures: x86
+:Type: vm
+:Parameters: arg[0] must be 0.
+:Returns: 0 on success, -EPERM if the userspace process does not
+ have CAP_SYS_BOOT, -EINVAL if args[0] is not 0 or any vCPUs have been
+ created.
+
+This capability disables the NX huge pages mitigation for iTLB MULTIHIT.
+
+The capability has no effect if the nx_huge_pages module parameter is not set.
+
+This capability may only be set before any vCPUs are created.
+
+7.39 KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
+---------------------------------------
+
+:Architectures: arm64
+:Type: vm
+:Parameters: arg[0] is the new split chunk size.
+:Returns: 0 on success, -EINVAL if any memslot was already created.
+
+This capability sets the chunk size used in Eager Page Splitting.
+
+Eager Page Splitting improves the performance of dirty-logging (used
+in live migrations) when guest memory is backed by huge-pages. It
+avoids splitting huge-pages (into PAGE_SIZE pages) on fault, by doing
+it eagerly when enabling dirty logging (with the
+KVM_MEM_LOG_DIRTY_PAGES flag for a memory region), or when using
+KVM_CLEAR_DIRTY_LOG.
+
+The chunk size specifies how many pages to break at a time, using a
+single allocation for each chunk. Bigger the chunk size, more pages
+need to be allocated ahead of time.
+
+The chunk size needs to be a valid block size. The list of acceptable
+block sizes is exposed in KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
+64-bit bitmap (each bit describing a block size). The default value is
+0, to disable the eager page splitting.
+
+7.40 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that can be configured to exit to userspace.
+Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
+
+The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
+of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
+the hypercalls whose corresponding bit is in the argument, and return
+ENOSYS for the others.
+
+7.41 KVM_CAP_ARM_SYSTEM_SUSPEND
+-------------------------------
+
+:Architectures: arm64
+:Type: vm
+
+When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
+type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
+
+7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
+-------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: None
+:Returns: 0 on success, -EINVAL if vCPUs have been created before enabling this
+ capability.
+
+This capability changes the behavior of the registers that identify a PE
+implementation of the Arm architecture: MIDR_EL1, REVIDR_EL1, and AIDR_EL1.
+By default, these registers are visible to userspace but treated as invariant.
+
+When this capability is enabled, KVM allows userspace to change the
+aforementioned registers before the first KVM_RUN. These registers are VM
+scoped, meaning that the same set of values are presented on all vCPUs in a
+given VM.
+
+8. Other capabilities.
+======================
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+---------------------
+
+:Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
+
+8.3 KVM_CAP_PPC_MMU_RADIX
+-------------------------
+
+:Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that the kernel can support guests using the
+radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
+processor).
+
+8.4 KVM_CAP_PPC_MMU_HASH_V3
+---------------------------
+
+:Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that the kernel can support guests using the
+hashed page table MMU defined in Power ISA V3.00 (as implemented in
+the POWER9 processor), including in-memory segment tables.
+
+8.5 KVM_CAP_MIPS_VZ
+-------------------
+
+:Architectures: mips
+
+This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
+it is available, means that full hardware assisted virtualization capabilities
+of the hardware are available for use through KVM. An appropriate
+KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
+utilises it.
+
+If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
+available, it means that the VM is using full hardware assisted virtualization
+capabilities of the hardware. This is useful to check after creating a VM with
+KVM_VM_MIPS_DEFAULT.
+
+The value returned by KVM_CHECK_EXTENSION should be compared against known
+values (see below). All other values are reserved. This is to allow for the
+possibility of other hardware assisted virtualization implementations which
+may be incompatible with the MIPS VZ ASE.
+
+== ==========================================================================
+ 0 The trap & emulate implementation is in use to run guest code in user
+ mode. Guest virtual memory segments are rearranged to fit the guest in the
+ user mode address space.
+
+ 1 The MIPS VZ ASE is in use, providing full hardware assisted
+ virtualization, including standard guest virtual memory segments.
+== ==========================================================================
+
+8.7 KVM_CAP_MIPS_64BIT
+----------------------
+
+:Architectures: mips
+
+This capability indicates the supported architecture type of the guest, i.e. the
+supported register and address width.
+
+The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
+kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
+be checked specifically against known values (see below). All other values are
+reserved.
+
+== ========================================================================
+ 0 MIPS32 or microMIPS32.
+ Both registers and addresses are 32-bits wide.
+ It will only be possible to run 32-bit guest code.
+
+ 1 MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
+ Registers are 64-bits wide, but addresses are 32-bits wide.
+ 64-bit guest code may run but cannot access MIPS64 memory segments.
+ It will also be possible to run 32-bit guest code.
+
+ 2 MIPS64 or microMIPS64 with access to all address segments.
+ Both registers and addresses are 64-bits wide.
+ It will be possible to run 64-bit or 32-bit guest code.
+== ========================================================================
+
+8.9 KVM_CAP_ARM_USER_IRQ
+------------------------
+
+:Architectures: arm64
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
+that if userspace creates a VM without an in-kernel interrupt controller, it
+will be notified of changes to the output level of in-kernel emulated devices,
+which can generate virtual interrupts, presented to the VM.
+For such VMs, on every return to userspace, the kernel
+updates the vcpu's run->s.regs.device_irq_level field to represent the actual
+output level of the device.
+
+Whenever kvm detects a change in the device output level, kvm guarantees at
+least one return to userspace before running the VM. This exit could either
+be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
+userspace can always sample the device output level and re-compute the state of
+the userspace interrupt controller. Userspace should always check the state
+of run->s.regs.device_irq_level on every kvm exit.
+The value in run->s.regs.device_irq_level can represent both level and edge
+triggered interrupt signals, depending on the device. Edge triggered interrupt
+signals will exit to userspace with the bit in run->s.regs.device_irq_level
+set exactly once per edge signal.
+
+The field run->s.regs.device_irq_level is available independent of
+run->kvm_valid_regs or run->kvm_dirty_regs bits.
+
+If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
+number larger than 0 indicating the version of this capability is implemented
+and thereby which bits in run->s.regs.device_irq_level can signal values.
+
+Currently the following bits are defined for the device_irq_level bitmap::
+
+ KVM_CAP_ARM_USER_IRQ >= 1:
+
+ KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
+ KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
+ KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
+
+Future versions of kvm may implement additional events. These will get
+indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
+listed above.
+
+8.10 KVM_CAP_PPC_SMT_POSSIBLE
+-----------------------------
+
+:Architectures: ppc
+
+Querying this capability returns a bitmap indicating the possible
+virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
+(counting from the right) is set, then a virtual SMT mode of 2^N is
+available.
+
+8.12 KVM_CAP_HYPERV_VP_INDEX
+----------------------------
+
+:Architectures: x86
+
+This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
+value is used to denote the target vcpu for a SynIC interrupt. For
+compatibility, KVM initializes this msr to KVM's internal vcpu index. When this
+capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_S390_AIS_MIGRATION
+-------------------------------
+
+:Architectures: s390
+
+This capability indicates if the flic device will be able to get/set the
+AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
+to discover this without having to create a flic device.
+
+8.14 KVM_CAP_S390_PSW
+---------------------
+
+:Architectures: s390
+
+This capability indicates that the PSW is exposed via the kvm_run structure.
+
+8.15 KVM_CAP_S390_GMAP
+----------------------
+
+:Architectures: s390
+
+This capability indicates that the user space memory used as guest mapping can
+be anywhere in the user memory address space, as long as the memory slots are
+aligned and sized to a segment (1MB) boundary.
+
+8.16 KVM_CAP_S390_COW
+---------------------
+
+:Architectures: s390
+
+This capability indicates that the user space memory used as guest mapping can
+use copy-on-write semantics as well as dirty pages tracking via read-only page
+tables.
+
+8.17 KVM_CAP_S390_BPB
+---------------------
+
+:Architectures: s390
+
+This capability indicates that kvm will implement the interfaces to handle
+reset, migration and nested KVM for branch prediction blocking. The stfle
+facility 82 should not be provided to the guest without this capability.
+
+8.18 KVM_CAP_HYPERV_TLBFLUSH
+----------------------------
+
+:Architectures: x86
+
+This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
+hypercalls:
+HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
+HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
+
+8.19 KVM_CAP_ARM_INJECT_SERROR_ESR
+----------------------------------
+
+:Architectures: arm64
+
+This capability indicates that userspace can specify (via the
+KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
+takes a virtual SError interrupt exception.
+If KVM advertises this capability, userspace can only specify the ISS field for
+the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
+CPU when the exception is taken. If this virtual SError is taken to EL1 using
+AArch64, this value will be reported in the ISS field of ESR_ELx.
+
+See KVM_CAP_VCPU_EVENTS for more details.
+
+8.20 KVM_CAP_HYPERV_SEND_IPI
+----------------------------
+
+:Architectures: x86
+
+This capability indicates that KVM supports paravirtualized Hyper-V IPI send
+hypercalls:
+HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
+
+8.22 KVM_CAP_S390_VCPU_RESETS
+-----------------------------
+
+:Architectures: s390
+
+This capability indicates that the KVM_S390_NORMAL_RESET and
+KVM_S390_CLEAR_RESET ioctls are available.
+
+8.23 KVM_CAP_S390_PROTECTED
+---------------------------
+
+:Architectures: s390
+
+This capability indicates that the Ultravisor has been initialized and
+KVM can therefore start protected VMs.
+This capability governs the KVM_S390_PV_COMMAND ioctl and the
+KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
+guests when the state change is invalid.
+
+8.24 KVM_CAP_STEAL_TIME
+-----------------------
+
+:Architectures: arm64, x86
+
+This capability indicates that KVM supports steal time accounting.
+When steal time accounting is supported it may be enabled with
+architecture-specific interfaces. This capability and the architecture-
+specific interfaces must be consistent, i.e. if one says the feature
+is supported, than the other should as well and vice versa. For arm64
+see Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL".
+For x86 see Documentation/virt/kvm/x86/msr.rst "MSR_KVM_STEAL_TIME".
+
+8.25 KVM_CAP_S390_DIAG318
+-------------------------
+
+:Architectures: s390
+
+This capability enables a guest to set information about its control program
+(i.e. guest kernel type and version). The information is helpful during
+system/firmware service events, providing additional data about the guest
+environments running on the machine.
+
+The information is associated with the DIAGNOSE 0x318 instruction, which sets
+an 8-byte value consisting of a one-byte Control Program Name Code (CPNC) and
+a 7-byte Control Program Version Code (CPVC). The CPNC determines what
+environment the control program is running in (e.g. Linux, z/VM...), and the
+CPVC is used for information specific to OS (e.g. Linux version, Linux
+distribution...)
+
+If this capability is available, then the CPNC and CPVC can be synchronized
+between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).
+
+8.26 KVM_CAP_X86_USER_SPACE_MSR
+-------------------------------
+
+:Architectures: x86
+
+This capability indicates that KVM supports deflection of MSR reads and
+writes to user space. It can be enabled on a VM level. If enabled, MSR
+accesses that would usually trigger a #GP by KVM into the guest will
+instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
+KVM_EXIT_X86_WRMSR exit notifications.
+
+8.27 KVM_CAP_X86_MSR_FILTER
+---------------------------
+
+:Architectures: x86
+
+This capability indicates that KVM supports that accesses to user defined MSRs
+may be rejected. With this capability exposed, KVM exports new VM ioctl
+KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
+ranges that KVM should deny access to.
+
+In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
+trap and emulate MSRs that are outside of the scope of KVM as well as
+limit the attack surface on KVM's MSR emulation code.
+
8.30 KVM_CAP_XEN_HVM
--------------------
@@ -8847,10 +8872,9 @@
done when the KVM_CAP_XEN_HVM ioctl sets the
KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE flag.
-8.31 KVM_CAP_PPC_MULTITCE
--------------------------
+8.31 KVM_CAP_SPAPR_MULTITCE
+---------------------------
-:Capability: KVM_CAP_PPC_MULTITCE
:Architectures: ppc
:Type: vm
@@ -8882,72 +8906,9 @@
supported in the host. A VMM can check whether the service is
available to the guest on migration.
-8.33 KVM_CAP_HYPERV_ENFORCE_CPUID
----------------------------------
-
-Architectures: x86
-
-When enabled, KVM will disable emulated Hyper-V features provided to the
-guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
-currently implemented Hyper-V features are provided unconditionally when
-Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
-leaf.
-
-8.34 KVM_CAP_EXIT_HYPERCALL
----------------------------
-
-:Capability: KVM_CAP_EXIT_HYPERCALL
-:Architectures: x86
-:Type: vm
-
-This capability, if enabled, will cause KVM to exit to userspace
-with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
-
-Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
-of hypercalls that can be configured to exit to userspace.
-Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
-
-The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
-of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
-the hypercalls whose corresponding bit is in the argument, and return
-ENOSYS for the others.
-
-8.35 KVM_CAP_PMU_CAPABILITY
----------------------------
-
-:Capability: KVM_CAP_PMU_CAPABILITY
-:Architectures: x86
-:Type: vm
-:Parameters: arg[0] is bitmask of PMU virtualization capabilities.
-:Returns: 0 on success, -EINVAL when arg[0] contains invalid bits
-
-This capability alters PMU virtualization in KVM.
-
-Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of
-PMU virtualization capabilities that can be adjusted on a VM.
-
-The argument to KVM_ENABLE_CAP is also a bitmask and selects specific
-PMU virtualization capabilities to be applied to the VM. This can
-only be invoked on a VM prior to the creation of VCPUs.
-
-At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting
-this capability will disable PMU virtualization for that VM. Usermode
-should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
-
-8.36 KVM_CAP_ARM_SYSTEM_SUSPEND
--------------------------------
-
-:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
-:Architectures: arm64
-:Type: vm
-
-When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
-type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
-
8.37 KVM_CAP_S390_PROTECTED_DUMP
--------------------------------
-:Capability: KVM_CAP_S390_PROTECTED_DUMP
:Architectures: s390
:Type: vm
@@ -8957,27 +8918,9 @@
dump related UV data. Also the vcpu ioctl `KVM_S390_PV_CPU_COMMAND` is
available and supports the `KVM_PV_DUMP_CPU` subcommand.
-8.38 KVM_CAP_VM_DISABLE_NX_HUGE_PAGES
--------------------------------------
-
-:Capability: KVM_CAP_VM_DISABLE_NX_HUGE_PAGES
-:Architectures: x86
-:Type: vm
-:Parameters: arg[0] must be 0.
-:Returns: 0 on success, -EPERM if the userspace process does not
- have CAP_SYS_BOOT, -EINVAL if args[0] is not 0 or any vCPUs have been
- created.
-
-This capability disables the NX huge pages mitigation for iTLB MULTIHIT.
-
-The capability has no effect if the nx_huge_pages module parameter is not set.
-
-This capability may only be set before any vCPUs are created.
-
8.39 KVM_CAP_S390_CPU_TOPOLOGY
------------------------------
-:Capability: KVM_CAP_S390_CPU_TOPOLOGY
:Architectures: s390
:Type: vm
@@ -8999,37 +8942,9 @@
When getting the Modified Change Topology Report value, the attr->addr
must point to a byte where the value will be stored or retrieved from.
-8.40 KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
----------------------------------------
-
-:Capability: KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE
-:Architectures: arm64
-:Type: vm
-:Parameters: arg[0] is the new split chunk size.
-:Returns: 0 on success, -EINVAL if any memslot was already created.
-
-This capability sets the chunk size used in Eager Page Splitting.
-
-Eager Page Splitting improves the performance of dirty-logging (used
-in live migrations) when guest memory is backed by huge-pages. It
-avoids splitting huge-pages (into PAGE_SIZE pages) on fault, by doing
-it eagerly when enabling dirty logging (with the
-KVM_MEM_LOG_DIRTY_PAGES flag for a memory region), or when using
-KVM_CLEAR_DIRTY_LOG.
-
-The chunk size specifies how many pages to break at a time, using a
-single allocation for each chunk. Bigger the chunk size, more pages
-need to be allocated ahead of time.
-
-The chunk size needs to be a valid block size. The list of acceptable
-block sizes is exposed in KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
-64-bit bitmap (each bit describing a block size). The default value is
-0, to disable the eager page splitting.
-
8.41 KVM_CAP_VM_TYPES
---------------------
-:Capability: KVM_CAP_MEMORY_ATTRIBUTES
:Architectures: x86
:Type: system ioctl
@@ -9046,6 +8961,67 @@
production. The behavior and effective ABI for software-protected VMs is
unstable.
+8.42 KVM_CAP_PPC_RPT_INVALIDATE
+-------------------------------
+
+:Architectures: ppc
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
+8.43 KVM_CAP_PPC_AIL_MODE_3
+---------------------------
+
+:Architectures: ppc
+
+This capability indicates that the kernel supports the mode 3 setting for the
+"Address Translation Mode on Interrupt" aka "Alternate Interrupt Location"
+resource that is controlled with the H_SET_MODE hypercall.
+
+This capability allows a guest kernel to use a better-performance mode for
+handling interrupts and system calls.
+
+8.44 KVM_CAP_MEMORY_FAULT_INFO
+------------------------------
+
+:Architectures: x86
+
+The presence of this capability indicates that KVM_RUN will fill
+kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if
+there is a valid memslot but no backing VMA for the corresponding host virtual
+address.
+
+The information in kvm_run.memory_fault is valid if and only if KVM_RUN returns
+an error with errno=EFAULT or errno=EHWPOISON *and* kvm_run.exit_reason is set
+to KVM_EXIT_MEMORY_FAULT.
+
+Note: Userspaces which attempt to resolve memory faults so that they can retry
+KVM_RUN are encouraged to guard against repeatedly receiving the same
+error/annotated fault.
+
+See KVM_EXIT_MEMORY_FAULT for more information.
+
+8.45 KVM_CAP_X86_GUEST_MODE
+---------------------------
+
+:Architectures: x86
+
+The presence of this capability indicates that KVM_RUN will update the
+KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
+vCPU was executing nested guest code when it exited.
+
+KVM exits with the register state of either the L1 or L2 guest
+depending on which executed at the time of an exit. Userspace must
+take care to differentiate between these cases.
+
9. Known KVM API problems
=========================
@@ -9076,9 +9052,10 @@
The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.
-CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``.
-It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
-has enabled in-kernel emulation of the local APIC.
+On older versions of Linux, CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by
+``KVM_GET_SUPPORTED_CPUID``, but it can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER``
+is present and the kernel has enabled in-kernel emulation of the local APIC.
+On newer versions, ``KVM_GET_SUPPORTED_CPUID`` does report the bit as available.
CPU topology
~~~~~~~~~~~~
diff --git a/MAINTAINERS b/MAINTAINERS
index 96b8270..c593161 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10151,6 +10151,8 @@
F: include/linux/gpio/
F: include/linux/of_gpio.h
K: (devm_)?gpio_(request|free|direction|get|set)
+K: GPIOD_FLAGS_BIT_NONEXCLUSIVE
+K: devm_gpiod_unhinge
GPIO UAPI
M: Bartosz Golaszewski <brgl@bgdev.pl>
diff --git a/Makefile b/Makefile
index 38689a0..f424185 100644
--- a/Makefile
+++ b/Makefile
@@ -1068,6 +1068,9 @@
KBUILD_CFLAGS += -fconserve-stack
endif
+# Ensure compilers do not transform certain loops into calls to wcslen()
+KBUILD_CFLAGS += -fno-builtin-wcslen
+
# change __FILE__ to the relative path to the source directory
ifdef building_out_of_srctree
KBUILD_CPPFLAGS += $(call cc-option,-ffile-prefix-map=$(srcroot)/=)
diff --git a/arch/arm/configs/at91_dt_defconfig b/arch/arm/configs/at91_dt_defconfig
index f2596a1..ff13e1e 100644
--- a/arch/arm/configs/at91_dt_defconfig
+++ b/arch/arm/configs/at91_dt_defconfig
@@ -232,7 +232,6 @@
CONFIG_CRYPTO_DEV_ATMEL_AES=y
CONFIG_CRYPTO_DEV_ATMEL_TDES=y
CONFIG_CRYPTO_DEV_ATMEL_SHA=y
-CONFIG_CRC_CCITT=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_ACORN_8x8=y
diff --git a/arch/arm/configs/collie_defconfig b/arch/arm/configs/collie_defconfig
index 42cb1c8..578c6a4 100644
--- a/arch/arm/configs/collie_defconfig
+++ b/arch/arm/configs/collie_defconfig
@@ -78,7 +78,6 @@
CONFIG_NLS_DEFAULT="cp437"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_CCITT=y
CONFIG_FONTS=y
CONFIG_FONT_MINI_4x6=y
# CONFIG_DEBUG_BUGVERBOSE is not set
diff --git a/arch/arm/configs/davinci_all_defconfig b/arch/arm/configs/davinci_all_defconfig
index 3474e47..70b8c78 100644
--- a/arch/arm/configs/davinci_all_defconfig
+++ b/arch/arm/configs/davinci_all_defconfig
@@ -249,7 +249,6 @@
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_T10DIF=m
CONFIG_DMA_CMA=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_RT_MUTEXES=y
diff --git a/arch/arm/configs/dove_defconfig b/arch/arm/configs/dove_defconfig
index b382a2e..d76eb12 100644
--- a/arch/arm/configs/dove_defconfig
+++ b/arch/arm/configs/dove_defconfig
@@ -128,7 +128,6 @@
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_DEV_MARVELL_CESA=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig
index 7ad48fd..e81a5d6 100644
--- a/arch/arm/configs/exynos_defconfig
+++ b/arch/arm/configs/exynos_defconfig
@@ -370,7 +370,6 @@
CONFIG_CRYPTO_CHACHA20_NEON=m
CONFIG_CRYPTO_DEV_EXYNOS_RNG=y
CONFIG_CRYPTO_DEV_S5P=y
-CONFIG_CRC_CCITT=y
CONFIG_DMA_CMA=y
CONFIG_CMA_SIZE_MBYTES=96
CONFIG_FONTS=y
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index 297c6a7b..062c1eb 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -481,8 +481,6 @@
CONFIG_CRYPTO_DEV_FSL_CAAM=y
CONFIG_CRYPTO_DEV_SAHARA=y
CONFIG_CRYPTO_DEV_MXS_DCP=y
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_T10DIF=y
CONFIG_CMA_SIZE_MBYTES=64
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
diff --git a/arch/arm/configs/lpc18xx_defconfig b/arch/arm/configs/lpc18xx_defconfig
index 2aa2ac8..2d48918 100644
--- a/arch/arm/configs/lpc18xx_defconfig
+++ b/arch/arm/configs/lpc18xx_defconfig
@@ -147,7 +147,6 @@
# CONFIG_INOTIFY_USER is not set
CONFIG_JFFS2_FS=y
# CONFIG_NETWORK_FILESYSTEMS is not set
-CONFIG_CRC_ITU_T=y
CONFIG_PRINTK_TIME=y
# CONFIG_ENABLE_MUST_CHECK is not set
# CONFIG_DEBUG_BUGVERBOSE is not set
diff --git a/arch/arm/configs/lpc32xx_defconfig b/arch/arm/configs/lpc32xx_defconfig
index 98e2672..9afccd7 100644
--- a/arch/arm/configs/lpc32xx_defconfig
+++ b/arch/arm/configs/lpc32xx_defconfig
@@ -179,7 +179,6 @@
CONFIG_NLS_UTF8=y
CONFIG_CRYPTO_ANSI_CPRNG=y
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig b/arch/arm/configs/milbeaut_m10v_defconfig
index acd1620..275ddf7 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -108,8 +108,6 @@
CONFIG_CRYPTO_AES_ARM_CE=m
CONFIG_CRYPTO_CHACHA20_NEON=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_ITU_T=m
CONFIG_DMA_CMA=y
CONFIG_CMA_SIZE_MBYTES=64
CONFIG_PRINTK_TIME=y
diff --git a/arch/arm/configs/mmp2_defconfig b/arch/arm/configs/mmp2_defconfig
index f6f9e13..842a989 100644
--- a/arch/arm/configs/mmp2_defconfig
+++ b/arch/arm/configs/mmp2_defconfig
@@ -67,7 +67,6 @@
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/multi_v4t_defconfig b/arch/arm/configs/multi_v4t_defconfig
index 27d6506..1a86dc3 100644
--- a/arch/arm/configs/multi_v4t_defconfig
+++ b/arch/arm/configs/multi_v4t_defconfig
@@ -91,6 +91,5 @@
CONFIG_VFAT_FS=y
CONFIG_CRAMFS=y
CONFIG_MINIX_FS=y
-CONFIG_CRC_CCITT=y
# CONFIG_FTRACE is not set
CONFIG_DEBUG_USER=y
diff --git a/arch/arm/configs/multi_v5_defconfig b/arch/arm/configs/multi_v5_defconfig
index db81862..cf6180b 100644
--- a/arch/arm/configs/multi_v5_defconfig
+++ b/arch/arm/configs/multi_v5_defconfig
@@ -289,7 +289,6 @@
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_DEV_MARVELL_CESA=y
-CONFIG_CRC_CCITT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/arm/configs/mvebu_v5_defconfig b/arch/arm/configs/mvebu_v5_defconfig
index a518d4a..23dbb80 100644
--- a/arch/arm/configs/mvebu_v5_defconfig
+++ b/arch/arm/configs/mvebu_v5_defconfig
@@ -187,7 +187,6 @@
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_DEV_MARVELL_CESA=y
-CONFIG_CRC_CCITT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/arm/configs/mxs_defconfig b/arch/arm/configs/mxs_defconfig
index d8a6e43..c76d661 100644
--- a/arch/arm/configs/mxs_defconfig
+++ b/arch/arm/configs/mxs_defconfig
@@ -160,7 +160,6 @@
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_CRYPTO_DEV_MXS_DCP=y
-CONFIG_CRC_ITU_T=m
CONFIG_FONTS=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig
index 113d6df..75b326b 100644
--- a/arch/arm/configs/omap2plus_defconfig
+++ b/arch/arm/configs/omap2plus_defconfig
@@ -706,9 +706,6 @@
CONFIG_CRYPTO_DEV_OMAP_SHAM=m
CONFIG_CRYPTO_DEV_OMAP_AES=m
CONFIG_CRYPTO_DEV_OMAP_DES=m
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_T10DIF=y
-CONFIG_CRC_ITU_T=y
CONFIG_DMA_CMA=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
diff --git a/arch/arm/configs/orion5x_defconfig b/arch/arm/configs/orion5x_defconfig
index 0629b08..62b9c61 100644
--- a/arch/arm/configs/orion5x_defconfig
+++ b/arch/arm/configs/orion5x_defconfig
@@ -136,7 +136,6 @@
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_DEV_MARVELL_CESA=y
-CONFIG_CRC_T10DIF=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/arm/configs/pxa168_defconfig b/arch/arm/configs/pxa168_defconfig
index ce10fe2..4748c7d 100644
--- a/arch/arm/configs/pxa168_defconfig
+++ b/arch/arm/configs/pxa168_defconfig
@@ -41,7 +41,6 @@
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/pxa910_defconfig b/arch/arm/configs/pxa910_defconfig
index 1f28aea..49b59c6 100644
--- a/arch/arm/configs/pxa910_defconfig
+++ b/arch/arm/configs/pxa910_defconfig
@@ -50,7 +50,6 @@
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/pxa_defconfig b/arch/arm/configs/pxa_defconfig
index de0ac8f..24fca86 100644
--- a/arch/arm/configs/pxa_defconfig
+++ b/arch/arm/configs/pxa_defconfig
@@ -663,8 +663,6 @@
CONFIG_CRYPTO_SHA256_ARM=m
CONFIG_CRYPTO_SHA512_ARM=m
CONFIG_CRYPTO_AES_ARM=m
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_T10DIF=m
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
diff --git a/arch/arm/configs/s5pv210_defconfig b/arch/arm/configs/s5pv210_defconfig
index 5dbe85c..02121ee 100644
--- a/arch/arm/configs/s5pv210_defconfig
+++ b/arch/arm/configs/s5pv210_defconfig
@@ -113,7 +113,6 @@
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_CRC_CCITT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index ea7ddf6..e14720a 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -227,8 +227,6 @@
CONFIG_CRYPTO_DEV_ATMEL_AES=y
CONFIG_CRYPTO_DEV_ATMEL_TDES=y
CONFIG_CRYPTO_DEV_ATMEL_SHA=y
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_ITU_T=y
CONFIG_DMA_CMA=y
CONFIG_CMA_SIZE_MBYTES=32
CONFIG_CMA_ALIGNMENT=9
diff --git a/arch/arm/configs/spitz_defconfig b/arch/arm/configs/spitz_defconfig
index ac5b7a5..ffec59e 100644
--- a/arch/arm/configs/spitz_defconfig
+++ b/arch/arm/configs/spitz_defconfig
@@ -234,7 +234,6 @@
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_WP512=m
-CONFIG_CRC_CCITT=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
diff --git a/arch/arm/configs/stm32_defconfig b/arch/arm/configs/stm32_defconfig
index 423bb41..dcd9c3160 100644
--- a/arch/arm/configs/stm32_defconfig
+++ b/arch/arm/configs/stm32_defconfig
@@ -74,7 +74,6 @@
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
CONFIG_NLS=y
-CONFIG_CRC_ITU_T=y
CONFIG_PRINTK_TIME=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/arm/configs/wpcm450_defconfig b/arch/arm/configs/wpcm450_defconfig
index 5e4397f..cd4b3e7 100644
--- a/arch/arm/configs/wpcm450_defconfig
+++ b/arch/arm/configs/wpcm450_defconfig
@@ -191,8 +191,6 @@
CONFIG_X509_CERTIFICATE_PARSER=y
CONFIG_PKCS7_MESSAGE_PARSER=y
CONFIG_SYSTEM_TRUSTED_KEYRING=y
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_ITU_T=m
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d1b1a33..e4f7775 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -121,6 +121,15 @@
#define ESR_ELx_FSC_SEA_TTW(n) (0x14 + (n))
#define ESR_ELx_FSC_SECC (0x18)
#define ESR_ELx_FSC_SECC_TTW(n) (0x1c + (n))
+#define ESR_ELx_FSC_ADDRSZ (0x00)
+
+/*
+ * Annoyingly, the negative levels for Address size faults aren't laid out
+ * contiguously (or in the desired order)
+ */
+#define ESR_ELx_FSC_ADDRSZ_nL(n) ((n) == -1 ? 0x25 : 0x2C)
+#define ESR_ELx_FSC_ADDRSZ_L(n) ((n) < 0 ? ESR_ELx_FSC_ADDRSZ_nL(n) : \
+ (ESR_ELx_FSC_ADDRSZ + (n)))
/* Status codes for individual page table levels */
#define ESR_ELx_FSC_ACCESS_L(n) (ESR_ELx_FSC_ACCESS + (n))
@@ -161,8 +170,6 @@
#define ESR_ELx_Xs_MASK (GENMASK_ULL(4, 0))
/* ISS field definitions for exceptions taken in to Hyp */
-#define ESR_ELx_FSC_ADDRSZ (0x00)
-#define ESR_ELx_FSC_ADDRSZ_L(n) (ESR_ELx_FSC_ADDRSZ + (n))
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
#define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
@@ -464,6 +471,39 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
(esr == ESR_ELx_FSC_ACCESS_L(0));
}
+static inline bool esr_fsc_is_addr_sz_fault(unsigned long esr)
+{
+ esr &= ESR_ELx_FSC;
+
+ return (esr == ESR_ELx_FSC_ADDRSZ_L(3)) ||
+ (esr == ESR_ELx_FSC_ADDRSZ_L(2)) ||
+ (esr == ESR_ELx_FSC_ADDRSZ_L(1)) ||
+ (esr == ESR_ELx_FSC_ADDRSZ_L(0)) ||
+ (esr == ESR_ELx_FSC_ADDRSZ_L(-1));
+}
+
+static inline bool esr_fsc_is_sea_ttw(unsigned long esr)
+{
+ esr = esr & ESR_ELx_FSC;
+
+ return (esr == ESR_ELx_FSC_SEA_TTW(3)) ||
+ (esr == ESR_ELx_FSC_SEA_TTW(2)) ||
+ (esr == ESR_ELx_FSC_SEA_TTW(1)) ||
+ (esr == ESR_ELx_FSC_SEA_TTW(0)) ||
+ (esr == ESR_ELx_FSC_SEA_TTW(-1));
+}
+
+static inline bool esr_fsc_is_secc_ttw(unsigned long esr)
+{
+ esr = esr & ESR_ELx_FSC;
+
+ return (esr == ESR_ELx_FSC_SECC_TTW(3)) ||
+ (esr == ESR_ELx_FSC_SECC_TTW(2)) ||
+ (esr == ESR_ELx_FSC_SECC_TTW(1)) ||
+ (esr == ESR_ELx_FSC_SECC_TTW(0)) ||
+ (esr == ESR_ELx_FSC_SECC_TTW(-1));
+}
+
/* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
static inline bool esr_iss_is_eretax(unsigned long esr)
{
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d7cf665..bd020fc2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -305,7 +305,12 @@ static __always_inline unsigned long kvm_vcpu_get_hfar(const struct kvm_vcpu *vc
static __always_inline phys_addr_t kvm_vcpu_get_fault_ipa(const struct kvm_vcpu *vcpu)
{
- return ((phys_addr_t)vcpu->arch.fault.hpfar_el2 & HPFAR_MASK) << 8;
+ u64 hpfar = vcpu->arch.fault.hpfar_el2;
+
+ if (unlikely(!(hpfar & HPFAR_EL2_NS)))
+ return INVALID_GPA;
+
+ return FIELD_GET(HPFAR_EL2_FIPA, hpfar) << 12;
}
static inline u64 kvm_vcpu_get_disr(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 87e10d9..9398ade 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -14,7 +14,7 @@
* Was this synchronous external abort a RAS notification?
* Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
*/
-static inline int kvm_handle_guest_sea(phys_addr_t addr, u64 esr)
+static inline int kvm_handle_guest_sea(void)
{
/* apei_claim_sea(NULL) expects to mask interrupts itself */
lockdep_assert_irqs_enabled();
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 17df945..fc573fc 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -12,6 +12,16 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+static inline bool __fault_safe_to_translate(u64 esr)
+{
+ u64 fsc = esr & ESR_ELx_FSC;
+
+ if (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr))
+ return false;
+
+ return !(fsc == ESR_ELx_FSC_EXTABT && (esr & ESR_ELx_FnV));
+}
+
static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
{
int ret;
@@ -44,34 +54,50 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
return true;
}
+/*
+ * Checks for the conditions when HPFAR_EL2 is written, per ARM ARM R_FKLWR.
+ */
+static inline bool __hpfar_valid(u64 esr)
+{
+ /*
+ * CPUs affected by ARM erratum #834220 may incorrectly report a
+ * stage-2 translation fault when a stage-1 permission fault occurs.
+ *
+ * Re-walk the page tables to determine if a stage-1 fault actually
+ * occurred.
+ */
+ if (cpus_have_final_cap(ARM64_WORKAROUND_834220) &&
+ esr_fsc_is_translation_fault(esr))
+ return false;
+
+ if (esr_fsc_is_translation_fault(esr) || esr_fsc_is_access_flag_fault(esr))
+ return true;
+
+ if ((esr & ESR_ELx_S1PTW) && esr_fsc_is_permission_fault(esr))
+ return true;
+
+ return esr_fsc_is_addr_sz_fault(esr);
+}
+
static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
{
- u64 hpfar, far;
+ u64 hpfar;
- far = read_sysreg_el2(SYS_FAR);
+ fault->far_el2 = read_sysreg_el2(SYS_FAR);
+ fault->hpfar_el2 = 0;
+
+ if (__hpfar_valid(esr))
+ hpfar = read_sysreg(hpfar_el2);
+ else if (unlikely(!__fault_safe_to_translate(esr)))
+ return true;
+ else if (!__translate_far_to_hpfar(fault->far_el2, &hpfar))
+ return false;
/*
- * The HPFAR can be invalid if the stage 2 fault did not
- * happen during a stage 1 page table walk (the ESR_EL2.S1PTW
- * bit is clear) and one of the two following cases are true:
- * 1. The fault was due to a permission fault
- * 2. The processor carries errata 834220
- *
- * Therefore, for all non S1PTW faults where we either have a
- * permission fault or the errata workaround is enabled, we
- * resolve the IPA using the AT instruction.
+ * Hijack HPFAR_EL2.NS (RES0 in Non-secure) to indicate a valid
+ * HPFAR value.
*/
- if (!(esr & ESR_ELx_S1PTW) &&
- (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
- esr_fsc_is_permission_fault(esr))) {
- if (!__translate_far_to_hpfar(far, &hpfar))
- return false;
- } else {
- hpfar = read_sysreg(hpfar_el2);
- }
-
- fault->far_el2 = far;
- fault->hpfar_el2 = hpfar;
+ fault->hpfar_el2 = hpfar | HPFAR_EL2_NS;
return true;
}
diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
index e433dfa..3369dd0 100644
--- a/arch/arm64/kvm/hyp/nvhe/ffa.c
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
@@ -730,10 +730,10 @@ static void do_ffa_version(struct arm_smccc_res *res,
hyp_ffa_version = ffa_req_version;
}
- if (hyp_ffa_post_init())
+ if (hyp_ffa_post_init()) {
res->a0 = FFA_RET_NOT_SUPPORTED;
- else {
- has_version_negotiated = true;
+ } else {
+ smp_store_release(&has_version_negotiated, true);
res->a0 = hyp_ffa_version;
}
unlock:
@@ -809,7 +809,8 @@ bool kvm_host_ffa_handler(struct kvm_cpu_context *host_ctxt, u32 func_id)
if (!is_ffa_call(func_id))
return false;
- if (!has_version_negotiated && func_id != FFA_VERSION) {
+ if (func_id != FFA_VERSION &&
+ !smp_load_acquire(&has_version_negotiated)) {
ffa_to_smccc_error(&res, FFA_RET_INVALID_PARAMETERS);
goto out_handled;
}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f34f11c..2a5284f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -578,7 +578,14 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
return;
}
- addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
+
+ /*
+ * Yikes, we couldn't resolve the fault IPA. This should reinject an
+ * abort into the host when we figure out how to do that.
+ */
+ BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
+ addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
+
ret = host_stage2_idmap(addr);
BUG_ON(ret && ret != -EAGAIN);
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2feb6c6..754f2fe 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1794,9 +1794,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
gfn_t gfn;
int ret, idx;
+ /* Synchronous External Abort? */
+ if (kvm_vcpu_abt_issea(vcpu)) {
+ /*
+ * For RAS the host kernel may handle this abort.
+ * There is no need to pass the error into the guest.
+ */
+ if (kvm_handle_guest_sea())
+ kvm_inject_vabt(vcpu);
+
+ return 1;
+ }
+
esr = kvm_vcpu_get_esr(vcpu);
+ /*
+ * The fault IPA should be reliable at this point as we're not dealing
+ * with an SEA.
+ */
ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+ if (KVM_BUG_ON(ipa == INVALID_GPA, vcpu->kvm))
+ return -EFAULT;
+
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
if (esr_fsc_is_translation_fault(esr)) {
@@ -1818,18 +1837,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
}
}
- /* Synchronous External Abort? */
- if (kvm_vcpu_abt_issea(vcpu)) {
- /*
- * For RAS the host kernel may handle this abort.
- * There is no need to pass the error into the guest.
- */
- if (kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_esr(vcpu)))
- kvm_inject_vabt(vcpu);
-
- return 1;
- }
-
trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
kvm_vcpu_get_hfar(vcpu), fault_ipa);
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index f947684..bdf044c 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -3536,3 +3536,10 @@
Field 4 P
Field 3:0 Align
EndSysreg
+
+Sysreg HPFAR_EL2 3 4 6 0 4
+Field 63 NS
+Res0 62:48
+Field 47:4 FIPA
+Res0 3:0
+EndSysreg
diff --git a/arch/hexagon/configs/comet_defconfig b/arch/hexagon/configs/comet_defconfig
index 469c025..c6108f0 100644
--- a/arch/hexagon/configs/comet_defconfig
+++ b/arch/hexagon/configs/comet_defconfig
@@ -72,9 +72,6 @@
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=y
-CONFIG_CRC16=y
-CONFIG_CRC_T10DIF=y
CONFIG_FRAME_WARN=0
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
diff --git a/arch/m68k/configs/amcore_defconfig b/arch/m68k/configs/amcore_defconfig
index 67a0d15..110279a 100644
--- a/arch/m68k/configs/amcore_defconfig
+++ b/arch/m68k/configs/amcore_defconfig
@@ -89,4 +89,3 @@
# CONFIG_CRYPTO_ECHAINIV is not set
CONFIG_CRYPTO_ANSI_CPRNG=y
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC16=y
diff --git a/arch/mips/configs/ath79_defconfig b/arch/mips/configs/ath79_defconfig
index 8caa03a..cba0b85 100644
--- a/arch/mips/configs/ath79_defconfig
+++ b/arch/mips/configs/ath79_defconfig
@@ -82,7 +82,6 @@
# CONFIG_IOMMU_SUPPORT is not set
# CONFIG_DNOTIFY is not set
# CONFIG_PROC_PAGE_MONITOR is not set
-CONFIG_CRC_ITU_T=m
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
# CONFIG_SCHED_DEBUG is not set
diff --git a/arch/mips/configs/bigsur_defconfig b/arch/mips/configs/bigsur_defconfig
index fe28263..8f7c368 100644
--- a/arch/mips/configs/bigsur_defconfig
+++ b/arch/mips/configs/bigsur_defconfig
@@ -238,7 +238,6 @@
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
-CONFIG_CRC_T10DIF=m
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/mips/configs/fuloong2e_defconfig b/arch/mips/configs/fuloong2e_defconfig
index 5ab149c..114fcd6 100644
--- a/arch/mips/configs/fuloong2e_defconfig
+++ b/arch/mips/configs/fuloong2e_defconfig
@@ -218,4 +218,3 @@
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=y
diff --git a/arch/mips/configs/ip22_defconfig b/arch/mips/configs/ip22_defconfig
index 31ca93d..f1a8ccf 100644
--- a/arch/mips/configs/ip22_defconfig
+++ b/arch/mips/configs/ip22_defconfig
@@ -326,5 +326,4 @@
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_T10DIF=m
CONFIG_DEBUG_MEMORY_INIT=y
diff --git a/arch/mips/configs/ip27_defconfig b/arch/mips/configs/ip27_defconfig
index b8907b3..5d07994 100644
--- a/arch/mips/configs/ip27_defconfig
+++ b/arch/mips/configs/ip27_defconfig
@@ -317,4 +317,3 @@
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_LZO=m
-CONFIG_CRC_T10DIF=m
diff --git a/arch/mips/configs/ip30_defconfig b/arch/mips/configs/ip30_defconfig
index 270181a..a4524e7 100644
--- a/arch/mips/configs/ip30_defconfig
+++ b/arch/mips/configs/ip30_defconfig
@@ -179,4 +179,3 @@
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_LZO=m
-CONFIG_CRC_T10DIF=m
diff --git a/arch/mips/configs/ip32_defconfig b/arch/mips/configs/ip32_defconfig
index 121e7e4..d8ac114 100644
--- a/arch/mips/configs/ip32_defconfig
+++ b/arch/mips/configs/ip32_defconfig
@@ -177,7 +177,6 @@
CONFIG_CRYPTO_TEA=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_DEFLATE=y
-CONFIG_CRC_T10DIF=y
CONFIG_FONTS=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
diff --git a/arch/mips/configs/omega2p_defconfig b/arch/mips/configs/omega2p_defconfig
index 128f9ab..e2bcdfd 100644
--- a/arch/mips/configs/omega2p_defconfig
+++ b/arch/mips/configs/omega2p_defconfig
@@ -111,7 +111,6 @@
CONFIG_NLS_UTF8=y
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
-CONFIG_CRC16=y
CONFIG_XZ_DEC=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/mips/configs/rb532_defconfig b/arch/mips/configs/rb532_defconfig
index 0261969..42b161d 100644
--- a/arch/mips/configs/rb532_defconfig
+++ b/arch/mips/configs/rb532_defconfig
@@ -155,5 +155,4 @@
CONFIG_SQUASHFS=y
CONFIG_CRYPTO_TEST=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC16=m
CONFIG_STRIP_ASM_SYMS=y
diff --git a/arch/mips/configs/rt305x_defconfig b/arch/mips/configs/rt305x_defconfig
index 8404e0a..8f9701e 100644
--- a/arch/mips/configs/rt305x_defconfig
+++ b/arch/mips/configs/rt305x_defconfig
@@ -128,7 +128,6 @@
# CONFIG_SQUASHFS_ZLIB is not set
CONFIG_SQUASHFS_XZ=y
CONFIG_CRYPTO_ARC4=m
-CONFIG_CRC_ITU_T=m
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_POWERPC is not set
# CONFIG_XZ_DEC_IA64 is not set
diff --git a/arch/mips/configs/sb1250_swarm_defconfig b/arch/mips/configs/sb1250_swarm_defconfig
index ce855b6..ae2afff 100644
--- a/arch/mips/configs/sb1250_swarm_defconfig
+++ b/arch/mips/configs/sb1250_swarm_defconfig
@@ -99,4 +99,3 @@
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC16=m
diff --git a/arch/mips/configs/vocore2_defconfig b/arch/mips/configs/vocore2_defconfig
index 917967f..2a9a9b1 100644
--- a/arch/mips/configs/vocore2_defconfig
+++ b/arch/mips/configs/vocore2_defconfig
@@ -111,7 +111,6 @@
CONFIG_NLS_UTF8=y
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
-CONFIG_CRC16=y
CONFIG_XZ_DEC=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
diff --git a/arch/mips/configs/xway_defconfig b/arch/mips/configs/xway_defconfig
index 7b91edf..aae8497 100644
--- a/arch/mips/configs/xway_defconfig
+++ b/arch/mips/configs/xway_defconfig
@@ -140,7 +140,6 @@
# CONFIG_SQUASHFS_ZLIB is not set
CONFIG_SQUASHFS_XZ=y
CONFIG_CRYPTO_ARC4=m
-CONFIG_CRC_ITU_T=m
CONFIG_PRINTK_TIME=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
diff --git a/arch/parisc/configs/generic-32bit_defconfig b/arch/parisc/configs/generic-32bit_defconfig
index f5fffc2..5b65c98 100644
--- a/arch/parisc/configs/generic-32bit_defconfig
+++ b/arch/parisc/configs/generic-32bit_defconfig
@@ -264,8 +264,6 @@
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_DEFLATE=y
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_T10DIF=y
CONFIG_FONTS=y
CONFIG_PRINTK_TIME=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/parisc/configs/generic-64bit_defconfig b/arch/parisc/configs/generic-64bit_defconfig
index 2487765..ecc9ffc 100644
--- a/arch/parisc/configs/generic-64bit_defconfig
+++ b/arch/parisc/configs/generic-64bit_defconfig
@@ -292,7 +292,6 @@
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=m
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_KERNEL=y
CONFIG_STRIP_ASM_SYMS=y
diff --git a/arch/powerpc/configs/44x/sam440ep_defconfig b/arch/powerpc/configs/44x/sam440ep_defconfig
index 2479ab6..98221bd 100644
--- a/arch/powerpc/configs/44x/sam440ep_defconfig
+++ b/arch/powerpc/configs/44x/sam440ep_defconfig
@@ -91,5 +91,4 @@
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/powerpc/configs/44x/warp_defconfig b/arch/powerpc/configs/44x/warp_defconfig
index 20891c4..5757625 100644
--- a/arch/powerpc/configs/44x/warp_defconfig
+++ b/arch/powerpc/configs/44x/warp_defconfig
@@ -85,8 +85,6 @@
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_T10DIF=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_FS=y
diff --git a/arch/powerpc/configs/83xx/mpc832x_rdb_defconfig b/arch/powerpc/configs/83xx/mpc832x_rdb_defconfig
index 1715ff5..b99caba 100644
--- a/arch/powerpc/configs/83xx/mpc832x_rdb_defconfig
+++ b/arch/powerpc/configs/83xx/mpc832x_rdb_defconfig
@@ -73,6 +73,5 @@
CONFIG_NLS_CODEPAGE_932=y
CONFIG_NLS_ISO8859_8=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/83xx/mpc834x_itx_defconfig b/arch/powerpc/configs/83xx/mpc834x_itx_defconfig
index e65c0057..1116305 100644
--- a/arch/powerpc/configs/83xx/mpc834x_itx_defconfig
+++ b/arch/powerpc/configs/83xx/mpc834x_itx_defconfig
@@ -80,5 +80,4 @@
CONFIG_NFS_FS=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/83xx/mpc834x_itxgp_defconfig b/arch/powerpc/configs/83xx/mpc834x_itxgp_defconfig
index 17714bf..312d39e 100644
--- a/arch/powerpc/configs/83xx/mpc834x_itxgp_defconfig
+++ b/arch/powerpc/configs/83xx/mpc834x_itxgp_defconfig
@@ -72,5 +72,4 @@
CONFIG_NFS_FS=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/83xx/mpc837x_rdb_defconfig b/arch/powerpc/configs/83xx/mpc837x_rdb_defconfig
index 58fae51..ac27f99 100644
--- a/arch/powerpc/configs/83xx/mpc837x_rdb_defconfig
+++ b/arch/powerpc/configs/83xx/mpc837x_rdb_defconfig
@@ -75,6 +75,5 @@
CONFIG_NFS_FS=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/85xx/ge_imp3a_defconfig b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
index 6f58ee1..7beb36a 100644
--- a/arch/powerpc/configs/85xx/ge_imp3a_defconfig
+++ b/arch/powerpc/configs/85xx/ge_imp3a_defconfig
@@ -221,8 +221,6 @@
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=y
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_MD5=y
diff --git a/arch/powerpc/configs/85xx/stx_gp3_defconfig b/arch/powerpc/configs/85xx/stx_gp3_defconfig
index e708049..0a42072 100644
--- a/arch/powerpc/configs/85xx/stx_gp3_defconfig
+++ b/arch/powerpc/configs/85xx/stx_gp3_defconfig
@@ -60,8 +60,6 @@
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_NLS=y
-CONFIG_CRC_CCITT=y
-CONFIG_CRC_T10DIF=m
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_BDI_SWITCH=y
diff --git a/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig b/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
index 3a6381a..488d03a 100644
--- a/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
+++ b/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
@@ -132,7 +132,6 @@
CONFIG_NFSD=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_T10DIF=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_DEBUG_BUGVERBOSE is not set
CONFIG_CRYPTO_HMAC=y
diff --git a/arch/powerpc/configs/86xx-hw.config b/arch/powerpc/configs/86xx-hw.config
index 0cb24b3..e7bd265 100644
--- a/arch/powerpc/configs/86xx-hw.config
+++ b/arch/powerpc/configs/86xx-hw.config
@@ -5,7 +5,6 @@
# CONFIG_CARDBUS is not set
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_ST=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_HMAC=y
CONFIG_DS1682=y
CONFIG_EEPROM_LEGACY=y
diff --git a/arch/powerpc/configs/amigaone_defconfig b/arch/powerpc/configs/amigaone_defconfig
index 200bb1e..69ef3dc 100644
--- a/arch/powerpc/configs/amigaone_defconfig
+++ b/arch/powerpc/configs/amigaone_defconfig
@@ -106,7 +106,6 @@
CONFIG_AFFS_FS=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MUTEXES=y
diff --git a/arch/powerpc/configs/chrp32_defconfig b/arch/powerpc/configs/chrp32_defconfig
index fb314f7..b799c95 100644
--- a/arch/powerpc/configs/chrp32_defconfig
+++ b/arch/powerpc/configs/chrp32_defconfig
@@ -110,7 +110,6 @@
CONFIG_TMPFS=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MUTEXES=y
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config b/arch/powerpc/configs/fsl-emb-nonhw.config
index d6d2a45..2f81bc2 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -15,7 +15,6 @@
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
-CONFIG_CRC_T10DIF=y
CONFIG_CPUSETS=y
CONFIG_CRAMFS=y
CONFIG_CRYPTO_MD4=y
diff --git a/arch/powerpc/configs/g5_defconfig b/arch/powerpc/configs/g5_defconfig
index 9215bed..7e58f3e 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -231,7 +231,6 @@
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_UTF8=y
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MUTEXES=y
diff --git a/arch/powerpc/configs/gamecube_defconfig b/arch/powerpc/configs/gamecube_defconfig
index d77eeb5..cdd9965 100644
--- a/arch/powerpc/configs/gamecube_defconfig
+++ b/arch/powerpc/configs/gamecube_defconfig
@@ -82,7 +82,6 @@
CONFIG_CIFS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
diff --git a/arch/powerpc/configs/linkstation_defconfig b/arch/powerpc/configs/linkstation_defconfig
index fa707de..b564f9e 100644
--- a/arch/powerpc/configs/linkstation_defconfig
+++ b/arch/powerpc/configs/linkstation_defconfig
@@ -125,8 +125,6 @@
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/mpc83xx_defconfig b/arch/powerpc/configs/mpc83xx_defconfig
index 83c4710..a815d9e 100644
--- a/arch/powerpc/configs/mpc83xx_defconfig
+++ b/arch/powerpc/configs/mpc83xx_defconfig
@@ -97,7 +97,6 @@
CONFIG_NFS_FS=y
CONFIG_NFS_V4=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_SHA512=y
diff --git a/arch/powerpc/configs/mpc866_ads_defconfig b/arch/powerpc/configs/mpc866_ads_defconfig
index a0d27c5..dfbdd5e 100644
--- a/arch/powerpc/configs/mpc866_ads_defconfig
+++ b/arch/powerpc/configs/mpc866_ads_defconfig
@@ -38,4 +38,3 @@
CONFIG_CRAMFS=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
-CONFIG_CRC_CCITT=y
diff --git a/arch/powerpc/configs/mvme5100_defconfig b/arch/powerpc/configs/mvme5100_defconfig
index d1c7fd5..fa2b3b9c 100644
--- a/arch/powerpc/configs/mvme5100_defconfig
+++ b/arch/powerpc/configs/mvme5100_defconfig
@@ -107,8 +107,6 @@
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_T10DIF=y
CONFIG_XZ_DEC=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
diff --git a/arch/powerpc/configs/pasemi_defconfig b/arch/powerpc/configs/pasemi_defconfig
index 6199394..8bbf51b3 100644
--- a/arch/powerpc/configs/pasemi_defconfig
+++ b/arch/powerpc/configs/pasemi_defconfig
@@ -159,7 +159,6 @@
CONFIG_NFSD_V4=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
diff --git a/arch/powerpc/configs/pmac32_defconfig b/arch/powerpc/configs/pmac32_defconfig
index e8b3f67..1bc3466 100644
--- a/arch/powerpc/configs/pmac32_defconfig
+++ b/arch/powerpc/configs/pmac32_defconfig
@@ -276,7 +276,6 @@
CONFIG_NFSD_V4=y
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
diff --git a/arch/powerpc/configs/ppc44x_defconfig b/arch/powerpc/configs/ppc44x_defconfig
index 8b595f6..41c930f 100644
--- a/arch/powerpc/configs/ppc44x_defconfig
+++ b/arch/powerpc/configs/ppc44x_defconfig
@@ -90,7 +90,6 @@
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
-CONFIG_CRC_T10DIF=m
CONFIG_MAGIC_SYSRQ=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_CRYPTO_ECB=y
diff --git a/arch/powerpc/configs/ppc64e_defconfig b/arch/powerpc/configs/ppc64e_defconfig
index 4c05f4e..d2e659a 100644
--- a/arch/powerpc/configs/ppc64e_defconfig
+++ b/arch/powerpc/configs/ppc64e_defconfig
@@ -207,7 +207,6 @@
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_CRC_T10DIF=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_STACK_USAGE=y
diff --git a/arch/powerpc/configs/ps3_defconfig b/arch/powerpc/configs/ps3_defconfig
index 2b175dd..0b48d2b 100644
--- a/arch/powerpc/configs/ps3_defconfig
+++ b/arch/powerpc/configs/ps3_defconfig
@@ -148,8 +148,6 @@
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_LZO=m
-CONFIG_CRC_CCITT=m
-CONFIG_CRC_T10DIF=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_MAGIC_SYSRQ=y
diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
index 3086c4a..2b71a6d 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -278,8 +278,6 @@
# CONFIG_INTEGRITY is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity"
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC16=y
-CONFIG_CRC_ITU_T=y
# CONFIG_XZ_DEC_X86 is not set
# CONFIG_XZ_DEC_IA64 is not set
# CONFIG_XZ_DEC_ARM is not set
diff --git a/arch/powerpc/configs/storcenter_defconfig b/arch/powerpc/configs/storcenter_defconfig
index 7a978d3..e415222 100644
--- a/arch/powerpc/configs/storcenter_defconfig
+++ b/arch/powerpc/configs/storcenter_defconfig
@@ -75,4 +75,3 @@
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
-CONFIG_CRC_T10DIF=y
diff --git a/arch/powerpc/configs/wii_defconfig b/arch/powerpc/configs/wii_defconfig
index 5017a69..7c714a1 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -114,7 +114,6 @@
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
-CONFIG_CRC_CCITT=y
CONFIG_PRINTK_TIME=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index db8161e..99fb986 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -332,6 +332,10 @@
def_bool n
select HAVE_MARCH_Z15_FEATURES
+config HAVE_MARCH_Z17_FEATURES
+ def_bool n
+ select HAVE_MARCH_Z16_FEATURES
+
choice
prompt "Processor type"
default MARCH_Z196
@@ -397,6 +401,14 @@
Select this to enable optimizations for IBM z16 (3931 and
3932 series).
+config MARCH_Z17
+ bool "IBM z17"
+ select HAVE_MARCH_Z17_FEATURES
+ depends on $(cc-option,-march=z17)
+ help
+ Select this to enable optimizations for IBM z17 (9175 and
+ 9176 series).
+
endchoice
config MARCH_Z10_TUNE
@@ -420,6 +432,9 @@
config MARCH_Z16_TUNE
def_bool TUNE_Z16 || MARCH_Z16 && TUNE_DEFAULT
+config MARCH_Z17_TUNE
+ def_bool TUNE_Z17 || MARCH_Z17 && TUNE_DEFAULT
+
choice
prompt "Tune code generation"
default TUNE_DEFAULT
@@ -464,6 +479,10 @@
bool "IBM z16"
depends on $(cc-option,-mtune=z16)
+config TUNE_Z17
+ bool "IBM z17"
+ depends on $(cc-option,-mtune=z17)
+
endchoice
config 64BIT
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index b06dc53..7679bc1 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -48,6 +48,7 @@
mflags-$(CONFIG_MARCH_Z14) := -march=z14
mflags-$(CONFIG_MARCH_Z15) := -march=z15
mflags-$(CONFIG_MARCH_Z16) := -march=z16
+mflags-$(CONFIG_MARCH_Z17) := -march=z17
export CC_FLAGS_MARCH := $(mflags-y)
@@ -61,6 +62,7 @@
cflags-$(CONFIG_MARCH_Z14_TUNE) += -mtune=z14
cflags-$(CONFIG_MARCH_Z15_TUNE) += -mtune=z15
cflags-$(CONFIG_MARCH_Z16_TUNE) += -mtune=z16
+cflags-$(CONFIG_MARCH_Z17_TUNE) += -mtune=z17
cflags-y += -Wa,-I$(srctree)/arch/$(ARCH)/include
diff --git a/arch/s390/include/asm/march.h b/arch/s390/include/asm/march.h
index fd9eef3..11a71bd 100644
--- a/arch/s390/include/asm/march.h
+++ b/arch/s390/include/asm/march.h
@@ -33,6 +33,10 @@
#define MARCH_HAS_Z16_FEATURES 1
#endif
+#ifdef CONFIG_HAVE_MARCH_Z17_FEATURES
+#define MARCH_HAS_Z17_FEATURES 1
+#endif
+
#endif /* __DECOMPRESSOR */
#endif /* __ASM_S390_MARCH_H */
diff --git a/arch/s390/kernel/perf_cpum_cf.c b/arch/s390/kernel/perf_cpum_cf.c
index 33205dd..e657fad 100644
--- a/arch/s390/kernel/perf_cpum_cf.c
+++ b/arch/s390/kernel/perf_cpum_cf.c
@@ -442,7 +442,7 @@ static void cpum_cf_make_setsize(enum cpumf_ctr_set ctrset)
ctrset_size = 48;
else if (cpumf_ctr_info.csvn >= 3 && cpumf_ctr_info.csvn <= 5)
ctrset_size = 128;
- else if (cpumf_ctr_info.csvn == 6 || cpumf_ctr_info.csvn == 7)
+ else if (cpumf_ctr_info.csvn >= 6 && cpumf_ctr_info.csvn <= 8)
ctrset_size = 160;
break;
case CPUMF_CTR_SET_MT_DIAG:
@@ -858,18 +858,13 @@ static int cpumf_pmu_event_type(struct perf_event *event)
static int cpumf_pmu_event_init(struct perf_event *event)
{
unsigned int type = event->attr.type;
- int err;
+ int err = -ENOENT;
if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_RAW)
err = __hw_perf_event_init(event, type);
else if (event->pmu->type == type)
/* Registered as unknown PMU */
err = __hw_perf_event_init(event, cpumf_pmu_event_type(event));
- else
- return -ENOENT;
-
- if (unlikely(err) && event->destroy)
- event->destroy(event);
return err;
}
@@ -1819,8 +1814,6 @@ static int cfdiag_event_init(struct perf_event *event)
event->destroy = hw_perf_event_destroy;
err = cfdiag_event_init2(event);
- if (unlikely(err))
- event->destroy(event);
out:
return err;
}
diff --git a/arch/s390/kernel/perf_cpum_cf_events.c b/arch/s390/kernel/perf_cpum_cf_events.c
index e4a6bfc..690a293 100644
--- a/arch/s390/kernel/perf_cpum_cf_events.c
+++ b/arch/s390/kernel/perf_cpum_cf_events.c
@@ -237,7 +237,6 @@ CPUMF_EVENT_ATTR(cf_z14, TX_C_TABORT_NO_SPECIAL, 0x00f4);
CPUMF_EVENT_ATTR(cf_z14, TX_C_TABORT_SPECIAL, 0x00f5);
CPUMF_EVENT_ATTR(cf_z14, MT_DIAG_CYCLES_ONE_THR_ACTIVE, 0x01c0);
CPUMF_EVENT_ATTR(cf_z14, MT_DIAG_CYCLES_TWO_THR_ACTIVE, 0x01c1);
-
CPUMF_EVENT_ATTR(cf_z15, L1D_RO_EXCL_WRITES, 0x0080);
CPUMF_EVENT_ATTR(cf_z15, DTLB2_WRITES, 0x0081);
CPUMF_EVENT_ATTR(cf_z15, DTLB2_MISSES, 0x0082);
@@ -365,6 +364,83 @@ CPUMF_EVENT_ATTR(cf_z16, NNPA_WAIT_LOCK, 0x010d);
CPUMF_EVENT_ATTR(cf_z16, NNPA_HOLD_LOCK, 0x010e);
CPUMF_EVENT_ATTR(cf_z16, MT_DIAG_CYCLES_ONE_THR_ACTIVE, 0x01c0);
CPUMF_EVENT_ATTR(cf_z16, MT_DIAG_CYCLES_TWO_THR_ACTIVE, 0x01c1);
+CPUMF_EVENT_ATTR(cf_z17, L1D_RO_EXCL_WRITES, 0x0080);
+CPUMF_EVENT_ATTR(cf_z17, DTLB2_WRITES, 0x0081);
+CPUMF_EVENT_ATTR(cf_z17, DTLB2_MISSES, 0x0082);
+CPUMF_EVENT_ATTR(cf_z17, CRSTE_1MB_WRITES, 0x0083);
+CPUMF_EVENT_ATTR(cf_z17, DTLB2_GPAGE_WRITES, 0x0084);
+CPUMF_EVENT_ATTR(cf_z17, ITLB2_WRITES, 0x0086);
+CPUMF_EVENT_ATTR(cf_z17, ITLB2_MISSES, 0x0087);
+CPUMF_EVENT_ATTR(cf_z17, TLB2_PTE_WRITES, 0x0089);
+CPUMF_EVENT_ATTR(cf_z17, TLB2_CRSTE_WRITES, 0x008a);
+CPUMF_EVENT_ATTR(cf_z17, TLB2_ENGINES_BUSY, 0x008b);
+CPUMF_EVENT_ATTR(cf_z17, TX_C_TEND, 0x008c);
+CPUMF_EVENT_ATTR(cf_z17, TX_NC_TEND, 0x008d);
+CPUMF_EVENT_ATTR(cf_z17, L1C_TLB2_MISSES, 0x008f);
+CPUMF_EVENT_ATTR(cf_z17, DCW_REQ, 0x0091);
+CPUMF_EVENT_ATTR(cf_z17, DCW_REQ_IV, 0x0092);
+CPUMF_EVENT_ATTR(cf_z17, DCW_REQ_CHIP_HIT, 0x0093);
+CPUMF_EVENT_ATTR(cf_z17, DCW_REQ_DRAWER_HIT, 0x0094);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_CHIP, 0x0095);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_CHIP_IV, 0x0096);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_CHIP_CHIP_HIT, 0x0097);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_CHIP_DRAWER_HIT, 0x0098);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_MODULE, 0x0099);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_DRAWER, 0x009a);
+CPUMF_EVENT_ATTR(cf_z17, DCW_OFF_DRAWER, 0x009b);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_CHIP_MEMORY, 0x009c);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_MODULE_MEMORY, 0x009d);
+CPUMF_EVENT_ATTR(cf_z17, DCW_ON_DRAWER_MEMORY, 0x009e);
+CPUMF_EVENT_ATTR(cf_z17, DCW_OFF_DRAWER_MEMORY, 0x009f);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_MODULE_IV, 0x00a0);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_MODULE_CHIP_HIT, 0x00a1);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_MODULE_DRAWER_HIT, 0x00a2);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_DRAWER_IV, 0x00a3);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_DRAWER_CHIP_HIT, 0x00a4);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_ON_DRAWER_DRAWER_HIT, 0x00a5);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_OFF_DRAWER_IV, 0x00a6);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_OFF_DRAWER_CHIP_HIT, 0x00a7);
+CPUMF_EVENT_ATTR(cf_z17, IDCW_OFF_DRAWER_DRAWER_HIT, 0x00a8);
+CPUMF_EVENT_ATTR(cf_z17, ICW_REQ, 0x00a9);
+CPUMF_EVENT_ATTR(cf_z17, ICW_REQ_IV, 0x00aa);
+CPUMF_EVENT_ATTR(cf_z17, ICW_REQ_CHIP_HIT, 0x00ab);
+CPUMF_EVENT_ATTR(cf_z17, ICW_REQ_DRAWER_HIT, 0x00ac);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_CHIP, 0x00ad);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_CHIP_IV, 0x00ae);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_CHIP_CHIP_HIT, 0x00af);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_CHIP_DRAWER_HIT, 0x00b0);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_MODULE, 0x00b1);
+CPUMF_EVENT_ATTR(cf_z17, ICW_ON_DRAWER, 0x00b2);
+CPUMF_EVENT_ATTR(cf_z17, ICW_OFF_DRAWER, 0x00b3);
+CPUMF_EVENT_ATTR(cf_z17, CYCLES_SAMETHRD, 0x00ca);
+CPUMF_EVENT_ATTR(cf_z17, CYCLES_DIFFTHRD, 0x00cb);
+CPUMF_EVENT_ATTR(cf_z17, INST_SAMETHRD, 0x00cc);
+CPUMF_EVENT_ATTR(cf_z17, INST_DIFFTHRD, 0x00cd);
+CPUMF_EVENT_ATTR(cf_z17, WRONG_BRANCH_PREDICTION, 0x00ce);
+CPUMF_EVENT_ATTR(cf_z17, VX_BCD_EXECUTION_SLOTS, 0x00e1);
+CPUMF_EVENT_ATTR(cf_z17, DECIMAL_INSTRUCTIONS, 0x00e2);
+CPUMF_EVENT_ATTR(cf_z17, LAST_HOST_TRANSLATIONS, 0x00e8);
+CPUMF_EVENT_ATTR(cf_z17, TX_NC_TABORT, 0x00f4);
+CPUMF_EVENT_ATTR(cf_z17, TX_C_TABORT_NO_SPECIAL, 0x00f5);
+CPUMF_EVENT_ATTR(cf_z17, TX_C_TABORT_SPECIAL, 0x00f6);
+CPUMF_EVENT_ATTR(cf_z17, DFLT_ACCESS, 0x00f8);
+CPUMF_EVENT_ATTR(cf_z17, DFLT_CYCLES, 0x00fd);
+CPUMF_EVENT_ATTR(cf_z17, SORTL, 0x0100);
+CPUMF_EVENT_ATTR(cf_z17, DFLT_CC, 0x0109);
+CPUMF_EVENT_ATTR(cf_z17, DFLT_CCFINISH, 0x010a);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_INVOCATIONS, 0x010b);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_COMPLETIONS, 0x010c);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_WAIT_LOCK, 0x010d);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_HOLD_LOCK, 0x010e);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_INST_ONCHIP, 0x0110);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_INST_OFFCHIP, 0x0111);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_INST_DIFF, 0x0112);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_4K_PREFETCH, 0x0114);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_COMPL_LOCK, 0x0115);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_RETRY_LOCK, 0x0116);
+CPUMF_EVENT_ATTR(cf_z17, NNPA_RETRY_LOCK_WITH_PLO, 0x0117);
+CPUMF_EVENT_ATTR(cf_z17, MT_DIAG_CYCLES_ONE_THR_ACTIVE, 0x01c0);
+CPUMF_EVENT_ATTR(cf_z17, MT_DIAG_CYCLES_TWO_THR_ACTIVE, 0x01c1);
static struct attribute *cpumcf_fvn1_pmu_event_attr[] __initdata = {
CPUMF_EVENT_PTR(cf_fvn1, CPU_CYCLES),
@@ -414,7 +490,7 @@ static struct attribute *cpumcf_svn_12345_pmu_event_attr[] __initdata = {
NULL,
};
-static struct attribute *cpumcf_svn_67_pmu_event_attr[] __initdata = {
+static struct attribute *cpumcf_svn_678_pmu_event_attr[] __initdata = {
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_FUNCTIONS),
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_CYCLES),
CPUMF_EVENT_PTR(cf_svn_12345, PRNG_BLOCKED_FUNCTIONS),
@@ -779,6 +855,87 @@ static struct attribute *cpumcf_z16_pmu_event_attr[] __initdata = {
NULL,
};
+static struct attribute *cpumcf_z17_pmu_event_attr[] __initdata = {
+ CPUMF_EVENT_PTR(cf_z17, L1D_RO_EXCL_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, DTLB2_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, DTLB2_MISSES),
+ CPUMF_EVENT_PTR(cf_z17, CRSTE_1MB_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, DTLB2_GPAGE_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, ITLB2_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, ITLB2_MISSES),
+ CPUMF_EVENT_PTR(cf_z17, TLB2_PTE_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, TLB2_CRSTE_WRITES),
+ CPUMF_EVENT_PTR(cf_z17, TLB2_ENGINES_BUSY),
+ CPUMF_EVENT_PTR(cf_z17, TX_C_TEND),
+ CPUMF_EVENT_PTR(cf_z17, TX_NC_TEND),
+ CPUMF_EVENT_PTR(cf_z17, L1C_TLB2_MISSES),
+ CPUMF_EVENT_PTR(cf_z17, DCW_REQ),
+ CPUMF_EVENT_PTR(cf_z17, DCW_REQ_IV),
+ CPUMF_EVENT_PTR(cf_z17, DCW_REQ_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, DCW_REQ_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_CHIP),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_CHIP_IV),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_CHIP_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_CHIP_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_MODULE),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_DRAWER),
+ CPUMF_EVENT_PTR(cf_z17, DCW_OFF_DRAWER),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_CHIP_MEMORY),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_MODULE_MEMORY),
+ CPUMF_EVENT_PTR(cf_z17, DCW_ON_DRAWER_MEMORY),
+ CPUMF_EVENT_PTR(cf_z17, DCW_OFF_DRAWER_MEMORY),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_MODULE_IV),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_MODULE_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_MODULE_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_DRAWER_IV),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_DRAWER_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_ON_DRAWER_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_OFF_DRAWER_IV),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_OFF_DRAWER_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, IDCW_OFF_DRAWER_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, ICW_REQ),
+ CPUMF_EVENT_PTR(cf_z17, ICW_REQ_IV),
+ CPUMF_EVENT_PTR(cf_z17, ICW_REQ_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, ICW_REQ_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_CHIP),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_CHIP_IV),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_CHIP_CHIP_HIT),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_CHIP_DRAWER_HIT),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_MODULE),
+ CPUMF_EVENT_PTR(cf_z17, ICW_ON_DRAWER),
+ CPUMF_EVENT_PTR(cf_z17, ICW_OFF_DRAWER),
+ CPUMF_EVENT_PTR(cf_z17, CYCLES_SAMETHRD),
+ CPUMF_EVENT_PTR(cf_z17, CYCLES_DIFFTHRD),
+ CPUMF_EVENT_PTR(cf_z17, INST_SAMETHRD),
+ CPUMF_EVENT_PTR(cf_z17, INST_DIFFTHRD),
+ CPUMF_EVENT_PTR(cf_z17, WRONG_BRANCH_PREDICTION),
+ CPUMF_EVENT_PTR(cf_z17, VX_BCD_EXECUTION_SLOTS),
+ CPUMF_EVENT_PTR(cf_z17, DECIMAL_INSTRUCTIONS),
+ CPUMF_EVENT_PTR(cf_z17, LAST_HOST_TRANSLATIONS),
+ CPUMF_EVENT_PTR(cf_z17, TX_NC_TABORT),
+ CPUMF_EVENT_PTR(cf_z17, TX_C_TABORT_NO_SPECIAL),
+ CPUMF_EVENT_PTR(cf_z17, TX_C_TABORT_SPECIAL),
+ CPUMF_EVENT_PTR(cf_z17, DFLT_ACCESS),
+ CPUMF_EVENT_PTR(cf_z17, DFLT_CYCLES),
+ CPUMF_EVENT_PTR(cf_z17, SORTL),
+ CPUMF_EVENT_PTR(cf_z17, DFLT_CC),
+ CPUMF_EVENT_PTR(cf_z17, DFLT_CCFINISH),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_INVOCATIONS),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_COMPLETIONS),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_WAIT_LOCK),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_HOLD_LOCK),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_INST_ONCHIP),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_INST_OFFCHIP),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_INST_DIFF),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_4K_PREFETCH),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_COMPL_LOCK),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_RETRY_LOCK),
+ CPUMF_EVENT_PTR(cf_z17, NNPA_RETRY_LOCK_WITH_PLO),
+ CPUMF_EVENT_PTR(cf_z17, MT_DIAG_CYCLES_ONE_THR_ACTIVE),
+ CPUMF_EVENT_PTR(cf_z17, MT_DIAG_CYCLES_TWO_THR_ACTIVE),
+ NULL,
+};
+
/* END: CPUM_CF COUNTER DEFINITIONS ===================================== */
static struct attribute_group cpumcf_pmu_events_group = {
@@ -859,7 +1016,7 @@ __init const struct attribute_group **cpumf_cf_event_group(void)
if (ci.csvn >= 1 && ci.csvn <= 5)
csvn = cpumcf_svn_12345_pmu_event_attr;
else if (ci.csvn >= 6)
- csvn = cpumcf_svn_67_pmu_event_attr;
+ csvn = cpumcf_svn_678_pmu_event_attr;
/* Determine model-specific counter set(s) */
get_cpu_id(&cpu_id);
@@ -892,6 +1049,10 @@ __init const struct attribute_group **cpumf_cf_event_group(void)
case 0x3932:
model = cpumcf_z16_pmu_event_attr;
break;
+ case 0x9175:
+ case 0x9176:
+ model = cpumcf_z17_pmu_event_attr;
+ break;
default:
model = none;
break;
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 5f60248..ad22799 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -885,9 +885,6 @@ static int cpumsf_pmu_event_init(struct perf_event *event)
event->attr.exclude_idle = 0;
err = __hw_perf_event_init(event);
- if (unlikely(err))
- if (event->destroy)
- event->destroy(event);
return err;
}
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 54e2814..80b1f7a 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -294,6 +294,10 @@ static int __init setup_elf_platform(void)
case 0x3932:
strcpy(elf_platform, "z16");
break;
+ case 0x9175:
+ case 0x9176:
+ strcpy(elf_platform, "z17");
+ break;
}
return 0;
}
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 610dd44..a06a000 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -95,7 +95,7 @@ static int handle_validity(struct kvm_vcpu *vcpu)
vcpu->stat.exit_validity++;
trace_kvm_s390_intercept_validity(vcpu, viwhy);
- KVM_EVENT(3, "validity intercept 0x%x for pid %u (kvm 0x%pK)", viwhy,
+ KVM_EVENT(3, "validity intercept 0x%x for pid %u (kvm 0x%p)", viwhy,
current->pid, vcpu->kvm);
/* do not warn on invalid runtime instrumentation mode */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 2811a6c..60c360c 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -3161,7 +3161,7 @@ void kvm_s390_gisa_clear(struct kvm *kvm)
if (!gi->origin)
return;
gisa_clear_ipm(gi->origin);
- VM_EVENT(kvm, 3, "gisa 0x%pK cleared", gi->origin);
+ VM_EVENT(kvm, 3, "gisa 0x%p cleared", gi->origin);
}
void kvm_s390_gisa_init(struct kvm *kvm)
@@ -3177,7 +3177,7 @@ void kvm_s390_gisa_init(struct kvm *kvm)
hrtimer_setup(&gi->timer, gisa_vcpu_kicker, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
memset(gi->origin, 0, sizeof(struct kvm_s390_gisa));
gi->origin->next_alert = (u32)virt_to_phys(gi->origin);
- VM_EVENT(kvm, 3, "gisa 0x%pK initialized", gi->origin);
+ VM_EVENT(kvm, 3, "gisa 0x%p initialized", gi->origin);
}
void kvm_s390_gisa_enable(struct kvm *kvm)
@@ -3218,7 +3218,7 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
process_gib_alert_list();
hrtimer_cancel(&gi->timer);
gi->origin = NULL;
- VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
+ VM_EVENT(kvm, 3, "gisa 0x%p destroyed", gisa);
}
void kvm_s390_gisa_disable(struct kvm *kvm)
@@ -3467,7 +3467,7 @@ int __init kvm_s390_gib_init(u8 nisc)
}
}
- KVM_EVENT(3, "gib 0x%pK (nisc=%d) initialized", gib, gib->nisc);
+ KVM_EVENT(3, "gib 0x%p (nisc=%d) initialized", gib, gib->nisc);
goto out;
out_unreg_gal:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fff8637..3f31751 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1022,7 +1022,7 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
}
mutex_unlock(&kvm->lock);
VM_EVENT(kvm, 3, "SET: max guest address: %lu", new_limit);
- VM_EVENT(kvm, 3, "New guest asce: 0x%pK",
+ VM_EVENT(kvm, 3, "New guest asce: 0x%p",
(void *) kvm->arch.gmap->asce);
break;
}
@@ -3466,7 +3466,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_s390_gisa_init(kvm);
INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
kvm->arch.pv.set_aside = NULL;
- KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
+ KVM_EVENT(3, "vm 0x%p created by pid %u", kvm, current->pid);
return 0;
out_err:
@@ -3529,7 +3529,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_s390_destroy_adapters(kvm);
kvm_s390_clear_float_irqs(kvm);
kvm_s390_vsie_destroy(kvm);
- KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
+ KVM_EVENT(3, "vm 0x%p destroyed", kvm);
}
/* Section: vcpu related */
@@ -3650,7 +3650,7 @@ static int sca_switch_to_extended(struct kvm *kvm)
free_page((unsigned long)old_sca);
- VM_EVENT(kvm, 2, "Switched to ESCA (0x%pK -> 0x%pK)",
+ VM_EVENT(kvm, 2, "Switched to ESCA (0x%p -> 0x%p)",
old_sca, kvm->arch.sca);
return 0;
}
@@ -4027,7 +4027,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
goto out_free_sie_block;
}
- VM_EVENT(vcpu->kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK",
+ VM_EVENT(vcpu->kvm, 3, "create cpu %d at 0x%p, sie block at 0x%p",
vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);
trace_kvm_s390_create_vcpu(vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);
diff --git a/arch/s390/kvm/trace-s390.h b/arch/s390/kvm/trace-s390.h
index 9ac92db..9e28f16 100644
--- a/arch/s390/kvm/trace-s390.h
+++ b/arch/s390/kvm/trace-s390.h
@@ -56,7 +56,7 @@ TRACE_EVENT(kvm_s390_create_vcpu,
__entry->sie_block = sie_block;
),
- TP_printk("create cpu %d at 0x%pK, sie block at 0x%pK",
+ TP_printk("create cpu %d at 0x%p, sie block at 0x%p",
__entry->id, __entry->vcpu, __entry->sie_block)
);
@@ -255,7 +255,7 @@ TRACE_EVENT(kvm_s390_enable_css,
__entry->kvm = kvm;
),
- TP_printk("enabling channel I/O support (kvm @ %pK)\n",
+ TP_printk("enabling channel I/O support (kvm @ %p)\n",
__entry->kvm)
);
diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index 855f818..d5c68ad 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -54,6 +54,9 @@ static struct facility_def facility_defs[] = {
#ifdef CONFIG_HAVE_MARCH_Z15_FEATURES
61, /* miscellaneous-instruction-extension 3 */
#endif
+#ifdef CONFIG_HAVE_MARCH_Z17_FEATURES
+ 84, /* miscellaneous-instruction-extension 4 */
+#endif
-1 /* END */
}
},
diff --git a/arch/sh/configs/ap325rxa_defconfig b/arch/sh/configs/ap325rxa_defconfig
index 4464a2a..b6f36c9 100644
--- a/arch/sh/configs/ap325rxa_defconfig
+++ b/arch/sh/configs/ap325rxa_defconfig
@@ -99,4 +99,3 @@
CONFIG_CRYPTO=y
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/ecovec24_defconfig b/arch/sh/configs/ecovec24_defconfig
index ee1b366..e76694a 100644
--- a/arch/sh/configs/ecovec24_defconfig
+++ b/arch/sh/configs/ecovec24_defconfig
@@ -128,4 +128,3 @@
CONFIG_CRYPTO=y
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/edosk7705_defconfig b/arch/sh/configs/edosk7705_defconfig
index 296ed76..ee3f6db 100644
--- a/arch/sh/configs/edosk7705_defconfig
+++ b/arch/sh/configs/edosk7705_defconfig
@@ -33,4 +33,3 @@
# CONFIG_PROC_FS is not set
# CONFIG_SYSFS is not set
# CONFIG_ENABLE_MUST_CHECK is not set
-# CONFIG_CRC32 is not set
diff --git a/arch/sh/configs/espt_defconfig b/arch/sh/configs/espt_defconfig
index 67716a4..da176f1 100644
--- a/arch/sh/configs/espt_defconfig
+++ b/arch/sh/configs/espt_defconfig
@@ -110,4 +110,3 @@
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_DEBUG_FS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/hp6xx_defconfig b/arch/sh/configs/hp6xx_defconfig
index 77e3185..3582af1 100644
--- a/arch/sh/configs/hp6xx_defconfig
+++ b/arch/sh/configs/hp6xx_defconfig
@@ -56,5 +56,3 @@
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC16=y
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/kfr2r09-romimage_defconfig b/arch/sh/configs/kfr2r09-romimage_defconfig
index 42bf341..88fbb65 100644
--- a/arch/sh/configs/kfr2r09-romimage_defconfig
+++ b/arch/sh/configs/kfr2r09-romimage_defconfig
@@ -49,4 +49,3 @@
# CONFIG_NETWORK_FILESYSTEMS is not set
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_DEBUG_FS=y
-# CONFIG_CRC32 is not set
diff --git a/arch/sh/configs/landisk_defconfig b/arch/sh/configs/landisk_defconfig
index d871623..924bb32 100644
--- a/arch/sh/configs/landisk_defconfig
+++ b/arch/sh/configs/landisk_defconfig
@@ -111,4 +111,3 @@
CONFIG_NLS_CODEPAGE_932=y
CONFIG_SH_STANDARD_BIOS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/lboxre2_defconfig b/arch/sh/configs/lboxre2_defconfig
index 6a23476..0307bb2 100644
--- a/arch/sh/configs/lboxre2_defconfig
+++ b/arch/sh/configs/lboxre2_defconfig
@@ -58,4 +58,3 @@
CONFIG_NLS_CODEPAGE_437=y
CONFIG_SH_STANDARD_BIOS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/magicpanelr2_defconfig b/arch/sh/configs/magicpanelr2_defconfig
index 8d44374..93b9aa3 100644
--- a/arch/sh/configs/magicpanelr2_defconfig
+++ b/arch/sh/configs/magicpanelr2_defconfig
@@ -86,5 +86,3 @@
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_FRAME_POINTER=y
-CONFIG_CRC_CCITT=m
-CONFIG_CRC16=m
diff --git a/arch/sh/configs/migor_defconfig b/arch/sh/configs/migor_defconfig
index 2d1e65c..fc2010c 100644
--- a/arch/sh/configs/migor_defconfig
+++ b/arch/sh/configs/migor_defconfig
@@ -90,4 +90,3 @@
CONFIG_CRYPTO_MANAGER=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/r7780mp_defconfig b/arch/sh/configs/r7780mp_defconfig
index 6bd6c0a..f28b8c4 100644
--- a/arch/sh/configs/r7780mp_defconfig
+++ b/arch/sh/configs/r7780mp_defconfig
@@ -105,4 +105,3 @@
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/r7785rp_defconfig b/arch/sh/configs/r7785rp_defconfig
index cde6685..3a4239f2 100644
--- a/arch/sh/configs/r7785rp_defconfig
+++ b/arch/sh/configs/r7785rp_defconfig
@@ -103,4 +103,3 @@
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/rts7751r2d1_defconfig b/arch/sh/configs/rts7751r2d1_defconfig
index c863a11..69568cc 100644
--- a/arch/sh/configs/rts7751r2d1_defconfig
+++ b/arch/sh/configs/rts7751r2d1_defconfig
@@ -87,4 +87,3 @@
CONFIG_NLS_CODEPAGE_932=y
CONFIG_DEBUG_FS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/rts7751r2dplus_defconfig b/arch/sh/configs/rts7751r2dplus_defconfig
index 7e4f710..ecb4bdb 100644
--- a/arch/sh/configs/rts7751r2dplus_defconfig
+++ b/arch/sh/configs/rts7751r2dplus_defconfig
@@ -92,4 +92,3 @@
CONFIG_NLS_CODEPAGE_932=y
CONFIG_DEBUG_FS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/sdk7780_defconfig b/arch/sh/configs/sdk7780_defconfig
index cd24cf0..9870d16 100644
--- a/arch/sh/configs/sdk7780_defconfig
+++ b/arch/sh/configs/sdk7780_defconfig
@@ -136,4 +136,3 @@
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
index 472fdf3..64f9308 100644
--- a/arch/sh/configs/se7206_defconfig
+++ b/arch/sh/configs/se7206_defconfig
@@ -101,6 +101,3 @@
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=y
-CONFIG_CRC16=y
-CONFIG_CRC_ITU_T=y
diff --git a/arch/sh/configs/se7712_defconfig b/arch/sh/configs/se7712_defconfig
index 49a4961..8770a72 100644
--- a/arch/sh/configs/se7712_defconfig
+++ b/arch/sh/configs/se7712_defconfig
@@ -96,4 +96,3 @@
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_CCITT=y
diff --git a/arch/sh/configs/se7721_defconfig b/arch/sh/configs/se7721_defconfig
index de29379..b15c640 100644
--- a/arch/sh/configs/se7721_defconfig
+++ b/arch/sh/configs/se7721_defconfig
@@ -122,4 +122,3 @@
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_FRAME_POINTER=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_CCITT=y
diff --git a/arch/sh/configs/se7724_defconfig b/arch/sh/configs/se7724_defconfig
index 9652127..9501e69 100644
--- a/arch/sh/configs/se7724_defconfig
+++ b/arch/sh/configs/se7724_defconfig
@@ -128,4 +128,3 @@
CONFIG_CRYPTO=y
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/sh03_defconfig b/arch/sh/configs/sh03_defconfig
index 48f38ec..4d75c92 100644
--- a/arch/sh/configs/sh03_defconfig
+++ b/arch/sh/configs/sh03_defconfig
@@ -120,6 +120,5 @@
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_CCITT=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_GENERIC=y
diff --git a/arch/sh/configs/sh2007_defconfig b/arch/sh/configs/sh2007_defconfig
index 1b1174a..cc6292b 100644
--- a/arch/sh/configs/sh2007_defconfig
+++ b/arch/sh/configs/sh2007_defconfig
@@ -193,5 +193,3 @@
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
-CONFIG_CRC_CCITT=y
-CONFIG_CRC16=y
diff --git a/arch/sh/configs/sh7724_generic_defconfig b/arch/sh/configs/sh7724_generic_defconfig
index 5440bd0..e6298f2 100644
--- a/arch/sh/configs/sh7724_generic_defconfig
+++ b/arch/sh/configs/sh7724_generic_defconfig
@@ -39,4 +39,3 @@
# CONFIG_SYSFS is not set
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_ENABLE_MUST_CHECK is not set
-# CONFIG_CRC32 is not set
diff --git a/arch/sh/configs/sh7763rdp_defconfig b/arch/sh/configs/sh7763rdp_defconfig
index 57923c3..b77b331 100644
--- a/arch/sh/configs/sh7763rdp_defconfig
+++ b/arch/sh/configs/sh7763rdp_defconfig
@@ -112,4 +112,3 @@
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_DEBUG_FS=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC_T10DIF=y
diff --git a/arch/sh/configs/sh7770_generic_defconfig b/arch/sh/configs/sh7770_generic_defconfig
index 4338af8..2e2b4698 100644
--- a/arch/sh/configs/sh7770_generic_defconfig
+++ b/arch/sh/configs/sh7770_generic_defconfig
@@ -41,4 +41,3 @@
# CONFIG_SYSFS is not set
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_ENABLE_MUST_CHECK is not set
-# CONFIG_CRC32 is not set
diff --git a/arch/sh/configs/titan_defconfig b/arch/sh/configs/titan_defconfig
index 8e85f20..f022ada 100644
--- a/arch/sh/configs/titan_defconfig
+++ b/arch/sh/configs/titan_defconfig
@@ -264,4 +264,3 @@
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC16=m
diff --git a/arch/sparc/configs/sparc64_defconfig b/arch/sparc/configs/sparc64_defconfig
index 01b2bdf..f1ba0fe 100644
--- a/arch/sparc/configs/sparc64_defconfig
+++ b/arch/sparc/configs/sparc64_defconfig
@@ -229,7 +229,6 @@
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
# CONFIG_CRYPTO_ANSI_CPRNG is not set
-CONFIG_CRC16=m
CONFIG_VCC=m
CONFIG_PATA_CMD64X=y
CONFIG_IP_PNP=y
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index d3caa31..175958b 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -17,19 +17,20 @@
.pushsection .noinstr.text, "ax"
-SYM_FUNC_START(entry_ibpb)
+/* Clobbers AX, CX, DX */
+SYM_FUNC_START(write_ibpb)
ANNOTATE_NOENDBR
movl $MSR_IA32_PRED_CMD, %ecx
- movl $PRED_CMD_IBPB, %eax
+ movl _ASM_RIP(x86_pred_cmd), %eax
xorl %edx, %edx
wrmsr
/* Make sure IBPB clears return stack preductions too. */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET
RET
-SYM_FUNC_END(entry_ibpb)
+SYM_FUNC_END(write_ibpb)
/* For KVM */
-EXPORT_SYMBOL_GPL(entry_ibpb);
+EXPORT_SYMBOL_GPL(write_ibpb);
.popsection
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a884ab5..3bdae45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1472,8 +1472,13 @@ struct kvm_arch {
struct once nx_once;
#ifdef CONFIG_X86_64
- /* The number of TDP MMU pages across all roots. */
+#ifdef CONFIG_KVM_PROVE_MMU
+ /*
+ * The number of TDP MMU pages across all roots. Used only to sanity
+ * check that KVM isn't leaking TDP MMU pages.
+ */
atomic64_t tdp_mmu_pages;
+#endif
/*
* List of struct kvm_mmu_pages being used as roots.
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 8a5cc8e..5c43f14 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -269,7 +269,7 @@
* typically has NO_MELTDOWN).
*
* While retbleed_untrain_ret() doesn't clobber anything but requires stack,
- * entry_ibpb() will clobber AX, CX, DX.
+ * write_ibpb() will clobber AX, CX, DX.
*
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
* where we have a stack but before any RET instruction.
@@ -279,7 +279,7 @@
VALIDATE_UNRET_END
CALL_UNTRAIN_RET
ALTERNATIVE_2 "", \
- "call entry_ibpb", \ibpb_feature, \
+ "call write_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
.endm
@@ -368,7 +368,7 @@ extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
extern void entry_untrain_ret(void);
-extern void entry_ibpb(void);
+extern void write_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop(void);
@@ -514,11 +514,11 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
: "memory");
}
-extern u64 x86_pred_cmd;
-
static inline void indirect_branch_prediction_barrier(void)
{
- alternative_msr_write(MSR_IA32_PRED_CMD, x86_pred_cmd, X86_FEATURE_IBPB);
+ asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB)
+ : ASM_CALL_CONSTRAINT
+ :: "rax", "rcx", "rdx", "memory");
}
/* The Intel SPEC CTRL MSR base value cache */
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 55a5e65..4f84d42 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -16,23 +16,23 @@
#ifdef __ASSEMBLER__
#define ASM_CLAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "clac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "clac", X86_FEATURE_SMAP
#define ASM_STAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "stac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "stac", X86_FEATURE_SMAP
#else /* __ASSEMBLER__ */
static __always_inline void clac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP);
+ alternative("", "clac", X86_FEATURE_SMAP);
}
static __always_inline void stac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP);
+ alternative("", "stac", X86_FEATURE_SMAP);
}
static __always_inline unsigned long smap_save(void)
@@ -59,9 +59,9 @@ static __always_inline void smap_restore(unsigned long flags)
/* These macros can be used in asm() statements */
#define ASM_CLAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "clac", X86_FEATURE_SMAP)
#define ASM_STAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "stac", X86_FEATURE_SMAP)
#define ASM_CLAC_UNSAFE \
ALTERNATIVE("", ANNOTATE_IGNORE_ALTERNATIVE "clac", X86_FEATURE_SMAP)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index dae6a73..9fa321a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -23,6 +23,8 @@
#include <linux/serial_core.h>
#include <linux/pgtable.h>
+#include <xen/xen.h>
+
#include <asm/e820/api.h>
#include <asm/irqdomain.h>
#include <asm/pci_x86.h>
@@ -1729,6 +1731,15 @@ int __init acpi_mps_check(void)
{
#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_X86_MPPARSE)
/* mptable code is not built-in*/
+
+ /*
+ * Xen disables ACPI in PV DomU guests but it still emulates APIC and
+ * supports SMP. Returning early here ensures that APIC is not disabled
+ * unnecessarily and the guest is not limited to a single vCPU.
+ */
+ if (xen_pv_domain() && !xen_initial_domain())
+ return 0;
+
if (acpi_disabled || acpi_noirq) {
pr_warn("MPS support code is not built-in, using acpi=off or acpi=noirq or pci=noacpi may have problem\n");
return 1;
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 79569f7..a839ff50 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -805,6 +805,7 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
static const struct x86_cpu_id erratum_1386_microcode[] = {
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x01), 0x2, 0x2, 0x0800126e),
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x31), 0x0, 0x0, 0x08301052),
+ {}
};
static void fix_erratum_1386(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 4386aa6..362602b 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -59,7 +59,6 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
-EXPORT_SYMBOL_GPL(x86_pred_cmd);
static u64 __ro_after_init x86_arch_cap_msr;
@@ -1142,7 +1141,7 @@ static void __init retbleed_select_mitigation(void)
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
/*
- * There is no need for RSB filling: entry_ibpb() ensures
+ * There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/
@@ -1592,51 +1591,54 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
rrsba_disabled = true;
}
-static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
+static void __init spectre_v2_select_rsb_mitigation(enum spectre_v2_mitigation mode)
{
/*
- * Similar to context switches, there are two types of RSB attacks
- * after VM exit:
+ * WARNING! There are many subtleties to consider when changing *any*
+ * code related to RSB-related mitigations. Before doing so, carefully
+ * read the following document, and update if necessary:
*
- * 1) RSB underflow
+ * Documentation/admin-guide/hw-vuln/rsb.rst
*
- * 2) Poisoned RSB entry
+ * In an overly simplified nutshell:
*
- * When retpoline is enabled, both are mitigated by filling/clearing
- * the RSB.
+ * - User->user RSB attacks are conditionally mitigated during
+ * context switches by cond_mitigation -> write_ibpb().
*
- * When IBRS is enabled, while #1 would be mitigated by the IBRS branch
- * prediction isolation protections, RSB still needs to be cleared
- * because of #2. Note that SMEP provides no protection here, unlike
- * user-space-poisoned RSB entries.
+ * - User->kernel and guest->host attacks are mitigated by eIBRS or
+ * RSB filling.
*
- * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
- * bug is present then a LITE version of RSB protection is required,
- * just a single call needs to retire before a RET is executed.
+ * Though, depending on config, note that other alternative
+ * mitigations may end up getting used instead, e.g., IBPB on
+ * entry/vmexit, call depth tracking, or return thunks.
*/
+
switch (mode) {
case SPECTRE_V2_NONE:
- return;
+ break;
- case SPECTRE_V2_EIBRS_LFENCE:
case SPECTRE_V2_EIBRS:
- if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
- setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
- pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
- }
- return;
-
+ case SPECTRE_V2_EIBRS_LFENCE:
case SPECTRE_V2_EIBRS_RETPOLINE:
+ if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
+ pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
+ setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
+ }
+ break;
+
case SPECTRE_V2_RETPOLINE:
case SPECTRE_V2_LFENCE:
case SPECTRE_V2_IBRS:
+ pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
+ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
- pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
- return;
- }
+ break;
- pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
- dump_stack();
+ default:
+ pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n");
+ dump_stack();
+ break;
+ }
}
/*
@@ -1830,48 +1832,7 @@ static void __init spectre_v2_select_mitigation(void)
spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);
- /*
- * If Spectre v2 protection has been enabled, fill the RSB during a
- * context switch. In general there are two types of RSB attacks
- * across context switches, for which the CALLs/RETs may be unbalanced.
- *
- * 1) RSB underflow
- *
- * Some Intel parts have "bottomless RSB". When the RSB is empty,
- * speculated return targets may come from the branch predictor,
- * which could have a user-poisoned BTB or BHB entry.
- *
- * AMD has it even worse: *all* returns are speculated from the BTB,
- * regardless of the state of the RSB.
- *
- * When IBRS or eIBRS is enabled, the "user -> kernel" attack
- * scenario is mitigated by the IBRS branch prediction isolation
- * properties, so the RSB buffer filling wouldn't be necessary to
- * protect against this type of attack.
- *
- * The "user -> user" attack scenario is mitigated by RSB filling.
- *
- * 2) Poisoned RSB entry
- *
- * If the 'next' in-kernel return stack is shorter than 'prev',
- * 'next' could be tricked into speculating with a user-poisoned RSB
- * entry.
- *
- * The "user -> kernel" attack scenario is mitigated by SMEP and
- * eIBRS.
- *
- * The "user -> user" scenario, also known as SpectreBHB, requires
- * RSB clearing.
- *
- * So to mitigate all cases, unconditionally fill RSB on context
- * switches.
- *
- * FIXME: Is this pointless for retbleed-affected AMD?
- */
- setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
- pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
-
- spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
+ spectre_v2_select_rsb_mitigation(mode);
/*
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
@@ -2676,7 +2637,7 @@ static void __init srso_select_mitigation(void)
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
/*
- * There is no need for RSB filling: entry_ibpb() ensures
+ * There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/
@@ -2701,7 +2662,7 @@ static void __init srso_select_mitigation(void)
srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT;
/*
- * There is no need for RSB filling: entry_ibpb() ensures
+ * There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 93ec829..cc4a541 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3553,6 +3553,22 @@ static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
free_rmid(rgrp->closid, rgrp->mon.rmid);
}
+/*
+ * We allow creating mon groups only with in a directory called "mon_groups"
+ * which is present in every ctrl_mon group. Check if this is a valid
+ * "mon_groups" directory.
+ *
+ * 1. The directory should be named "mon_groups".
+ * 2. The mon group itself should "not" be named "mon_groups".
+ * This makes sure "mon_groups" directory always has a ctrl_mon group
+ * as parent.
+ */
+static bool is_mon_groups(struct kernfs_node *kn, const char *name)
+{
+ return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
+ strcmp(name, "mon_groups"));
+}
+
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
const char *name, umode_t mode,
enum rdt_group_type rtype, struct rdtgroup **r)
@@ -3568,6 +3584,15 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
goto out_unlock;
}
+ /*
+ * Check that the parent directory for a monitor group is a "mon_groups"
+ * directory.
+ */
+ if (rtype == RDTMON_GROUP && !is_mon_groups(parent_kn, name)) {
+ ret = -EPERM;
+ goto out_unlock;
+ }
+
if (rtype == RDTMON_GROUP &&
(prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
@@ -3751,22 +3776,6 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
return ret;
}
-/*
- * We allow creating mon groups only with in a directory called "mon_groups"
- * which is present in every ctrl_mon group. Check if this is a valid
- * "mon_groups" directory.
- *
- * 1. The directory should be named "mon_groups".
- * 2. The mon group itself should "not" be named "mon_groups".
- * This makes sure "mon_groups" directory always has a ctrl_mon group
- * as parent.
- */
-static bool is_mon_groups(struct kernfs_node *kn, const char *name)
-{
- return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
- strcmp(name, "mon_groups"));
-}
-
static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
umode_t mode)
{
@@ -3782,11 +3791,8 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);
- /*
- * If RDT monitoring is supported and the parent directory is a valid
- * "mon_groups" directory, add a monitoring subdirectory.
- */
- if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
+ /* Else, attempt to add a monitoring subdirectory. */
+ if (resctrl_arch_mon_capable())
return rdtgroup_mkdir_mon(parent_kn, name, mode);
return -EPERM;
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0..9d8dd8d 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI
diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index 611f27e..3aad78b 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -389,10 +389,10 @@ static int __init setup_early_printk(char *buf)
keep = (strstr(buf, "keep") != NULL);
while (*buf != '\0') {
- if (!strncmp(buf, "mmio", 4)) {
- early_mmio_serial_init(buf + 4);
+ if (!strncmp(buf, "mmio32", 6)) {
+ buf += 6;
+ early_mmio_serial_init(buf);
early_console_register(&early_serial_console, keep);
- buf += 4;
}
if (!strncmp(buf, "serial", 6)) {
buf += 6;
@@ -407,9 +407,9 @@ static int __init setup_early_printk(char *buf)
}
#ifdef CONFIG_PCI
if (!strncmp(buf, "pciserial", 9)) {
- early_pci_serial_init(buf + 9);
+ buf += 9; /* Keep from match the above "pciserial" */
+ early_pci_serial_init(buf);
early_console_register(&early_serial_console, keep);
- buf += 9; /* Keep from match the above "serial" */
}
#endif
if (!strncmp(buf, "vga", 3) &&
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5e4d493..571c906 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1427,8 +1427,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
}
break;
case 0xa: { /* Architectural Performance Monitoring */
- union cpuid10_eax eax;
- union cpuid10_edx edx;
+ union cpuid10_eax eax = { };
+ union cpuid10_edx edx = { };
if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
@@ -1444,8 +1444,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
if (kvm_pmu_cap.version)
edx.split.anythread_deprecated = 1;
- edx.split.reserved1 = 0;
- edx.split.reserved2 = 0;
entry->eax = eax.full;
entry->ebx = kvm_pmu_cap.events_mask;
@@ -1763,7 +1761,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
break;
/* AMD Extended Performance Monitoring and Debug */
case 0x80000022: {
- union cpuid_0x80000022_ebx ebx;
+ union cpuid_0x80000022_ebx ebx = { };
entry->ecx = entry->edx = 0;
if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7cc0564..21a3b81 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -40,7 +40,9 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
kvm_tdp_mmu_zap_invalidated_roots(kvm, false);
- WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
+#ifdef CONFIG_KVM_PROVE_MMU
+ KVM_MMU_WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
+#endif
WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
/*
@@ -325,13 +327,17 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
+#ifdef CONFIG_KVM_PROVE_MMU
atomic64_inc(&kvm->arch.tdp_mmu_pages);
+#endif
}
static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, -1);
+#ifdef CONFIG_KVM_PROVE_MMU
atomic64_dec(&kvm->arch.tdp_mmu_pages);
+#endif
}
/**
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index ec08fa3..51116fe 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -31,6 +31,8 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
*/
static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
+#define PI_LOCK_SCHED_OUT SINGLE_DEPTH_NESTING
+
static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
{
return &(to_vmx(vcpu)->pi_desc);
@@ -89,9 +91,20 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
* current pCPU if the task was migrated.
*/
if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
- raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu);
+
+ /*
+ * In addition to taking the wakeup lock for the regular/IRQ
+ * context, tell lockdep it is being taken for the "sched out"
+ * context as well. vCPU loads happens in task context, and
+ * this is taking the lock of the *previous* CPU, i.e. can race
+ * with both the scheduler and the wakeup handler.
+ */
+ raw_spin_lock(spinlock);
+ spin_acquire(&spinlock->dep_map, PI_LOCK_SCHED_OUT, 0, _RET_IP_);
list_del(&vmx->pi_wakeup_list);
- raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ spin_release(&spinlock->dep_map, _RET_IP_);
+ raw_spin_unlock(spinlock);
}
dest = cpu_physical_id(cpu);
@@ -148,11 +161,23 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct pi_desc old, new;
- unsigned long flags;
- local_irq_save(flags);
+ lockdep_assert_irqs_disabled();
- raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ /*
+ * Acquire the wakeup lock using the "sched out" context to workaround
+ * a lockdep false positive. When this is called, schedule() holds
+ * various per-CPU scheduler locks. When the wakeup handler runs, it
+ * holds this CPU's wakeup lock while calling try_to_wake_up(), which
+ * can eventually take the aforementioned scheduler locks, which causes
+ * lockdep to assume there is deadlock.
+ *
+ * Deadlock can't actually occur because IRQs are disabled for the
+ * entirety of the sched_out critical section, i.e. the wakeup handler
+ * can't run while the scheduler locks are held.
+ */
+ raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu),
+ PI_LOCK_SCHED_OUT);
list_add_tail(&vmx->pi_wakeup_list,
&per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
@@ -176,8 +201,6 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
*/
if (pi_test_on(&new))
__apic_send_IPI_self(POSTED_INTR_WAKEUP_VECTOR);
-
- local_irq_restore(flags);
}
static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c841817a..3712dde 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11786,6 +11786,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
if (kvm_mpx_supported())
kvm_load_guest_fpu(vcpu);
+ kvm_vcpu_srcu_read_lock(vcpu);
+
r = kvm_apic_accept_events(vcpu);
if (r < 0)
goto out;
@@ -11799,6 +11801,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
mp_state->mp_state = vcpu->arch.mp_state;
out:
+ kvm_vcpu_srcu_read_unlock(vcpu);
+
if (kvm_mpx_supported())
kvm_put_guest_fpu(vcpu);
vcpu_put(vcpu);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e459d97..eb83348f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -667,9 +667,9 @@ static void cond_mitigation(struct task_struct *next)
prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
/*
- * Avoid user/user BTB poisoning by flushing the branch predictor
- * when switching between processes. This stops one process from
- * doing Spectre-v2 attacks on another.
+ * Avoid user->user BTB/RSB poisoning by flushing them when switching
+ * between processes. This stops one process from doing Spectre-v2
+ * attacks on another.
*
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index 8c534c3..66f066b 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -26,7 +26,7 @@
/* code below belongs to the image kernel */
.align PAGE_SIZE
SYM_FUNC_START(restore_registers)
- ANNOTATE_NOENDBR
+ ENDBR
/* go back to the original page tables */
movq %r9, %cr3
@@ -120,7 +120,7 @@
/* code below has been relocated to a safe page */
SYM_FUNC_START(core_restore_code)
- ANNOTATE_NOENDBR
+ ENDBR
/* switch to temporary page tables */
movq %rax, %cr3
/* flush TLB */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 43dcd8c..1b7710b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -70,6 +70,9 @@ EXPORT_SYMBOL(xen_start_flags);
*/
struct shared_info *HYPERVISOR_shared_info = &xen_dummy_shared_info;
+/* Number of pages released from the initial allocation. */
+unsigned long xen_released_pages;
+
static __ref void xen_get_vendor(void)
{
init_cpu_devs();
@@ -466,6 +469,13 @@ int __init arch_xen_unpopulated_init(struct resource **res)
xen_free_unpopulated_pages(1, &pg);
}
+ /*
+ * Account for the region being in the physmap but unpopulated.
+ * The value in xen_released_pages is used by the balloon
+ * driver to know how much of the physmap is unpopulated and
+ * set an accurate initial memory target.
+ */
+ xen_released_pages += xen_extra_mem[i].n_pfns;
/* Zero so region is not also added to the balloon driver. */
xen_extra_mem[i].n_pfns = 0;
}
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 0e3d930..9d25d93 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -1,5 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/acpi.h>
+#include <linux/cpufreq.h>
+#include <linux/cpuidle.h>
#include <linux/export.h>
#include <linux/mm.h>
@@ -123,8 +125,23 @@ static void __init pvh_arch_setup(void)
{
pvh_reserve_extra_memory();
- if (xen_initial_domain())
+ if (xen_initial_domain()) {
xen_add_preferred_consoles();
+
+ /*
+ * Disable usage of CPU idle and frequency drivers: when
+ * running as hardware domain the exposed native ACPI tables
+ * causes idle and/or frequency drivers to attach and
+ * malfunction. It's Xen the entity that controls the idle and
+ * frequency states.
+ *
+ * For unprivileged domains the exposed ACPI tables are
+ * fabricated and don't contain such data.
+ */
+ disable_cpuidle();
+ disable_cpufreq();
+ WARN_ON(xen_set_default_idle());
+ }
}
void __init xen_pvh_init(struct boot_params *boot_params)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index c3db71d..3823e52 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -37,9 +37,6 @@
#define GB(x) ((uint64_t)(x) * 1024 * 1024 * 1024)
-/* Number of pages released from the initial allocation. */
-unsigned long xen_released_pages;
-
/* Memory map would allow PCI passthrough. */
bool xen_pv_pci_possible;
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 109af12..461bb15 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -226,9 +226,7 @@
push %rax
mov $__HYPERVISOR_iret, %eax
syscall /* Do the IRET. */
-#ifdef CONFIG_MITIGATION_SLS
- int3
-#endif
+ ud2 /* The SYSCALL should never return. */
.endm
SYM_CODE_START(xen_iret)
diff --git a/drivers/accel/ivpu/ivpu_debugfs.c b/drivers/accel/ivpu/ivpu_debugfs.c
index 0825851..f0dad0c 100644
--- a/drivers/accel/ivpu/ivpu_debugfs.c
+++ b/drivers/accel/ivpu/ivpu_debugfs.c
@@ -332,7 +332,7 @@ ivpu_force_recovery_fn(struct file *file, const char __user *user_buf, size_t si
return -EINVAL;
ret = ivpu_rpm_get(vdev);
- if (ret)
+ if (ret < 0)
return ret;
ivpu_pm_trigger_recovery(vdev, "debugfs");
@@ -383,7 +383,7 @@ static int dct_active_set(void *data, u64 active_percent)
return -EINVAL;
ret = ivpu_rpm_get(vdev);
- if (ret)
+ if (ret < 0)
return ret;
if (active_percent)
diff --git a/drivers/accel/ivpu/ivpu_ipc.c b/drivers/accel/ivpu/ivpu_ipc.c
index 0e096fd..39f83225c 100644
--- a/drivers/accel/ivpu/ivpu_ipc.c
+++ b/drivers/accel/ivpu/ivpu_ipc.c
@@ -302,7 +302,8 @@ ivpu_ipc_send_receive_internal(struct ivpu_device *vdev, struct vpu_jsm_msg *req
struct ivpu_ipc_consumer cons;
int ret;
- drm_WARN_ON(&vdev->drm, pm_runtime_status_suspended(vdev->drm.dev));
+ drm_WARN_ON(&vdev->drm, pm_runtime_status_suspended(vdev->drm.dev) &&
+ pm_runtime_enabled(vdev->drm.dev));
ivpu_ipc_consumer_add(vdev, &cons, channel, NULL);
diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c
index ffe7b10..2a043ba 100644
--- a/drivers/accel/ivpu/ivpu_ms.c
+++ b/drivers/accel/ivpu/ivpu_ms.c
@@ -4,6 +4,7 @@
*/
#include <drm/drm_file.h>
+#include <linux/pm_runtime.h>
#include "ivpu_drv.h"
#include "ivpu_gem.h"
@@ -44,6 +45,10 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
args->sampling_period_ns < MS_MIN_SAMPLE_PERIOD_NS)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
if (get_instance_by_mask(file_priv, args->metric_group_mask)) {
@@ -96,6 +101,8 @@ int ivpu_ms_start_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
kfree(ms);
unlock:
mutex_unlock(&file_priv->ms_lock);
+
+ ivpu_rpm_put(vdev);
return ret;
}
@@ -160,6 +167,10 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
if (!args->metric_group_mask)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
ms = get_instance_by_mask(file_priv, args->metric_group_mask);
@@ -187,6 +198,7 @@ int ivpu_ms_get_data_ioctl(struct drm_device *dev, void *data, struct drm_file *
unlock:
mutex_unlock(&file_priv->ms_lock);
+ ivpu_rpm_put(vdev);
return ret;
}
@@ -204,11 +216,17 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
{
struct ivpu_file_priv *file_priv = file->driver_priv;
struct drm_ivpu_metric_streamer_stop *args = data;
+ struct ivpu_device *vdev = file_priv->vdev;
struct ivpu_ms_instance *ms;
+ int ret;
if (!args->metric_group_mask)
return -EINVAL;
+ ret = ivpu_rpm_get(vdev);
+ if (ret < 0)
+ return ret;
+
mutex_lock(&file_priv->ms_lock);
ms = get_instance_by_mask(file_priv, args->metric_group_mask);
@@ -217,6 +235,7 @@ int ivpu_ms_stop_ioctl(struct drm_device *dev, void *data, struct drm_file *file
mutex_unlock(&file_priv->ms_lock);
+ ivpu_rpm_put(vdev);
return ms ? 0 : -EINVAL;
}
@@ -281,6 +300,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *
void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
{
struct ivpu_ms_instance *ms, *tmp;
+ struct ivpu_device *vdev = file_priv->vdev;
+
+ pm_runtime_get_sync(vdev->drm.dev);
mutex_lock(&file_priv->ms_lock);
@@ -293,6 +315,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv)
free_instance(file_priv, ms);
mutex_unlock(&file_priv->ms_lock);
+
+ pm_runtime_put_autosuspend(vdev->drm.dev);
}
void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 90b0984..0a70260 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -458,7 +458,7 @@ static void acpi_button_notify(acpi_handle handle, u32 event, void *data)
acpi_pm_wakeup_event(&device->dev);
button = acpi_driver_data(device);
- if (button->suspended)
+ if (button->suspended || event == ACPI_BUTTON_NOTIFY_WAKE)
return;
input = button->input;
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 8db09d8..3c5f348 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -2301,6 +2301,34 @@ static const struct dmi_system_id acpi_ec_no_wakeup[] = {
DMI_MATCH(DMI_PRODUCT_FAMILY, "103C_5336AN HP ZHAN 66 Pro"),
},
},
+ /*
+ * Lenovo Legion Go S; touchscreen blocks HW sleep when woken up from EC
+ * https://gitlab.freedesktop.org/drm/amd/-/issues/3929
+ */
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "83L3"),
+ }
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "83N6"),
+ }
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "83Q2"),
+ }
+ },
+ {
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"),
+ }
+ },
{ },
};
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index a35dd0e4..f73ce6e 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -229,7 +229,7 @@ static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
node_entry = ACPI_PTR_DIFF(node, table_hdr);
entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
sizeof(struct acpi_table_pptt));
- proc_sz = sizeof(struct acpi_pptt_processor *);
+ proc_sz = sizeof(struct acpi_pptt_processor);
while ((unsigned long)entry + proc_sz < table_end) {
cpu_node = (struct acpi_pptt_processor *)entry;
@@ -270,7 +270,7 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
table_end = (unsigned long)table_hdr + table_hdr->length;
entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
sizeof(struct acpi_table_pptt));
- proc_sz = sizeof(struct acpi_pptt_processor *);
+ proc_sz = sizeof(struct acpi_pptt_processor);
/* find the processor structure associated with this cpuid */
while ((unsigned long)entry + proc_sz < table_end) {
diff --git a/drivers/ata/pata_pxa.c b/drivers/ata/pata_pxa.c
index 434f380..03dbaf4 100644
--- a/drivers/ata/pata_pxa.c
+++ b/drivers/ata/pata_pxa.c
@@ -223,10 +223,16 @@ static int pxa_ata_probe(struct platform_device *pdev)
ap->ioaddr.cmd_addr = devm_ioremap(&pdev->dev, cmd_res->start,
resource_size(cmd_res));
+ if (!ap->ioaddr.cmd_addr)
+ return -ENOMEM;
ap->ioaddr.ctl_addr = devm_ioremap(&pdev->dev, ctl_res->start,
resource_size(ctl_res));
+ if (!ap->ioaddr.ctl_addr)
+ return -ENOMEM;
ap->ioaddr.bmdma_addr = devm_ioremap(&pdev->dev, dma_res->start,
resource_size(dma_res));
+ if (!ap->ioaddr.bmdma_addr)
+ return -ENOMEM;
/*
* Adjust register offsets
diff --git a/drivers/ata/sata_sx4.c b/drivers/ata/sata_sx4.c
index a482741..c3042ec 100644
--- a/drivers/ata/sata_sx4.c
+++ b/drivers/ata/sata_sx4.c
@@ -1117,9 +1117,14 @@ static int pdc20621_prog_dimm0(struct ata_host *host)
mmio += PDC_CHIP0_OFS;
for (i = 0; i < ARRAY_SIZE(pdc_i2c_read_data); i++)
- pdc20621_i2c_read(host, PDC_DIMM0_SPD_DEV_ADDRESS,
- pdc_i2c_read_data[i].reg,
- &spd0[pdc_i2c_read_data[i].ofs]);
+ if (!pdc20621_i2c_read(host, PDC_DIMM0_SPD_DEV_ADDRESS,
+ pdc_i2c_read_data[i].reg,
+ &spd0[pdc_i2c_read_data[i].ofs])) {
+ dev_err(host->dev,
+ "Failed in i2c read at index %d: device=%#x, reg=%#x\n",
+ i, PDC_DIMM0_SPD_DEV_ADDRESS, pdc_i2c_read_data[i].reg);
+ return -EIO;
+ }
data |= (spd0[4] - 8) | ((spd0[21] != 0) << 3) | ((spd0[3]-11) << 4);
data |= ((spd0[17] / 4) << 6) | ((spd0[5] / 2) << 7) |
@@ -1284,6 +1289,8 @@ static unsigned int pdc20621_dimm_init(struct ata_host *host)
/* Programming DIMM0 Module Control Register (index_CID0:80h) */
size = pdc20621_prog_dimm0(host);
+ if (size < 0)
+ return size;
dev_dbg(host->dev, "Local DIMM Size = %dMB\n", size);
/* Programming DIMM Module Global Control Register (index_CID0:88h) */
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a97f2c4..2551ebf 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -367,7 +367,7 @@
tristate "Rados block device (RBD)"
depends on INET && BLOCK
select CEPH_LIB
- select LIBCRC32C
+ select CRC32
select CRYPTO_AES
select CRYPTO
help
diff --git a/drivers/block/drbd/Kconfig b/drivers/block/drbd/Kconfig
index 6fb4e38..495a72d 100644
--- a/drivers/block/drbd/Kconfig
+++ b/drivers/block/drbd/Kconfig
@@ -10,7 +10,7 @@
tristate "DRBD Distributed Replicated Block Device support"
depends on PROC_FS && INET
select LRU_CACHE
- select LIBCRC32C
+ select CRC32
help
NOTE: In order to authenticate connections you have to select
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 3bb9cee..aa163ae 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -2031,7 +2031,7 @@ static int null_add_dev(struct nullb_device *dev)
nullb->disk->minors = 1;
nullb->disk->fops = &null_ops;
nullb->disk->private_data = nullb;
- strscpy_pad(nullb->disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
+ strscpy(nullb->disk->disk_name, nullb->disk_name);
if (nullb->dev->zoned) {
rv = null_register_zoned_dev(nullb);
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2fd05c1..cdb1543f 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1140,6 +1140,25 @@ static void ublk_complete_rq(struct kref *ref)
__ublk_complete_rq(req);
}
+static void ublk_do_fail_rq(struct request *req)
+{
+ struct ublk_queue *ubq = req->mq_hctx->driver_data;
+
+ if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
+ blk_mq_requeue_request(req, false);
+ else
+ __ublk_complete_rq(req);
+}
+
+static void ublk_fail_rq_fn(struct kref *ref)
+{
+ struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data,
+ ref);
+ struct request *req = blk_mq_rq_from_pdu(data);
+
+ ublk_do_fail_rq(req);
+}
+
/*
* Since ublk_rq_task_work_cb always fails requests immediately during
* exiting, __ublk_fail_req() is only called from abort context during
@@ -1153,10 +1172,13 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
{
WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
- if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
- blk_mq_requeue_request(req, false);
- else
- ublk_put_req_ref(ubq, req);
+ if (ublk_need_req_ref(ubq)) {
+ struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
+
+ kref_put(&data->ref, ublk_fail_rq_fn);
+ } else {
+ ublk_do_fail_rq(req);
+ }
}
static void ubq_complete_io_cmd(struct ublk_io *io, int res,
@@ -1349,7 +1371,8 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
return BLK_EH_RESET_TIMER;
}
-static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq)
+static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq,
+ bool check_cancel)
{
blk_status_t res;
@@ -1368,7 +1391,7 @@ static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq)
if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
return BLK_STS_IOERR;
- if (unlikely(ubq->canceling))
+ if (check_cancel && unlikely(ubq->canceling))
return BLK_STS_IOERR;
/* fill iod to slot in io cmd buffer */
@@ -1387,7 +1410,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
struct request *rq = bd->rq;
blk_status_t res;
- res = ublk_prep_req(ubq, rq);
+ res = ublk_prep_req(ubq, rq, false);
if (res != BLK_STS_OK)
return res;
@@ -1419,7 +1442,7 @@ static void ublk_queue_rqs(struct rq_list *rqlist)
ublk_queue_cmd_list(ubq, &submit_list);
ubq = this_q;
- if (ublk_prep_req(ubq, req) == BLK_STS_OK)
+ if (ublk_prep_req(ubq, req, true) == BLK_STS_OK)
rq_list_add_tail(&submit_list, req);
else
rq_list_add_tail(&requeue_list, req);
@@ -2413,9 +2436,9 @@ static struct ublk_device *ublk_get_device_from_id(int idx)
return ub;
}
-static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
+static int ublk_ctrl_start_dev(struct ublk_device *ub,
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
const struct ublk_param_basic *p = &ub->params.basic;
int ublksrv_pid = (int)header->data[0];
struct queue_limits lim = {
@@ -2534,9 +2557,8 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
}
static int ublk_ctrl_get_queue_affinity(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
cpumask_var_t cpumask;
unsigned long queue;
@@ -2585,9 +2607,8 @@ static inline void ublk_dump_dev_info(struct ublksrv_ctrl_dev_info *info)
info->nr_hw_queues, info->queue_depth);
}
-static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
+static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
struct ublksrv_ctrl_dev_info info;
struct ublk_device *ub;
@@ -2812,9 +2833,8 @@ static int ublk_ctrl_stop_dev(struct ublk_device *ub)
}
static int ublk_ctrl_get_dev_info(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
if (header->len < sizeof(struct ublksrv_ctrl_dev_info) || !header->addr)
@@ -2843,9 +2863,8 @@ static void ublk_ctrl_fill_params_devt(struct ublk_device *ub)
}
static int ublk_ctrl_get_params(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
struct ublk_params_header ph;
int ret;
@@ -2874,9 +2893,8 @@ static int ublk_ctrl_get_params(struct ublk_device *ub,
}
static int ublk_ctrl_set_params(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
struct ublk_params_header ph;
int ret = -EFAULT;
@@ -2940,9 +2958,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
}
static int ublk_ctrl_start_recovery(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
int ret = -EINVAL;
int i;
@@ -2988,9 +3005,8 @@ static int ublk_ctrl_start_recovery(struct ublk_device *ub,
}
static int ublk_ctrl_end_recovery(struct ublk_device *ub,
- struct io_uring_cmd *cmd)
+ const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
int ublksrv_pid = (int)header->data[0];
int ret = -EINVAL;
int i;
@@ -3037,9 +3053,8 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
return ret;
}
-static int ublk_ctrl_get_features(struct io_uring_cmd *cmd)
+static int ublk_ctrl_get_features(const struct ublksrv_ctrl_cmd *header)
{
- const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
void __user *argp = (void __user *)(unsigned long)header->addr;
u64 features = UBLK_F_ALL;
@@ -3178,7 +3193,7 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
goto out;
if (cmd_op == UBLK_U_CMD_GET_FEATURES) {
- ret = ublk_ctrl_get_features(cmd);
+ ret = ublk_ctrl_get_features(header);
goto out;
}
@@ -3195,17 +3210,17 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
switch (_IOC_NR(cmd_op)) {
case UBLK_CMD_START_DEV:
- ret = ublk_ctrl_start_dev(ub, cmd);
+ ret = ublk_ctrl_start_dev(ub, header);
break;
case UBLK_CMD_STOP_DEV:
ret = ublk_ctrl_stop_dev(ub);
break;
case UBLK_CMD_GET_DEV_INFO:
case UBLK_CMD_GET_DEV_INFO2:
- ret = ublk_ctrl_get_dev_info(ub, cmd);
+ ret = ublk_ctrl_get_dev_info(ub, header);
break;
case UBLK_CMD_ADD_DEV:
- ret = ublk_ctrl_add_dev(cmd);
+ ret = ublk_ctrl_add_dev(header);
break;
case UBLK_CMD_DEL_DEV:
ret = ublk_ctrl_del_dev(&ub, true);
@@ -3214,19 +3229,19 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
ret = ublk_ctrl_del_dev(&ub, false);
break;
case UBLK_CMD_GET_QUEUE_AFFINITY:
- ret = ublk_ctrl_get_queue_affinity(ub, cmd);
+ ret = ublk_ctrl_get_queue_affinity(ub, header);
break;
case UBLK_CMD_GET_PARAMS:
- ret = ublk_ctrl_get_params(ub, cmd);
+ ret = ublk_ctrl_get_params(ub, header);
break;
case UBLK_CMD_SET_PARAMS:
- ret = ublk_ctrl_set_params(ub, cmd);
+ ret = ublk_ctrl_set_params(ub, header);
break;
case UBLK_CMD_START_USER_RECOVERY:
- ret = ublk_ctrl_start_recovery(ub, cmd);
+ ret = ublk_ctrl_start_recovery(ub, header);
break;
case UBLK_CMD_END_USER_RECOVERY:
- ret = ublk_ctrl_end_recovery(ub, cmd);
+ ret = ublk_ctrl_end_recovery(ub, header);
break;
default:
ret = -EOPNOTSUPP;
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index cc7398c..e74e36a 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -393,7 +393,7 @@ static long udmabuf_create(struct miscdevice *device,
if (!ubuf)
return -ENOMEM;
- pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
+ pglimit = ((u64)size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
for (i = 0; i < head->count; i++) {
pgoff_t subpgcnt;
diff --git a/drivers/firmware/smccc/kvm_guest.c b/drivers/firmware/smccc/kvm_guest.c
index 5767aed..a123c05 100644
--- a/drivers/firmware/smccc/kvm_guest.c
+++ b/drivers/firmware/smccc/kvm_guest.c
@@ -95,7 +95,7 @@ void __init kvm_arm_target_impl_cpu_init(void)
for (i = 0; i < max_cpus; i++) {
arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_DISCOVER_IMPL_CPUS_FUNC_ID,
- i, &res);
+ i, 0, 0, &res);
if (res.a0 != SMCCC_RET_SUCCESS) {
pr_warn("Discovering target implementation CPUs failed\n");
goto mem_free;
@@ -103,7 +103,7 @@ void __init kvm_arm_target_impl_cpu_init(void)
target[i].midr = res.a1;
target[i].revidr = res.a2;
target[i].aidr = res.a3;
- };
+ }
if (!cpu_errata_set_target_impl(max_cpus, target)) {
pr_warn("Failed to set target implementation CPUs\n");
diff --git a/drivers/gpio/TODO b/drivers/gpio/TODO
index b5f0a7a..4b70cba 100644
--- a/drivers/gpio/TODO
+++ b/drivers/gpio/TODO
@@ -186,3 +186,37 @@
Encourage users to switch to using them and eventually remove the existing
global export/unexport attribues.
+
+-------------------------------------------------------------------------------
+
+Remove GPIOD_FLAGS_BIT_NONEXCLUSIVE
+
+GPIOs in the linux kernel are meant to be an exclusive resource. This means
+that the GPIO descriptors (the software representation of the hardware concept)
+are not reference counted and - in general - only one user at a time can
+request a GPIO line and control its settings. The consumer API is designed
+around full control of the line's state as evidenced by the fact that, for
+instance, gpiod_set_value() does indeed drive the line as requested, instead
+of bumping an enable counter of some sort.
+
+A problematic use-case for GPIOs is when two consumers want to use the same
+descriptor independently. An example of such a user is the regulator subsystem
+which may instantiate several struct regulator_dev instances containing
+a struct device but using the same enable GPIO line.
+
+A workaround was introduced in the form of the GPIOD_FLAGS_BIT_NONEXCLUSIVE
+flag but its implementation is problematic: it does not provide any
+synchronization of usage nor did it introduce any enable count meaning the
+non-exclusive users of the same descriptor will in fact "fight" for the
+control over it. This flag should be removed and replaced with a better
+solution, possibly based on the new power sequencing subsystem.
+
+-------------------------------------------------------------------------------
+
+Remove devm_gpiod_unhinge()
+
+devm_gpiod_unhinge() is provided as a way to transfer the ownership of managed
+enable GPIOs to the regulator core. Rather than doing that however, we should
+make it possible for the regulator subsystem to deal with GPIO resources the
+lifetime of which it doesn't control as logically, a GPIO obtained by a caller
+should also be freed by it.
diff --git a/drivers/gpio/gpio-mpc8xxx.c b/drivers/gpio/gpio-mpc8xxx.c
index 0cd4c36..5415175 100644
--- a/drivers/gpio/gpio-mpc8xxx.c
+++ b/drivers/gpio/gpio-mpc8xxx.c
@@ -410,7 +410,9 @@ static int mpc8xxx_probe(struct platform_device *pdev)
goto err;
}
- device_init_wakeup(dev, true);
+ ret = devm_device_init_wakeup(dev);
+ if (ret)
+ return dev_err_probe(dev, ret, "Failed to init wakeup\n");
return 0;
err:
diff --git a/drivers/gpio/gpio-tegra186.c b/drivers/gpio/gpio-tegra186.c
index 6895b65..d27bfac 100644
--- a/drivers/gpio/gpio-tegra186.c
+++ b/drivers/gpio/gpio-tegra186.c
@@ -823,6 +823,7 @@ static int tegra186_gpio_probe(struct platform_device *pdev)
struct gpio_irq_chip *irq;
struct tegra_gpio *gpio;
struct device_node *np;
+ struct resource *res;
char **names;
int err;
@@ -842,19 +843,19 @@ static int tegra186_gpio_probe(struct platform_device *pdev)
gpio->num_banks++;
/* get register apertures */
- gpio->secure = devm_platform_ioremap_resource_byname(pdev, "security");
- if (IS_ERR(gpio->secure)) {
- gpio->secure = devm_platform_ioremap_resource(pdev, 0);
- if (IS_ERR(gpio->secure))
- return PTR_ERR(gpio->secure);
- }
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "security");
+ if (!res)
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ gpio->secure = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(gpio->secure))
+ return PTR_ERR(gpio->secure);
- gpio->base = devm_platform_ioremap_resource_byname(pdev, "gpio");
- if (IS_ERR(gpio->base)) {
- gpio->base = devm_platform_ioremap_resource(pdev, 1);
- if (IS_ERR(gpio->base))
- return PTR_ERR(gpio->base);
- }
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "gpio");
+ if (!res)
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+ gpio->base = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(gpio->base))
+ return PTR_ERR(gpio->base);
err = platform_irq_count(pdev);
if (err < 0)
diff --git a/drivers/gpio/gpio-zynq.c b/drivers/gpio/gpio-zynq.c
index be81fa2..3dae63f 100644
--- a/drivers/gpio/gpio-zynq.c
+++ b/drivers/gpio/gpio-zynq.c
@@ -1011,6 +1011,7 @@ static void zynq_gpio_remove(struct platform_device *pdev)
ret = pm_runtime_get_sync(&pdev->dev);
if (ret < 0)
dev_warn(&pdev->dev, "pm_runtime_get_sync() Failed\n");
+ device_init_wakeup(&pdev->dev, 0);
gpiochip_remove(&gpio->chip);
device_set_wakeup_capable(&pdev->dev, 0);
pm_runtime_disable(&pdev->dev);
diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c
index 08205f3..120d1ec 100644
--- a/drivers/gpio/gpiolib-devres.c
+++ b/drivers/gpio/gpiolib-devres.c
@@ -317,11 +317,15 @@ EXPORT_SYMBOL_GPL(devm_gpiod_put);
* @dev: GPIO consumer
* @desc: GPIO descriptor to remove resource management from
*
+ * *DEPRECATED*
+ * This function should not be used. It's been provided as a workaround for
+ * resource ownership issues in the regulator framework and should be replaced
+ * with a better solution.
+ *
* Remove resource management from a GPIO descriptor. This is needed when
* you want to hand over lifecycle management of a descriptor to another
* mechanism.
*/
-
void devm_gpiod_unhinge(struct device *dev, struct gpio_desc *desc)
{
int ret;
diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index eb667f8..65f6a71 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -193,6 +193,8 @@ static void of_gpio_try_fixup_polarity(const struct device_node *np,
*/
{ "himax,hx8357", "gpios-reset", false },
{ "himax,hx8369", "gpios-reset", false },
+#endif
+#if IS_ENABLED(CONFIG_MTD_NAND_JZ4780)
/*
* The rb-gpios semantics was undocumented and qi,lb60 (along with
* the ingenic driver) got it wrong. The active state encodes the
@@ -266,6 +268,9 @@ static void of_gpio_set_polarity_by_property(const struct device_node *np,
{ "fsl,imx8qm-fec", "phy-reset-gpios", "phy-reset-active-high" },
{ "fsl,s32v234-fec", "phy-reset-gpios", "phy-reset-active-high" },
#endif
+#if IS_ENABLED(CONFIG_MMC_ATMELMCI)
+ { "atmel,hsmci", "cd-gpios", "cd-inverted" },
+#endif
#if IS_ENABLED(CONFIG_PCI_IMX6)
{ "fsl,imx6q-pcie", "reset-gpio", "reset-gpio-active-high" },
{ "fsl,imx6sx-pcie", "reset-gpio", "reset-gpio-active-high" },
@@ -292,9 +297,6 @@ static void of_gpio_set_polarity_by_property(const struct device_node *np,
{ "regulator-gpio", "enable-gpio", "enable-active-high" },
{ "regulator-gpio", "enable-gpios", "enable-active-high" },
#endif
-#if IS_ENABLED(CONFIG_MMC_ATMELMCI)
- { "atmel,hsmci", "cd-gpios", "cd-inverted" },
-#endif
};
unsigned int i;
bool active_high;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6d83ccfa..2c04ae1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -353,7 +353,6 @@ enum amdgpu_kiq_irq {
AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0,
AMDGPU_CP_KIQ_IRQ_LAST
};
-#define SRIOV_USEC_TIMEOUT 1200000 /* wait 12 * 100ms for SRIOV */
#define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */
#define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */
#define MAX_KIQ_REG_TRY 1000
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a30111d..b34b915 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3643,6 +3643,13 @@ static int amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
adev, adev->ip_blocks[i].version->type))
continue;
+ /* Since we skip suspend for S0i3, we need to cancel the delayed
+ * idle work here as the suspend callback never gets called.
+ */
+ if (adev->in_s0ix &&
+ adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GFX &&
+ amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(10, 0, 0))
+ cancel_delayed_work_sync(&adev->gfx.idle_work);
/* skip suspend of gfx/mes and psp for S0ix
* gfx is in gfxoff state, so on resume it will exit gfxoff just
* like at runtime. PSP is also part of the always on hardware
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index dc2713e..9e738fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -120,6 +120,8 @@ MODULE_FIRMWARE("amdgpu/vega20_ip_discovery.bin");
MODULE_FIRMWARE("amdgpu/raven_ip_discovery.bin");
MODULE_FIRMWARE("amdgpu/raven2_ip_discovery.bin");
MODULE_FIRMWARE("amdgpu/picasso_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/arcturus_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/aldebaran_ip_discovery.bin");
#define mmIP_DISCOVERY_VERSION 0x16A00
#define mmRCC_CONFIG_MEMSIZE 0xde3
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 9f627ca..667080c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -75,11 +75,25 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
*/
static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach)
{
- struct drm_gem_object *obj = attach->dmabuf->priv;
- struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+ struct dma_buf *dmabuf = attach->dmabuf;
+ struct amdgpu_bo *bo = gem_to_amdgpu_bo(dmabuf->priv);
+ u32 domains = bo->preferred_domains;
- /* pin buffer into GTT */
- return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
+ dma_resv_assert_held(dmabuf->resv);
+
+ /*
+ * Try pinning into VRAM to allow P2P with RDMA NICs without ODP
+ * support if all attachments can do P2P. If any attachment can't do
+ * P2P just pin into GTT instead.
+ */
+ list_for_each_entry(attach, &dmabuf->attachments, node)
+ if (!attach->peer2peer)
+ domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
+
+ if (domains & AMDGPU_GEM_DOMAIN_VRAM)
+ bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+
+ return amdgpu_bo_pin(bo, domains);
}
/**
@@ -134,9 +148,6 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach,
r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
if (r)
return ERR_PTR(r);
-
- } else if (bo->tbo.resource->mem_type != TTM_PL_TT) {
- return ERR_PTR(-EBUSY);
}
switch (bo->tbo.resource->mem_type) {
@@ -184,7 +195,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach,
struct sg_table *sgt,
enum dma_data_direction dir)
{
- if (sgt->sgl->page_link) {
+ if (sg_page(sgt->sgl)) {
dma_unmap_sgtable(attach->dev, sgt, dir, 0);
sg_free_table(sgt);
kfree(sgt);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 4646252..ecb74cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -699,12 +699,10 @@ int amdgpu_gmc_flush_gpu_tlb_pasid(struct amdgpu_device *adev, uint16_t pasid,
uint32_t flush_type, bool all_hub,
uint32_t inst)
{
- u32 usec_timeout = amdgpu_sriov_vf(adev) ? SRIOV_USEC_TIMEOUT :
- adev->usec_timeout;
struct amdgpu_ring *ring = &adev->gfx.kiq[inst].ring;
struct amdgpu_kiq *kiq = &adev->gfx.kiq[inst];
unsigned int ndw;
- int r;
+ int r, cnt = 0;
uint32_t seq;
/*
@@ -761,10 +759,21 @@ int amdgpu_gmc_flush_gpu_tlb_pasid(struct amdgpu_device *adev, uint16_t pasid,
amdgpu_ring_commit(ring);
spin_unlock(&adev->gfx.kiq[inst].ring_lock);
- if (amdgpu_fence_wait_polling(ring, seq, usec_timeout) < 1) {
+
+ r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
+
+ might_sleep();
+ while (r < 1 && cnt++ < MAX_KIQ_REG_TRY &&
+ !amdgpu_reset_pending(adev->reset_domain)) {
+ msleep(MAX_KIQ_REG_BAILOUT_INTERVAL);
+ r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
+ }
+
+ if (cnt > MAX_KIQ_REG_TRY) {
dev_err(adev->dev, "timeout waiting for kiq fence\n");
r = -ETIME;
- }
+ } else
+ r = 0;
}
error_unlock_reset:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 80cd6f5..0b99877 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -163,8 +163,8 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
* When GTT is just an alternative to VRAM make sure that we
* only use it as fallback and still try to fill up VRAM first.
*/
- if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
- !(adev->flags & AMD_IS_APU))
+ if (abo->tbo.resource && !(adev->flags & AMD_IS_APU) &&
+ domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
places[c].flags |= TTM_PL_FLAG_FALLBACK;
c++;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 6da8994..2d7f82e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -24,6 +24,7 @@
#include <linux/dma-mapping.h>
#include <drm/ttm/ttm_range_manager.h>
+#include <drm/drm_drv.h>
#include "amdgpu.h"
#include "amdgpu_vm.h"
@@ -907,6 +908,9 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
struct ttm_resource_manager *man = &mgr->manager;
int err;
+ man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
+ if (IS_ERR(man->cg))
+ return PTR_ERR(man->cg);
ttm_resource_manager_init(man, &adev->mman.bdev,
adev->gmc.real_vram_size);
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index e65916a..ef9538fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -894,6 +894,10 @@ static void mes_v11_0_get_fw_version(struct amdgpu_device *adev)
{
int pipe;
+ /* return early if we have already fetched these */
+ if (adev->mes.sched_version && adev->mes.kiq_version)
+ return;
+
/* get MES scheduler/KIQ versions */
mutex_lock(&adev->srbm_mutex);
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 183dd33..e6ab617 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -1392,17 +1392,20 @@ static int mes_v12_0_queue_init(struct amdgpu_device *adev,
mes_v12_0_queue_init_register(ring);
}
- /* get MES scheduler/KIQ versions */
- mutex_lock(&adev->srbm_mutex);
- soc21_grbm_select(adev, 3, pipe, 0, 0);
+ if (((pipe == AMDGPU_MES_SCHED_PIPE) && !adev->mes.sched_version) ||
+ ((pipe == AMDGPU_MES_KIQ_PIPE) && !adev->mes.kiq_version)) {
+ /* get MES scheduler/KIQ versions */
+ mutex_lock(&adev->srbm_mutex);
+ soc21_grbm_select(adev, 3, pipe, 0, 0);
- if (pipe == AMDGPU_MES_SCHED_PIPE)
- adev->mes.sched_version = RREG32_SOC15(GC, 0, regCP_MES_GP3_LO);
- else if (pipe == AMDGPU_MES_KIQ_PIPE && adev->enable_mes_kiq)
- adev->mes.kiq_version = RREG32_SOC15(GC, 0, regCP_MES_GP3_LO);
+ if (pipe == AMDGPU_MES_SCHED_PIPE)
+ adev->mes.sched_version = RREG32_SOC15(GC, 0, regCP_MES_GP3_LO);
+ else if (pipe == AMDGPU_MES_KIQ_PIPE && adev->enable_mes_kiq)
+ adev->mes.kiq_version = RREG32_SOC15(GC, 0, regCP_MES_GP3_LO);
- soc21_grbm_select(adev, 0, 0, 0, 0);
- mutex_unlock(&adev->srbm_mutex);
+ soc21_grbm_select(adev, 0, 0, 0, 0);
+ mutex_unlock(&adev->srbm_mutex);
+ }
return 0;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index e477d75..9bbee48 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1983,9 +1983,6 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev)
if (kfd_dbg_has_ttmps_always_setup(dev->gpu))
dev->node_props.debug_prop |= HSA_DBG_DISPATCH_INFO_ALWAYS_VALID;
- if (dev->gpu->adev->sdma.supported_reset & AMDGPU_RESET_TYPE_PER_QUEUE)
- dev->node_props.capability2 |= HSA_CAP2_PER_SDMA_QUEUE_RESET_SUPPORTED;
-
if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(10, 0, 0)) {
if (KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 3) ||
KFD_GC_VERSION(dev->gpu) == IP_VERSION(9, 4, 4))
@@ -2001,7 +1998,11 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev)
dev->node_props.capability |=
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED;
- dev->node_props.capability |= HSA_CAP_PER_QUEUE_RESET_SUPPORTED;
+ if (!amdgpu_sriov_vf(dev->gpu->adev))
+ dev->node_props.capability |= HSA_CAP_PER_QUEUE_RESET_SUPPORTED;
+
+ if (dev->gpu->adev->sdma.supported_reset & AMDGPU_RESET_TYPE_PER_QUEUE)
+ dev->node_props.capability2 |= HSA_CAP2_PER_SDMA_QUEUE_RESET_SUPPORTED;
} else {
dev->node_props.debug_prop |= HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 |
HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d0d8ad5..9fed447 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1726,9 +1726,30 @@ static const struct dmi_system_id dmi_quirk_table[] = {
.callback = edp0_on_dp1_callback,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "HP"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "HP EliteBook 645 14 inch G11 Notebook PC"),
+ },
+ },
+ {
+ .callback = edp0_on_dp1_callback,
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "HP"),
DMI_MATCH(DMI_PRODUCT_NAME, "HP EliteBook 665 16 inch G11 Notebook PC"),
},
},
+ {
+ .callback = edp0_on_dp1_callback,
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "HP"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "HP ProBook 445 14 inch G11 Notebook PC"),
+ },
+ },
+ {
+ .callback = edp0_on_dp1_callback,
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "HP"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "HP ProBook 465 16 inch G11 Notebook PC"),
+ },
+ },
{}
/* TODO: refactor this from a fixed table to a dynamic option */
};
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index 36a830a..e8bdd7f 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -113,6 +113,7 @@ bool amdgpu_dm_crtc_vrr_active(const struct dm_crtc_state *dm_state)
*
* Panel Replay and PSR SU
* - Enable when:
+ * - VRR is disabled
* - vblank counter is disabled
* - entry is allowed: usermode demonstrates an adequate number of fast
* commits)
@@ -131,19 +132,20 @@ static void amdgpu_dm_crtc_set_panel_sr_feature(
bool is_sr_active = (link->replay_settings.replay_allow_active ||
link->psr_settings.psr_allow_active);
bool is_crc_window_active = false;
+ bool vrr_active = amdgpu_dm_crtc_vrr_active_irq(vblank_work->acrtc);
#ifdef CONFIG_DRM_AMD_SECURE_DISPLAY
is_crc_window_active =
amdgpu_dm_crc_window_is_activated(&vblank_work->acrtc->base);
#endif
- if (link->replay_settings.replay_feature_enabled &&
+ if (link->replay_settings.replay_feature_enabled && !vrr_active &&
allow_sr_entry && !is_sr_active && !is_crc_window_active) {
amdgpu_dm_replay_enable(vblank_work->stream, true);
} else if (vblank_enabled) {
if (link->psr_settings.psr_version < DC_PSR_VERSION_SU_1 && is_sr_active)
amdgpu_dm_psr_disable(vblank_work->stream, false);
- } else if (link->psr_settings.psr_feature_enabled &&
+ } else if (link->psr_settings.psr_feature_enabled && !vrr_active &&
allow_sr_entry && !is_sr_active && !is_crc_window_active) {
struct amdgpu_dm_connector *aconn =
@@ -244,6 +246,8 @@ static void amdgpu_dm_crtc_vblank_control_worker(struct work_struct *work)
struct vblank_control_work *vblank_work =
container_of(work, struct vblank_control_work, work);
struct amdgpu_display_manager *dm = vblank_work->dm;
+ struct amdgpu_device *adev = drm_to_adev(dm->ddev);
+ int r;
mutex_lock(&dm->dc_lock);
@@ -271,8 +275,15 @@ static void amdgpu_dm_crtc_vblank_control_worker(struct work_struct *work)
vblank_work->acrtc->dm_irq_params.allow_sr_entry);
}
- if (dm->active_vblank_irq_count == 0)
+ if (dm->active_vblank_irq_count == 0) {
+ r = amdgpu_dpm_pause_power_profile(adev, true);
+ if (r)
+ dev_warn(adev->dev, "failed to set default power profile mode\n");
dc_allow_idle_optimizations(dm->dc, true);
+ r = amdgpu_dpm_pause_power_profile(adev, false);
+ if (r)
+ dev_warn(adev->dev, "failed to restore the power profile mode\n");
+ }
mutex_unlock(&dm->dc_lock);
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
index be54f0e..94e99e5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
@@ -86,6 +86,8 @@ static void dml21_init(const struct dc *in_dc, struct dml2_context **dml_ctx, co
/* Store configuration options */
(*dml_ctx)->config = *config;
+ DC_FP_START();
+
/*Initialize SOCBB and DCNIP params */
dml21_initialize_soc_bb_params(&(*dml_ctx)->v21.dml_init, config, in_dc);
dml21_initialize_ip_params(&(*dml_ctx)->v21.dml_init, config, in_dc);
@@ -96,6 +98,8 @@ static void dml21_init(const struct dc *in_dc, struct dml2_context **dml_ctx, co
/*Initialize DML21 instance */
dml2_initialize_instance(&(*dml_ctx)->v21.dml_init);
+
+ DC_FP_END();
}
bool dml21_create(const struct dc *in_dc, struct dml2_context **dml_ctx, const struct dml2_configuration_options *config)
@@ -283,11 +287,16 @@ bool dml21_validate(const struct dc *in_dc, struct dc_state *context, struct dml
{
bool out = false;
+ DC_FP_START();
+
/* Use dml_validate_only for fast_validate path */
- if (fast_validate) {
+ if (fast_validate)
out = dml21_check_mode_support(in_dc, context, dml_ctx);
- } else
+ else
out = dml21_mode_check_and_programming(in_dc, context, dml_ctx);
+
+ DC_FP_END();
+
return out;
}
@@ -426,8 +435,12 @@ void dml21_copy(struct dml2_context *dst_dml_ctx,
dst_dml_ctx->v21.mode_programming.programming = dst_dml2_programming;
+ DC_FP_START();
+
/* need to initialize copied instance for internal references to be correct */
dml2_initialize_instance(&dst_dml_ctx->v21.dml_init);
+
+ DC_FP_END();
}
bool dml21_create_copy(struct dml2_context **dst_dml_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
index 939ee07..f549a77 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
@@ -732,11 +732,16 @@ bool dml2_validate(const struct dc *in_dc, struct dc_state *context, struct dml2
return out;
}
+ DC_FP_START();
+
/* Use dml_validate_only for fast_validate path */
if (fast_validate)
out = dml2_validate_only(context);
else
out = dml2_validate_and_build_resource(in_dc, context);
+
+ DC_FP_END();
+
return out;
}
@@ -779,11 +784,15 @@ static void dml2_init(const struct dc *in_dc, const struct dml2_configuration_op
break;
}
+ DC_FP_START();
+
initialize_dml2_ip_params(*dml2, in_dc, &(*dml2)->v20.dml_core_ctx.ip);
initialize_dml2_soc_bbox(*dml2, in_dc, &(*dml2)->v20.dml_core_ctx.soc);
initialize_dml2_soc_states(*dml2, in_dc, &(*dml2)->v20.dml_core_ctx.soc, &(*dml2)->v20.dml_core_ctx.states);
+
+ DC_FP_END();
}
bool dml2_create(const struct dc *in_dc, const struct dml2_configuration_options *config, struct dml2_context **dml2)
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 2a96061..21dc956 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -429,6 +429,7 @@ struct amd_pm_funcs {
int (*set_pp_table)(void *handle, const char *buf, size_t size);
void (*debugfs_print_current_performance_level)(void *handle, struct seq_file *m);
int (*switch_power_profile)(void *handle, enum PP_SMC_POWER_PROFILE type, bool en);
+ int (*pause_power_profile)(void *handle, bool pause);
/* export to amdgpu */
struct amd_vce_state *(*get_vce_clock_state)(void *handle, u32 idx);
int (*dispatch_tasks)(void *handle, enum amd_pp_task task_id,
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 81e9b44..3533d43 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -349,6 +349,25 @@ int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
return ret;
}
+int amdgpu_dpm_pause_power_profile(struct amdgpu_device *adev,
+ bool pause)
+{
+ const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+ int ret = 0;
+
+ if (amdgpu_sriov_vf(adev))
+ return 0;
+
+ if (pp_funcs && pp_funcs->pause_power_profile) {
+ mutex_lock(&adev->pm.mutex);
+ ret = pp_funcs->pause_power_profile(
+ adev->powerplay.pp_handle, pause);
+ mutex_unlock(&adev->pm.mutex);
+ }
+
+ return ret;
+}
+
int amdgpu_dpm_set_xgmi_pstate(struct amdgpu_device *adev,
uint32_t pstate)
{
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index f93d287..4c0f7ad 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -410,6 +410,8 @@ int amdgpu_dpm_set_xgmi_pstate(struct amdgpu_device *adev,
int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
enum PP_SMC_POWER_PROFILE type,
bool en);
+int amdgpu_dpm_pause_power_profile(struct amdgpu_device *adev,
+ bool pause);
int amdgpu_dpm_baco_reset(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 033c322..46cce1d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2398,7 +2398,11 @@ static int smu_switch_power_profile(void *handle,
smu_power_profile_mode_get(smu, type);
else
smu_power_profile_mode_put(smu, type);
- ret = smu_bump_power_profile_mode(smu, NULL, 0);
+ /* don't switch the active workload when paused */
+ if (smu->pause_workload)
+ ret = 0;
+ else
+ ret = smu_bump_power_profile_mode(smu, NULL, 0);
if (ret) {
if (enable)
smu_power_profile_mode_put(smu, type);
@@ -2411,6 +2415,35 @@ static int smu_switch_power_profile(void *handle,
return 0;
}
+static int smu_pause_power_profile(void *handle,
+ bool pause)
+{
+ struct smu_context *smu = handle;
+ struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
+ u32 workload_mask = 1 << PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+ int ret;
+
+ if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+ return -EOPNOTSUPP;
+
+ if (smu_dpm_ctx->dpm_level != AMD_DPM_FORCED_LEVEL_MANUAL &&
+ smu_dpm_ctx->dpm_level != AMD_DPM_FORCED_LEVEL_PERF_DETERMINISM) {
+ smu->pause_workload = pause;
+
+ /* force to bootup default profile */
+ if (smu->pause_workload && smu->ppt_funcs->set_power_profile_mode)
+ ret = smu->ppt_funcs->set_power_profile_mode(smu,
+ workload_mask,
+ NULL,
+ 0);
+ else
+ ret = smu_bump_power_profile_mode(smu, NULL, 0);
+ return ret;
+ }
+
+ return 0;
+}
+
static enum amd_dpm_forced_level smu_get_performance_level(void *handle)
{
struct smu_context *smu = handle;
@@ -3733,6 +3766,7 @@ static const struct amd_pm_funcs swsmu_pm_funcs = {
.get_pp_table = smu_sys_get_pp_table,
.set_pp_table = smu_sys_set_pp_table,
.switch_power_profile = smu_switch_power_profile,
+ .pause_power_profile = smu_pause_power_profile,
/* export to amdgpu */
.dispatch_tasks = smu_handle_dpm_task,
.load_firmware = smu_load_microcode,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 3ba1696..dd6d0e7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -558,6 +558,7 @@ struct smu_context {
/* asic agnostic workload mask */
uint32_t workload_mask;
+ bool pause_workload;
/* default/user workload preference */
uint32_t power_profile_mode;
uint32_t workload_refcount[PP_SMC_POWER_PROFILE_COUNT];
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
index 78391d8..25fabf3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
@@ -1204,7 +1204,7 @@ int smu_v11_0_set_fan_speed_rpm(struct smu_context *smu,
uint32_t crystal_clock_freq = 2500;
uint32_t tach_period;
- if (speed == 0)
+ if (!speed || speed > UINT_MAX/8)
return -EINVAL;
/*
* To prevent from possible overheat, some ASICs may have requirement
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c
index 048be287..98b898a 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -244,6 +244,7 @@ static int icl_get_qgv_points(struct drm_i915_private *dev_priv,
qi->deinterleave = 4;
break;
case INTEL_DRAM_GDDR:
+ case INTEL_DRAM_GDDR_ECC:
qi->channel_width = 32;
break;
default:
@@ -398,6 +399,12 @@ static const struct intel_sa_info xe2_hpd_sa_info = {
/* Other values not used by simplified algorithm */
};
+static const struct intel_sa_info xe2_hpd_ecc_sa_info = {
+ .derating = 45,
+ .deprogbwlimit = 53,
+ /* Other values not used by simplified algorithm */
+};
+
static int icl_get_bw_info(struct drm_i915_private *dev_priv, const struct intel_sa_info *sa)
{
struct intel_qgv_info qi = {};
@@ -740,10 +747,15 @@ static unsigned int icl_qgv_bw(struct drm_i915_private *i915,
void intel_bw_init_hw(struct drm_i915_private *dev_priv)
{
+ const struct dram_info *dram_info = &dev_priv->dram_info;
+
if (!HAS_DISPLAY(dev_priv))
return;
- if (DISPLAY_VERx100(dev_priv) >= 1401 && IS_DGFX(dev_priv))
+ if (DISPLAY_VERx100(dev_priv) >= 1401 && IS_DGFX(dev_priv) &&
+ dram_info->type == INTEL_DRAM_GDDR_ECC)
+ xe2_hpd_get_bw_info(dev_priv, &xe2_hpd_ecc_sa_info);
+ else if (DISPLAY_VERx100(dev_priv) >= 1401 && IS_DGFX(dev_priv))
xe2_hpd_get_bw_info(dev_priv, &xe2_hpd_sa_info);
else if (DISPLAY_VER(dev_priv) >= 14)
tgl_get_bw_info(dev_priv, &mtl_sa_info);
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 3afb85f..3b509c7 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -968,7 +968,9 @@ static bool vrr_params_changed(const struct intel_crtc_state *old_crtc_state,
old_crtc_state->vrr.vmin != new_crtc_state->vrr.vmin ||
old_crtc_state->vrr.vmax != new_crtc_state->vrr.vmax ||
old_crtc_state->vrr.guardband != new_crtc_state->vrr.guardband ||
- old_crtc_state->vrr.pipeline_full != new_crtc_state->vrr.pipeline_full;
+ old_crtc_state->vrr.pipeline_full != new_crtc_state->vrr.pipeline_full ||
+ old_crtc_state->vrr.vsync_start != new_crtc_state->vrr.vsync_start ||
+ old_crtc_state->vrr.vsync_end != new_crtc_state->vrr.vsync_end;
}
static bool cmrr_params_changed(const struct intel_crtc_state *old_crtc_state,
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index a236b5f..9476aaa 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -172,10 +172,28 @@ int intel_dp_link_symbol_clock(int rate)
static int max_dprx_rate(struct intel_dp *intel_dp)
{
- if (intel_dp_tunnel_bw_alloc_is_enabled(intel_dp))
- return drm_dp_tunnel_max_dprx_rate(intel_dp->tunnel);
+ struct intel_display *display = to_intel_display(intel_dp);
+ struct intel_encoder *encoder = &dp_to_dig_port(intel_dp)->base;
+ int max_rate;
- return drm_dp_bw_code_to_link_rate(intel_dp->dpcd[DP_MAX_LINK_RATE]);
+ if (intel_dp_tunnel_bw_alloc_is_enabled(intel_dp))
+ max_rate = drm_dp_tunnel_max_dprx_rate(intel_dp->tunnel);
+ else
+ max_rate = drm_dp_bw_code_to_link_rate(intel_dp->dpcd[DP_MAX_LINK_RATE]);
+
+ /*
+ * Some broken eDP sinks illegally declare support for
+ * HBR3 without TPS4, and are unable to produce a stable
+ * output. Reject HBR3 when TPS4 is not available.
+ */
+ if (max_rate >= 810000 && !drm_dp_tps4_supported(intel_dp->dpcd)) {
+ drm_dbg_kms(display->drm,
+ "[ENCODER:%d:%s] Rejecting HBR3 due to missing TPS4 support\n",
+ encoder->base.base.id, encoder->base.name);
+ max_rate = 540000;
+ }
+
+ return max_rate;
}
static int max_dprx_lane_count(struct intel_dp *intel_dp)
@@ -4170,6 +4188,9 @@ static void intel_edp_mso_init(struct intel_dp *intel_dp)
static void
intel_edp_set_sink_rates(struct intel_dp *intel_dp)
{
+ struct intel_display *display = to_intel_display(intel_dp);
+ struct intel_encoder *encoder = &dp_to_dig_port(intel_dp)->base;
+
intel_dp->num_sink_rates = 0;
if (intel_dp->edp_dpcd[0] >= DP_EDP_14) {
@@ -4180,10 +4201,7 @@ intel_edp_set_sink_rates(struct intel_dp *intel_dp)
sink_rates, sizeof(sink_rates));
for (i = 0; i < ARRAY_SIZE(sink_rates); i++) {
- int val = le16_to_cpu(sink_rates[i]);
-
- if (val == 0)
- break;
+ int rate;
/* Value read multiplied by 200kHz gives the per-lane
* link rate in kHz. The source rates are, however,
@@ -4191,7 +4209,24 @@ intel_edp_set_sink_rates(struct intel_dp *intel_dp)
* back to symbols is
* (val * 200kHz)*(8/10 ch. encoding)*(1/8 bit to Byte)
*/
- intel_dp->sink_rates[i] = (val * 200) / 10;
+ rate = le16_to_cpu(sink_rates[i]) * 200 / 10;
+
+ if (rate == 0)
+ break;
+
+ /*
+ * Some broken eDP sinks illegally declare support for
+ * HBR3 without TPS4, and are unable to produce a stable
+ * output. Reject HBR3 when TPS4 is not available.
+ */
+ if (rate >= 810000 && !drm_dp_tps4_supported(intel_dp->dpcd)) {
+ drm_dbg_kms(display->drm,
+ "[ENCODER:%d:%s] Rejecting HBR3 due to missing TPS4 support\n",
+ encoder->base.base.id, encoder->base.name);
+ break;
+ }
+
+ intel_dp->sink_rates[i] = rate;
}
intel_dp->num_sink_rates = i;
}
diff --git a/drivers/gpu/drm/i915/display/intel_vblank.c b/drivers/gpu/drm/i915/display/intel_vblank.c
index 4efd4f7..7b240ce 100644
--- a/drivers/gpu/drm/i915/display/intel_vblank.c
+++ b/drivers/gpu/drm/i915/display/intel_vblank.c
@@ -222,7 +222,9 @@ int intel_crtc_scanline_offset(const struct intel_crtc_state *crtc_state)
* However if queried just before the start of vblank we'll get an
* answer that's slightly in the future.
*/
- if (DISPLAY_VER(display) == 2)
+ if (DISPLAY_VER(display) >= 20 || display->platform.battlemage)
+ return 1;
+ else if (DISPLAY_VER(display) == 2)
return -1;
else if (HAS_DDI(display) && intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI))
return 2;
diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
index 9378d59..9ca4258 100644
--- a/drivers/gpu/drm/i915/gt/intel_rc6.c
+++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
@@ -117,21 +117,10 @@ static void gen11_rc6_enable(struct intel_rc6 *rc6)
GEN6_RC_CTL_RC6_ENABLE |
GEN6_RC_CTL_EI_MODE(1);
- /*
- * BSpec 52698 - Render powergating must be off.
- * FIXME BSpec is outdated, disabling powergating for MTL is just
- * temporary wa and should be removed after fixing real cause
- * of forcewake timeouts.
- */
- if (IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 70), IP_VER(12, 74)))
- pg_enable =
- GEN9_MEDIA_PG_ENABLE |
- GEN11_MEDIA_SAMPLER_PG_ENABLE;
- else
- pg_enable =
- GEN9_RENDER_PG_ENABLE |
- GEN9_MEDIA_PG_ENABLE |
- GEN11_MEDIA_SAMPLER_PG_ENABLE;
+ pg_enable =
+ GEN9_RENDER_PG_ENABLE |
+ GEN9_MEDIA_PG_ENABLE |
+ GEN11_MEDIA_SAMPLER_PG_ENABLE;
if (GRAPHICS_VER(gt->i915) >= 12 && !IS_DG1(gt->i915)) {
for (i = 0; i < I915_MAX_VCS; i++)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index d791f9b..456d337 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -317,6 +317,11 @@ void intel_huc_init_early(struct intel_huc *huc)
}
}
+void intel_huc_fini_late(struct intel_huc *huc)
+{
+ delayed_huc_load_fini(huc);
+}
+
#define HUC_LOAD_MODE_STRING(x) (x ? "GSC" : "legacy")
static int check_huc_loading_mode(struct intel_huc *huc)
{
@@ -414,12 +419,6 @@ int intel_huc_init(struct intel_huc *huc)
void intel_huc_fini(struct intel_huc *huc)
{
- /*
- * the fence is initialized in init_early, so we need to clean it up
- * even if HuC loading is off.
- */
- delayed_huc_load_fini(huc);
-
if (huc->heci_pkt)
i915_vma_unpin_and_release(&huc->heci_pkt, 0);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.h b/drivers/gpu/drm/i915/gt/uc/intel_huc.h
index d5e441b..921ad4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.h
@@ -55,6 +55,7 @@ struct intel_huc {
int intel_huc_sanitize(struct intel_huc *huc);
void intel_huc_init_early(struct intel_huc *huc);
+void intel_huc_fini_late(struct intel_huc *huc);
int intel_huc_init(struct intel_huc *huc);
void intel_huc_fini(struct intel_huc *huc);
int intel_huc_auth(struct intel_huc *huc, enum intel_huc_authentication_type type);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 90ba1b0..4a3493e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -136,6 +136,7 @@ void intel_uc_init_late(struct intel_uc *uc)
void intel_uc_driver_late_release(struct intel_uc *uc)
{
+ intel_huc_fini_late(&uc->huc);
}
/**
diff --git a/drivers/gpu/drm/i915/gvt/opregion.c b/drivers/gpu/drm/i915/gvt/opregion.c
index 509f9cca..dbad4d8 100644
--- a/drivers/gpu/drm/i915/gvt/opregion.c
+++ b/drivers/gpu/drm/i915/gvt/opregion.c
@@ -222,7 +222,6 @@ int intel_vgpu_init_opregion(struct intel_vgpu *vgpu)
u8 *buf;
struct opregion_header *header;
struct vbt v;
- const char opregion_signature[16] = OPREGION_SIGNATURE;
gvt_dbg_core("init vgpu%d opregion\n", vgpu->id);
vgpu_opregion(vgpu)->va = (void *)__get_free_pages(GFP_KERNEL |
@@ -236,8 +235,10 @@ int intel_vgpu_init_opregion(struct intel_vgpu *vgpu)
/* emulated opregion with VBT mailbox only */
buf = (u8 *)vgpu_opregion(vgpu)->va;
header = (struct opregion_header *)buf;
- memcpy(header->signature, opregion_signature,
- sizeof(opregion_signature));
+
+ static_assert(sizeof(header->signature) == sizeof(OPREGION_SIGNATURE) - 1);
+ memcpy(header->signature, OPREGION_SIGNATURE, sizeof(header->signature));
+
header->size = 0x8;
header->opregion_ver = 0x02000000;
header->mboxes = MBOX_VBT;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ffc346379..54538b6f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -305,6 +305,7 @@ struct drm_i915_private {
INTEL_DRAM_DDR5,
INTEL_DRAM_LPDDR5,
INTEL_DRAM_GDDR,
+ INTEL_DRAM_GDDR_ECC,
} type;
u8 num_qgv_points;
u8 num_psf_gv_points;
diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c
index fee76c1..8892818 100644
--- a/drivers/gpu/drm/i915/selftests/i915_selftest.c
+++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c
@@ -23,7 +23,9 @@
#include <linux/random.h>
+#include "gt/intel_gt.h"
#include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_regs.h"
#include "gt/uc/intel_gsc_fw.h"
#include "i915_driver.h"
@@ -253,11 +255,27 @@ int i915_mock_selftests(void)
int i915_live_selftests(struct pci_dev *pdev)
{
struct drm_i915_private *i915 = pdev_to_i915(pdev);
+ struct intel_uncore *uncore = &i915->uncore;
int err;
+ u32 pg_enable;
+ intel_wakeref_t wakeref;
if (!i915_selftest.live)
return 0;
+ /*
+ * FIXME Disable render powergating, this is temporary wa and should be removed
+ * after fixing real cause of forcewake timeouts.
+ */
+ with_intel_runtime_pm(uncore->rpm, wakeref) {
+ if (IS_GFX_GT_IP_RANGE(to_gt(i915), IP_VER(12, 00), IP_VER(12, 74))) {
+ pg_enable = intel_uncore_read(uncore, GEN9_PG_ENABLE);
+ if (pg_enable & GEN9_RENDER_PG_ENABLE)
+ intel_uncore_write_fw(uncore, GEN9_PG_ENABLE,
+ pg_enable & ~GEN9_RENDER_PG_ENABLE);
+ }
+ }
+
__wait_gsc_proxy_completed(i915);
__wait_gsc_huc_load_completed(i915);
diff --git a/drivers/gpu/drm/i915/soc/intel_dram.c b/drivers/gpu/drm/i915/soc/intel_dram.c
index 9e310f4..f60eedb 100644
--- a/drivers/gpu/drm/i915/soc/intel_dram.c
+++ b/drivers/gpu/drm/i915/soc/intel_dram.c
@@ -687,6 +687,10 @@ static int xelpdp_get_dram_info(struct drm_i915_private *i915)
drm_WARN_ON(&i915->drm, !IS_DGFX(i915));
dram_info->type = INTEL_DRAM_GDDR;
break;
+ case 9:
+ drm_WARN_ON(&i915->drm, !IS_DGFX(i915));
+ dram_info->type = INTEL_DRAM_GDDR_ECC;
+ break;
default:
MISSING_CASE(val);
return -EINVAL;
diff --git a/drivers/gpu/drm/imagination/pvr_fw.c b/drivers/gpu/drm/imagination/pvr_fw.c
index 3debc98..d09c4c6 100644
--- a/drivers/gpu/drm/imagination/pvr_fw.c
+++ b/drivers/gpu/drm/imagination/pvr_fw.c
@@ -732,7 +732,7 @@ pvr_fw_process(struct pvr_device *pvr_dev)
fw_mem->core_data, fw_mem->core_code_alloc_size);
if (err)
- goto err_free_fw_core_data_obj;
+ goto err_free_kdata;
memcpy(fw_code_ptr, fw_mem->code, fw_mem->code_alloc_size);
memcpy(fw_data_ptr, fw_mem->data, fw_mem->data_alloc_size);
@@ -742,10 +742,14 @@ pvr_fw_process(struct pvr_device *pvr_dev)
memcpy(fw_core_data_ptr, fw_mem->core_data, fw_mem->core_data_alloc_size);
/* We're finished with the firmware section memory on the CPU, unmap. */
- if (fw_core_data_ptr)
+ if (fw_core_data_ptr) {
pvr_fw_object_vunmap(fw_mem->core_data_obj);
- if (fw_core_code_ptr)
+ fw_core_data_ptr = NULL;
+ }
+ if (fw_core_code_ptr) {
pvr_fw_object_vunmap(fw_mem->core_code_obj);
+ fw_core_code_ptr = NULL;
+ }
pvr_fw_object_vunmap(fw_mem->data_obj);
fw_data_ptr = NULL;
pvr_fw_object_vunmap(fw_mem->code_obj);
@@ -753,7 +757,7 @@ pvr_fw_process(struct pvr_device *pvr_dev)
err = pvr_fw_create_fwif_connection_ctl(pvr_dev);
if (err)
- goto err_free_fw_core_data_obj;
+ goto err_free_kdata;
return 0;
@@ -763,13 +767,16 @@ pvr_fw_process(struct pvr_device *pvr_dev)
kfree(fw_mem->data);
kfree(fw_mem->code);
-err_free_fw_core_data_obj:
if (fw_core_data_ptr)
- pvr_fw_object_unmap_and_destroy(fw_mem->core_data_obj);
+ pvr_fw_object_vunmap(fw_mem->core_data_obj);
+ if (fw_mem->core_data_obj)
+ pvr_fw_object_destroy(fw_mem->core_data_obj);
err_free_fw_core_code_obj:
if (fw_core_code_ptr)
- pvr_fw_object_unmap_and_destroy(fw_mem->core_code_obj);
+ pvr_fw_object_vunmap(fw_mem->core_code_obj);
+ if (fw_mem->core_code_obj)
+ pvr_fw_object_destroy(fw_mem->core_code_obj);
err_free_fw_data_obj:
if (fw_data_ptr)
@@ -836,6 +843,12 @@ pvr_fw_cleanup(struct pvr_device *pvr_dev)
struct pvr_fw_mem *fw_mem = &pvr_dev->fw_dev.mem;
pvr_fw_fini_fwif_connection_ctl(pvr_dev);
+
+ kfree(fw_mem->core_data);
+ kfree(fw_mem->core_code);
+ kfree(fw_mem->data);
+ kfree(fw_mem->code);
+
if (fw_mem->core_code_obj)
pvr_fw_object_destroy(fw_mem->core_code_obj);
if (fw_mem->core_data_obj)
diff --git a/drivers/gpu/drm/imagination/pvr_job.c b/drivers/gpu/drm/imagination/pvr_job.c
index 1cdb3cf..59b334d 100644
--- a/drivers/gpu/drm/imagination/pvr_job.c
+++ b/drivers/gpu/drm/imagination/pvr_job.c
@@ -671,6 +671,13 @@ pvr_jobs_link_geom_frag(struct pvr_job_data *job_data, u32 *job_count)
geom_job->paired_job = frag_job;
frag_job->paired_job = geom_job;
+ /* The geometry job pvr_job structure is used when the fragment
+ * job is being prepared by the GPU scheduler. Have the fragment
+ * job hold a reference on the geometry job to prevent it being
+ * freed until the fragment job has finished with it.
+ */
+ pvr_job_get(geom_job);
+
/* Skip the fragment job we just paired to the geometry job. */
i++;
}
diff --git a/drivers/gpu/drm/imagination/pvr_queue.c b/drivers/gpu/drm/imagination/pvr_queue.c
index eba6930..5e9bc09 100644
--- a/drivers/gpu/drm/imagination/pvr_queue.c
+++ b/drivers/gpu/drm/imagination/pvr_queue.c
@@ -866,6 +866,10 @@ static void pvr_queue_free_job(struct drm_sched_job *sched_job)
struct pvr_job *job = container_of(sched_job, struct pvr_job, base);
drm_sched_job_cleanup(sched_job);
+
+ if (job->type == DRM_PVR_JOB_TYPE_FRAGMENT && job->paired_job)
+ pvr_job_put(job->paired_job);
+
job->paired_job = NULL;
pvr_job_put(job);
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index db961ea..2016c1e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -144,6 +144,9 @@ nouveau_bo_del_ttm(struct ttm_buffer_object *bo)
nouveau_bo_del_io_reserve_lru(bo);
nv10_bo_put_tile_region(dev, nvbo->tile, NULL);
+ if (bo->base.import_attach)
+ drm_prime_gem_destroy(&bo->base, bo->sg);
+
/*
* If nouveau_bo_new() allocated this buffer, the GEM object was never
* initialized, so don't attempt to release it.
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 9ae2cee..67e3c99 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -87,9 +87,6 @@ nouveau_gem_object_del(struct drm_gem_object *gem)
return;
}
- if (gem->import_attach)
- drm_prime_gem_destroy(gem, nvbo->bo.sg);
-
ttm_bo_put(&nvbo->bo);
pm_runtime_mark_last_busy(dev);
diff --git a/drivers/gpu/drm/rockchip/dw_hdmi_qp-rockchip.c b/drivers/gpu/drm/rockchip/dw_hdmi_qp-rockchip.c
index 3d1dddb..7d531b6 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi_qp-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi_qp-rockchip.c
@@ -94,6 +94,7 @@ struct rockchip_hdmi_qp {
struct gpio_desc *enable_gpio;
struct delayed_work hpd_work;
int port_id;
+ const struct rockchip_hdmi_qp_ctrl_ops *ctrl_ops;
};
struct rockchip_hdmi_qp_ctrl_ops {
@@ -461,6 +462,7 @@ static int dw_hdmi_qp_rockchip_bind(struct device *dev, struct device *master,
return -ENODEV;
}
+ hdmi->ctrl_ops = cfg->ctrl_ops;
hdmi->dev = &pdev->dev;
hdmi->port_id = -ENODEV;
@@ -600,27 +602,8 @@ static void dw_hdmi_qp_rockchip_remove(struct platform_device *pdev)
static int __maybe_unused dw_hdmi_qp_rockchip_resume(struct device *dev)
{
struct rockchip_hdmi_qp *hdmi = dev_get_drvdata(dev);
- u32 val;
- val = HIWORD_UPDATE(RK3588_SCLIN_MASK, RK3588_SCLIN_MASK) |
- HIWORD_UPDATE(RK3588_SDAIN_MASK, RK3588_SDAIN_MASK) |
- HIWORD_UPDATE(RK3588_MODE_MASK, RK3588_MODE_MASK) |
- HIWORD_UPDATE(RK3588_I2S_SEL_MASK, RK3588_I2S_SEL_MASK);
- regmap_write(hdmi->vo_regmap,
- hdmi->port_id ? RK3588_GRF_VO1_CON6 : RK3588_GRF_VO1_CON3,
- val);
-
- val = HIWORD_UPDATE(RK3588_SET_HPD_PATH_MASK,
- RK3588_SET_HPD_PATH_MASK);
- regmap_write(hdmi->regmap, RK3588_GRF_SOC_CON7, val);
-
- if (hdmi->port_id)
- val = HIWORD_UPDATE(RK3588_HDMI1_GRANT_SEL,
- RK3588_HDMI1_GRANT_SEL);
- else
- val = HIWORD_UPDATE(RK3588_HDMI0_GRANT_SEL,
- RK3588_HDMI0_GRANT_SEL);
- regmap_write(hdmi->vo_regmap, RK3588_GRF_VO1_CON9, val);
+ hdmi->ctrl_ops->io_init(hdmi);
dw_hdmi_qp_resume(dev, hdmi->hdmi);
diff --git a/drivers/gpu/drm/rockchip/rockchip_vop2_reg.c b/drivers/gpu/drm/rockchip/rockchip_vop2_reg.c
index 14958d6..0a2840c 100644
--- a/drivers/gpu/drm/rockchip/rockchip_vop2_reg.c
+++ b/drivers/gpu/drm/rockchip/rockchip_vop2_reg.c
@@ -1754,9 +1754,9 @@ static unsigned long rk3588_set_intf_mux(struct vop2_video_port *vp, int id, u32
dip |= FIELD_PREP(RK3588_DSP_IF_POL__DP0_PIN_POL, polflags);
break;
case ROCKCHIP_VOP2_EP_DP1:
- die &= ~RK3588_SYS_DSP_INFACE_EN_MIPI1_MUX;
- die |= RK3588_SYS_DSP_INFACE_EN_MIPI1 |
- FIELD_PREP(RK3588_SYS_DSP_INFACE_EN_MIPI1_MUX, vp->id);
+ die &= ~RK3588_SYS_DSP_INFACE_EN_DP1_MUX;
+ die |= RK3588_SYS_DSP_INFACE_EN_DP1 |
+ FIELD_PREP(RK3588_SYS_DSP_INFACE_EN_DP1_MUX, vp->id);
dip &= ~RK3588_DSP_IF_POL__DP1_PIN_POL;
dip |= FIELD_PREP(RK3588_DSP_IF_POL__DP1_PIN_POL, polflags);
break;
diff --git a/drivers/gpu/drm/sti/Makefile b/drivers/gpu/drm/sti/Makefile
index f203ac5..f778a4e 100644
--- a/drivers/gpu/drm/sti/Makefile
+++ b/drivers/gpu/drm/sti/Makefile
@@ -7,8 +7,6 @@
sti_compositor.o \
sti_crtc.o \
sti_plane.o \
- sti_crtc.o \
- sti_plane.o \
sti_hdmi.o \
sti_hdmi_tx3g4c28phy.o \
sti_dvo.o \
diff --git a/drivers/gpu/drm/tests/drm_client_modeset_test.c b/drivers/gpu/drm/tests/drm_client_modeset_test.c
index 7516f6c..b2fdb1a 100644
--- a/drivers/gpu/drm/tests/drm_client_modeset_test.c
+++ b/drivers/gpu/drm/tests/drm_client_modeset_test.c
@@ -95,6 +95,9 @@ static void drm_test_pick_cmdline_res_1920_1080_60(struct kunit *test)
expected_mode = drm_mode_find_dmt(priv->drm, 1920, 1080, 60, false);
KUNIT_ASSERT_NOT_NULL(test, expected_mode);
+ ret = drm_kunit_add_mode_destroy_action(test, expected_mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_ASSERT_TRUE(test,
drm_mode_parse_command_line_for_connector(cmdline,
connector,
@@ -129,7 +132,8 @@ static void drm_test_pick_cmdline_named(struct kunit *test)
struct drm_device *drm = priv->drm;
struct drm_connector *connector = &priv->connector;
struct drm_cmdline_mode *cmdline_mode = &connector->cmdline_mode;
- const struct drm_display_mode *expected_mode, *mode;
+ const struct drm_display_mode *mode;
+ struct drm_display_mode *expected_mode;
const char *cmdline = params->cmdline;
int ret;
@@ -149,6 +153,9 @@ static void drm_test_pick_cmdline_named(struct kunit *test)
expected_mode = params->func(drm);
KUNIT_ASSERT_NOT_NULL(test, expected_mode);
+ ret = drm_kunit_add_mode_destroy_action(test, expected_mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_TRUE(test, drm_mode_equal(expected_mode, mode));
}
diff --git a/drivers/gpu/drm/tests/drm_cmdline_parser_test.c b/drivers/gpu/drm/tests/drm_cmdline_parser_test.c
index 59c8408..1cfcb59 100644
--- a/drivers/gpu/drm/tests/drm_cmdline_parser_test.c
+++ b/drivers/gpu/drm/tests/drm_cmdline_parser_test.c
@@ -7,6 +7,7 @@
#include <kunit/test.h>
#include <drm/drm_connector.h>
+#include <drm/drm_kunit_helpers.h>
#include <drm/drm_modes.h>
static const struct drm_connector no_connector = {};
@@ -955,8 +956,15 @@ struct drm_cmdline_tv_option_test {
static void drm_test_cmdline_tv_options(struct kunit *test)
{
const struct drm_cmdline_tv_option_test *params = test->param_value;
- const struct drm_display_mode *expected_mode = params->mode_fn(NULL);
+ struct drm_display_mode *expected_mode;
struct drm_cmdline_mode mode = { };
+ int ret;
+
+ expected_mode = params->mode_fn(NULL);
+ KUNIT_ASSERT_NOT_NULL(test, expected_mode);
+
+ ret = drm_kunit_add_mode_destroy_action(test, expected_mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
KUNIT_EXPECT_TRUE(test, drm_mode_parse_command_line_for_connector(params->cmdline,
&no_connector, &mode));
diff --git a/drivers/gpu/drm/tests/drm_kunit_helpers.c b/drivers/gpu/drm/tests/drm_kunit_helpers.c
index a4eb68f..6f6616c 100644
--- a/drivers/gpu/drm/tests/drm_kunit_helpers.c
+++ b/drivers/gpu/drm/tests/drm_kunit_helpers.c
@@ -279,6 +279,28 @@ static void kunit_action_drm_mode_destroy(void *ptr)
}
/**
+ * drm_kunit_add_mode_destroy_action() - Add a drm_destroy_mode kunit action
+ * @test: The test context object
+ * @mode: The drm_display_mode to destroy eventually
+ *
+ * Registers a kunit action that will destroy the drm_display_mode at
+ * the end of the test.
+ *
+ * If an error occurs, the drm_display_mode will be destroyed.
+ *
+ * Returns:
+ * 0 on success, an error code otherwise.
+ */
+int drm_kunit_add_mode_destroy_action(struct kunit *test,
+ struct drm_display_mode *mode)
+{
+ return kunit_add_action_or_reset(test,
+ kunit_action_drm_mode_destroy,
+ mode);
+}
+EXPORT_SYMBOL_GPL(drm_kunit_add_mode_destroy_action);
+
+/**
* drm_kunit_display_mode_from_cea_vic() - return a mode for CEA VIC for a KUnit test
* @test: The test context object
* @dev: DRM device
diff --git a/drivers/gpu/drm/tests/drm_modes_test.c b/drivers/gpu/drm/tests/drm_modes_test.c
index 6ed51f9..f5b20f9 100644
--- a/drivers/gpu/drm/tests/drm_modes_test.c
+++ b/drivers/gpu/drm/tests/drm_modes_test.c
@@ -40,6 +40,7 @@ static void drm_test_modes_analog_tv_ntsc_480i(struct kunit *test)
{
struct drm_test_modes_priv *priv = test->priv;
struct drm_display_mode *mode;
+ int ret;
mode = drm_analog_tv_mode(priv->drm,
DRM_MODE_TV_MODE_NTSC,
@@ -47,6 +48,9 @@ static void drm_test_modes_analog_tv_ntsc_480i(struct kunit *test)
true);
KUNIT_ASSERT_NOT_NULL(test, mode);
+ ret = drm_kunit_add_mode_destroy_action(test, mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_EQ(test, drm_mode_vrefresh(mode), 60);
KUNIT_EXPECT_EQ(test, mode->hdisplay, 720);
@@ -70,6 +74,7 @@ static void drm_test_modes_analog_tv_ntsc_480i_inlined(struct kunit *test)
{
struct drm_test_modes_priv *priv = test->priv;
struct drm_display_mode *expected, *mode;
+ int ret;
expected = drm_analog_tv_mode(priv->drm,
DRM_MODE_TV_MODE_NTSC,
@@ -77,9 +82,15 @@ static void drm_test_modes_analog_tv_ntsc_480i_inlined(struct kunit *test)
true);
KUNIT_ASSERT_NOT_NULL(test, expected);
+ ret = drm_kunit_add_mode_destroy_action(test, expected);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
mode = drm_mode_analog_ntsc_480i(priv->drm);
KUNIT_ASSERT_NOT_NULL(test, mode);
+ ret = drm_kunit_add_mode_destroy_action(test, mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_TRUE(test, drm_mode_equal(expected, mode));
}
@@ -87,6 +98,7 @@ static void drm_test_modes_analog_tv_pal_576i(struct kunit *test)
{
struct drm_test_modes_priv *priv = test->priv;
struct drm_display_mode *mode;
+ int ret;
mode = drm_analog_tv_mode(priv->drm,
DRM_MODE_TV_MODE_PAL,
@@ -94,6 +106,9 @@ static void drm_test_modes_analog_tv_pal_576i(struct kunit *test)
true);
KUNIT_ASSERT_NOT_NULL(test, mode);
+ ret = drm_kunit_add_mode_destroy_action(test, mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_EQ(test, drm_mode_vrefresh(mode), 50);
KUNIT_EXPECT_EQ(test, mode->hdisplay, 720);
@@ -117,6 +132,7 @@ static void drm_test_modes_analog_tv_pal_576i_inlined(struct kunit *test)
{
struct drm_test_modes_priv *priv = test->priv;
struct drm_display_mode *expected, *mode;
+ int ret;
expected = drm_analog_tv_mode(priv->drm,
DRM_MODE_TV_MODE_PAL,
@@ -124,9 +140,15 @@ static void drm_test_modes_analog_tv_pal_576i_inlined(struct kunit *test)
true);
KUNIT_ASSERT_NOT_NULL(test, expected);
+ ret = drm_kunit_add_mode_destroy_action(test, expected);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
mode = drm_mode_analog_pal_576i(priv->drm);
KUNIT_ASSERT_NOT_NULL(test, mode);
+ ret = drm_kunit_add_mode_destroy_action(test, mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_TRUE(test, drm_mode_equal(expected, mode));
}
@@ -134,6 +156,7 @@ static void drm_test_modes_analog_tv_mono_576i(struct kunit *test)
{
struct drm_test_modes_priv *priv = test->priv;
struct drm_display_mode *mode;
+ int ret;
mode = drm_analog_tv_mode(priv->drm,
DRM_MODE_TV_MODE_MONOCHROME,
@@ -141,6 +164,9 @@ static void drm_test_modes_analog_tv_mono_576i(struct kunit *test)
true);
KUNIT_ASSERT_NOT_NULL(test, mode);
+ ret = drm_kunit_add_mode_destroy_action(test, mode);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
KUNIT_EXPECT_EQ(test, drm_mode_vrefresh(mode), 50);
KUNIT_EXPECT_EQ(test, mode->hdisplay, 720);
diff --git a/drivers/gpu/drm/tests/drm_probe_helper_test.c b/drivers/gpu/drm/tests/drm_probe_helper_test.c
index bc09ff3..db0e4f5 100644
--- a/drivers/gpu/drm/tests/drm_probe_helper_test.c
+++ b/drivers/gpu/drm/tests/drm_probe_helper_test.c
@@ -98,7 +98,7 @@ drm_test_connector_helper_tv_get_modes_check(struct kunit *test)
struct drm_connector *connector = &priv->connector;
struct drm_cmdline_mode *cmdline = &connector->cmdline_mode;
struct drm_display_mode *mode;
- const struct drm_display_mode *expected;
+ struct drm_display_mode *expected;
size_t len;
int ret;
@@ -134,6 +134,9 @@ drm_test_connector_helper_tv_get_modes_check(struct kunit *test)
KUNIT_EXPECT_TRUE(test, drm_mode_equal(mode, expected));
KUNIT_EXPECT_TRUE(test, mode->type & DRM_MODE_TYPE_PREFERRED);
+
+ ret = drm_kunit_add_mode_destroy_action(test, expected);
+ KUNIT_ASSERT_EQ(test, ret, 0);
}
if (params->num_expected_modes >= 2) {
@@ -145,6 +148,9 @@ drm_test_connector_helper_tv_get_modes_check(struct kunit *test)
KUNIT_EXPECT_TRUE(test, drm_mode_equal(mode, expected));
KUNIT_EXPECT_FALSE(test, mode->type & DRM_MODE_TYPE_PREFERRED);
+
+ ret = drm_kunit_add_mode_destroy_action(test, expected);
+ KUNIT_ASSERT_EQ(test, ret, 0);
}
mutex_unlock(&priv->drm->mode_config.mutex);
diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c
index dde8fc1..90c99d8 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -115,13 +115,14 @@ int virtio_gpu_gem_object_open(struct drm_gem_object *obj,
if (!vgdev->has_context_init)
virtio_gpu_create_context(obj->dev, file);
- objs = virtio_gpu_array_alloc(1);
- if (!objs)
- return -ENOMEM;
- virtio_gpu_array_add_obj(objs, obj);
+ if (vfpriv->context_created) {
+ objs = virtio_gpu_array_alloc(1);
+ if (!objs)
+ return -ENOMEM;
+ virtio_gpu_array_add_obj(objs, obj);
- if (vfpriv->ctx_id)
virtio_gpu_cmd_context_attach_resource(vgdev, vfpriv->ctx_id, objs);
+ }
out_notify:
virtio_gpu_notify(vgdev);
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a6f5a78..87e584a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -366,12 +366,6 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
return 0;
obj = new_state->fb->obj[0];
- if (obj->import_attach) {
- ret = virtio_gpu_prepare_imported_obj(plane, new_state, obj);
- if (ret)
- return ret;
- }
-
if (bo->dumb || obj->import_attach) {
vgplane_st->fence = virtio_gpu_fence_alloc(vgdev,
vgdev->fence_drv.context,
@@ -380,7 +374,21 @@ static int virtio_gpu_plane_prepare_fb(struct drm_plane *plane,
return -ENOMEM;
}
+ if (obj->import_attach) {
+ ret = virtio_gpu_prepare_imported_obj(plane, new_state, obj);
+ if (ret)
+ goto err_fence;
+ }
+
return 0;
+
+err_fence:
+ if (vgplane_st->fence) {
+ dma_fence_put(&vgplane_st->fence->f);
+ vgplane_st->fence = NULL;
+ }
+
+ return ret;
}
static void virtio_gpu_cleanup_imported_obj(struct drm_gem_object *obj)
diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c b/drivers/gpu/drm/virtio/virtgpu_prime.c
index fe6a0b0..4de2a63 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -321,6 +321,7 @@ struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev,
return ERR_PTR(-ENOMEM);
obj = &bo->base.base;
+ obj->resv = buf->resv;
obj->funcs = &virtgpu_gem_dma_buf_funcs;
drm_gem_private_object_init(dev, obj, buf->size);
diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index a255946..8cfcd33 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -41,6 +41,7 @@
#define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
+#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
#define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 72ef0b6..9f8667e 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -585,6 +585,7 @@ struct xe_device {
INTEL_DRAM_DDR5,
INTEL_DRAM_LPDDR5,
INTEL_DRAM_GDDR,
+ INTEL_DRAM_GDDR_ECC,
} type;
u8 num_qgv_points;
u8 num_psf_gv_points;
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 03072e0..084cbde 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -322,6 +322,13 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
return 0;
}
+/*
+ * Ensure that roundup_pow_of_two(length) doesn't overflow.
+ * Note that roundup_pow_of_two() operates on unsigned long,
+ * not on u64.
+ */
+#define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX))
+
/**
* xe_gt_tlb_invalidation_range - Issue a TLB invalidation on this GT for an
* address range
@@ -346,6 +353,7 @@ int xe_gt_tlb_invalidation_range(struct xe_gt *gt,
struct xe_device *xe = gt_to_xe(gt);
#define MAX_TLB_INVALIDATION_LEN 7
u32 action[MAX_TLB_INVALIDATION_LEN];
+ u64 length = end - start;
int len = 0;
xe_gt_assert(gt, fence);
@@ -358,11 +366,11 @@ int xe_gt_tlb_invalidation_range(struct xe_gt *gt,
action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
- if (!xe->info.has_range_tlb_invalidation) {
+ if (!xe->info.has_range_tlb_invalidation ||
+ length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
} else {
u64 orig_start = start;
- u64 length = end - start;
u64 align;
if (length < SZ_4K)
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index 8521531..43b1192 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -1070,6 +1070,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_RUNNING,
SLPC_RESET_EXTENDED_TIMEOUT_MS)) {
xe_gt_err(gt, "GuC PC Start failed: Dynamic GT frequency control and GT sleep states are now disabled.\n");
+ ret = -EIO;
goto out;
}
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 8c05fd3..93241fd 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -389,12 +389,6 @@ xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe)
blit_cctl_val,
XE_RTP_ACTION_FLAG(ENGINE_BASE)))
},
- /* Use Fixed slice CCS mode */
- { XE_RTP_NAME("RCU_MODE_FIXED_SLICE_CCS_MODE"),
- XE_RTP_RULES(FUNC(xe_hw_engine_match_fixed_cslice_mode)),
- XE_RTP_ACTIONS(FIELD_SET(RCU_MODE, RCU_MODE_FIXED_SLICE_CCS_MODE,
- RCU_MODE_FIXED_SLICE_CCS_MODE))
- },
/* Disable WMTP if HW doesn't support it */
{ XE_RTP_NAME("DISABLE_WMTP_ON_UNSUPPORTED_HW"),
XE_RTP_RULES(FUNC(xe_rtp_cfeg_wmtp_disabled)),
@@ -461,6 +455,12 @@ hw_engine_setup_default_state(struct xe_hw_engine *hwe)
XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), CS_PRIORITY_MEM_READ,
XE_RTP_ACTION_FLAG(ENGINE_BASE)))
},
+ /* Use Fixed slice CCS mode */
+ { XE_RTP_NAME("RCU_MODE_FIXED_SLICE_CCS_MODE"),
+ XE_RTP_RULES(FUNC(xe_hw_engine_match_fixed_cslice_mode)),
+ XE_RTP_ACTIONS(FIELD_SET(RCU_MODE, RCU_MODE_FIXED_SLICE_CCS_MODE,
+ RCU_MODE_FIXED_SLICE_CCS_MODE))
+ },
};
xe_rtp_process_to_sr(&ctx, engine_entries, ARRAY_SIZE(engine_entries), &hwe->reg_sr);
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
index b53e8d2..a440442 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
@@ -32,14 +32,61 @@ bool xe_hw_engine_timeout_in_range(u64 timeout, u64 min, u64 max)
return timeout >= min && timeout <= max;
}
-static void kobj_xe_hw_engine_release(struct kobject *kobj)
+static void xe_hw_engine_sysfs_kobj_release(struct kobject *kobj)
{
kfree(kobj);
}
+static ssize_t xe_hw_engine_class_sysfs_attr_show(struct kobject *kobj,
+ struct attribute *attr,
+ char *buf)
+{
+ struct xe_device *xe = kobj_to_xe(kobj);
+ struct kobj_attribute *kattr;
+ ssize_t ret = -EIO;
+
+ kattr = container_of(attr, struct kobj_attribute, attr);
+ if (kattr->show) {
+ xe_pm_runtime_get(xe);
+ ret = kattr->show(kobj, kattr, buf);
+ xe_pm_runtime_put(xe);
+ }
+
+ return ret;
+}
+
+static ssize_t xe_hw_engine_class_sysfs_attr_store(struct kobject *kobj,
+ struct attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct xe_device *xe = kobj_to_xe(kobj);
+ struct kobj_attribute *kattr;
+ ssize_t ret = -EIO;
+
+ kattr = container_of(attr, struct kobj_attribute, attr);
+ if (kattr->store) {
+ xe_pm_runtime_get(xe);
+ ret = kattr->store(kobj, kattr, buf, count);
+ xe_pm_runtime_put(xe);
+ }
+
+ return ret;
+}
+
+static const struct sysfs_ops xe_hw_engine_class_sysfs_ops = {
+ .show = xe_hw_engine_class_sysfs_attr_show,
+ .store = xe_hw_engine_class_sysfs_attr_store,
+};
+
static const struct kobj_type kobj_xe_hw_engine_type = {
- .release = kobj_xe_hw_engine_release,
- .sysfs_ops = &kobj_sysfs_ops
+ .release = xe_hw_engine_sysfs_kobj_release,
+ .sysfs_ops = &xe_hw_engine_class_sysfs_ops,
+};
+
+static const struct kobj_type kobj_xe_hw_engine_type_def = {
+ .release = xe_hw_engine_sysfs_kobj_release,
+ .sysfs_ops = &kobj_sysfs_ops,
};
static ssize_t job_timeout_max_store(struct kobject *kobj,
@@ -543,7 +590,7 @@ static int xe_add_hw_engine_class_defaults(struct xe_device *xe,
if (!kobj)
return -ENOMEM;
- kobject_init(kobj, &kobj_xe_hw_engine_type);
+ kobject_init(kobj, &kobj_xe_hw_engine_type_def);
err = kobject_add(kobj, parent, "%s", ".defaults");
if (err)
goto err_object;
@@ -559,57 +606,6 @@ static int xe_add_hw_engine_class_defaults(struct xe_device *xe,
return err;
}
-static void xe_hw_engine_sysfs_kobj_release(struct kobject *kobj)
-{
- kfree(kobj);
-}
-
-static ssize_t xe_hw_engine_class_sysfs_attr_show(struct kobject *kobj,
- struct attribute *attr,
- char *buf)
-{
- struct xe_device *xe = kobj_to_xe(kobj);
- struct kobj_attribute *kattr;
- ssize_t ret = -EIO;
-
- kattr = container_of(attr, struct kobj_attribute, attr);
- if (kattr->show) {
- xe_pm_runtime_get(xe);
- ret = kattr->show(kobj, kattr, buf);
- xe_pm_runtime_put(xe);
- }
-
- return ret;
-}
-
-static ssize_t xe_hw_engine_class_sysfs_attr_store(struct kobject *kobj,
- struct attribute *attr,
- const char *buf,
- size_t count)
-{
- struct xe_device *xe = kobj_to_xe(kobj);
- struct kobj_attribute *kattr;
- ssize_t ret = -EIO;
-
- kattr = container_of(attr, struct kobj_attribute, attr);
- if (kattr->store) {
- xe_pm_runtime_get(xe);
- ret = kattr->store(kobj, kattr, buf, count);
- xe_pm_runtime_put(xe);
- }
-
- return ret;
-}
-
-static const struct sysfs_ops xe_hw_engine_class_sysfs_ops = {
- .show = xe_hw_engine_class_sysfs_attr_show,
- .store = xe_hw_engine_class_sysfs_attr_store,
-};
-
-static const struct kobj_type xe_hw_engine_sysfs_kobj_type = {
- .release = xe_hw_engine_sysfs_kobj_release,
- .sysfs_ops = &xe_hw_engine_class_sysfs_ops,
-};
static void hw_engine_class_sysfs_fini(void *arg)
{
@@ -640,7 +636,7 @@ int xe_hw_engine_class_sysfs_init(struct xe_gt *gt)
if (!kobj)
return -ENOMEM;
- kobject_init(kobj, &xe_hw_engine_sysfs_kobj_type);
+ kobject_init(kobj, &kobj_xe_hw_engine_type);
err = kobject_add(kobj, gt->sysfs, "engines");
if (err)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index df4282c..5a3e890 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1177,7 +1177,7 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
err_sync:
/* Sync partial copies if any. FIXME: job_mutex? */
if (fence) {
- dma_fence_wait(m->fence, false);
+ dma_fence_wait(fence, false);
dma_fence_put(fence);
}
@@ -1547,7 +1547,7 @@ void xe_migrate_wait(struct xe_migrate *m)
static u32 pte_update_cmd_size(u64 size)
{
u32 num_dword;
- u64 entries = DIV_ROUND_UP(size, XE_PAGE_SIZE);
+ u64 entries = DIV_U64_ROUND_UP(size, XE_PAGE_SIZE);
XE_WARN_ON(size > MAX_PREEMPTDISABLE_TRANSFER);
/*
@@ -1558,7 +1558,7 @@ static u32 pte_update_cmd_size(u64 size)
* 2 dword for the page table's physical location
* 2*n dword for value of pte to fill (each pte entry is 2 dwords)
*/
- num_dword = (1 + 2) * DIV_ROUND_UP(entries, 0x1ff);
+ num_dword = (1 + 2) * DIV_U64_ROUND_UP(entries, 0x1ff);
num_dword += entries * 2;
return num_dword;
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 917fc16..a7582b0 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -137,7 +137,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
int i)
{
- u32 flags = PIPE_CONTROL_CS_STALL |
+ u32 flags0 = 0;
+ u32 flags1 = PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -148,11 +149,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
PIPE_CONTROL_STORE_DATA_INDEX;
if (invalidate_tlb)
- flags |= PIPE_CONTROL_TLB_INVALIDATE;
+ flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
- flags &= ~mask_flags;
+ flags1 &= ~mask_flags;
- return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
+ if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
+ flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
+
+ return emit_pipe_control(dw, i, flags0, flags1,
+ LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
}
static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 3e829c87..f8c1285 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -696,11 +696,14 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
list_for_each_entry(block, blocks, link)
block->private = vr;
+ xe_bo_get(bo);
err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base,
&bo->devmem_allocation, ctx);
- xe_bo_unlock(bo);
if (err)
- xe_bo_put(bo); /* Creation ref */
+ xe_svm_devmem_release(&bo->devmem_allocation);
+
+ xe_bo_unlock(bo);
+ xe_bo_put(bo);
unlock:
mmap_read_unlock(mm);
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index 0c738af..9b9e176 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -32,8 +32,10 @@
GRAPHICS_VERSION(3001)
14022293748 GRAPHICS_VERSION(2001)
GRAPHICS_VERSION(2004)
+ GRAPHICS_VERSION_RANGE(3000, 3001)
22019794406 GRAPHICS_VERSION(2001)
GRAPHICS_VERSION(2004)
+ GRAPHICS_VERSION_RANGE(3000, 3001)
22019338487 MEDIA_VERSION(2000)
GRAPHICS_VERSION(2001)
MEDIA_VERSION(3000), MEDIA_STEP(A0, B0), FUNC(xe_rtp_match_not_sriov_vf)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index d525ab4..dd7d030 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -487,17 +487,6 @@ static int tegra241_cmdqv_hw_reset(struct arm_smmu_device *smmu)
/* VCMDQ Resource Helpers */
-static void tegra241_vcmdq_free_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
-{
- struct arm_smmu_queue *q = &vcmdq->cmdq.q;
- size_t nents = 1 << q->llq.max_n_shift;
- size_t qsz = nents << CMDQ_ENT_SZ_SHIFT;
-
- if (!q->base)
- return;
- dmam_free_coherent(vcmdq->cmdqv->smmu.dev, qsz, q->base, q->base_dma);
-}
-
static int tegra241_vcmdq_alloc_smmu_cmdq(struct tegra241_vcmdq *vcmdq)
{
struct arm_smmu_device *smmu = &vcmdq->cmdqv->smmu;
@@ -560,7 +549,8 @@ static void tegra241_vintf_free_lvcmdq(struct tegra241_vintf *vintf, u16 lidx)
struct tegra241_vcmdq *vcmdq = vintf->lvcmdqs[lidx];
char header[64];
- tegra241_vcmdq_free_smmu_cmdq(vcmdq);
+ /* Note that the lvcmdq queue memory space is managed by devres */
+
tegra241_vintf_deinit_lvcmdq(vintf, lidx);
dev_dbg(vintf->cmdqv->dev,
@@ -768,13 +758,13 @@ static int tegra241_cmdqv_init_structures(struct arm_smmu_device *smmu)
vintf = kzalloc(sizeof(*vintf), GFP_KERNEL);
if (!vintf)
- goto out_fallback;
+ return -ENOMEM;
/* Init VINTF0 for in-kernel use */
ret = tegra241_cmdqv_init_vintf(cmdqv, 0, vintf);
if (ret) {
dev_err(cmdqv->dev, "failed to init vintf0: %d\n", ret);
- goto free_vintf;
+ return ret;
}
/* Preallocate logical VCMDQs to VINTF0 */
@@ -783,24 +773,12 @@ static int tegra241_cmdqv_init_structures(struct arm_smmu_device *smmu)
vcmdq = tegra241_vintf_alloc_lvcmdq(vintf, lidx);
if (IS_ERR(vcmdq))
- goto free_lvcmdq;
+ return PTR_ERR(vcmdq);
}
/* Now, we are ready to run all the impl ops */
smmu->impl_ops = &tegra241_cmdqv_impl_ops;
return 0;
-
-free_lvcmdq:
- for (lidx--; lidx >= 0; lidx--)
- tegra241_vintf_free_lvcmdq(vintf, lidx);
- tegra241_cmdqv_deinit_vintf(cmdqv, vintf->idx);
-free_vintf:
- kfree(vintf);
-out_fallback:
- dev_info(smmu->impl_dev, "Falling back to standard SMMU CMDQ\n");
- smmu->options &= ~ARM_SMMU_OPT_TEGRA241_CMDQV;
- tegra241_cmdqv_remove(smmu);
- return 0;
}
#ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cb7e29d..a775e4d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1754,7 +1754,7 @@ static size_t cookie_msi_granule(const struct iommu_domain *domain)
return PAGE_SIZE;
default:
BUG();
- };
+ }
}
static struct list_head *cookie_msi_pages(const struct iommu_domain *domain)
@@ -1766,7 +1766,7 @@ static struct list_head *cookie_msi_pages(const struct iommu_domain *domain)
return &domain->msi_cookie->msi_page_list;
default:
BUG();
- };
+ }
}
static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 69e23e0..317266a 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -832,7 +832,7 @@ static int __maybe_unused exynos_sysmmu_suspend(struct device *dev)
struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
mutex_lock(&owner->rpm_lock);
- if (&data->domain->domain != &exynos_identity_domain) {
+ if (data->domain) {
dev_dbg(data->sysmmu, "saving state\n");
__sysmmu_disable(data);
}
@@ -850,7 +850,7 @@ static int __maybe_unused exynos_sysmmu_resume(struct device *dev)
struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
mutex_lock(&owner->rpm_lock);
- if (&data->domain->domain != &exynos_identity_domain) {
+ if (data->domain) {
dev_dbg(data->sysmmu, "restoring state\n");
__sysmmu_enable(data);
}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6e67cc6..b29da2d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3835,7 +3835,6 @@ static void intel_iommu_release_device(struct device *dev)
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
kfree(info);
- set_dma_ops(dev, NULL);
}
static void intel_iommu_get_resv_regions(struct device *device,
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index ea3ca52..3bc2a03 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1287,43 +1287,44 @@ static struct irq_chip intel_ir_chip = {
};
/*
- * With posted MSIs, all vectors are multiplexed into a single notification
- * vector. Devices MSIs are then dispatched in a demux loop where
- * EOIs can be coalesced as well.
+ * With posted MSIs, the MSI vectors are multiplexed into a single notification
+ * vector, and only the notification vector is sent to the APIC IRR. Device
+ * MSIs are then dispatched in a demux loop that harvests the MSIs from the
+ * CPU's Posted Interrupt Request bitmap. I.e. Posted MSIs never get sent to
+ * the APIC IRR, and thus do not need an EOI. The notification handler instead
+ * performs a single EOI after processing the PIR.
*
- * "INTEL-IR-POST" IRQ chip does not do EOI on ACK, thus the dummy irq_ack()
- * function. Instead EOI is performed by the posted interrupt notification
- * handler.
+ * Note! Pending SMP/CPU affinity changes, which are per MSI, must still be
+ * honored, only the APIC EOI is omitted.
*
* For the example below, 3 MSIs are coalesced into one CPU notification. Only
- * one apic_eoi() is needed.
+ * one apic_eoi() is needed, but each MSI needs to process pending changes to
+ * its CPU affinity.
*
* __sysvec_posted_msi_notification()
* irq_enter();
* handle_edge_irq()
* irq_chip_ack_parent()
- * dummy(); // No EOI
+ * irq_move_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* handle_edge_irq()
* irq_chip_ack_parent()
- * dummy(); // No EOI
+ * irq_move_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* handle_edge_irq()
* irq_chip_ack_parent()
- * dummy(); // No EOI
+ * irq_move_irq(); // No EOI
* handle_irq_event()
* driver_handler()
* apic_eoi()
* irq_exit()
+ *
*/
-
-static void dummy_ack(struct irq_data *d) { }
-
static struct irq_chip intel_ir_chip_post_msi = {
.name = "INTEL-IR-POST",
- .irq_ack = dummy_ack,
+ .irq_ack = irq_move_irq,
.irq_set_affinity = intel_ir_set_affinity,
.irq_compose_msi_msg = intel_ir_compose_msi_msg,
.irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c8033ca..4f91a74 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -538,6 +538,9 @@ static void iommu_deinit_device(struct device *dev)
dev->iommu_group = NULL;
module_put(ops->owner);
dev_iommu_free(dev);
+#ifdef CONFIG_IOMMU_DMA
+ dev->dma_iommu = false;
+#endif
}
static struct iommu_domain *pasid_array_entry_to_domain(void *entry)
@@ -2717,7 +2720,8 @@ int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
* if upper layers showed interest and installed a fault handler,
* invoke it.
*/
- if (domain->handler)
+ if (domain->cookie_type == IOMMU_COOKIE_FAULT_HANDLER &&
+ domain->handler)
ret = domain->handler(domain, dev, iova, flags,
domain->handler_token);
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 074daf1..e424b27 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1081,31 +1081,24 @@ static int ipmmu_probe(struct platform_device *pdev)
}
}
+ platform_set_drvdata(pdev, mmu);
/*
* Register the IPMMU to the IOMMU subsystem in the following cases:
* - R-Car Gen2 IPMMU (all devices registered)
* - R-Car Gen3 IPMMU (leaf devices only - skip root IPMMU-MM device)
*/
- if (!mmu->features->has_cache_leaf_nodes || !ipmmu_is_root(mmu)) {
- ret = iommu_device_sysfs_add(&mmu->iommu, &pdev->dev, NULL,
- dev_name(&pdev->dev));
- if (ret)
- return ret;
+ if (mmu->features->has_cache_leaf_nodes && ipmmu_is_root(mmu))
+ return 0;
- ret = iommu_device_register(&mmu->iommu, &ipmmu_ops, &pdev->dev);
- if (ret)
- return ret;
- }
+ ret = iommu_device_sysfs_add(&mmu->iommu, &pdev->dev, NULL, dev_name(&pdev->dev));
+ if (ret)
+ return ret;
- /*
- * We can't create the ARM mapping here as it requires the bus to have
- * an IOMMU, which only happens when bus_set_iommu() is called in
- * ipmmu_init() after the probe function returns.
- */
+ ret = iommu_device_register(&mmu->iommu, &ipmmu_ops, &pdev->dev);
+ if (ret)
+ iommu_device_sysfs_remove(&mmu->iommu);
- platform_set_drvdata(pdev, mmu);
-
- return 0;
+ return ret;
}
static void ipmmu_remove(struct platform_device *pdev)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 034b0e6..df98d0c 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1372,15 +1372,6 @@ static int mtk_iommu_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, data);
mutex_init(&data->mutex);
- ret = iommu_device_sysfs_add(&data->iommu, dev, NULL,
- "mtk-iommu.%pa", &ioaddr);
- if (ret)
- goto out_link_remove;
-
- ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
- if (ret)
- goto out_sysfs_remove;
-
if (MTK_IOMMU_HAS_FLAG(data->plat_data, SHARE_PGTABLE)) {
list_add_tail(&data->list, data->plat_data->hw_list);
data->hw_list = data->plat_data->hw_list;
@@ -1390,19 +1381,28 @@ static int mtk_iommu_probe(struct platform_device *pdev)
data->hw_list = &data->hw_list_head;
}
+ ret = iommu_device_sysfs_add(&data->iommu, dev, NULL,
+ "mtk-iommu.%pa", &ioaddr);
+ if (ret)
+ goto out_list_del;
+
+ ret = iommu_device_register(&data->iommu, &mtk_iommu_ops, dev);
+ if (ret)
+ goto out_sysfs_remove;
+
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
ret = component_master_add_with_match(dev, &mtk_iommu_com_ops, match);
if (ret)
- goto out_list_del;
+ goto out_device_unregister;
}
return ret;
-out_list_del:
- list_del(&data->list);
+out_device_unregister:
iommu_device_unregister(&data->iommu);
out_sysfs_remove:
iommu_device_sysfs_remove(&data->iommu);
-out_link_remove:
+out_list_del:
+ list_del(&data->list);
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM))
device_link_remove(data->smicomm_dev, dev);
out_runtime_disable:
diff --git a/drivers/irqchip/irq-bcm2712-mip.c b/drivers/irqchip/irq-bcm2712-mip.c
index 49a19db..4cce242 100644
--- a/drivers/irqchip/irq-bcm2712-mip.c
+++ b/drivers/irqchip/irq-bcm2712-mip.c
@@ -163,6 +163,7 @@ static const struct irq_domain_ops mip_middle_domain_ops = {
static const struct msi_parent_ops mip_msi_parent_ops = {
.supported_flags = MIP_MSI_FLAGS_SUPPORTED,
.required_flags = MIP_MSI_FLAGS_REQUIRED,
+ .chip_flags = MSI_CHIP_FLAG_SET_EOI | MSI_CHIP_FLAG_SET_ACK,
.bus_select_token = DOMAIN_BUS_GENERIC_MSI,
.bus_select_mask = MATCH_PCI_MSI,
.prefix = "MIP-MSI-",
diff --git a/drivers/irqchip/irq-sg2042-msi.c b/drivers/irqchip/irq-sg2042-msi.c
index ee682e8..375b55a 100644
--- a/drivers/irqchip/irq-sg2042-msi.c
+++ b/drivers/irqchip/irq-sg2042-msi.c
@@ -151,6 +151,7 @@ static const struct irq_domain_ops sg2042_msi_middle_domain_ops = {
static const struct msi_parent_ops sg2042_msi_parent_ops = {
.required_flags = SG2042_MSI_FLAGS_REQUIRED,
.supported_flags = SG2042_MSI_FLAGS_SUPPORTED,
+ .chip_flags = MSI_CHIP_FLAG_SET_ACK,
.bus_select_mask = MATCH_PCI_MSI,
.bus_select_token = DOMAIN_BUS_NEXUS,
.prefix = "SG2042-",
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 06f809e..ddb37f6 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -139,7 +139,7 @@
tristate "RAID-4/RAID-5/RAID-6 mode"
depends on BLK_DEV_MD
select RAID6_PQ
- select LIBCRC32C
+ select CRC32
select ASYNC_MEMCPY
select ASYNC_XOR
select ASYNC_PQ
diff --git a/drivers/md/persistent-data/Kconfig b/drivers/md/persistent-data/Kconfig
index f4f948b..dbb97a7 100644
--- a/drivers/md/persistent-data/Kconfig
+++ b/drivers/md/persistent-data/Kconfig
@@ -2,7 +2,7 @@
config DM_PERSISTENT_DATA
tristate
depends on BLK_DEV_DM
- select LIBCRC32C
+ select CRC32
select DM_BUFIO
help
Library providing immutable on-disk data structure support for
diff --git a/drivers/mtd/inftlcore.c b/drivers/mtd/inftlcore.c
index 9739387..58c6e17 100644
--- a/drivers/mtd/inftlcore.c
+++ b/drivers/mtd/inftlcore.c
@@ -482,10 +482,11 @@ static inline u16 INFTL_findwriteunit(struct INFTLrecord *inftl, unsigned block)
silly = MAX_LOOPS;
while (thisEUN <= inftl->lastEUN) {
- inftl_read_oob(mtd, (thisEUN * inftl->EraseSize) +
- blockofs, 8, &retlen, (char *)&bci);
-
- status = bci.Status | bci.Status1;
+ if (inftl_read_oob(mtd, (thisEUN * inftl->EraseSize) +
+ blockofs, 8, &retlen, (char *)&bci) < 0)
+ status = SECTOR_IGNORE;
+ else
+ status = bci.Status | bci.Status1;
pr_debug("INFTL: status of block %d in EUN %d is %x\n",
block , writeEUN, status);
diff --git a/drivers/mtd/nand/Makefile b/drivers/mtd/nand/Makefile
index db516a4..44913ff 100644
--- a/drivers/mtd/nand/Makefile
+++ b/drivers/mtd/nand/Makefile
@@ -3,11 +3,8 @@
nandcore-objs := core.o bbt.o
obj-$(CONFIG_MTD_NAND_CORE) += nandcore.o
obj-$(CONFIG_MTD_NAND_ECC_MEDIATEK) += ecc-mtk.o
-ifeq ($(CONFIG_SPI_QPIC_SNAND),y)
obj-$(CONFIG_SPI_QPIC_SNAND) += qpic_common.o
-else
obj-$(CONFIG_MTD_NAND_QCOM) += qpic_common.o
-endif
obj-y += onenand/
obj-y += raw/
obj-y += spi/
diff --git a/drivers/mtd/nand/raw/r852.c b/drivers/mtd/nand/raw/r852.c
index b07c2f8..918974d 100644
--- a/drivers/mtd/nand/raw/r852.c
+++ b/drivers/mtd/nand/raw/r852.c
@@ -387,6 +387,9 @@ static int r852_wait(struct nand_chip *chip)
static int r852_ready(struct nand_chip *chip)
{
struct r852_device *dev = r852_get_dev(nand_to_mtd(chip));
+ if (dev->card_unstable)
+ return 0;
+
return !(r852_read_reg(dev, R852_CARD_STA) & R852_CARD_STA_BUSY);
}
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index eeec8bf..1bd4313 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -143,7 +143,7 @@
depends on PTP_1588_CLOCK_OPTIONAL
select FW_LOADER
select ZLIB_INFLATE
- select LIBCRC32C
+ select CRC32
select MDIO
help
This driver supports Broadcom NetXtremeII 10 gigabit Ethernet cards.
@@ -207,7 +207,7 @@
depends on PCI
depends on PTP_1588_CLOCK_OPTIONAL
select FW_LOADER
- select LIBCRC32C
+ select CRC32
select NET_DEVLINK
select PAGE_POOL
select DIMLIB
diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
index ca742cc..7dae5aa 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -70,8 +70,8 @@
depends on 64BIT && PCI
depends on PCI
depends on PTP_1588_CLOCK_OPTIONAL
+ select CRC32
select FW_LOADER
- select LIBCRC32C
select LIQUIDIO_CORE
select NET_DEVLINK
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c
index 0f844c1..35acc07 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c
@@ -165,6 +165,11 @@ static void __otx2_qos_txschq_cfg(struct otx2_nic *pfvf,
otx2_config_sched_shaping(pfvf, node, cfg, &num_regs);
} else if (level == NIX_TXSCH_LVL_TL2) {
+ /* configure parent txschq */
+ cfg->reg[num_regs] = NIX_AF_TL2X_PARENT(node->schq);
+ cfg->regval[num_regs] = (u64)hw->tx_link << 16;
+ num_regs++;
+
/* configure link cfg */
if (level == pfvf->qos.link_cfg_lvl) {
cfg->reg[num_regs] = NIX_AF_TL3_TL2X_LINKX_CFG(node->schq, hw->tx_link);
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index 00b0b31..e69eaa6 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -310,7 +310,8 @@ static bool wx_alloc_mapped_page(struct wx_ring *rx_ring,
return true;
page = page_pool_dev_alloc_pages(rx_ring->page_pool);
- WARN_ON(!page);
+ if (unlikely(!page))
+ return false;
dma = page_pool_get_dma_addr(page);
bi->page_dma = dma;
@@ -546,7 +547,8 @@ static void wx_rx_checksum(struct wx_ring *ring,
return;
/* Hardware can't guarantee csum if IPv6 Dest Header found */
- if (dptype.prot != WX_DEC_PTYPE_PROT_SCTP && WX_RXD_IPV6EX(rx_desc))
+ if (dptype.prot != WX_DEC_PTYPE_PROT_SCTP &&
+ wx_test_staterr(rx_desc, WX_RXD_STAT_IPV6EX))
return;
/* if L4 checksum error */
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index 5b230ecb..4c545b2 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -513,6 +513,7 @@ enum WX_MSCA_CMD_value {
#define WX_RXD_STAT_L4CS BIT(7) /* L4 xsum calculated */
#define WX_RXD_STAT_IPCS BIT(8) /* IP xsum calculated */
#define WX_RXD_STAT_OUTERIPCS BIT(10) /* Cloud IP xsum calculated*/
+#define WX_RXD_STAT_IPV6EX BIT(12) /* IPv6 Dest Header */
#define WX_RXD_STAT_TS BIT(14) /* IEEE1588 Time Stamp */
#define WX_RXD_ERR_OUTERIPER BIT(26) /* CRC IP Header error */
@@ -589,8 +590,6 @@ enum wx_l2_ptypes {
#define WX_RXD_PKTTYPE(_rxd) \
((le32_to_cpu((_rxd)->wb.lower.lo_dword.data) >> 9) & 0xFF)
-#define WX_RXD_IPV6EX(_rxd) \
- ((le32_to_cpu((_rxd)->wb.lower.lo_dword.data) >> 6) & 0x1)
/*********************** Transmit Descriptor Config Masks ****************/
#define WX_TXD_STAT_DD BIT(0) /* Descriptor Done */
#define WX_TXD_DTYP_DATA 0 /* Adv Data Descriptor */
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 675fbd2..cc1bfd2 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -244,6 +244,46 @@ static bool phy_drv_wol_enabled(struct phy_device *phydev)
return wol.wolopts != 0;
}
+static void phy_link_change(struct phy_device *phydev, bool up)
+{
+ struct net_device *netdev = phydev->attached_dev;
+
+ if (up)
+ netif_carrier_on(netdev);
+ else
+ netif_carrier_off(netdev);
+ phydev->adjust_link(netdev);
+ if (phydev->mii_ts && phydev->mii_ts->link_state)
+ phydev->mii_ts->link_state(phydev->mii_ts, phydev);
+}
+
+/**
+ * phy_uses_state_machine - test whether consumer driver uses PAL state machine
+ * @phydev: the target PHY device structure
+ *
+ * Ultimately, this aims to indirectly determine whether the PHY is attached
+ * to a consumer which uses the state machine by calling phy_start() and
+ * phy_stop().
+ *
+ * When the PHY driver consumer uses phylib, it must have previously called
+ * phy_connect_direct() or one of its derivatives, so that phy_prepare_link()
+ * has set up a hook for monitoring state changes.
+ *
+ * When the PHY driver is used by the MAC driver consumer through phylink (the
+ * only other provider of a phy_link_change() method), using the PHY state
+ * machine is not optional.
+ *
+ * Return: true if consumer calls phy_start() and phy_stop(), false otherwise.
+ */
+static bool phy_uses_state_machine(struct phy_device *phydev)
+{
+ if (phydev->phy_link_change == phy_link_change)
+ return phydev->attached_dev && phydev->adjust_link;
+
+ /* phydev->phy_link_change is implicitly phylink_phy_change() */
+ return true;
+}
+
static bool mdio_bus_phy_may_suspend(struct phy_device *phydev)
{
struct device_driver *drv = phydev->mdio.dev.driver;
@@ -310,7 +350,7 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev)
* may call phy routines that try to grab the same lock, and that may
* lead to a deadlock.
*/
- if (phydev->attached_dev && phydev->adjust_link)
+ if (phy_uses_state_machine(phydev))
phy_stop_machine(phydev);
if (!mdio_bus_phy_may_suspend(phydev))
@@ -364,7 +404,7 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
}
}
- if (phydev->attached_dev && phydev->adjust_link)
+ if (phy_uses_state_machine(phydev))
phy_start_machine(phydev);
return 0;
@@ -1055,19 +1095,6 @@ struct phy_device *phy_find_first(struct mii_bus *bus)
}
EXPORT_SYMBOL(phy_find_first);
-static void phy_link_change(struct phy_device *phydev, bool up)
-{
- struct net_device *netdev = phydev->attached_dev;
-
- if (up)
- netif_carrier_on(netdev);
- else
- netif_carrier_off(netdev);
- phydev->adjust_link(netdev);
- if (phydev->mii_ts && phydev->mii_ts->link_state)
- phydev->mii_ts->link_state(phydev->mii_ts, phydev);
-}
-
/**
* phy_prepare_link - prepares the PHY layer to monitor link status
* @phydev: target phy_device struct
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 644e99f..9c49321 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -506,6 +506,11 @@ ppp_sync_txmunge(struct syncppp *ap, struct sk_buff *skb)
unsigned char *data;
int islcp;
+ /* Ensure we can safely access protocol field and LCP code */
+ if (!pskb_may_pull(skb, 3)) {
+ kfree_skb(skb);
+ return NULL;
+ }
data = skb->data;
proto = get_unaligned_be16(data);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cc23035..b502ac0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4295,6 +4295,15 @@ static void nvme_scan_work(struct work_struct *work)
nvme_scan_ns_sequential(ctrl);
}
mutex_unlock(&ctrl->scan_lock);
+
+ /* Requeue if we have missed AENs */
+ if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
+ nvme_queue_scan(ctrl);
+#ifdef CONFIG_NVME_MULTIPATH
+ else
+ /* Re-read the ANA log page to not miss updates */
+ queue_work(nvme_wq, &ctrl->ana_work);
+#endif
}
/*
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 89be591..05eccd9 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -427,7 +427,7 @@ static bool nvme_available_path(struct nvme_ns_head *head)
struct nvme_ns *ns;
if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
- return NULL;
+ return false;
list_for_each_entry_srcu(ns, &head->list, siblings,
srcu_read_lock_held(&head->srcu)) {
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f..72d2602 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1803,6 +1803,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
ret = PTR_ERR(sock_file);
goto err_destroy_mutex;
}
+
+ sk_net_refcnt_upgrade(queue->sock->sk);
nvme_tcp_reclassify_socket(queue->sock);
/* Single syn retry */
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 7318b73..7b50130 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -995,16 +995,6 @@ nvmet_fc_hostport_get(struct nvmet_fc_hostport *hostport)
return kref_get_unless_zero(&hostport->ref);
}
-static void
-nvmet_fc_free_hostport(struct nvmet_fc_hostport *hostport)
-{
- /* if LLDD not implemented, leave as NULL */
- if (!hostport || !hostport->hosthandle)
- return;
-
- nvmet_fc_hostport_put(hostport);
-}
-
static struct nvmet_fc_hostport *
nvmet_fc_match_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
{
@@ -1028,33 +1018,24 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
struct nvmet_fc_hostport *newhost, *match = NULL;
unsigned long flags;
+ /*
+ * Caller holds a reference on tgtport.
+ */
+
/* if LLDD not implemented, leave as NULL */
if (!hosthandle)
return NULL;
- /*
- * take reference for what will be the newly allocated hostport if
- * we end up using a new allocation
- */
- if (!nvmet_fc_tgtport_get(tgtport))
- return ERR_PTR(-EINVAL);
-
spin_lock_irqsave(&tgtport->lock, flags);
match = nvmet_fc_match_hostport(tgtport, hosthandle);
spin_unlock_irqrestore(&tgtport->lock, flags);
- if (match) {
- /* no new allocation - release reference */
- nvmet_fc_tgtport_put(tgtport);
+ if (match)
return match;
- }
newhost = kzalloc(sizeof(*newhost), GFP_KERNEL);
- if (!newhost) {
- /* no new allocation - release reference */
- nvmet_fc_tgtport_put(tgtport);
+ if (!newhost)
return ERR_PTR(-ENOMEM);
- }
spin_lock_irqsave(&tgtport->lock, flags);
match = nvmet_fc_match_hostport(tgtport, hosthandle);
@@ -1063,6 +1044,7 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
kfree(newhost);
newhost = match;
} else {
+ nvmet_fc_tgtport_get(tgtport);
newhost->tgtport = tgtport;
newhost->hosthandle = hosthandle;
INIT_LIST_HEAD(&newhost->host_list);
@@ -1076,20 +1058,14 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
}
static void
-nvmet_fc_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
-{
- nvmet_fc_delete_target_assoc(assoc);
- nvmet_fc_tgt_a_put(assoc);
-}
-
-static void
nvmet_fc_delete_assoc_work(struct work_struct *work)
{
struct nvmet_fc_tgt_assoc *assoc =
container_of(work, struct nvmet_fc_tgt_assoc, del_work);
struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
- nvmet_fc_delete_assoc(assoc);
+ nvmet_fc_delete_target_assoc(assoc);
+ nvmet_fc_tgt_a_put(assoc);
nvmet_fc_tgtport_put(tgtport);
}
@@ -1097,7 +1073,8 @@ static void
nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
{
nvmet_fc_tgtport_get(assoc->tgtport);
- queue_work(nvmet_wq, &assoc->del_work);
+ if (!queue_work(nvmet_wq, &assoc->del_work))
+ nvmet_fc_tgtport_put(assoc->tgtport);
}
static bool
@@ -1143,6 +1120,7 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
goto out_ida;
assoc->tgtport = tgtport;
+ nvmet_fc_tgtport_get(tgtport);
assoc->a_id = idx;
INIT_LIST_HEAD(&assoc->a_list);
kref_init(&assoc->ref);
@@ -1190,7 +1168,7 @@ nvmet_fc_target_assoc_free(struct kref *ref)
/* Send Disconnect now that all i/o has completed */
nvmet_fc_xmt_disconnect_assoc(assoc);
- nvmet_fc_free_hostport(assoc->hostport);
+ nvmet_fc_hostport_put(assoc->hostport);
spin_lock_irqsave(&tgtport->lock, flags);
oldls = assoc->rcv_disconn;
spin_unlock_irqrestore(&tgtport->lock, flags);
@@ -1244,6 +1222,8 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
dev_info(tgtport->dev,
"{%d:%d} Association deleted\n",
tgtport->fc_target_port.port_num, assoc->a_id);
+
+ nvmet_fc_tgtport_put(tgtport);
}
static struct nvmet_fc_tgt_assoc *
@@ -1455,11 +1435,6 @@ nvmet_fc_free_tgtport(struct kref *ref)
struct nvmet_fc_tgtport *tgtport =
container_of(ref, struct nvmet_fc_tgtport, ref);
struct device *dev = tgtport->dev;
- unsigned long flags;
-
- spin_lock_irqsave(&nvmet_fc_tgtlock, flags);
- list_del(&tgtport->tgt_list);
- spin_unlock_irqrestore(&nvmet_fc_tgtlock, flags);
nvmet_fc_free_ls_iodlist(tgtport);
@@ -1620,6 +1595,11 @@ int
nvmet_fc_unregister_targetport(struct nvmet_fc_target_port *target_port)
{
struct nvmet_fc_tgtport *tgtport = targetport_to_tgtport(target_port);
+ unsigned long flags;
+
+ spin_lock_irqsave(&nvmet_fc_tgtlock, flags);
+ list_del(&tgtport->tgt_list);
+ spin_unlock_irqrestore(&nvmet_fc_tgtlock, flags);
nvmet_fc_portentry_unbind_tgt(tgtport);
diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index e1abb27..641201e 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -208,6 +208,7 @@ struct fcloop_lport {
struct nvme_fc_local_port *localport;
struct list_head lport_list;
struct completion unreg_done;
+ refcount_t ref;
};
struct fcloop_lport_priv {
@@ -239,7 +240,7 @@ struct fcloop_nport {
struct fcloop_tport *tport;
struct fcloop_lport *lport;
struct list_head nport_list;
- struct kref ref;
+ refcount_t ref;
u64 node_name;
u64 port_name;
u32 port_role;
@@ -274,7 +275,7 @@ struct fcloop_fcpreq {
u32 inistate;
bool active;
bool aborted;
- struct kref ref;
+ refcount_t ref;
struct work_struct fcp_rcv_work;
struct work_struct abort_rcv_work;
struct work_struct tio_done_work;
@@ -478,7 +479,7 @@ fcloop_t2h_xmt_ls_rsp(struct nvme_fc_local_port *localport,
if (targetport) {
tport = targetport->private;
spin_lock(&tport->lock);
- list_add_tail(&tport->ls_list, &tls_req->ls_list);
+ list_add_tail(&tls_req->ls_list, &tport->ls_list);
spin_unlock(&tport->lock);
queue_work(nvmet_wq, &tport->ls_work);
}
@@ -534,24 +535,18 @@ fcloop_tgt_discovery_evt(struct nvmet_fc_target_port *tgtport)
}
static void
-fcloop_tfcp_req_free(struct kref *ref)
-{
- struct fcloop_fcpreq *tfcp_req =
- container_of(ref, struct fcloop_fcpreq, ref);
-
- kfree(tfcp_req);
-}
-
-static void
fcloop_tfcp_req_put(struct fcloop_fcpreq *tfcp_req)
{
- kref_put(&tfcp_req->ref, fcloop_tfcp_req_free);
+ if (!refcount_dec_and_test(&tfcp_req->ref))
+ return;
+
+ kfree(tfcp_req);
}
static int
fcloop_tfcp_req_get(struct fcloop_fcpreq *tfcp_req)
{
- return kref_get_unless_zero(&tfcp_req->ref);
+ return refcount_inc_not_zero(&tfcp_req->ref);
}
static void
@@ -748,7 +743,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
INIT_WORK(&tfcp_req->fcp_rcv_work, fcloop_fcp_recv_work);
INIT_WORK(&tfcp_req->abort_rcv_work, fcloop_fcp_abort_recv_work);
INIT_WORK(&tfcp_req->tio_done_work, fcloop_tgt_fcprqst_done_work);
- kref_init(&tfcp_req->ref);
+ refcount_set(&tfcp_req->ref, 1);
queue_work(nvmet_wq, &tfcp_req->fcp_rcv_work);
@@ -1001,24 +996,39 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
}
static void
-fcloop_nport_free(struct kref *ref)
+fcloop_lport_put(struct fcloop_lport *lport)
{
- struct fcloop_nport *nport =
- container_of(ref, struct fcloop_nport, ref);
+ unsigned long flags;
- kfree(nport);
+ if (!refcount_dec_and_test(&lport->ref))
+ return;
+
+ spin_lock_irqsave(&fcloop_lock, flags);
+ list_del(&lport->lport_list);
+ spin_unlock_irqrestore(&fcloop_lock, flags);
+
+ kfree(lport);
+}
+
+static int
+fcloop_lport_get(struct fcloop_lport *lport)
+{
+ return refcount_inc_not_zero(&lport->ref);
}
static void
fcloop_nport_put(struct fcloop_nport *nport)
{
- kref_put(&nport->ref, fcloop_nport_free);
+ if (!refcount_dec_and_test(&nport->ref))
+ return;
+
+ kfree(nport);
}
static int
fcloop_nport_get(struct fcloop_nport *nport)
{
- return kref_get_unless_zero(&nport->ref);
+ return refcount_inc_not_zero(&nport->ref);
}
static void
@@ -1029,6 +1039,8 @@ fcloop_localport_delete(struct nvme_fc_local_port *localport)
/* release any threads waiting for the unreg to complete */
complete(&lport->unreg_done);
+
+ fcloop_lport_put(lport);
}
static void
@@ -1140,6 +1152,7 @@ fcloop_create_local_port(struct device *dev, struct device_attribute *attr,
lport->localport = localport;
INIT_LIST_HEAD(&lport->lport_list);
+ refcount_set(&lport->ref, 1);
spin_lock_irqsave(&fcloop_lock, flags);
list_add_tail(&lport->lport_list, &fcloop_lports);
@@ -1156,13 +1169,6 @@ fcloop_create_local_port(struct device *dev, struct device_attribute *attr,
return ret ? ret : count;
}
-
-static void
-__unlink_local_port(struct fcloop_lport *lport)
-{
- list_del(&lport->lport_list);
-}
-
static int
__wait_localport_unreg(struct fcloop_lport *lport)
{
@@ -1175,8 +1181,6 @@ __wait_localport_unreg(struct fcloop_lport *lport)
if (!ret)
wait_for_completion(&lport->unreg_done);
- kfree(lport);
-
return ret;
}
@@ -1199,8 +1203,9 @@ fcloop_delete_local_port(struct device *dev, struct device_attribute *attr,
list_for_each_entry(tlport, &fcloop_lports, lport_list) {
if (tlport->localport->node_name == nodename &&
tlport->localport->port_name == portname) {
+ if (!fcloop_lport_get(tlport))
+ break;
lport = tlport;
- __unlink_local_port(lport);
break;
}
}
@@ -1210,6 +1215,7 @@ fcloop_delete_local_port(struct device *dev, struct device_attribute *attr,
return -ENOENT;
ret = __wait_localport_unreg(lport);
+ fcloop_lport_put(lport);
return ret ? ret : count;
}
@@ -1249,7 +1255,7 @@ fcloop_alloc_nport(const char *buf, size_t count, bool remoteport)
newnport->port_role = opts->roles;
if (opts->mask & NVMF_OPT_FCADDR)
newnport->port_id = opts->fcaddr;
- kref_init(&newnport->ref);
+ refcount_set(&newnport->ref, 1);
spin_lock_irqsave(&fcloop_lock, flags);
@@ -1637,17 +1643,17 @@ static void __exit fcloop_exit(void)
for (;;) {
lport = list_first_entry_or_null(&fcloop_lports,
typeof(*lport), lport_list);
- if (!lport)
+ if (!lport || !fcloop_lport_get(lport))
break;
- __unlink_local_port(lport);
-
spin_unlock_irqrestore(&fcloop_lock, flags);
ret = __wait_localport_unreg(lport);
if (ret)
pr_warn("%s: Failed deleting local port\n", __func__);
+ fcloop_lport_put(lport);
+
spin_lock_irqsave(&fcloop_lock, flags);
}
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8d610c1..94daca15a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1990,12 +1990,12 @@ static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
device_create_managed_software_node(&pdev->dev, properties, NULL))
pci_warn(pdev, "could not add stall property");
}
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa251, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa255, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa256, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa258, quirk_huawei_pcie_sva);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HUAWEI, 0xa259, quirk_huawei_pcie_sva);
/*
* It's possible for the MSI to get corrupted if SHPC and ACPI are used
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 21fa7ac..4904b83 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -302,11 +302,17 @@ static struct airq_info *new_airq_info(int index)
static unsigned long *get_airq_indicator(struct virtqueue *vqs[], int nvqs,
u64 *first, void **airq_info)
{
- int i, j;
+ int i, j, queue_idx, highest_queue_idx = -1;
struct airq_info *info;
unsigned long *indicator_addr = NULL;
unsigned long bit, flags;
+ /* Array entries without an actual queue pointer must be ignored. */
+ for (i = 0; i < nvqs; i++) {
+ if (vqs[i])
+ highest_queue_idx++;
+ }
+
for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
mutex_lock(&airq_areas_lock);
if (!airq_areas[i])
@@ -316,7 +322,7 @@ static unsigned long *get_airq_indicator(struct virtqueue *vqs[], int nvqs,
if (!info)
return NULL;
write_lock_irqsave(&info->lock, flags);
- bit = airq_iv_alloc(info->aiv, nvqs);
+ bit = airq_iv_alloc(info->aiv, highest_queue_idx + 1);
if (bit == -1UL) {
/* Not enough vacancies. */
write_unlock_irqrestore(&info->lock, flags);
@@ -325,8 +331,10 @@ static unsigned long *get_airq_indicator(struct virtqueue *vqs[], int nvqs,
*first = bit;
*airq_info = info;
indicator_addr = info->aiv->vector;
- for (j = 0; j < nvqs; j++) {
- airq_iv_set_ptr(info->aiv, bit + j,
+ for (j = 0, queue_idx = 0; j < nvqs; j++) {
+ if (!vqs[j])
+ continue;
+ airq_iv_set_ptr(info->aiv, bit + queue_idx++,
(unsigned long)vqs[j]);
}
write_unlock_irqrestore(&info->lock, flags);
diff --git a/drivers/spi/spi-fsl-qspi.c b/drivers/spi/spi-fsl-qspi.c
index 5c59fdd..b5ecffc 100644
--- a/drivers/spi/spi-fsl-qspi.c
+++ b/drivers/spi/spi-fsl-qspi.c
@@ -949,24 +949,20 @@ static int fsl_qspi_probe(struct platform_device *pdev)
ret = devm_add_action_or_reset(dev, fsl_qspi_cleanup, q);
if (ret)
- goto err_destroy_mutex;
+ goto err_put_ctrl;
ret = devm_spi_register_controller(dev, ctlr);
if (ret)
- goto err_destroy_mutex;
+ goto err_put_ctrl;
return 0;
-err_destroy_mutex:
- mutex_destroy(&q->lock);
-
err_disable_clk:
fsl_qspi_clk_disable_unprep(q);
err_put_ctrl:
spi_controller_put(ctlr);
- dev_err(dev, "Freescale QuadSPI probe failed\n");
return ret;
}
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f7d6f479..24f4858 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -278,7 +278,7 @@
config XEN_ACPI_PROCESSOR
tristate "Xen ACPI processor"
- depends on XEN && XEN_PV_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
+ depends on XEN && XEN_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
default m
help
This ACPI processor uploads Power Management information to the Xen
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 65d4e7f..8c85280 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -679,7 +679,7 @@ void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
}
EXPORT_SYMBOL(xen_free_ballooned_pages);
-static void __init balloon_add_regions(void)
+static int __init balloon_add_regions(void)
{
unsigned long start_pfn, pages;
unsigned long pfn, extra_pfn_end;
@@ -702,26 +702,38 @@ static void __init balloon_add_regions(void)
for (pfn = start_pfn; pfn < extra_pfn_end; pfn++)
balloon_append(pfn_to_page(pfn));
- balloon_stats.total_pages += extra_pfn_end - start_pfn;
+ /*
+ * Extra regions are accounted for in the physmap, but need
+ * decreasing from current_pages to balloon down the initial
+ * allocation, because they are already accounted for in
+ * total_pages.
+ */
+ if (extra_pfn_end - start_pfn >= balloon_stats.current_pages) {
+ WARN(1, "Extra pages underflow current target");
+ return -ERANGE;
+ }
+ balloon_stats.current_pages -= extra_pfn_end - start_pfn;
}
+
+ return 0;
}
static int __init balloon_init(void)
{
struct task_struct *task;
+ int rc;
if (!xen_domain())
return -ENODEV;
pr_info("Initialising balloon driver\n");
-#ifdef CONFIG_XEN_PV
- balloon_stats.current_pages = xen_pv_domain()
- ? min(xen_start_info->nr_pages - xen_released_pages, max_pfn)
- : get_num_physpages();
-#else
- balloon_stats.current_pages = get_num_physpages();
-#endif
+ if (xen_released_pages >= get_num_physpages()) {
+ WARN(1, "Released pages underflow current target");
+ return -ERANGE;
+ }
+
+ balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
balloon_stats.target_pages = balloon_stats.current_pages;
balloon_stats.balloon_low = 0;
balloon_stats.balloon_high = 0;
@@ -738,7 +750,9 @@ static int __init balloon_init(void)
register_sysctl_init("xen/balloon", balloon_table);
#endif
- balloon_add_regions();
+ rc = balloon_add_regions();
+ if (rc)
+ return rc;
task = kthread_run(balloon_thread, NULL, "xen-balloon");
if (IS_ERR(task)) {
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index fcb335b..6d18192 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -513,4 +513,5 @@ static int __init boot_wait_for_devices(void)
late_initcall(boot_wait_for_devices);
#endif
+MODULE_DESCRIPTION("Xen PV-device frontend support");
MODULE_LICENSE("GPL");
diff --git a/fs/bcachefs/Kconfig b/fs/bcachefs/Kconfig
index bf1c94e..07709b0 100644
--- a/fs/bcachefs/Kconfig
+++ b/fs/bcachefs/Kconfig
@@ -4,7 +4,7 @@
depends on BLOCK
select EXPORTFS
select CLOSURES
- select LIBCRC32C
+ select CRC32
select CRC64
select FS_POSIX_ACL
select LZ4_COMPRESS
@@ -15,10 +15,9 @@
select ZLIB_INFLATE
select ZSTD_COMPRESS
select ZSTD_DECOMPRESS
- select CRYPTO
select CRYPTO_LIB_SHA256
- select CRYPTO_CHACHA20
- select CRYPTO_POLY1305
+ select CRYPTO_LIB_CHACHA
+ select CRYPTO_LIB_POLY1305
select KEYS
select RAID6_PQ
select XOR_BLOCKS
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 5d9f208..5cb0fc3 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -981,8 +981,8 @@ struct bch_fs {
mempool_t compress_workspace[BCH_COMPRESSION_OPT_NR];
size_t zstd_workspace_size;
- struct crypto_sync_skcipher *chacha20;
- struct crypto_shash *poly1305;
+ struct bch_key chacha20_key;
+ bool chacha20_key_set;
atomic64_t key_version;
diff --git a/fs/bcachefs/btree_journal_iter.c b/fs/bcachefs/btree_journal_iter.c
index d1ad1a7..7d6c971 100644
--- a/fs/bcachefs/btree_journal_iter.c
+++ b/fs/bcachefs/btree_journal_iter.c
@@ -644,8 +644,6 @@ void bch2_btree_and_journal_iter_init_node_iter(struct btree_trans *trans,
*/
static int journal_sort_key_cmp(const void *_l, const void *_r)
{
- cond_resched();
-
const struct journal_key *l = _l;
const struct journal_key *r = _r;
@@ -689,7 +687,8 @@ void bch2_journal_keys_put(struct bch_fs *c)
static void __journal_keys_sort(struct journal_keys *keys)
{
- sort(keys->data, keys->nr, sizeof(keys->data[0]), journal_sort_key_cmp, NULL);
+ sort_nonatomic(keys->data, keys->nr, sizeof(keys->data[0]),
+ journal_sort_key_cmp, NULL);
cond_resched();
diff --git a/fs/bcachefs/btree_node_scan.c b/fs/bcachefs/btree_node_scan.c
index 8c9fdb7..86acf03 100644
--- a/fs/bcachefs/btree_node_scan.c
+++ b/fs/bcachefs/btree_node_scan.c
@@ -183,7 +183,7 @@ static void try_read_btree_node(struct find_btree_nodes *f, struct bch_dev *ca,
return;
if (bch2_csum_type_is_encryption(BSET_CSUM_TYPE(&bn->keys))) {
- if (!c->chacha20)
+ if (!c->chacha20_key_set)
return;
struct nonce nonce = btree_nonce(&bn->keys, 0);
@@ -398,7 +398,7 @@ int bch2_scan_for_btree_nodes(struct bch_fs *c)
bch2_print_string_as_lines(KERN_INFO, buf.buf);
}
- sort(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_cookie, NULL);
+ sort_nonatomic(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_cookie, NULL);
dst = 0;
darray_for_each(f->nodes, i) {
@@ -418,7 +418,7 @@ int bch2_scan_for_btree_nodes(struct bch_fs *c)
}
f->nodes.nr = dst;
- sort(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_pos, NULL);
+ sort_nonatomic(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_pos, NULL);
if (0 && c->opts.verbose) {
printbuf_reset(&buf);
diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buffer.c
index adbe576..0941fb2 100644
--- a/fs/bcachefs/btree_write_buffer.c
+++ b/fs/bcachefs/btree_write_buffer.c
@@ -428,10 +428,10 @@ static int bch2_btree_write_buffer_flush_locked(struct btree_trans *trans)
*/
trace_and_count(c, write_buffer_flush_slowpath, trans, slowpath, wb->flushing.keys.nr);
- sort(wb->flushing.keys.data,
- wb->flushing.keys.nr,
- sizeof(wb->flushing.keys.data[0]),
- wb_key_seq_cmp, NULL);
+ sort_nonatomic(wb->flushing.keys.data,
+ wb->flushing.keys.nr,
+ sizeof(wb->flushing.keys.data[0]),
+ wb_key_seq_cmp, NULL);
darray_for_each(wb->flushing.keys, i) {
if (!i->journal_seq)
diff --git a/fs/bcachefs/checksum.c b/fs/bcachefs/checksum.c
index 3726689..d0a34a0 100644
--- a/fs/bcachefs/checksum.c
+++ b/fs/bcachefs/checksum.c
@@ -7,17 +7,12 @@
#include "super-io.h"
#include <linux/crc32c.h>
-#include <linux/crypto.h>
#include <linux/xxhash.h>
#include <linux/key.h>
#include <linux/random.h>
#include <linux/ratelimit.h>
-#include <linux/scatterlist.h>
-#include <crypto/algapi.h>
#include <crypto/chacha.h>
-#include <crypto/hash.h>
#include <crypto/poly1305.h>
-#include <crypto/skcipher.h>
#include <keys/user-type.h>
/*
@@ -96,116 +91,40 @@ static void bch2_checksum_update(struct bch2_checksum_state *state, const void *
}
}
-static inline int do_encrypt_sg(struct crypto_sync_skcipher *tfm,
- struct nonce nonce,
- struct scatterlist *sg, size_t len)
+static void bch2_chacha20_init(u32 state[CHACHA_STATE_WORDS],
+ const struct bch_key *key, struct nonce nonce)
{
- SYNC_SKCIPHER_REQUEST_ON_STACK(req, tfm);
+ u32 key_words[CHACHA_KEY_SIZE / sizeof(u32)];
- skcipher_request_set_sync_tfm(req, tfm);
- skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, sg, sg, len, nonce.d);
+ BUILD_BUG_ON(sizeof(key_words) != sizeof(*key));
+ memcpy(key_words, key, sizeof(key_words));
+ le32_to_cpu_array(key_words, ARRAY_SIZE(key_words));
- int ret = crypto_skcipher_encrypt(req);
- if (ret)
- pr_err("got error %i from crypto_skcipher_encrypt()", ret);
+ BUILD_BUG_ON(sizeof(nonce) != CHACHA_IV_SIZE);
+ chacha_init(state, key_words, (const u8 *)nonce.d);
- return ret;
+ memzero_explicit(key_words, sizeof(key_words));
}
-static inline int do_encrypt(struct crypto_sync_skcipher *tfm,
- struct nonce nonce,
- void *buf, size_t len)
+static void bch2_chacha20(const struct bch_key *key, struct nonce nonce,
+ void *data, size_t len)
{
- if (!is_vmalloc_addr(buf)) {
- struct scatterlist sg = {};
+ u32 state[CHACHA_STATE_WORDS];
- sg_mark_end(&sg);
- sg_set_page(&sg, virt_to_page(buf), len, offset_in_page(buf));
- return do_encrypt_sg(tfm, nonce, &sg, len);
- } else {
- DARRAY_PREALLOCATED(struct scatterlist, 4) sgl;
- size_t sgl_len = 0;
- int ret;
-
- darray_init(&sgl);
-
- while (len) {
- unsigned offset = offset_in_page(buf);
- struct scatterlist sg = {
- .page_link = (unsigned long) vmalloc_to_page(buf),
- .offset = offset,
- .length = min(len, PAGE_SIZE - offset),
- };
-
- if (darray_push(&sgl, sg)) {
- sg_mark_end(&darray_last(sgl));
- ret = do_encrypt_sg(tfm, nonce, sgl.data, sgl_len);
- if (ret)
- goto err;
-
- nonce = nonce_add(nonce, sgl_len);
- sgl_len = 0;
- sgl.nr = 0;
- BUG_ON(darray_push(&sgl, sg));
- }
-
- buf += sg.length;
- len -= sg.length;
- sgl_len += sg.length;
- }
-
- sg_mark_end(&darray_last(sgl));
- ret = do_encrypt_sg(tfm, nonce, sgl.data, sgl_len);
-err:
- darray_exit(&sgl);
- return ret;
- }
+ bch2_chacha20_init(state, key, nonce);
+ chacha20_crypt(state, data, data, len);
+ memzero_explicit(state, sizeof(state));
}
-int bch2_chacha_encrypt_key(struct bch_key *key, struct nonce nonce,
- void *buf, size_t len)
+static void bch2_poly1305_init(struct poly1305_desc_ctx *desc,
+ struct bch_fs *c, struct nonce nonce)
{
- struct crypto_sync_skcipher *chacha20 =
- crypto_alloc_sync_skcipher("chacha20", 0, 0);
- int ret;
-
- ret = PTR_ERR_OR_ZERO(chacha20);
- if (ret) {
- pr_err("error requesting chacha20 cipher: %s", bch2_err_str(ret));
- return ret;
- }
-
- ret = crypto_skcipher_setkey(&chacha20->base,
- (void *) key, sizeof(*key));
- if (ret) {
- pr_err("error from crypto_skcipher_setkey(): %s", bch2_err_str(ret));
- goto err;
- }
-
- ret = do_encrypt(chacha20, nonce, buf, len);
-err:
- crypto_free_sync_skcipher(chacha20);
- return ret;
-}
-
-static int gen_poly_key(struct bch_fs *c, struct shash_desc *desc,
- struct nonce nonce)
-{
- u8 key[POLY1305_KEY_SIZE];
- int ret;
+ u8 key[POLY1305_KEY_SIZE] = { 0 };
nonce.d[3] ^= BCH_NONCE_POLY;
- memset(key, 0, sizeof(key));
- ret = do_encrypt(c->chacha20, nonce, key, sizeof(key));
- if (ret)
- return ret;
-
- desc->tfm = c->poly1305;
- crypto_shash_init(desc);
- crypto_shash_update(desc, key, sizeof(key));
- return 0;
+ bch2_chacha20(&c->chacha20_key, nonce, key, sizeof(key));
+ poly1305_init(desc, key);
}
struct bch_csum bch2_checksum(struct bch_fs *c, unsigned type,
@@ -230,14 +149,13 @@ struct bch_csum bch2_checksum(struct bch_fs *c, unsigned type,
case BCH_CSUM_chacha20_poly1305_80:
case BCH_CSUM_chacha20_poly1305_128: {
- SHASH_DESC_ON_STACK(desc, c->poly1305);
+ struct poly1305_desc_ctx dctx;
u8 digest[POLY1305_DIGEST_SIZE];
struct bch_csum ret = { 0 };
- gen_poly_key(c, desc, nonce);
-
- crypto_shash_update(desc, data, len);
- crypto_shash_final(desc, digest);
+ bch2_poly1305_init(&dctx, c, nonce);
+ poly1305_update(&dctx, data, len);
+ poly1305_final(&dctx, digest);
memcpy(&ret, digest, bch_crc_bytes[type]);
return ret;
@@ -253,11 +171,12 @@ int bch2_encrypt(struct bch_fs *c, unsigned type,
if (!bch2_csum_type_is_encryption(type))
return 0;
- if (bch2_fs_inconsistent_on(!c->chacha20,
+ if (bch2_fs_inconsistent_on(!c->chacha20_key_set,
c, "attempting to encrypt without encryption key"))
return -BCH_ERR_no_encryption_key;
- return do_encrypt(c->chacha20, nonce, data, len);
+ bch2_chacha20(&c->chacha20_key, nonce, data, len);
+ return 0;
}
static struct bch_csum __bch2_checksum_bio(struct bch_fs *c, unsigned type,
@@ -296,26 +215,26 @@ static struct bch_csum __bch2_checksum_bio(struct bch_fs *c, unsigned type,
case BCH_CSUM_chacha20_poly1305_80:
case BCH_CSUM_chacha20_poly1305_128: {
- SHASH_DESC_ON_STACK(desc, c->poly1305);
+ struct poly1305_desc_ctx dctx;
u8 digest[POLY1305_DIGEST_SIZE];
struct bch_csum ret = { 0 };
- gen_poly_key(c, desc, nonce);
+ bch2_poly1305_init(&dctx, c, nonce);
#ifdef CONFIG_HIGHMEM
__bio_for_each_segment(bv, bio, *iter, *iter) {
void *p = kmap_local_page(bv.bv_page) + bv.bv_offset;
- crypto_shash_update(desc, p, bv.bv_len);
+ poly1305_update(&dctx, p, bv.bv_len);
kunmap_local(p);
}
#else
__bio_for_each_bvec(bv, bio, *iter, *iter)
- crypto_shash_update(desc,
+ poly1305_update(&dctx,
page_address(bv.bv_page) + bv.bv_offset,
bv.bv_len);
#endif
- crypto_shash_final(desc, digest);
+ poly1305_final(&dctx, digest);
memcpy(&ret, digest, bch_crc_bytes[type]);
return ret;
@@ -338,43 +257,33 @@ int __bch2_encrypt_bio(struct bch_fs *c, unsigned type,
{
struct bio_vec bv;
struct bvec_iter iter;
- DARRAY_PREALLOCATED(struct scatterlist, 4) sgl;
- size_t sgl_len = 0;
+ u32 chacha_state[CHACHA_STATE_WORDS];
int ret = 0;
- if (bch2_fs_inconsistent_on(!c->chacha20,
+ if (bch2_fs_inconsistent_on(!c->chacha20_key_set,
c, "attempting to encrypt without encryption key"))
return -BCH_ERR_no_encryption_key;
- darray_init(&sgl);
+ bch2_chacha20_init(chacha_state, &c->chacha20_key, nonce);
bio_for_each_segment(bv, bio, iter) {
- struct scatterlist sg = {
- .page_link = (unsigned long) bv.bv_page,
- .offset = bv.bv_offset,
- .length = bv.bv_len,
- };
+ void *p;
- if (darray_push(&sgl, sg)) {
- sg_mark_end(&darray_last(sgl));
- ret = do_encrypt_sg(c->chacha20, nonce, sgl.data, sgl_len);
- if (ret)
- goto err;
-
- nonce = nonce_add(nonce, sgl_len);
- sgl_len = 0;
- sgl.nr = 0;
-
- BUG_ON(darray_push(&sgl, sg));
+ /*
+ * chacha_crypt() assumes that the length is a multiple of
+ * CHACHA_BLOCK_SIZE on any non-final call.
+ */
+ if (!IS_ALIGNED(bv.bv_len, CHACHA_BLOCK_SIZE)) {
+ bch_err_ratelimited(c, "bio not aligned for encryption");
+ ret = -EIO;
+ break;
}
- sgl_len += sg.length;
+ p = bvec_kmap_local(&bv);
+ chacha20_crypt(chacha_state, p, p, bv.bv_len);
+ kunmap_local(p);
}
-
- sg_mark_end(&darray_last(sgl));
- ret = do_encrypt_sg(c->chacha20, nonce, sgl.data, sgl_len);
-err:
- darray_exit(&sgl);
+ memzero_explicit(chacha_state, sizeof(chacha_state));
return ret;
}
@@ -650,10 +559,7 @@ int bch2_decrypt_sb_key(struct bch_fs *c,
}
/* decrypt real key: */
- ret = bch2_chacha_encrypt_key(&user_key, bch2_sb_key_nonce(c),
- &sb_key, sizeof(sb_key));
- if (ret)
- goto err;
+ bch2_chacha20(&user_key, bch2_sb_key_nonce(c), &sb_key, sizeof(sb_key));
if (bch2_key_is_encrypted(&sb_key)) {
bch_err(c, "incorrect encryption key");
@@ -668,31 +574,6 @@ int bch2_decrypt_sb_key(struct bch_fs *c,
return ret;
}
-static int bch2_alloc_ciphers(struct bch_fs *c)
-{
- if (c->chacha20)
- return 0;
-
- struct crypto_sync_skcipher *chacha20 = crypto_alloc_sync_skcipher("chacha20", 0, 0);
- int ret = PTR_ERR_OR_ZERO(chacha20);
- if (ret) {
- bch_err(c, "error requesting chacha20 module: %s", bch2_err_str(ret));
- return ret;
- }
-
- struct crypto_shash *poly1305 = crypto_alloc_shash("poly1305", 0, 0);
- ret = PTR_ERR_OR_ZERO(poly1305);
- if (ret) {
- bch_err(c, "error requesting poly1305 module: %s", bch2_err_str(ret));
- crypto_free_sync_skcipher(chacha20);
- return ret;
- }
-
- c->chacha20 = chacha20;
- c->poly1305 = poly1305;
- return 0;
-}
-
#if 0
/*
@@ -797,35 +678,21 @@ int bch2_enable_encryption(struct bch_fs *c, bool keyed)
void bch2_fs_encryption_exit(struct bch_fs *c)
{
- if (c->poly1305)
- crypto_free_shash(c->poly1305);
- if (c->chacha20)
- crypto_free_sync_skcipher(c->chacha20);
+ memzero_explicit(&c->chacha20_key, sizeof(c->chacha20_key));
}
int bch2_fs_encryption_init(struct bch_fs *c)
{
struct bch_sb_field_crypt *crypt;
- struct bch_key key;
- int ret = 0;
+ int ret;
crypt = bch2_sb_field_get(c->disk_sb.sb, crypt);
if (!crypt)
- goto out;
+ return 0;
- ret = bch2_alloc_ciphers(c);
+ ret = bch2_decrypt_sb_key(c, crypt, &c->chacha20_key);
if (ret)
- goto out;
-
- ret = bch2_decrypt_sb_key(c, crypt, &key);
- if (ret)
- goto out;
-
- ret = crypto_skcipher_setkey(&c->chacha20->base,
- (void *) &key.key, sizeof(key.key));
- if (ret)
- goto out;
-out:
- memzero_explicit(&key, sizeof(key));
- return ret;
+ return ret;
+ c->chacha20_key_set = true;
+ return 0;
}
diff --git a/fs/bcachefs/checksum.h b/fs/bcachefs/checksum.h
index 4ac251c..1310782 100644
--- a/fs/bcachefs/checksum.h
+++ b/fs/bcachefs/checksum.h
@@ -69,7 +69,6 @@ static inline void bch2_csum_err_msg(struct printbuf *out,
bch2_csum_to_text(out, type, expected);
}
-int bch2_chacha_encrypt_key(struct bch_key *, struct nonce, void *, size_t);
int bch2_request_key(struct bch_sb *, struct bch_key *);
#ifndef __KERNEL__
int bch2_revoke_key(struct bch_sb *);
@@ -156,7 +155,7 @@ static inline bool bch2_checksum_type_valid(const struct bch_fs *c,
if (type >= BCH_CSUM_NR)
return false;
- if (bch2_csum_type_is_encryption(type) && !c->chacha20)
+ if (bch2_csum_type_is_encryption(type) && !c->chacha20_key_set)
return false;
return true;
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index de02ebf..b211c97 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -607,7 +607,7 @@ void bch2_data_update_inflight_to_text(struct printbuf *out, struct data_update
prt_newline(out);
printbuf_indent_add(out, 2);
bch2_data_update_opts_to_text(out, m->op.c, &m->op.opts, &m->data_opts);
- prt_printf(out, "read_done:\t\%u\n", m->read_done);
+ prt_printf(out, "read_done:\t%u\n", m->read_done);
bch2_write_op_to_text(out, &m->op);
printbuf_indent_sub(out, 2);
}
diff --git a/fs/bcachefs/dirent.c b/fs/bcachefs/dirent.c
index bf53a02..8488a75 100644
--- a/fs/bcachefs/dirent.c
+++ b/fs/bcachefs/dirent.c
@@ -287,8 +287,8 @@ static void dirent_init_casefolded_name(struct bkey_i_dirent *dirent,
EBUG_ON(!dirent->v.d_casefold);
EBUG_ON(!cf_name->len);
- dirent->v.d_cf_name_block.d_name_len = name->len;
- dirent->v.d_cf_name_block.d_cf_name_len = cf_name->len;
+ dirent->v.d_cf_name_block.d_name_len = cpu_to_le16(name->len);
+ dirent->v.d_cf_name_block.d_cf_name_len = cpu_to_le16(cf_name->len);
memcpy(&dirent->v.d_cf_name_block.d_names[0], name->name, name->len);
memcpy(&dirent->v.d_cf_name_block.d_names[name->len], cf_name->name, cf_name->len);
memset(&dirent->v.d_cf_name_block.d_names[name->len + cf_name->len], 0,
diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 19d4599..e3a75dc 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -225,11 +225,26 @@ static void bchfs_read(struct btree_trans *trans,
bch2_read_extent(trans, rbio, iter.pos,
data_btree, k, offset_into_extent, flags);
- swap(rbio->bio.bi_iter.bi_size, bytes);
+ /*
+ * Careful there's a landmine here if bch2_read_extent() ever
+ * starts returning transaction restarts here.
+ *
+ * We've changed rbio->bi_iter.bi_size to be "bytes we can read
+ * from this extent" with the swap call, and we restore it
+ * below. That restore needs to come before checking for
+ * errors.
+ *
+ * But unlike __bch2_read(), we use the rbio bvec iter, not one
+ * on the stack, so we can't do the restore right after the
+ * bch2_read_extent() call: we don't own that iterator anymore
+ * if BCH_READ_last_fragment is set, since we may have submitted
+ * that rbio instead of cloning it.
+ */
if (flags & BCH_READ_last_fragment)
break;
+ swap(rbio->bio.bi_iter.bi_size, bytes);
bio_advance(&rbio->bio, bytes);
err:
if (ret &&
diff --git a/fs/bcachefs/io_read.c b/fs/bcachefs/io_read.c
index 417bb0c..fd627c8 100644
--- a/fs/bcachefs/io_read.c
+++ b/fs/bcachefs/io_read.c
@@ -977,7 +977,8 @@ int __bch2_read_extent(struct btree_trans *trans, struct bch_read_bio *orig,
goto err;
}
- if (unlikely(bch2_csum_type_is_encryption(pick.crc.csum_type)) && !c->chacha20) {
+ if (unlikely(bch2_csum_type_is_encryption(pick.crc.csum_type)) &&
+ !c->chacha20_key_set) {
struct printbuf buf = PRINTBUF;
bch2_read_err_msg_trans(trans, &buf, orig, read_pos);
prt_printf(&buf, "attempting to read encrypted data without encryption key\n ");
diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c
index 1b7961f..2a54ac7 100644
--- a/fs/bcachefs/journal_io.c
+++ b/fs/bcachefs/journal_io.c
@@ -1460,7 +1460,7 @@ int bch2_journal_read(struct bch_fs *c,
static void journal_advance_devs_to_next_bucket(struct journal *j,
struct dev_alloc_list *devs,
- unsigned sectors, u64 seq)
+ unsigned sectors, __le64 seq)
{
struct bch_fs *c = container_of(j, struct bch_fs, journal);
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 79fd18a..d2b07f6 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -389,9 +389,9 @@ int bch2_journal_replay(struct bch_fs *c)
* Now, replay any remaining keys in the order in which they appear in
* the journal, unpinning those journal entries as we go:
*/
- sort(keys_sorted.data, keys_sorted.nr,
- sizeof(keys_sorted.data[0]),
- journal_sort_seq_cmp, NULL);
+ sort_nonatomic(keys_sorted.data, keys_sorted.nr,
+ sizeof(keys_sorted.data[0]),
+ journal_sort_seq_cmp, NULL);
darray_for_each(keys_sorted, kp) {
cond_resched();
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index a58edde43..b79e80a 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -70,14 +70,10 @@
#include <linux/percpu.h>
#include <linux/random.h>
#include <linux/sysfs.h>
-#include <crypto/hash.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kent Overstreet <kent.overstreet@gmail.com>");
MODULE_DESCRIPTION("bcachefs filesystem");
-MODULE_SOFTDEP("pre: chacha20");
-MODULE_SOFTDEP("pre: poly1305");
-MODULE_SOFTDEP("pre: xxhash");
const char * const bch2_fs_flag_strs[] = {
#define x(n) #n,
@@ -1002,12 +998,6 @@ static void print_mount_opts(struct bch_fs *c)
prt_str(&p, "starting version ");
bch2_version_to_text(&p, c->sb.version);
- if (c->opts.read_only) {
- prt_str(&p, " opts=");
- first = false;
- prt_printf(&p, "ro");
- }
-
for (i = 0; i < bch2_opts_nr; i++) {
const struct bch_option *opt = &bch2_opt_table[i];
u64 v = bch2_opt_get_by_id(&c->opts, i);
diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index fa85155..73a2dfb 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -3,9 +3,9 @@
config BTRFS_FS
tristate "Btrfs filesystem support"
select BLK_CGROUP_PUNT_BIO
+ select CRC32
select CRYPTO
select CRYPTO_CRC32C
- select LIBCRC32C
select CRYPTO_XXHASH
select CRYPTO_SHA256
select CRYPTO_BLAKE2B
diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 7249d70..3e7def3 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -3,7 +3,7 @@
tristate "Ceph distributed file system"
depends on INET
select CEPH_LIB
- select LIBCRC32C
+ select CRC32
select CRYPTO_AES
select CRYPTO
select NETFS_SUPPORT
diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index 331e49c..8f68ec4 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -3,8 +3,8 @@
config EROFS_FS
tristate "EROFS filesystem support"
depends on BLOCK
+ select CRC32
select FS_IOMAP
- select LIBCRC32C
help
EROFS (Enhanced Read-Only File System) is a lightweight read-only
file system with modern designs (e.g. no buffer heads, inline
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index be7f87a..7bd231d1 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -4,7 +4,6 @@
select BUFFER_HEAD
select FS_POSIX_ACL
select CRC32
- select LIBCRC32C
select QUOTACTL
select FS_IOMAP
help
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index fffd6ff..ae0ca68 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -3,7 +3,7 @@
tristate "XFS filesystem support"
depends on BLOCK
select EXPORTFS
- select LIBCRC32C
+ select CRC32
select FS_IOMAP
help
XFS is a high performance journaling filesystem which originated
diff --git a/include/drm/drm_kunit_helpers.h b/include/drm/drm_kunit_helpers.h
index 11d59ce..1c62d1d 100644
--- a/include/drm/drm_kunit_helpers.h
+++ b/include/drm/drm_kunit_helpers.h
@@ -118,6 +118,9 @@ drm_kunit_helper_create_crtc(struct kunit *test,
const struct drm_crtc_funcs *funcs,
const struct drm_crtc_helper_funcs *helper_funcs);
+int drm_kunit_add_mode_destroy_action(struct kunit *test,
+ struct drm_display_mode *mode);
+
struct drm_display_mode *
drm_kunit_display_mode_from_cea_vic(struct kunit *test, struct drm_device *dev,
u8 video_code);
diff --git a/include/drm/intel/pciids.h b/include/drm/intel/pciids.h
index 4736ea5..d212848 100644
--- a/include/drm/intel/pciids.h
+++ b/include/drm/intel/pciids.h
@@ -850,6 +850,7 @@
MACRO__(0xE20C, ## __VA_ARGS__), \
MACRO__(0xE20D, ## __VA_ARGS__), \
MACRO__(0xE210, ## __VA_ARGS__), \
+ MACRO__(0xE211, ## __VA_ARGS__), \
MACRO__(0xE212, ## __VA_ARGS__), \
MACRO__(0xE215, ## __VA_ARGS__), \
MACRO__(0xE216, ## __VA_ARGS__)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 0ffb97c..39c768f 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -67,7 +67,7 @@ enum kunit_status {
/*
* Speed Attribute is stored as an enum and separated into categories of
- * speed: very_slowm, slow, and normal. These speeds are relative to
+ * speed: very_slow, slow, and normal. These speeds are relative to
* other KUnit tests.
*
* Note: unset speed attribute acts as default of KUNIT_SPEED_NORMAL.
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 485b651..5bc8f55 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -710,6 +710,7 @@ struct cgroup_subsys {
void (*css_released)(struct cgroup_subsys_state *css);
void (*css_free)(struct cgroup_subsys_state *css);
void (*css_reset)(struct cgroup_subsys_state *css);
+ void (*css_killed)(struct cgroup_subsys_state *css);
void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu);
int (*css_extra_stat_show)(struct seq_file *seq,
struct cgroup_subsys_state *css);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 28e999f..e7da3c3 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -344,7 +344,7 @@ static inline u64 cgroup_id(const struct cgroup *cgrp)
*/
static inline bool css_is_dying(struct cgroup_subsys_state *css)
{
- return !(css->flags & CSS_NO_REF) && percpu_ref_is_dying(&css->refcnt);
+ return css->flags & CSS_DYING;
}
static inline void cgroup_get(struct cgroup *cgrp)
diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h
index 45b651c..8adc8e9 100644
--- a/include/linux/gpio/consumer.h
+++ b/include/linux/gpio/consumer.h
@@ -31,6 +31,7 @@ struct gpio_descs {
#define GPIOD_FLAGS_BIT_DIR_OUT BIT(1)
#define GPIOD_FLAGS_BIT_DIR_VAL BIT(2)
#define GPIOD_FLAGS_BIT_OPEN_DRAIN BIT(3)
+/* GPIOD_FLAGS_BIT_NONEXCLUSIVE is DEPRECATED, don't use in new code. */
#define GPIOD_FLAGS_BIT_NONEXCLUSIVE BIT(4)
/**
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 1adcba3..1ef867b 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -345,7 +345,7 @@ static inline void hrtimer_update_function(struct hrtimer *timer,
if (WARN_ON_ONCE(!function))
return;
#endif
- timer->function = function;
+ ACCESS_PRIVATE(timer, function) = function;
}
/* Forward a hrtimer so it expires after now: */
diff --git a/include/linux/irqchip/irq-davinci-aintc.h b/include/linux/irqchip/irq-davinci-aintc.h
deleted file mode 100644
index ea4e087..0000000
--- a/include/linux/irqchip/irq-davinci-aintc.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * Copyright (C) 2019 Texas Instruments
- */
-
-#ifndef _LINUX_IRQ_DAVINCI_AINTC_
-#define _LINUX_IRQ_DAVINCI_AINTC_
-
-#include <linux/ioport.h>
-
-/**
- * struct davinci_aintc_config - configuration data for davinci-aintc driver.
- *
- * @reg: register range to map
- * @num_irqs: number of HW interrupts supported by the controller
- * @prios: an array of size num_irqs containing priority settings for
- * each interrupt
- */
-struct davinci_aintc_config {
- struct resource reg;
- unsigned int num_irqs;
- u8 *prios;
-};
-
-void davinci_aintc_init(const struct davinci_aintc_config *config);
-
-#endif /* _LINUX_IRQ_DAVINCI_AINTC_ */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5438a1b..291d49b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2382,7 +2382,7 @@ static inline bool kvm_is_visible_memslot(struct kvm_memory_slot *memslot)
struct kvm_vcpu *kvm_get_running_vcpu(void);
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
bool kvm_arch_has_irq_bypass(void);
int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,
struct irq_bypass_producer *);
diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h
index 1e74895..311f145 100644
--- a/include/linux/mtd/spinand.h
+++ b/include/linux/mtd/spinand.h
@@ -67,7 +67,7 @@
SPI_MEM_OP_ADDR(2, addr, 1), \
SPI_MEM_OP_DUMMY(ndummy, 1), \
SPI_MEM_OP_DATA_IN(len, buf, 1), \
- __VA_OPT__(SPI_MEM_OP_MAX_FREQ(__VA_ARGS__)))
+ SPI_MEM_OP_MAX_FREQ(__VA_ARGS__ + 0))
#define SPINAND_PAGE_READ_FROM_CACHE_FAST_OP(addr, ndummy, buf, len) \
SPI_MEM_OP(SPI_MEM_OP_CMD(0x0b, 1), \
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf3b644..2d11d01 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4429,6 +4429,7 @@ void linkwatch_fire_event(struct net_device *dev);
* pending work list (if queued).
*/
void linkwatch_sync_dev(struct net_device *dev);
+void __linkwatch_sync_dev(struct net_device *dev);
/**
* netif_carrier_ok - test if carrier present
@@ -4974,6 +4975,7 @@ void dev_set_rx_mode(struct net_device *dev);
int dev_set_promiscuity(struct net_device *dev, int inc);
int netif_set_allmulti(struct net_device *dev, int inc, bool notify);
int dev_set_allmulti(struct net_device *dev, int inc);
+void netif_state_change(struct net_device *dev);
void netdev_state_change(struct net_device *dev);
void __netdev_notify_peers(struct net_device *dev);
void netdev_notify_peers(struct net_device *dev);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5a9bf15..0069ba6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -823,7 +823,6 @@ struct perf_event {
struct irq_work pending_disable_irq;
struct callback_head pending_task;
unsigned int pending_work;
- struct rcuwait pending_work_wait;
atomic_t event_limit;
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index ccaaf4c..ea39dd2 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -240,6 +240,6 @@ rtnl_notify_needed(const struct net *net, u16 nlflags, u32 group)
return (nlflags & NLM_F_ECHO) || rtnl_has_listeners(net, group);
}
-void netdev_set_operstate(struct net_device *dev, int newstate);
+void netif_set_operstate(struct net_device *dev, int newstate);
#endif /* __LINUX_RTNETLINK_H */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 31248cf..dcd288f 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -775,6 +775,7 @@ struct sctp_transport {
/* Reference counting. */
refcount_t refcnt;
+ __u32 dead:1,
/* RTO-Pending : A flag used to track if one of the DATA
* chunks sent to this address is currently being
* used to compute a RTT. If this flag is 0,
@@ -784,7 +785,7 @@ struct sctp_transport {
* calculation completes (i.e. the DATA chunk
* is SACK'd) clear this flag.
*/
- __u32 rto_pending:1,
+ rto_pending:1,
/*
* hb_sent : a flag that signals that we have a pending
diff --git a/include/net/sock.h b/include/net/sock.h
index 8daf1b3..694f954 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -339,6 +339,8 @@ struct sk_filter;
* @sk_txtime_unused: unused txtime flags
* @ns_tracker: tracker for netns reference
* @sk_user_frags: xarray of pages the user is holding a reference on.
+ * @sk_owner: reference to the real owner of the socket that calls
+ * sock_lock_init_class_and_name().
*/
struct sock {
/*
@@ -547,6 +549,10 @@ struct sock {
struct rcu_head sk_rcu;
netns_tracker ns_tracker;
struct xarray sk_user_frags;
+
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
+ struct module *sk_owner;
+#endif
};
struct sock_bh_locked {
@@ -1583,6 +1589,35 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
sk_mem_reclaim(sk);
}
+#if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
+static inline void sk_owner_set(struct sock *sk, struct module *owner)
+{
+ __module_get(owner);
+ sk->sk_owner = owner;
+}
+
+static inline void sk_owner_clear(struct sock *sk)
+{
+ sk->sk_owner = NULL;
+}
+
+static inline void sk_owner_put(struct sock *sk)
+{
+ module_put(sk->sk_owner);
+}
+#else
+static inline void sk_owner_set(struct sock *sk, struct module *owner)
+{
+}
+
+static inline void sk_owner_clear(struct sock *sk)
+{
+}
+
+static inline void sk_owner_put(struct sock *sk)
+{
+}
+#endif
/*
* Macro so as to not evaluate some arguments when
* lockdep is not enabled.
@@ -1592,13 +1627,14 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
*/
#define sock_lock_init_class_and_name(sk, sname, skey, name, key) \
do { \
+ sk_owner_set(sk, THIS_MODULE); \
sk->sk_lock.owned = 0; \
init_waitqueue_head(&sk->sk_lock.wq); \
spin_lock_init(&(sk)->sk_lock.slock); \
debug_check_no_locks_freed((void *)&(sk)->sk_lock, \
- sizeof((sk)->sk_lock)); \
+ sizeof((sk)->sk_lock)); \
lockdep_set_class_and_name(&(sk)->sk_lock.slock, \
- (skey), (sname)); \
+ (skey), (sname)); \
lockdep_init_map(&(sk)->sk_lock.dep_map, (name), (key), 0); \
} while (0)
diff --git a/include/vdso/unaligned.h b/include/vdso/unaligned.h
index eee3d2a..ff0c06b 100644
--- a/include/vdso/unaligned.h
+++ b/include/vdso/unaligned.h
@@ -2,14 +2,14 @@
#ifndef __VDSO_UNALIGNED_H
#define __VDSO_UNALIGNED_H
-#define __get_unaligned_t(type, ptr) ({ \
- const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
- __pptr->x; \
+#define __get_unaligned_t(type, ptr) ({ \
+ const struct { type x; } __packed * __get_pptr = (typeof(__get_pptr))(ptr); \
+ __get_pptr->x; \
})
-#define __put_unaligned_t(type, val, ptr) do { \
- struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
- __pptr->x = (val); \
+#define __put_unaligned_t(type, val, ptr) do { \
+ struct { type x; } __packed * __put_pptr = (typeof(__put_pptr))(ptr); \
+ __put_pptr->x = (val); \
} while (0)
#endif /* __VDSO_UNALIGNED_H */
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 0981092..953d5e7 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -504,6 +504,8 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
p->nbufs = tmp;
p->addr = READ_ONCE(sqe->addr);
p->len = READ_ONCE(sqe->len);
+ if (!p->len)
+ return -EINVAL;
if (check_mul_overflow((unsigned long)p->len, (unsigned long)p->nbufs,
&size))
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 5e64a8b..b36c882 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -175,6 +175,18 @@ void io_rsrc_cache_free(struct io_ring_ctx *ctx)
io_alloc_cache_free(&ctx->imu_cache, kfree);
}
+static void io_clear_table_tags(struct io_rsrc_data *data)
+{
+ int i;
+
+ for (i = 0; i < data->nr; i++) {
+ struct io_rsrc_node *node = data->nodes[i];
+
+ if (node)
+ node->tag = 0;
+ }
+}
+
__cold void io_rsrc_data_free(struct io_ring_ctx *ctx,
struct io_rsrc_data *data)
{
@@ -583,6 +595,7 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
io_file_table_set_alloc_range(ctx, 0, ctx->file_table.data.nr);
return 0;
fail:
+ io_clear_table_tags(&ctx->file_table.data);
io_sqe_files_unregister(ctx);
return ret;
}
@@ -902,8 +915,10 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
}
ctx->buf_table = data;
- if (ret)
+ if (ret) {
+ io_clear_table_tags(&ctx->buf_table);
io_sqe_buffers_unregister(ctx);
+ }
return ret;
}
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 80d4a6f..0f46e04 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -181,7 +181,7 @@ static void io_zcrx_free_area(struct io_zcrx_area *area)
kvfree(area->nia.niovs);
kvfree(area->user_refs);
if (area->pages) {
- unpin_user_pages(area->pages, area->nia.num_niovs);
+ unpin_user_pages(area->pages, area->nr_folios);
kvfree(area->pages);
}
kfree(area);
@@ -192,7 +192,7 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
struct io_uring_zcrx_area_reg *area_reg)
{
struct io_zcrx_area *area;
- int i, ret, nr_pages;
+ int i, ret, nr_pages, nr_iovs;
struct iovec iov;
if (area_reg->flags || area_reg->rq_area_token)
@@ -220,27 +220,28 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
area->pages = NULL;
goto err;
}
- area->nia.num_niovs = nr_pages;
+ area->nr_folios = nr_iovs = nr_pages;
+ area->nia.num_niovs = nr_iovs;
- area->nia.niovs = kvmalloc_array(nr_pages, sizeof(area->nia.niovs[0]),
+ area->nia.niovs = kvmalloc_array(nr_iovs, sizeof(area->nia.niovs[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->nia.niovs)
goto err;
- area->freelist = kvmalloc_array(nr_pages, sizeof(area->freelist[0]),
+ area->freelist = kvmalloc_array(nr_iovs, sizeof(area->freelist[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->freelist)
goto err;
- for (i = 0; i < nr_pages; i++)
+ for (i = 0; i < nr_iovs; i++)
area->freelist[i] = i;
- area->user_refs = kvmalloc_array(nr_pages, sizeof(area->user_refs[0]),
+ area->user_refs = kvmalloc_array(nr_iovs, sizeof(area->user_refs[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->user_refs)
goto err;
- for (i = 0; i < nr_pages; i++) {
+ for (i = 0; i < nr_iovs; i++) {
struct net_iov *niov = &area->nia.niovs[i];
niov->owner = &area->nia;
@@ -248,7 +249,7 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
atomic_set(&area->user_refs[i], 0);
}
- area->free_count = nr_pages;
+ area->free_count = nr_iovs;
area->ifq = ifq;
/* we're only supporting one area per ifq for now */
area->area_id = 0;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 706cc73..47f1c0e 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -15,6 +15,7 @@ struct io_zcrx_area {
bool is_mapped;
u16 area_id;
struct page **pages;
+ unsigned long nr_folios;
/* freelist */
spinlock_t freelist_lock ____cacheline_aligned_in_smp;
@@ -26,11 +27,11 @@ struct io_zcrx_ifq {
struct io_ring_ctx *ctx;
struct io_zcrx_area *area;
+ spinlock_t rq_lock ____cacheline_aligned_in_smp;
struct io_uring *rq_ring;
struct io_uring_zcrx_rqe *rqes;
- u32 rq_entries;
u32 cached_rq_head;
- spinlock_t rq_lock;
+ u32 rq_entries;
u32 if_rxq;
struct device *dev;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 27f08aa..3caf2cd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5923,6 +5923,12 @@ static void kill_css(struct cgroup_subsys_state *css)
if (css->flags & CSS_DYING)
return;
+ /*
+ * Call css_killed(), if defined, before setting the CSS_DYING flag
+ */
+ if (css->ss->css_killed)
+ css->ss->css_killed(css);
+
css->flags |= CSS_DYING;
/*
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 976a8bc..383963e 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -33,6 +33,7 @@ enum prs_errcode {
PERR_CPUSEMPTY,
PERR_HKEEPING,
PERR_ACCESS,
+ PERR_REMOTE,
};
/* bits in struct cpuset flags field */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 39c1fc6..306b604 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -61,10 +61,17 @@ static const char * const perr_strings[] = {
[PERR_CPUSEMPTY] = "cpuset.cpus and cpuset.cpus.exclusive are empty",
[PERR_HKEEPING] = "partition config conflicts with housekeeping setup",
[PERR_ACCESS] = "Enable partition not permitted",
+ [PERR_REMOTE] = "Have remote partition underneath",
};
/*
- * Exclusive CPUs distributed out to sub-partitions of top_cpuset
+ * For local partitions, update to subpartitions_cpus & isolated_cpus is done
+ * in update_parent_effective_cpumask(). For remote partitions, it is done in
+ * the remote_partition_*() and remote_cpus_update() helpers.
+ */
+/*
+ * Exclusive CPUs distributed out to local or remote sub-partitions of
+ * top_cpuset
*/
static cpumask_var_t subpartitions_cpus;
@@ -86,7 +93,6 @@ static struct list_head remote_children;
* A flag to force sched domain rebuild at the end of an operation.
* It can be set in
* - update_partition_sd_lb()
- * - remote_partition_check()
* - update_cpumasks_hier()
* - cpuset_update_flag()
* - cpuset_hotplug_update_tasks()
@@ -1089,9 +1095,14 @@ void cpuset_reset_sched_domains(void)
*
* Iterate through each task of @cs updating its cpus_allowed to the
* effective cpuset's. As this function is called with cpuset_mutex held,
- * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
- * is used instead of effective_cpus to make sure all offline CPUs are also
- * included as hotplug code won't update cpumasks for tasks in top_cpuset.
+ * cpuset membership stays stable.
+ *
+ * For top_cpuset, task_cpu_possible_mask() is used instead of effective_cpus
+ * to make sure all offline CPUs are also included as hotplug code won't
+ * update cpumasks for tasks in top_cpuset.
+ *
+ * As task_cpu_possible_mask() can be task dependent in arm64, we have to
+ * do cpu masking per task instead of doing it once for all.
*/
void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
{
@@ -1151,7 +1162,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
*
* Return: 0 if successful, an error code otherwise
*/
-static int update_partition_exclusive(struct cpuset *cs, int new_prs)
+static int update_partition_exclusive_flag(struct cpuset *cs, int new_prs)
{
bool exclusive = (new_prs > PRS_MEMBER);
@@ -1234,12 +1245,12 @@ static void reset_partition_data(struct cpuset *cs)
}
/*
- * partition_xcpus_newstate - Exclusive CPUs state change
+ * isolated_cpus_update - Update the isolated_cpus mask
* @old_prs: old partition_root_state
* @new_prs: new partition_root_state
* @xcpus: exclusive CPUs with state change
*/
-static void partition_xcpus_newstate(int old_prs, int new_prs, struct cpumask *xcpus)
+static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus)
{
WARN_ON_ONCE(old_prs == new_prs);
if (new_prs == PRS_ISOLATED)
@@ -1273,8 +1284,8 @@ static bool partition_xcpus_add(int new_prs, struct cpuset *parent,
isolcpus_updated = (new_prs != parent->partition_root_state);
if (isolcpus_updated)
- partition_xcpus_newstate(parent->partition_root_state, new_prs,
- xcpus);
+ isolated_cpus_update(parent->partition_root_state, new_prs,
+ xcpus);
cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus);
return isolcpus_updated;
@@ -1304,8 +1315,8 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
isolcpus_updated = (old_prs != parent->partition_root_state);
if (isolcpus_updated)
- partition_xcpus_newstate(old_prs, parent->partition_root_state,
- xcpus);
+ isolated_cpus_update(old_prs, parent->partition_root_state,
+ xcpus);
cpumask_and(xcpus, xcpus, cpu_active_mask);
cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus);
@@ -1340,20 +1351,57 @@ EXPORT_SYMBOL_GPL(cpuset_cpu_is_isolated);
* compute_effective_exclusive_cpumask - compute effective exclusive CPUs
* @cs: cpuset
* @xcpus: effective exclusive CPUs value to be set
- * Return: true if xcpus is not empty, false otherwise.
+ * @real_cs: the real cpuset (can be NULL)
+ * Return: 0 if there is no sibling conflict, > 0 otherwise
*
- * Starting with exclusive_cpus (cpus_allowed if exclusive_cpus is not set),
- * it must be a subset of parent's effective_xcpus.
+ * If exclusive_cpus isn't explicitly set or a real_cs is provided, we have to
+ * scan the sibling cpusets and exclude their exclusive_cpus or effective_xcpus
+ * as well. The provision of real_cs means that a cpumask is being changed and
+ * the given cs is a trial one.
*/
-static bool compute_effective_exclusive_cpumask(struct cpuset *cs,
- struct cpumask *xcpus)
+static int compute_effective_exclusive_cpumask(struct cpuset *cs,
+ struct cpumask *xcpus,
+ struct cpuset *real_cs)
{
+ struct cgroup_subsys_state *css;
struct cpuset *parent = parent_cs(cs);
+ struct cpuset *sibling;
+ int retval = 0;
if (!xcpus)
xcpus = cs->effective_xcpus;
- return cpumask_and(xcpus, user_xcpus(cs), parent->effective_xcpus);
+ cpumask_and(xcpus, user_xcpus(cs), parent->effective_xcpus);
+
+ if (!real_cs) {
+ if (!cpumask_empty(cs->exclusive_cpus))
+ return 0;
+ } else {
+ cs = real_cs;
+ }
+
+ /*
+ * Exclude exclusive CPUs from siblings
+ */
+ rcu_read_lock();
+ cpuset_for_each_child(sibling, css, parent) {
+ if (sibling == cs)
+ continue;
+
+ if (!cpumask_empty(sibling->exclusive_cpus) &&
+ cpumask_intersects(xcpus, sibling->exclusive_cpus)) {
+ cpumask_andnot(xcpus, xcpus, sibling->exclusive_cpus);
+ retval++;
+ continue;
+ }
+ if (!cpumask_empty(sibling->effective_xcpus) &&
+ cpumask_intersects(xcpus, sibling->effective_xcpus)) {
+ cpumask_andnot(xcpus, xcpus, sibling->effective_xcpus);
+ retval++;
+ }
+ }
+ rcu_read_unlock();
+ return retval;
}
static inline bool is_remote_partition(struct cpuset *cs)
@@ -1395,7 +1443,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
* remote partition root underneath it, its exclusive_cpus must
* have overlapped with subpartitions_cpus.
*/
- compute_effective_exclusive_cpumask(cs, tmp->new_cpus);
+ compute_effective_exclusive_cpumask(cs, tmp->new_cpus, NULL);
if (cpumask_empty(tmp->new_cpus) ||
cpumask_intersects(tmp->new_cpus, subpartitions_cpus) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
@@ -1404,8 +1452,11 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
spin_lock_irq(&callback_lock);
isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
list_add(&cs->remote_sibling, &remote_children);
+ cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
spin_unlock_irq(&callback_lock);
update_unbound_workqueue_cpumask(isolcpus_updated);
+ cpuset_force_rebuild();
+ cs->prs_err = 0;
/*
* Propagate changes in top_cpuset's effective_cpus down the hierarchy.
@@ -1428,20 +1479,24 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
{
bool isolcpus_updated;
- compute_effective_exclusive_cpumask(cs, tmp->new_cpus);
WARN_ON_ONCE(!is_remote_partition(cs));
- WARN_ON_ONCE(!cpumask_subset(tmp->new_cpus, subpartitions_cpus));
+ WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus));
spin_lock_irq(&callback_lock);
list_del_init(&cs->remote_sibling);
isolcpus_updated = partition_xcpus_del(cs->partition_root_state,
- NULL, tmp->new_cpus);
- cs->partition_root_state = -cs->partition_root_state;
- if (!cs->prs_err)
- cs->prs_err = PERR_INVCPUS;
+ NULL, cs->effective_xcpus);
+ if (cs->prs_err)
+ cs->partition_root_state = -cs->partition_root_state;
+ else
+ cs->partition_root_state = PRS_MEMBER;
+
+ /* effective_xcpus may need to be changed */
+ compute_effective_exclusive_cpumask(cs, NULL, NULL);
reset_partition_data(cs);
spin_unlock_irq(&callback_lock);
update_unbound_workqueue_cpumask(isolcpus_updated);
+ cpuset_force_rebuild();
/*
* Propagate changes in top_cpuset's effective_cpus down the hierarchy.
@@ -1453,14 +1508,15 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
/*
* remote_cpus_update - cpus_exclusive change of remote partition
* @cs: the cpuset to be updated
- * @newmask: the new effective_xcpus mask
+ * @xcpus: the new exclusive_cpus mask, if non-NULL
+ * @excpus: the new effective_xcpus mask
* @tmp: temporary masks
*
* top_cpuset and subpartitions_cpus will be updated or partition can be
* invalidated.
*/
-static void remote_cpus_update(struct cpuset *cs, struct cpumask *newmask,
- struct tmpmasks *tmp)
+static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
+ struct cpumask *excpus, struct tmpmasks *tmp)
{
bool adding, deleting;
int prs = cs->partition_root_state;
@@ -1471,29 +1527,45 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *newmask,
WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus));
- if (cpumask_empty(newmask))
+ if (cpumask_empty(excpus)) {
+ cs->prs_err = PERR_CPUSEMPTY;
goto invalidate;
+ }
- adding = cpumask_andnot(tmp->addmask, newmask, cs->effective_xcpus);
- deleting = cpumask_andnot(tmp->delmask, cs->effective_xcpus, newmask);
+ adding = cpumask_andnot(tmp->addmask, excpus, cs->effective_xcpus);
+ deleting = cpumask_andnot(tmp->delmask, cs->effective_xcpus, excpus);
/*
* Additions of remote CPUs is only allowed if those CPUs are
* not allocated to other partitions and there are effective_cpus
* left in the top cpuset.
*/
- if (adding && (!capable(CAP_SYS_ADMIN) ||
- cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
- cpumask_subset(top_cpuset.effective_cpus, tmp->addmask)))
- goto invalidate;
+ if (adding) {
+ if (!capable(CAP_SYS_ADMIN))
+ cs->prs_err = PERR_ACCESS;
+ else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
+ cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
+ cs->prs_err = PERR_NOCPUS;
+ if (cs->prs_err)
+ goto invalidate;
+ }
spin_lock_irq(&callback_lock);
if (adding)
isolcpus_updated += partition_xcpus_add(prs, NULL, tmp->addmask);
if (deleting)
isolcpus_updated += partition_xcpus_del(prs, NULL, tmp->delmask);
+ /*
+ * Need to update effective_xcpus and exclusive_cpus now as
+ * update_sibling_cpumasks() below may iterate back to the same cs.
+ */
+ cpumask_copy(cs->effective_xcpus, excpus);
+ if (xcpus)
+ cpumask_copy(cs->exclusive_cpus, xcpus);
spin_unlock_irq(&callback_lock);
update_unbound_workqueue_cpumask(isolcpus_updated);
+ if (adding || deleting)
+ cpuset_force_rebuild();
/*
* Propagate changes in top_cpuset's effective_cpus down the hierarchy.
@@ -1507,47 +1579,6 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *newmask,
}
/*
- * remote_partition_check - check if a child remote partition needs update
- * @cs: the cpuset to be updated
- * @newmask: the new effective_xcpus mask
- * @delmask: temporary mask for deletion (not in tmp)
- * @tmp: temporary masks
- *
- * This should be called before the given cs has updated its cpus_allowed
- * and/or effective_xcpus.
- */
-static void remote_partition_check(struct cpuset *cs, struct cpumask *newmask,
- struct cpumask *delmask, struct tmpmasks *tmp)
-{
- struct cpuset *child, *next;
- int disable_cnt = 0;
-
- /*
- * Compute the effective exclusive CPUs that will be deleted.
- */
- if (!cpumask_andnot(delmask, cs->effective_xcpus, newmask) ||
- !cpumask_intersects(delmask, subpartitions_cpus))
- return; /* No deletion of exclusive CPUs in partitions */
-
- /*
- * Searching the remote children list to look for those that will
- * be impacted by the deletion of exclusive CPUs.
- *
- * Since a cpuset must be removed from the remote children list
- * before it can go offline and holding cpuset_mutex will prevent
- * any change in cpuset status. RCU read lock isn't needed.
- */
- lockdep_assert_held(&cpuset_mutex);
- list_for_each_entry_safe(child, next, &remote_children, remote_sibling)
- if (cpumask_intersects(child->effective_cpus, delmask)) {
- remote_partition_disable(child, tmp);
- disable_cnt++;
- }
- if (disable_cnt)
- cpuset_force_rebuild();
-}
-
-/*
* prstate_housekeeping_conflict - check for partition & housekeeping conflicts
* @prstate: partition root state to be checked
* @new_cpus: cpu mask
@@ -1601,7 +1632,7 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
* The partcmd_update command is used by update_cpumasks_hier() with newmask
* NULL and update_cpumask() with newmask set. The partcmd_invalidate is used
* by update_cpumask() with NULL newmask. In both cases, the callers won't
- * check for error and so partition_root_state and prs_error will be updated
+ * check for error and so partition_root_state and prs_err will be updated
* directly.
*/
static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
@@ -1614,11 +1645,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
int old_prs, new_prs;
int part_error = PERR_NONE; /* Partition error? */
int subparts_delta = 0;
- struct cpumask *xcpus; /* cs effective_xcpus */
int isolcpus_updated = 0;
+ struct cpumask *xcpus = user_xcpus(cs);
bool nocpu;
lockdep_assert_held(&cpuset_mutex);
+ WARN_ON_ONCE(is_remote_partition(cs));
/*
* new_prs will only be changed for the partcmd_update and
@@ -1626,7 +1658,6 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
*/
adding = deleting = false;
old_prs = new_prs = cs->partition_root_state;
- xcpus = user_xcpus(cs);
if (cmd == partcmd_invalidate) {
if (is_prs_invalid(old_prs))
@@ -1661,12 +1692,19 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
if ((cmd == partcmd_enable) || (cmd == partcmd_enablei)) {
/*
- * Enabling partition root is not allowed if its
- * effective_xcpus is empty or doesn't overlap with
- * parent's effective_xcpus.
+ * Need to call compute_effective_exclusive_cpumask() in case
+ * exclusive_cpus not set. Sibling conflict should only happen
+ * if exclusive_cpus isn't set.
*/
- if (cpumask_empty(xcpus) ||
- !cpumask_intersects(xcpus, parent->effective_xcpus))
+ xcpus = tmp->new_cpus;
+ if (compute_effective_exclusive_cpumask(cs, xcpus, NULL))
+ WARN_ON_ONCE(!cpumask_empty(cs->exclusive_cpus));
+
+ /*
+ * Enabling partition root is not allowed if its
+ * effective_xcpus is empty.
+ */
+ if (cpumask_empty(xcpus))
return PERR_INVCPUS;
if (prstate_housekeeping_conflict(new_prs, xcpus))
@@ -1679,19 +1717,22 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
if (nocpu)
return PERR_NOCPUS;
- cpumask_copy(tmp->delmask, xcpus);
- deleting = true;
- subparts_delta++;
+ deleting = cpumask_and(tmp->delmask, xcpus, parent->effective_xcpus);
+ if (deleting)
+ subparts_delta++;
new_prs = (cmd == partcmd_enable) ? PRS_ROOT : PRS_ISOLATED;
} else if (cmd == partcmd_disable) {
/*
- * May need to add cpus to parent's effective_cpus for
- * valid partition root.
+ * May need to add cpus back to parent's effective_cpus
+ * (and maybe removed from subpartitions_cpus/isolated_cpus)
+ * for valid partition root. xcpus may contain CPUs that
+ * shouldn't be removed from the two global cpumasks.
*/
- adding = !is_prs_invalid(old_prs) &&
- cpumask_and(tmp->addmask, xcpus, parent->effective_xcpus);
- if (adding)
+ if (is_partition_valid(cs)) {
+ cpumask_copy(tmp->addmask, cs->effective_xcpus);
+ adding = true;
subparts_delta--;
+ }
new_prs = PRS_MEMBER;
} else if (newmask) {
/*
@@ -1701,6 +1742,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
part_error = PERR_CPUSEMPTY;
goto write_error;
}
+
/* Check newmask again, whether cpus are available for parent/cs */
nocpu |= tasks_nocpu_error(parent, cs, newmask);
@@ -1829,7 +1871,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
* CPU lists in cs haven't been updated yet. So defer it to later.
*/
if ((old_prs != new_prs) && (cmd != partcmd_update)) {
- int err = update_partition_exclusive(cs, new_prs);
+ int err = update_partition_exclusive_flag(cs, new_prs);
if (err)
return err;
@@ -1867,7 +1909,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
update_unbound_workqueue_cpumask(isolcpus_updated);
if ((old_prs != new_prs) && (cmd == partcmd_update))
- update_partition_exclusive(cs, new_prs);
+ update_partition_exclusive_flag(cs, new_prs);
if (adding || deleting) {
cpuset_update_tasks_cpumask(parent, tmp->addmask);
@@ -1917,7 +1959,7 @@ static void compute_partition_effective_cpumask(struct cpuset *cs,
* 2) All the effective_cpus will be used up and cp
* has tasks
*/
- compute_effective_exclusive_cpumask(cs, new_ecpus);
+ compute_effective_exclusive_cpumask(cs, new_ecpus, NULL);
cpumask_and(new_ecpus, new_ecpus, cpu_active_mask);
rcu_read_lock();
@@ -1925,6 +1967,11 @@ static void compute_partition_effective_cpumask(struct cpuset *cs,
if (!is_partition_valid(child))
continue;
+ /*
+ * There shouldn't be a remote partition underneath another
+ * partition root.
+ */
+ WARN_ON_ONCE(is_remote_partition(child));
child->prs_err = 0;
if (!cpumask_subset(child->effective_xcpus,
cs->effective_xcpus))
@@ -1980,32 +2027,39 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
bool remote = is_remote_partition(cp);
bool update_parent = false;
+ old_prs = new_prs = cp->partition_root_state;
+
/*
- * Skip descendent remote partition that acquires CPUs
- * directly from top cpuset unless it is cs.
+ * For child remote partition root (!= cs), we need to call
+ * remote_cpus_update() if effective_xcpus will be changed.
+ * Otherwise, we can skip the whole subtree.
+ *
+ * remote_cpus_update() will reuse tmp->new_cpus only after
+ * its value is being processed.
*/
if (remote && (cp != cs)) {
- pos_css = css_rightmost_descendant(pos_css);
- continue;
+ compute_effective_exclusive_cpumask(cp, tmp->new_cpus, NULL);
+ if (cpumask_equal(cp->effective_xcpus, tmp->new_cpus)) {
+ pos_css = css_rightmost_descendant(pos_css);
+ continue;
+ }
+ rcu_read_unlock();
+ remote_cpus_update(cp, NULL, tmp->new_cpus, tmp);
+ rcu_read_lock();
+
+ /* Remote partition may be invalidated */
+ new_prs = cp->partition_root_state;
+ remote = (new_prs == old_prs);
}
- /*
- * Update effective_xcpus if exclusive_cpus set.
- * The case when exclusive_cpus isn't set is handled later.
- */
- if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs)) {
- spin_lock_irq(&callback_lock);
- compute_effective_exclusive_cpumask(cp, NULL);
- spin_unlock_irq(&callback_lock);
- }
-
- old_prs = new_prs = cp->partition_root_state;
- if (remote || (is_partition_valid(parent) &&
- is_partition_valid(cp)))
+ if (remote || (is_partition_valid(parent) && is_partition_valid(cp)))
compute_partition_effective_cpumask(cp, tmp->new_cpus);
else
compute_effective_cpumask(tmp->new_cpus, cp, parent);
+ if (remote)
+ goto get_css; /* Ready to update cpuset data */
+
/*
* A partition with no effective_cpus is allowed as long as
* there is no task associated with it. Call
@@ -2025,9 +2079,6 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
if (is_in_v2_mode() && !remote && cpumask_empty(tmp->new_cpus))
cpumask_copy(tmp->new_cpus, parent->effective_cpus);
- if (remote)
- goto get_css;
-
/*
* Skip the whole subtree if
* 1) the cpumask remains the same,
@@ -2088,6 +2139,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
spin_lock_irq(&callback_lock);
cpumask_copy(cp->effective_cpus, tmp->new_cpus);
cp->partition_root_state = new_prs;
+ if (!cpumask_empty(cp->exclusive_cpus) && (cp != cs))
+ compute_effective_exclusive_cpumask(cp, NULL, NULL);
+
/*
* Make sure effective_xcpus is properly set for a valid
* partition root.
@@ -2174,7 +2228,14 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
parent);
if (cpumask_equal(tmp->new_cpus, sibling->effective_cpus))
continue;
+ } else if (is_remote_partition(sibling)) {
+ /*
+ * Change in a sibling cpuset won't affect a remote
+ * partition root.
+ */
+ continue;
}
+
if (!css_tryget_online(&sibling->css))
continue;
@@ -2231,8 +2292,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* trialcs->effective_xcpus is used as a temporary cpumask
* for checking validity of the partition root.
*/
+ trialcs->partition_root_state = PRS_MEMBER;
if (!cpumask_empty(trialcs->exclusive_cpus) || is_partition_valid(cs))
- compute_effective_exclusive_cpumask(trialcs, NULL);
+ compute_effective_exclusive_cpumask(trialcs, NULL, cs);
}
/* Nothing to do if the cpus didn't change */
@@ -2305,19 +2367,13 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* Call remote_cpus_update() to handle valid remote partition
*/
if (is_remote_partition(cs))
- remote_cpus_update(cs, xcpus, &tmp);
+ remote_cpus_update(cs, NULL, xcpus, &tmp);
else if (invalidate)
update_parent_effective_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
else
update_parent_effective_cpumask(cs, partcmd_update,
xcpus, &tmp);
- } else if (!cpumask_empty(cs->exclusive_cpus)) {
- /*
- * Use trialcs->effective_cpus as a temp cpumask
- */
- remote_partition_check(cs, trialcs->effective_xcpus,
- trialcs->effective_cpus, &tmp);
}
spin_lock_irq(&callback_lock);
@@ -2369,8 +2425,15 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (cpumask_equal(cs->exclusive_cpus, trialcs->exclusive_cpus))
return 0;
- if (*buf)
- compute_effective_exclusive_cpumask(trialcs, NULL);
+ if (*buf) {
+ trialcs->partition_root_state = PRS_MEMBER;
+ /*
+ * Reject the change if there is exclusive CPUs conflict with
+ * the siblings.
+ */
+ if (compute_effective_exclusive_cpumask(trialcs, NULL, cs))
+ return -EINVAL;
+ }
/*
* Check all the descendants in update_cpumasks_hier() if
@@ -2401,8 +2464,8 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (invalidate)
remote_partition_disable(cs, &tmp);
else
- remote_cpus_update(cs, trialcs->effective_xcpus,
- &tmp);
+ remote_cpus_update(cs, trialcs->exclusive_cpus,
+ trialcs->effective_xcpus, &tmp);
} else if (invalidate) {
update_parent_effective_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
@@ -2410,12 +2473,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_parent_effective_cpumask(cs, partcmd_update,
trialcs->effective_xcpus, &tmp);
}
- } else if (!cpumask_empty(trialcs->exclusive_cpus)) {
- /*
- * Use trialcs->effective_cpus as a temp cpumask
- */
- remote_partition_check(cs, trialcs->effective_xcpus,
- trialcs->effective_cpus, &tmp);
}
spin_lock_irq(&callback_lock);
cpumask_copy(cs->exclusive_cpus, trialcs->exclusive_cpus);
@@ -2782,7 +2839,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
int err = PERR_NONE, old_prs = cs->partition_root_state;
struct cpuset *parent = parent_cs(cs);
struct tmpmasks tmpmask;
- bool new_xcpus_state = false;
+ bool isolcpus_updated = false;
if (old_prs == new_prs)
return 0;
@@ -2796,18 +2853,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
- /*
- * Setup effective_xcpus if not properly set yet, it will be cleared
- * later if partition becomes invalid.
- */
- if ((new_prs > 0) && cpumask_empty(cs->exclusive_cpus)) {
- spin_lock_irq(&callback_lock);
- cpumask_and(cs->effective_xcpus,
- cs->cpus_allowed, parent->effective_xcpus);
- spin_unlock_irq(&callback_lock);
- }
-
- err = update_partition_exclusive(cs, new_prs);
+ err = update_partition_exclusive_flag(cs, new_prs);
if (err)
goto out;
@@ -2821,6 +2867,19 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
/*
+ * We don't support the creation of a new local partition with
+ * a remote partition underneath it. This unsupported
+ * setting can happen only if parent is the top_cpuset because
+ * a remote partition cannot be created underneath an existing
+ * local or remote partition.
+ */
+ if ((parent == &top_cpuset) &&
+ cpumask_intersects(cs->exclusive_cpus, subpartitions_cpus)) {
+ err = PERR_REMOTE;
+ goto out;
+ }
+
+ /*
* If parent is valid partition, enable local partiion.
* Otherwise, enable a remote partition.
*/
@@ -2835,8 +2894,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
} else if (old_prs && new_prs) {
/*
* A change in load balance state only, no change in cpumasks.
+ * Need to update isolated_cpus.
*/
- new_xcpus_state = true;
+ isolcpus_updated = true;
} else {
/*
* Switching back to member is always allowed even if it
@@ -2860,7 +2920,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
*/
if (err) {
new_prs = -new_prs;
- update_partition_exclusive(cs, new_prs);
+ update_partition_exclusive_flag(cs, new_prs);
}
spin_lock_irq(&callback_lock);
@@ -2868,14 +2928,18 @@ static int update_prstate(struct cpuset *cs, int new_prs)
WRITE_ONCE(cs->prs_err, err);
if (!is_partition_valid(cs))
reset_partition_data(cs);
- else if (new_xcpus_state)
- partition_xcpus_newstate(old_prs, new_prs, cs->effective_xcpus);
+ else if (isolcpus_updated)
+ isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(new_xcpus_state);
+ update_unbound_workqueue_cpumask(isolcpus_updated);
- /* Force update if switching back to member */
+ /* Force update if switching back to member & update effective_xcpus */
update_cpumasks_hier(cs, &tmpmask, !new_prs);
+ /* A newly created partition must have effective_xcpus set */
+ WARN_ON_ONCE(!old_prs && (new_prs > 0)
+ && cpumask_empty(cs->effective_xcpus));
+
/* Update sched domains and load balance flag */
update_partition_sd_lb(cs, old_prs);
@@ -3208,7 +3272,7 @@ int cpuset_common_seq_show(struct seq_file *sf, void *v)
return ret;
}
-static int sched_partition_show(struct seq_file *seq, void *v)
+static int cpuset_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
const char *err, *type = NULL;
@@ -3239,7 +3303,7 @@ static int sched_partition_show(struct seq_file *seq, void *v)
return 0;
}
-static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
+static ssize_t cpuset_partition_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
struct cpuset *cs = css_cs(of_css(of));
@@ -3260,11 +3324,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
css_get(&cs->css);
cpus_read_lock();
mutex_lock(&cpuset_mutex);
- if (!is_cpuset_online(cs))
- goto out_unlock;
-
- retval = update_prstate(cs, val);
-out_unlock:
+ if (is_cpuset_online(cs))
+ retval = update_prstate(cs, val);
mutex_unlock(&cpuset_mutex);
cpus_read_unlock();
css_put(&cs->css);
@@ -3308,8 +3369,8 @@ static struct cftype dfl_files[] = {
{
.name = "cpus.partition",
- .seq_show = sched_partition_show,
- .write = sched_partition_write,
+ .seq_show = cpuset_partition_show,
+ .write = cpuset_partition_write,
.private = FILE_PARTITION_ROOT,
.flags = CFTYPE_NOT_ON_ROOT,
.file_offset = offsetof(struct cpuset, partition_file),
@@ -3475,9 +3536,6 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
cpus_read_lock();
mutex_lock(&cpuset_mutex);
- if (is_partition_valid(cs))
- update_prstate(cs, 0);
-
if (!cpuset_v2() && is_sched_load_balance(cs))
cpuset_update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
@@ -3488,6 +3546,22 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
cpus_read_unlock();
}
+static void cpuset_css_killed(struct cgroup_subsys_state *css)
+{
+ struct cpuset *cs = css_cs(css);
+
+ cpus_read_lock();
+ mutex_lock(&cpuset_mutex);
+
+ /* Reset valid partition back to member */
+ if (is_partition_valid(cs))
+ update_prstate(cs, PRS_MEMBER);
+
+ mutex_unlock(&cpuset_mutex);
+ cpus_read_unlock();
+
+}
+
static void cpuset_css_free(struct cgroup_subsys_state *css)
{
struct cpuset *cs = css_cs(css);
@@ -3609,6 +3683,7 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.css_alloc = cpuset_css_alloc,
.css_online = cpuset_css_online,
.css_offline = cpuset_css_offline,
+ .css_killed = cpuset_css_killed,
.css_free = cpuset_css_free,
.can_attach = cpuset_can_attach,
.cancel_attach = cpuset_cancel_attach,
@@ -3739,10 +3814,10 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
if (remote && cpumask_empty(&new_cpus) &&
partition_is_populated(cs, NULL)) {
+ cs->prs_err = PERR_HOTPLUG;
remote_partition_disable(cs, tmp);
compute_effective_cpumask(&new_cpus, cs, parent);
remote = false;
- cpuset_force_rebuild();
}
/*
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 4bb587d..b223915 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -318,10 +318,11 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
might_sleep();
for_each_possible_cpu(cpu) {
- struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu);
+ struct cgroup *pos;
/* Reacquire for each CPU to avoid disabling IRQs too long */
__cgroup_rstat_lock(cgrp, cpu);
+ pos = cgroup_rstat_updated_list(cgrp, cpu);
for (; pos; pos = pos->rstat_flush_next) {
struct cgroup_subsys_state *css;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 128db74..e93c1956 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5518,30 +5518,6 @@ static bool exclusive_event_installable(struct perf_event *event,
static void perf_free_addr_filters(struct perf_event *event);
-static void perf_pending_task_sync(struct perf_event *event)
-{
- struct callback_head *head = &event->pending_task;
-
- if (!event->pending_work)
- return;
- /*
- * If the task is queued to the current task's queue, we
- * obviously can't wait for it to complete. Simply cancel it.
- */
- if (task_work_cancel(current, head)) {
- event->pending_work = 0;
- local_dec(&event->ctx->nr_no_switch_fast);
- return;
- }
-
- /*
- * All accesses related to the event are within the same RCU section in
- * perf_pending_task(). The RCU grace period before the event is freed
- * will make sure all those accesses are complete by then.
- */
- rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
-}
-
/* vs perf_event_alloc() error */
static void __free_event(struct perf_event *event)
{
@@ -5599,7 +5575,6 @@ static void _free_event(struct perf_event *event)
{
irq_work_sync(&event->pending_irq);
irq_work_sync(&event->pending_disable_irq);
- perf_pending_task_sync(event);
unaccount_event(event);
@@ -5692,10 +5667,17 @@ static void perf_remove_from_owner(struct perf_event *event)
static void put_event(struct perf_event *event)
{
+ struct perf_event *parent;
+
if (!atomic_long_dec_and_test(&event->refcount))
return;
+ parent = event->parent;
_free_event(event);
+
+ /* Matches the refcount bump in inherit_event() */
+ if (parent)
+ put_event(parent);
}
/*
@@ -5779,11 +5761,6 @@ int perf_event_release_kernel(struct perf_event *event)
if (tmp == child) {
perf_remove_from_context(child, DETACH_GROUP);
list_move(&child->child_list, &free_list);
- /*
- * This matches the refcount bump in inherit_event();
- * this can't be the last reference.
- */
- put_event(event);
} else {
var = &ctx->refcount;
}
@@ -5809,7 +5786,8 @@ int perf_event_release_kernel(struct perf_event *event)
void *var = &child->ctx->refcount;
list_del(&child->child_list);
- free_event(child);
+ /* Last reference unless ->pending_task work is pending */
+ put_event(child);
/*
* Wake any perf_event_free_task() waiting for this event to be
@@ -5820,7 +5798,11 @@ int perf_event_release_kernel(struct perf_event *event)
}
no_ctx:
- put_event(event); /* Must be the 'last' reference */
+ /*
+ * Last reference unless ->pending_task work is pending on this event
+ * or any of its children.
+ */
+ put_event(event);
return 0;
}
EXPORT_SYMBOL_GPL(perf_event_release_kernel);
@@ -7236,12 +7218,6 @@ static void perf_pending_task(struct callback_head *head)
int rctx;
/*
- * All accesses to the event must belong to the same implicit RCU read-side
- * critical section as the ->pending_work reset. See comment in
- * perf_pending_task_sync().
- */
- rcu_read_lock();
- /*
* If we 'fail' here, that's OK, it means recursion is already disabled
* and we won't recurse 'further'.
*/
@@ -7251,9 +7227,8 @@ static void perf_pending_task(struct callback_head *head)
event->pending_work = 0;
perf_sigtrap(event);
local_dec(&event->ctx->nr_no_switch_fast);
- rcuwait_wake_up(&event->pending_work_wait);
}
- rcu_read_unlock();
+ put_event(event);
if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
@@ -10248,6 +10223,7 @@ static int __perf_event_overflow(struct perf_event *event,
!task_work_add(current, &event->pending_task, notify_mode)) {
event->pending_work = pending_id;
local_inc(&event->ctx->nr_no_switch_fast);
+ WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount));
event->pending_addr = 0;
if (valid_sample && (data->sample_flags & PERF_SAMPLE_ADDR))
@@ -12610,7 +12586,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
init_irq_work(&event->pending_irq, perf_pending_irq);
event->pending_disable_irq = IRQ_WORK_INIT_HARD(perf_pending_disable);
init_task_work(&event->pending_task, perf_pending_task);
- rcuwait_init(&event->pending_work_wait);
mutex_init(&event->mmap_mutex);
raw_spin_lock_init(&event->addr_filters.lock);
@@ -13747,8 +13722,7 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
* Kick perf_poll() for is_event_hup();
*/
perf_event_wakeup(parent_event);
- free_event(event);
- put_event(parent_event);
+ put_event(event);
return;
}
@@ -13872,13 +13846,11 @@ static void perf_free_event(struct perf_event *event,
list_del_init(&event->child_list);
mutex_unlock(&parent->child_mutex);
- put_event(parent);
-
raw_spin_lock_irq(&ctx->lock);
perf_group_detach(event);
list_del_event(event, ctx);
raw_spin_unlock_irq(&ctx->lock);
- free_event(event);
+ put_event(event);
}
/*
@@ -14016,6 +13988,9 @@ inherit_event(struct perf_event *parent_event,
if (IS_ERR(child_event))
return child_event;
+ get_ctx(child_ctx);
+ child_event->ctx = child_ctx;
+
pmu_ctx = find_get_pmu_context(child_event->pmu, child_ctx, child_event);
if (IS_ERR(pmu_ctx)) {
free_event(child_event);
@@ -14037,8 +14012,6 @@ inherit_event(struct perf_event *parent_event,
return NULL;
}
- get_ctx(child_ctx);
-
/*
* Make the child state follow the state of the parent event,
* not its attr.disabled bit. We hold the parent's mutex,
@@ -14059,7 +14032,6 @@ inherit_event(struct perf_event *parent_event,
local64_set(&hwc->period_left, sample_period);
}
- child_event->ctx = child_ctx;
child_event->overflow_handler = parent_event->overflow_handler;
child_event->overflow_handler_context
= parent_event->overflow_handler_context;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 615b4e6d..8d783b5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1956,6 +1956,9 @@ static void free_ret_instance(struct uprobe_task *utask,
* to-be-reused return instances for future uretprobes. If ri_timer()
* happens to be running right now, though, we fallback to safety and
* just perform RCU-delated freeing of ri.
+ * Admittedly, this is a rather simple use of seqcount, but it nicely
+ * abstracts away all the necessary memory barriers, so we use
+ * a well-supported kernel primitive here.
*/
if (raw_seqcount_try_begin(&utask->ri_seqcount, seq)) {
/* immediate reuse of ri without RCU GP is OK */
@@ -2016,12 +2019,20 @@ static void ri_timer(struct timer_list *timer)
/* RCU protects return_instance from freeing. */
guard(rcu)();
- write_seqcount_begin(&utask->ri_seqcount);
+ /*
+ * See free_ret_instance() for notes on seqcount use.
+ * We also employ raw API variants to avoid lockdep false-positive
+ * warning complaining about enabled preemption. The timer can only be
+ * invoked once for a uprobe_task. Therefore there can only be one
+ * writer. The reader does not require an even sequence count to make
+ * progress, so it is OK to remain preemptible on PREEMPT_RT.
+ */
+ raw_write_seqcount_begin(&utask->ri_seqcount);
for_each_ret_instance_rcu(ri, utask->return_instances)
hprobe_expire(&ri->hprobe, false);
- write_seqcount_end(&utask->ri_seqcount);
+ raw_write_seqcount_end(&utask->ri_seqcount);
}
static struct uprobe_task *alloc_utask(void)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 517ee25..30899a8 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -366,7 +366,7 @@ static const struct debug_obj_descr hrtimer_debug_descr;
static void *hrtimer_debug_hint(void *addr)
{
- return ((struct hrtimer *) addr)->function;
+ return ACCESS_PRIVATE((struct hrtimer *)addr, function);
}
/*
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index a47bcf7..9a38594 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -509,6 +509,7 @@ void tick_resume(void)
#ifdef CONFIG_SUSPEND
static DEFINE_RAW_SPINLOCK(tick_freeze_lock);
+static DEFINE_WAIT_OVERRIDE_MAP(tick_freeze_map, LD_WAIT_SLEEP);
static unsigned int tick_freeze_depth;
/**
@@ -528,9 +529,22 @@ void tick_freeze(void)
if (tick_freeze_depth == num_online_cpus()) {
trace_suspend_resume(TPS("timekeeping_freeze"),
smp_processor_id(), true);
+ /*
+ * All other CPUs have their interrupts disabled and are
+ * suspended to idle. Other tasks have been frozen so there
+ * is no scheduling happening. This means that there is no
+ * concurrency in the system at this point. Therefore it is
+ * okay to acquire a sleeping lock on PREEMPT_RT, such as a
+ * spinlock, because the lock cannot be held by other CPUs
+ * or threads and acquiring it cannot block.
+ *
+ * Inform lockdep about the situation.
+ */
+ lock_map_acquire_try(&tick_freeze_map);
system_state = SYSTEM_SUSPEND;
sched_clock_suspend();
timekeeping_suspend();
+ lock_map_release(&tick_freeze_map);
} else {
tick_suspend_local();
}
@@ -552,8 +566,16 @@ void tick_unfreeze(void)
raw_spin_lock(&tick_freeze_lock);
if (tick_freeze_depth == num_online_cpus()) {
+ /*
+ * Similar to tick_freeze(). On resumption the first CPU may
+ * acquire uncontended sleeping locks while other CPUs block on
+ * tick_freeze_lock.
+ */
+ lock_map_acquire_try(&tick_freeze_map);
timekeeping_resume();
sched_clock_resume();
+ lock_map_release(&tick_freeze_map);
+
system_state = SYSTEM_RUNNING;
trace_suspend_resume(TPS("timekeeping_freeze"),
smp_processor_id(), false);
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 33082c4..95c6e34 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -89,8 +89,11 @@ static bool delete_fprobe_node(struct fprobe_hlist_node *node)
{
lockdep_assert_held(&fprobe_mutex);
- WRITE_ONCE(node->fp, NULL);
- hlist_del_rcu(&node->hlist);
+ /* Avoid double deleting */
+ if (READ_ONCE(node->fp) != NULL) {
+ WRITE_ONCE(node->fp, NULL);
+ hlist_del_rcu(&node->hlist);
+ }
return !!find_first_fprobe_node(node->addr);
}
@@ -411,6 +414,102 @@ static void fprobe_graph_remove_ips(unsigned long *addrs, int num)
ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
}
+#ifdef CONFIG_MODULES
+
+#define FPROBE_IPS_BATCH_INIT 8
+/* instruction pointer address list */
+struct fprobe_addr_list {
+ int index;
+ int size;
+ unsigned long *addrs;
+};
+
+static int fprobe_addr_list_add(struct fprobe_addr_list *alist, unsigned long addr)
+{
+ unsigned long *addrs;
+
+ if (alist->index >= alist->size)
+ return -ENOMEM;
+
+ alist->addrs[alist->index++] = addr;
+ if (alist->index < alist->size)
+ return 0;
+
+ /* Expand the address list */
+ addrs = kcalloc(alist->size * 2, sizeof(*addrs), GFP_KERNEL);
+ if (!addrs)
+ return -ENOMEM;
+
+ memcpy(addrs, alist->addrs, alist->size * sizeof(*addrs));
+ alist->size *= 2;
+ kfree(alist->addrs);
+ alist->addrs = addrs;
+
+ return 0;
+}
+
+static void fprobe_remove_node_in_module(struct module *mod, struct hlist_head *head,
+ struct fprobe_addr_list *alist)
+{
+ struct fprobe_hlist_node *node;
+ int ret = 0;
+
+ hlist_for_each_entry_rcu(node, head, hlist) {
+ if (!within_module(node->addr, mod))
+ continue;
+ if (delete_fprobe_node(node))
+ continue;
+ /*
+ * If failed to update alist, just continue to update hlist.
+ * Therefore, at list user handler will not hit anymore.
+ */
+ if (!ret)
+ ret = fprobe_addr_list_add(alist, node->addr);
+ }
+}
+
+/* Handle module unloading to manage fprobe_ip_table. */
+static int fprobe_module_callback(struct notifier_block *nb,
+ unsigned long val, void *data)
+{
+ struct fprobe_addr_list alist = {.size = FPROBE_IPS_BATCH_INIT};
+ struct module *mod = data;
+ int i;
+
+ if (val != MODULE_STATE_GOING)
+ return NOTIFY_DONE;
+
+ alist.addrs = kcalloc(alist.size, sizeof(*alist.addrs), GFP_KERNEL);
+ /* If failed to alloc memory, we can not remove ips from hash. */
+ if (!alist.addrs)
+ return NOTIFY_DONE;
+
+ mutex_lock(&fprobe_mutex);
+ for (i = 0; i < FPROBE_IP_TABLE_SIZE; i++)
+ fprobe_remove_node_in_module(mod, &fprobe_ip_table[i], &alist);
+
+ if (alist.index < alist.size && alist.index > 0)
+ ftrace_set_filter_ips(&fprobe_graph_ops.ops,
+ alist.addrs, alist.index, 1, 0);
+ mutex_unlock(&fprobe_mutex);
+
+ kfree(alist.addrs);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block fprobe_module_nb = {
+ .notifier_call = fprobe_module_callback,
+ .priority = 0,
+};
+
+static int __init init_fprobe_module(void)
+{
+ return register_module_notifier(&fprobe_module_nb);
+}
+early_initcall(init_fprobe_module);
+#endif
+
static int symbols_cmp(const void *a, const void *b)
{
const char **str_a = (const char **) a;
@@ -445,6 +544,7 @@ struct filter_match_data {
size_t index;
size_t size;
unsigned long *addrs;
+ struct module **mods;
};
static int filter_match_callback(void *data, const char *name, unsigned long addr)
@@ -458,30 +558,47 @@ static int filter_match_callback(void *data, const char *name, unsigned long add
if (!ftrace_location(addr))
return 0;
- if (match->addrs)
- match->addrs[match->index] = addr;
+ if (match->addrs) {
+ struct module *mod = __module_text_address(addr);
+ if (mod && !try_module_get(mod))
+ return 0;
+
+ match->mods[match->index] = mod;
+ match->addrs[match->index] = addr;
+ }
match->index++;
return match->index == match->size;
}
/*
* Make IP list from the filter/no-filter glob patterns.
- * Return the number of matched symbols, or -ENOENT.
+ * Return the number of matched symbols, or errno.
+ * If @addrs == NULL, this just counts the number of matched symbols. If @addrs
+ * is passed with an array, we need to pass the an @mods array of the same size
+ * to increment the module refcount for each symbol.
+ * This means we also need to call `module_put` for each element of @mods after
+ * using the @addrs.
*/
-static int ip_list_from_filter(const char *filter, const char *notfilter,
- unsigned long *addrs, size_t size)
+static int get_ips_from_filter(const char *filter, const char *notfilter,
+ unsigned long *addrs, struct module **mods,
+ size_t size)
{
struct filter_match_data match = { .filter = filter, .notfilter = notfilter,
- .index = 0, .size = size, .addrs = addrs};
+ .index = 0, .size = size, .addrs = addrs, .mods = mods};
int ret;
+ if (addrs && !mods)
+ return -EINVAL;
+
ret = kallsyms_on_each_symbol(filter_match_callback, &match);
if (ret < 0)
return ret;
- ret = module_kallsyms_on_each_symbol(NULL, filter_match_callback, &match);
- if (ret < 0)
- return ret;
+ if (IS_ENABLED(CONFIG_MODULES)) {
+ ret = module_kallsyms_on_each_symbol(NULL, filter_match_callback, &match);
+ if (ret < 0)
+ return ret;
+ }
return match.index ?: -ENOENT;
}
@@ -543,24 +660,35 @@ static int fprobe_init(struct fprobe *fp, unsigned long *addrs, int num)
*/
int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
{
- unsigned long *addrs;
- int ret;
+ unsigned long *addrs __free(kfree) = NULL;
+ struct module **mods __free(kfree) = NULL;
+ int ret, num;
if (!fp || !filter)
return -EINVAL;
- ret = ip_list_from_filter(filter, notfilter, NULL, FPROBE_IPS_MAX);
+ num = get_ips_from_filter(filter, notfilter, NULL, NULL, FPROBE_IPS_MAX);
+ if (num < 0)
+ return num;
+
+ addrs = kcalloc(num, sizeof(*addrs), GFP_KERNEL);
+ if (!addrs)
+ return -ENOMEM;
+
+ mods = kcalloc(num, sizeof(*mods), GFP_KERNEL);
+ if (!mods)
+ return -ENOMEM;
+
+ ret = get_ips_from_filter(filter, notfilter, addrs, mods, num);
if (ret < 0)
return ret;
- addrs = kcalloc(ret, sizeof(unsigned long), GFP_KERNEL);
- if (!addrs)
- return -ENOMEM;
- ret = ip_list_from_filter(filter, notfilter, addrs, ret);
- if (ret > 0)
- ret = register_fprobe_ips(fp, addrs, ret);
+ ret = register_fprobe_ips(fp, addrs, ret);
- kfree(addrs);
+ for (int i = 0; i < num; i++) {
+ if (mods[i])
+ module_put(mods[i]);
+ }
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe);
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 5d7ca80..b40fa59 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -919,9 +919,15 @@ static void __find_tracepoint_module_cb(struct tracepoint *tp, struct module *mo
struct __find_tracepoint_cb_data *data = priv;
if (!data->tpoint && !strcmp(data->tp_name, tp->name)) {
- data->tpoint = tp;
- if (!data->mod)
+ /* If module is not specified, try getting module refcount. */
+ if (!data->mod && mod) {
+ /* If failed to get refcount, ignore this tracepoint. */
+ if (!try_module_get(mod))
+ return;
+
data->mod = mod;
+ }
+ data->tpoint = tp;
}
}
@@ -933,7 +939,11 @@ static void __find_tracepoint_cb(struct tracepoint *tp, void *priv)
data->tpoint = tp;
}
-/* Find a tracepoint from kernel and module. */
+/*
+ * Find a tracepoint from kernel and module. If the tracepoint is on the module,
+ * the module's refcount is incremented and returned as *@tp_mod. Thus, if it is
+ * not NULL, caller must call module_put(*tp_mod) after used the tracepoint.
+ */
static struct tracepoint *find_tracepoint(const char *tp_name,
struct module **tp_mod)
{
@@ -962,7 +972,10 @@ static void reenable_trace_fprobe(struct trace_fprobe *tf)
}
}
-/* Find a tracepoint from specified module. */
+/*
+ * Find a tracepoint from specified module. In this case, this does not get the
+ * module's refcount. The caller must ensure the module is not freed.
+ */
static struct tracepoint *find_tracepoint_in_module(struct module *mod,
const char *tp_name)
{
@@ -1169,11 +1182,6 @@ static int trace_fprobe_create_internal(int argc, const char *argv[],
if (is_tracepoint) {
ctx->flags |= TPARG_FL_TPOINT;
tpoint = find_tracepoint(symbol, &tp_mod);
- /* lock module until register this tprobe. */
- if (tp_mod && !try_module_get(tp_mod)) {
- tpoint = NULL;
- tp_mod = NULL;
- }
if (tpoint) {
ctx->funcname = kallsyms_lookup(
(unsigned long)tpoint->probestub,
diff --git a/lib/Kconfig b/lib/Kconfig
index 61cce06..6c1b8f1 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -139,27 +139,22 @@
source "lib/crypto/Kconfig"
config CRC_CCITT
- tristate "CRC-CCITT functions"
+ tristate
help
- This option is provided for the case where no in-kernel-tree
- modules require CRC-CCITT functions, but a module built outside
- the kernel tree does. Such modules that use library CRC-CCITT
- functions require M here.
+ The CRC-CCITT library functions. Select this if your module uses any
+ of the functions from <linux/crc-ccitt.h>.
config CRC16
- tristate "CRC16 functions"
+ tristate
help
- This option is provided for the case where no in-kernel-tree
- modules require CRC16 functions, but a module built outside
- the kernel tree does. Such modules that use library CRC16
- functions require M here.
+ The CRC16 library functions. Select this if your module uses any of
+ the functions from <linux/crc16.h>.
config CRC_T10DIF
- tristate "CRC calculation for the T10 Data Integrity Field"
+ tristate
help
- This option is only needed if a module that's not in the
- kernel tree needs to calculate CRC checks for use with the
- SCSI data integrity subsystem.
+ The CRC-T10DIF library functions. Select this if your module uses
+ any of the functions from <linux/crc-t10dif.h>.
config ARCH_HAS_CRC_T10DIF
bool
@@ -169,22 +164,17 @@
default CRC_T10DIF if ARCH_HAS_CRC_T10DIF && CRC_OPTIMIZATIONS
config CRC_ITU_T
- tristate "CRC ITU-T V.41 functions"
+ tristate
help
- This option is provided for the case where no in-kernel-tree
- modules require CRC ITU-T V.41 functions, but a module built outside
- the kernel tree does. Such modules that use library CRC ITU-T V.41
- functions require M here.
+ The CRC-ITU-T library functions. Select this if your module uses
+ any of the functions from <linux/crc-itu-t.h>.
config CRC32
- tristate "CRC32/CRC32c functions"
- default y
+ tristate
select BITREVERSE
help
- This option is provided for the case where no in-kernel-tree
- modules require CRC32/CRC32c functions, but a module built outside
- the kernel tree does. Such modules that use library CRC32/CRC32c
- functions require M here.
+ The CRC32 library functions. Select this if your module uses any of
+ the functions from <linux/crc32.h> or <linux/crc32c.h>.
config ARCH_HAS_CRC32
bool
@@ -195,6 +185,9 @@
config CRC64
tristate
+ help
+ The CRC64 library functions. Select this if your module uses any of
+ the functions from <linux/crc64.h>.
config ARCH_HAS_CRC64
bool
@@ -205,19 +198,21 @@
config CRC4
tristate
+ help
+ The CRC4 library functions. Select this if your module uses any of
+ the functions from <linux/crc4.h>.
config CRC7
tristate
-
-config LIBCRC32C
- tristate
- select CRC32
help
- This option just selects CRC32 and is provided for compatibility
- purposes until the users are updated to select CRC32 directly.
+ The CRC7 library functions. Select this if your module uses any of
+ the functions from <linux/crc7.h>.
config CRC8
tristate
+ help
+ The CRC8 library functions. Select this if your module uses any of
+ the functions from <linux/crc8.h>.
config CRC_OPTIMIZATIONS
bool "Enable optimized CRC implementations" if EXPERT
diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
index 860a078..20b3162 100644
--- a/net/batman-adv/Kconfig
+++ b/net/batman-adv/Kconfig
@@ -9,7 +9,7 @@
config BATMAN_ADV
tristate "B.A.T.M.A.N. Advanced Meshing Protocol"
- select LIBCRC32C
+ select CRC32
help
B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is
a routing protocol for multi-hop ad-hoc mesh networks. The
diff --git a/net/ceph/Kconfig b/net/ceph/Kconfig
index c5c4eef..0aa21fc 100644
--- a/net/ceph/Kconfig
+++ b/net/ceph/Kconfig
@@ -2,7 +2,7 @@
config CEPH_LIB
tristate "Ceph core library"
depends on INET
- select LIBCRC32C
+ select CRC32
select CRYPTO_AES
select CRYPTO_CBC
select CRYPTO_GCM
diff --git a/net/core/dev.c b/net/core/dev.c
index 0608605..75e1043 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1518,15 +1518,7 @@ void netdev_features_change(struct net_device *dev)
}
EXPORT_SYMBOL(netdev_features_change);
-/**
- * netdev_state_change - device changes state
- * @dev: device to cause notification
- *
- * Called to indicate a device has changed state. This function calls
- * the notifier chains for netdev_chain and sends a NEWLINK message
- * to the routing socket.
- */
-void netdev_state_change(struct net_device *dev)
+void netif_state_change(struct net_device *dev)
{
if (dev->flags & IFF_UP) {
struct netdev_notifier_change_info change_info = {
@@ -1538,7 +1530,6 @@ void netdev_state_change(struct net_device *dev)
rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL, 0, NULL);
}
}
-EXPORT_SYMBOL(netdev_state_change);
/**
* __netdev_notify_peers - notify network peers about existence of @dev,
diff --git a/net/core/dev_api.c b/net/core/dev_api.c
index 90bafb0..90898cd 100644
--- a/net/core/dev_api.c
+++ b/net/core/dev_api.c
@@ -327,3 +327,19 @@ int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf)
return ret;
}
EXPORT_SYMBOL_GPL(dev_xdp_propagate);
+
+/**
+ * netdev_state_change() - device changes state
+ * @dev: device to cause notification
+ *
+ * Called to indicate a device has changed state. This function calls
+ * the notifier chains for netdev_chain and sends a NEWLINK message
+ * to the routing socket.
+ */
+void netdev_state_change(struct net_device *dev)
+{
+ netdev_lock_ops(dev);
+ netif_state_change(dev);
+ netdev_unlock_ops(dev);
+}
+EXPORT_SYMBOL(netdev_state_change);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2..864f3bb 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -183,7 +183,7 @@ static void linkwatch_do_dev(struct net_device *dev)
else
dev_deactivate(dev);
- netdev_state_change(dev);
+ netif_state_change(dev);
}
/* Note: our callers are responsible for calling netdev_tracker_free().
* This is the reason we use __dev_put() instead of dev_put().
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
*/
netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
spin_unlock_irq(&lweventlist_lock);
+ netdev_lock_ops(dev);
linkwatch_do_dev(dev);
+ netdev_unlock_ops(dev);
do_dev--;
spin_lock_irq(&lweventlist_lock);
}
@@ -253,25 +255,41 @@ static void __linkwatch_run_queue(int urgent_only)
spin_unlock_irq(&lweventlist_lock);
}
-void linkwatch_sync_dev(struct net_device *dev)
+static bool linkwatch_clean_dev(struct net_device *dev)
{
unsigned long flags;
- int clean = 0;
+ bool clean = false;
spin_lock_irqsave(&lweventlist_lock, flags);
if (!list_empty(&dev->link_watch_list)) {
list_del_init(&dev->link_watch_list);
- clean = 1;
+ clean = true;
/* We must release netdev tracker under
* the spinlock protection.
*/
netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
}
spin_unlock_irqrestore(&lweventlist_lock, flags);
- if (clean)
+
+ return clean;
+}
+
+void __linkwatch_sync_dev(struct net_device *dev)
+{
+ netdev_ops_assert_locked(dev);
+
+ if (linkwatch_clean_dev(dev))
linkwatch_do_dev(dev);
}
+void linkwatch_sync_dev(struct net_device *dev)
+{
+ if (linkwatch_clean_dev(dev)) {
+ netdev_lock_ops(dev);
+ linkwatch_do_dev(dev);
+ netdev_unlock_ops(dev);
+ }
+}
/* Must be called with the rtnl semaphore held */
void linkwatch_run_queue(void)
diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
index b7f22dc..941e26c 100644
--- a/net/core/lock_debug.c
+++ b/net/core/lock_debug.c
@@ -20,11 +20,11 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
switch (cmd) {
case NETDEV_REGISTER:
case NETDEV_UP:
+ case NETDEV_CHANGE:
netdev_ops_assert_locked(dev);
fallthrough;
case NETDEV_DOWN:
case NETDEV_REBOOT:
- case NETDEV_CHANGE:
case NETDEV_UNREGISTER:
case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c238528..39a5b72 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1043,7 +1043,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id,
}
EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo);
-void netdev_set_operstate(struct net_device *dev, int newstate)
+void netif_set_operstate(struct net_device *dev, int newstate)
{
unsigned int old = READ_ONCE(dev->operstate);
@@ -1052,9 +1052,9 @@ void netdev_set_operstate(struct net_device *dev, int newstate)
return;
} while (!try_cmpxchg(&dev->operstate, &old, newstate));
- netdev_state_change(dev);
+ netif_state_change(dev);
}
-EXPORT_SYMBOL(netdev_set_operstate);
+EXPORT_SYMBOL(netif_set_operstate);
static void set_operstate(struct net_device *dev, unsigned char transition)
{
@@ -1080,7 +1080,7 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
break;
}
- netdev_set_operstate(dev, operstate);
+ netif_set_operstate(dev, operstate);
}
static unsigned int rtnl_dev_get_flags(const struct net_device *dev)
@@ -3027,7 +3027,7 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
err = validate_linkmsg(dev, tb, extack);
if (err < 0)
- goto errout;
+ return err;
if (tb[IFLA_IFNAME])
nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
@@ -3396,7 +3396,7 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
errout:
if (status & DO_SETLINK_MODIFIED) {
if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
- netdev_state_change(dev);
+ netif_state_change(dev);
if (err < 0)
net_warn_ratelimited("A link change request failed with some changes committed already. Interface %s may have been left with an inconsistent configuration, please check.\n",
@@ -3676,8 +3676,11 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
nla_len(tb[IFLA_BROADCAST]));
if (tb[IFLA_TXQLEN])
dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
- if (tb[IFLA_OPERSTATE])
+ if (tb[IFLA_OPERSTATE]) {
+ netdev_lock_ops(dev);
set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
+ netdev_unlock_ops(dev);
+ }
if (tb[IFLA_LINKMODE])
dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
if (tb[IFLA_GROUP])
diff --git a/net/core/sock.c b/net/core/sock.c
index f67a3c5b..e54449c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2130,6 +2130,8 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
*/
static inline void sock_lock_init(struct sock *sk)
{
+ sk_owner_clear(sk);
+
if (sk->sk_kern_sock)
sock_lock_init_class_and_name(
sk,
@@ -2226,6 +2228,9 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
cgroup_sk_free(&sk->sk_cgrp_data);
mem_cgroup_sk_free(sk);
security_sk_free(sk);
+
+ sk_owner_put(sk);
+
if (slab != NULL)
kmem_cache_free(slab, sk);
else
diff --git a/net/ethtool/cmis.h b/net/ethtool/cmis.h
index 1e79041..4a9a946 100644
--- a/net/ethtool/cmis.h
+++ b/net/ethtool/cmis.h
@@ -101,7 +101,6 @@ struct ethtool_cmis_cdb_rpl {
};
u32 ethtool_cmis_get_max_lpl_size(u8 num_of_byte_octs);
-u32 ethtool_cmis_get_max_epl_size(u8 num_of_byte_octs);
void ethtool_cmis_cdb_compose_args(struct ethtool_cmis_cdb_cmd_args *args,
enum ethtool_cmis_cdb_cmd_id cmd, u8 *lpl,
diff --git a/net/ethtool/cmis_cdb.c b/net/ethtool/cmis_cdb.c
index d159dc1..0e2691c 100644
--- a/net/ethtool/cmis_cdb.c
+++ b/net/ethtool/cmis_cdb.c
@@ -16,15 +16,6 @@ u32 ethtool_cmis_get_max_lpl_size(u8 num_of_byte_octs)
return 8 * (1 + min_t(u8, num_of_byte_octs, 15));
}
-/* For accessing the EPL field on page 9Fh, the allowable length extension is
- * min(i, 255) byte octets where i specifies the allowable additional number of
- * byte octets in a READ or a WRITE.
- */
-u32 ethtool_cmis_get_max_epl_size(u8 num_of_byte_octs)
-{
- return 8 * (1 + min_t(u8, num_of_byte_octs, 255));
-}
-
void ethtool_cmis_cdb_compose_args(struct ethtool_cmis_cdb_cmd_args *args,
enum ethtool_cmis_cdb_cmd_id cmd, u8 *lpl,
u8 lpl_len, u8 *epl, u16 epl_len,
@@ -33,19 +24,16 @@ void ethtool_cmis_cdb_compose_args(struct ethtool_cmis_cdb_cmd_args *args,
{
args->req.id = cpu_to_be16(cmd);
args->req.lpl_len = lpl_len;
- if (lpl) {
+ if (lpl)
memcpy(args->req.payload, lpl, args->req.lpl_len);
- args->read_write_len_ext =
- ethtool_cmis_get_max_lpl_size(read_write_len_ext);
- }
if (epl) {
args->req.epl_len = cpu_to_be16(epl_len);
args->req.epl = epl;
- args->read_write_len_ext =
- ethtool_cmis_get_max_epl_size(read_write_len_ext);
}
args->max_duration = max_duration;
+ args->read_write_len_ext =
+ ethtool_cmis_get_max_lpl_size(read_write_len_ext);
args->msleep_pre_rpl = msleep_pre_rpl;
args->rpl_exp_len = rpl_exp_len;
args->flags = flags;
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 0cb6da1..49bea6b4 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -830,6 +830,7 @@ void ethtool_ringparam_get_cfg(struct net_device *dev,
/* Driver gives us current state, we want to return current config */
kparam->tcp_data_split = dev->cfg->hds_config;
+ kparam->hds_thresh = dev->cfg->hds_thresh;
}
static void ethtool_init_tsinfo(struct kernel_ethtool_ts_info *info)
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 2216394..8262cc1 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -60,7 +60,7 @@ static struct devlink *netdev_to_devlink_get(struct net_device *dev)
u32 ethtool_op_get_link(struct net_device *dev)
{
/* Synchronize carrier state with link watch, see also rtnl_getlink() */
- linkwatch_sync_dev(dev);
+ __linkwatch_sync_dev(dev);
return netif_carrier_ok(dev) ? 1 : 0;
}
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index a163d40..977beea 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -500,7 +500,7 @@ static int ethnl_default_doit(struct sk_buff *skb, struct genl_info *info)
netdev_unlock_ops(req_info->dev);
rtnl_unlock();
if (ret < 0)
- goto err_cleanup;
+ goto err_dev;
ret = ops->reply_size(req_info, reply_data);
if (ret < 0)
goto err_cleanup;
@@ -560,7 +560,7 @@ static int ethnl_default_dump_one(struct sk_buff *skb, struct net_device *dev,
netdev_unlock_ops(dev);
rtnl_unlock();
if (ret < 0)
- goto out;
+ goto out_cancel;
ret = ethnl_fill_reply_header(skb, dev, ctx->ops->hdr_attr);
if (ret < 0)
goto out;
@@ -569,6 +569,7 @@ static int ethnl_default_dump_one(struct sk_buff *skb, struct net_device *dev,
out:
if (ctx->ops->cleanup_data)
ctx->ops->cleanup_data(ctx->reply_data);
+out_cancel:
ctx->reply_data->dev = NULL;
if (ret < 0)
genlmsg_cancel(skb, ehdr);
@@ -793,7 +794,7 @@ static void ethnl_default_notify(struct net_device *dev, unsigned int cmd,
ethnl_init_reply_data(reply_data, ops, dev);
ret = ops->prepare_data(req_info, reply_data, &info);
if (ret < 0)
- goto err_cleanup;
+ goto err_rep;
ret = ops->reply_size(req_info, reply_data);
if (ret < 0)
goto err_cleanup;
@@ -828,6 +829,7 @@ static void ethnl_default_notify(struct net_device *dev, unsigned int cmd,
err_cleanup:
if (ops->cleanup_data)
ops->cleanup_data(reply_data);
+err_rep:
kfree(reply_data);
kfree(req_info);
return;
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 439cfb7a..1b1b700 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -33,14 +33,14 @@ static void hsr_set_operstate(struct hsr_port *master, bool has_carrier)
struct net_device *dev = master->dev;
if (!is_admin_up(dev)) {
- netdev_set_operstate(dev, IF_OPER_DOWN);
+ netif_set_operstate(dev, IF_OPER_DOWN);
return;
}
if (has_carrier)
- netdev_set_operstate(dev, IF_OPER_UP);
+ netif_set_operstate(dev, IF_OPER_UP);
else
- netdev_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
+ netif_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
}
static bool hsr_check_carrier(struct hsr_port *master)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2cffb8f..9ba83f0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3154,12 +3154,13 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
rtnl_net_lock(net);
dev = __dev_get_by_index(net, ireq.ifr6_ifindex);
- netdev_lock_ops(dev);
- if (dev)
+ if (dev) {
+ netdev_lock_ops(dev);
err = inet6_addr_add(net, dev, &cfg, 0, 0, NULL);
- else
+ netdev_unlock_ops(dev);
+ } else {
err = -ENODEV;
- netdev_unlock_ops(dev);
+ }
rtnl_net_unlock(net);
return err;
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ab12b81..210b84c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -470,10 +470,10 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
goto out;
hash = fl6->mp_hash;
- if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
- rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
- strict) >= 0) {
- match = first;
+ if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound)) {
+ if (rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
+ strict) >= 0)
+ match = first;
goto out;
}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 409bd41..24c2de1 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -899,13 +899,17 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
goto dispose_child;
}
- if (!subflow_hmac_valid(req, &mp_opt) ||
- !mptcp_can_accept_new_subflow(subflow_req->msk)) {
+ if (!subflow_hmac_valid(req, &mp_opt)) {
SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC);
subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
goto dispose_child;
}
+ if (!mptcp_can_accept_new_subflow(owner)) {
+ subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
+ goto dispose_child;
+ }
+
/* move the msk reference ownership to the subflow */
subflow_req->msk = NULL;
ctx->conn = (struct sock *)owner;
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index df2dc21..047ba81 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -212,7 +212,7 @@
bool 'SCTP protocol connection tracking support'
depends on NETFILTER_ADVANCED
default y
- select LIBCRC32C
+ select CRC32
help
With this option enabled, the layer 3 independent connection
tracking code will be able to do state tracking on SCTP connections.
@@ -475,7 +475,7 @@
config NF_TABLES
select NETFILTER_NETLINK
- select LIBCRC32C
+ select CRC32
tristate "Netfilter nf_tables support"
help
nftables is the new packet classification framework that intends to
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 2a3017b..8c5b1fe 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -105,7 +105,7 @@
config IP_VS_PROTO_SCTP
bool "SCTP load balancing support"
- select LIBCRC32C
+ select CRC32
help
This option enables support for load balancing SCTP transport
protocol. Say Y if unsure.
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index b8d3c32..c15db28 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -994,8 +994,9 @@ static int nft_pipapo_avx2_lookup_8b_16(unsigned long *map, unsigned long *fill,
NFT_PIPAPO_AVX2_BUCKET_LOAD8(5, lt, 8, pkt[8], bsize);
NFT_PIPAPO_AVX2_AND(6, 2, 3);
+ NFT_PIPAPO_AVX2_AND(3, 4, 7);
NFT_PIPAPO_AVX2_BUCKET_LOAD8(7, lt, 9, pkt[9], bsize);
- NFT_PIPAPO_AVX2_AND(0, 4, 5);
+ NFT_PIPAPO_AVX2_AND(0, 3, 5);
NFT_PIPAPO_AVX2_BUCKET_LOAD8(1, lt, 10, pkt[10], bsize);
NFT_PIPAPO_AVX2_AND(2, 6, 7);
NFT_PIPAPO_AVX2_BUCKET_LOAD8(3, lt, 11, pkt[11], bsize);
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2535f3f..5481bd56 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -11,7 +11,7 @@
(!NF_NAT || NF_NAT) && \
(!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
depends on PSAMPLE || !PSAMPLE
- select LIBCRC32C
+ select CRC32
select MPLS
select NET_MPLS_GSO
select DST_CACHE
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8180d0c1..a800127 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -784,7 +784,7 @@
config NET_ACT_CSUM
tristate "Checksum Updating"
depends on NET_CLS_ACT && INET
- select LIBCRC32C
+ select CRC32
help
Say Y here to update some common checksum after some direct
packet alterations.
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4f648af..ecec0a1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2057,6 +2057,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
struct tcmsg *tcm;
struct nlmsghdr *nlh;
unsigned char *b = skb_tail_pointer(skb);
+ int ret = -EMSGSIZE;
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
if (!nlh)
@@ -2101,11 +2102,45 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
return skb->len;
+cls_op_not_supp:
+ ret = -EOPNOTSUPP;
out_nlmsg_trim:
nla_put_failure:
-cls_op_not_supp:
nlmsg_trim(skb, b);
- return -1;
+ return ret;
+}
+
+static struct sk_buff *tfilter_notify_prep(struct net *net,
+ struct sk_buff *oskb,
+ struct nlmsghdr *n,
+ struct tcf_proto *tp,
+ struct tcf_block *block,
+ struct Qdisc *q, u32 parent,
+ void *fh, int event,
+ u32 portid, bool rtnl_held,
+ struct netlink_ext_ack *extack)
+{
+ unsigned int size = oskb ? max(NLMSG_GOODSIZE, oskb->len) : NLMSG_GOODSIZE;
+ struct sk_buff *skb;
+ int ret;
+
+retry:
+ skb = alloc_skb(size, GFP_KERNEL);
+ if (!skb)
+ return ERR_PTR(-ENOBUFS);
+
+ ret = tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
+ n->nlmsg_seq, n->nlmsg_flags, event, false,
+ rtnl_held, extack);
+ if (ret <= 0) {
+ kfree_skb(skb);
+ if (ret == -EMSGSIZE) {
+ size += NLMSG_GOODSIZE;
+ goto retry;
+ }
+ return ERR_PTR(-EINVAL);
+ }
+ return skb;
}
static int tfilter_notify(struct net *net, struct sk_buff *oskb,
@@ -2121,16 +2156,10 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
if (!unicast && !rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC))
return 0;
- skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOBUFS;
-
- if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
- n->nlmsg_seq, n->nlmsg_flags, event,
- false, rtnl_held, extack) <= 0) {
- kfree_skb(skb);
- return -EINVAL;
- }
+ skb = tfilter_notify_prep(net, oskb, n, tp, block, q, parent, fh, event,
+ portid, rtnl_held, extack);
+ if (IS_ERR(skb))
+ return PTR_ERR(skb);
if (unicast)
err = rtnl_unicast(skb, net, portid);
@@ -2153,16 +2182,11 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
if (!rtnl_notify_needed(net, n->nlmsg_flags, RTNLGRP_TC))
return tp->ops->delete(tp, fh, last, rtnl_held, extack);
- skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOBUFS;
-
- if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
- n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER,
- false, rtnl_held, extack) <= 0) {
+ skb = tfilter_notify_prep(net, oskb, n, tp, block, q, parent, fh,
+ RTM_DELTFILTER, portid, rtnl_held, extack);
+ if (IS_ERR(skb)) {
NL_SET_ERR_MSG(extack, "Failed to build del event notification");
- kfree_skb(skb);
- return -EINVAL;
+ return PTR_ERR(skb);
}
err = tp->ops->delete(tp, fh, last, rtnl_held, extack);
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 81189d0..12dd711 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -65,10 +65,7 @@ static struct sk_buff *codel_qdisc_dequeue(struct Qdisc *sch)
&q->stats, qdisc_pkt_len, codel_get_enqueue_time,
drop_func, dequeue_func);
- /* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
- * or HTB crashes. Defer it for next round.
- */
- if (q->stats.drop_count && sch->q.qlen) {
+ if (q->stats.drop_count) {
qdisc_tree_reduce_backlog(sch, q->stats.drop_count, q->stats.drop_len);
q->stats.drop_count = 0;
q->stats.drop_len = 0;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index c69b999..e0a81d3 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -105,6 +105,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
return -ENOBUFS;
gnet_stats_basic_sync_init(&cl->bstats);
+ INIT_LIST_HEAD(&cl->alist);
cl->common.classid = classid;
cl->quantum = quantum;
cl->qdisc = qdisc_create_dflt(sch->dev_queue,
@@ -229,7 +230,7 @@ static void drr_qlen_notify(struct Qdisc *csh, unsigned long arg)
{
struct drr_class *cl = (struct drr_class *)arg;
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
}
static int drr_dump_class(struct Qdisc *sch, unsigned long arg,
@@ -390,7 +391,7 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
if (unlikely(skb == NULL))
goto out;
if (cl->qdisc->q.qlen == 0)
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
bstats_update(&cl->bstats, skb);
qdisc_bstats_update(sch, skb);
@@ -431,7 +432,7 @@ static void drr_reset_qdisc(struct Qdisc *sch)
for (i = 0; i < q->clhash.hashsize; i++) {
hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) {
if (cl->qdisc->q.qlen)
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
qdisc_reset(cl->qdisc);
}
}
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index 516038a..c3bdeb1 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -293,7 +293,7 @@ static void ets_class_qlen_notify(struct Qdisc *sch, unsigned long arg)
* to remove them.
*/
if (!ets_class_is_strict(q, cl) && sch->q.qlen)
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
}
static int ets_class_dump(struct Qdisc *sch, unsigned long arg,
@@ -488,7 +488,7 @@ static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
if (unlikely(!skb))
goto out;
if (cl->qdisc->q.qlen == 0)
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
return ets_qdisc_dequeue_skb(sch, skb);
}
@@ -657,7 +657,7 @@ static int ets_qdisc_change(struct Qdisc *sch, struct nlattr *opt,
}
for (i = q->nbands; i < oldbands; i++) {
if (i >= q->nstrict && q->classes[i].qdisc->q.qlen)
- list_del(&q->classes[i].alist);
+ list_del_init(&q->classes[i].alist);
qdisc_tree_flush_backlog(q->classes[i].qdisc);
}
WRITE_ONCE(q->nstrict, nstrict);
@@ -713,7 +713,7 @@ static void ets_qdisc_reset(struct Qdisc *sch)
for (band = q->nstrict; band < q->nbands; band++) {
if (q->classes[band].qdisc->q.qlen)
- list_del(&q->classes[band].alist);
+ list_del_init(&q->classes[band].alist);
}
for (band = 0; band < q->nbands; band++)
qdisc_reset(q->classes[band].qdisc);
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 799f539..6c9029f 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -315,10 +315,8 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
}
qdisc_bstats_update(sch, skb);
flow->deficit -= qdisc_pkt_len(skb);
- /* We cant call qdisc_tree_reduce_backlog() if our qlen is 0,
- * or HTB crashes. Defer it for next round.
- */
- if (q->cstats.drop_count && sch->q.qlen) {
+
+ if (q->cstats.drop_count) {
qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
q->cstats.drop_len);
q->cstats.drop_count = 0;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index c287bf8..ce5045e 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -203,7 +203,10 @@ eltree_insert(struct hfsc_class *cl)
static inline void
eltree_remove(struct hfsc_class *cl)
{
- rb_erase(&cl->el_node, &cl->sched->eligible);
+ if (!RB_EMPTY_NODE(&cl->el_node)) {
+ rb_erase(&cl->el_node, &cl->sched->eligible);
+ RB_CLEAR_NODE(&cl->el_node);
+ }
}
static inline void
@@ -1220,7 +1223,8 @@ hfsc_qlen_notify(struct Qdisc *sch, unsigned long arg)
/* vttree is now handled in update_vf() so that update_vf(cl, 0, 0)
* needs to be called explicitly to remove a class from vttree.
*/
- update_vf(cl, 0, 0);
+ if (cl->cl_nactive)
+ update_vf(cl, 0, 0);
if (cl->cl_flags & HFSC_RSC)
eltree_remove(cl);
}
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c31bc54..4b9a639 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1485,6 +1485,8 @@ static void htb_qlen_notify(struct Qdisc *sch, unsigned long arg)
{
struct htb_class *cl = (struct htb_class *)arg;
+ if (!cl->prio_activity)
+ return;
htb_deactivate(qdisc_priv(sch), cl);
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 2cfbc97..687a932 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -347,7 +347,7 @@ static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl)
struct qfq_aggregate *agg = cl->agg;
- list_del(&cl->alist); /* remove from RR queue of the aggregate */
+ list_del_init(&cl->alist); /* remove from RR queue of the aggregate */
if (list_empty(&agg->active)) /* agg is now inactive */
qfq_deactivate_agg(q, agg);
}
@@ -474,6 +474,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
gnet_stats_basic_sync_init(&cl->bstats);
cl->common.classid = classid;
cl->deficit = lmax;
+ INIT_LIST_HEAD(&cl->alist);
cl->qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
classid, NULL);
@@ -982,7 +983,7 @@ static struct sk_buff *agg_dequeue(struct qfq_aggregate *agg,
cl->deficit -= (int) len;
if (cl->qdisc->q.qlen == 0) /* no more packets, remove from list */
- list_del(&cl->alist);
+ list_del_init(&cl->alist);
else if (cl->deficit < qdisc_pkt_len(cl->qdisc->ops->peek(cl->qdisc))) {
cl->deficit += agg->lmax;
list_move_tail(&cl->alist, &agg->active);
@@ -1415,6 +1416,8 @@ static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
struct qfq_sched *q = qdisc_priv(sch);
struct qfq_class *cl = (struct qfq_class *)arg;
+ if (list_empty(&cl->alist))
+ return;
qfq_deactivate_class(q, cl);
}
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 9ed197e..b912ad9 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -631,6 +631,15 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
struct red_parms *p = NULL;
struct sk_buff *to_free = NULL;
struct sk_buff *tail = NULL;
+ unsigned int maxflows;
+ unsigned int quantum;
+ unsigned int divisor;
+ int perturb_period;
+ u8 headdrop;
+ u8 maxdepth;
+ int limit;
+ u8 flags;
+
if (opt->nla_len < nla_attr_size(sizeof(*ctl)))
return -EINVAL;
@@ -652,39 +661,64 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
if (!p)
return -ENOMEM;
}
- if (ctl->limit == 1) {
- NL_SET_ERR_MSG_MOD(extack, "invalid limit");
- return -EINVAL;
- }
+
sch_tree_lock(sch);
+
+ limit = q->limit;
+ divisor = q->divisor;
+ headdrop = q->headdrop;
+ maxdepth = q->maxdepth;
+ maxflows = q->maxflows;
+ perturb_period = q->perturb_period;
+ quantum = q->quantum;
+ flags = q->flags;
+
+ /* update and validate configuration */
if (ctl->quantum)
- q->quantum = ctl->quantum;
- WRITE_ONCE(q->perturb_period, ctl->perturb_period * HZ);
+ quantum = ctl->quantum;
+ perturb_period = ctl->perturb_period * HZ;
if (ctl->flows)
- q->maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS);
+ maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS);
if (ctl->divisor) {
- q->divisor = ctl->divisor;
- q->maxflows = min_t(u32, q->maxflows, q->divisor);
+ divisor = ctl->divisor;
+ maxflows = min_t(u32, maxflows, divisor);
}
if (ctl_v1) {
if (ctl_v1->depth)
- q->maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH);
+ maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH);
if (p) {
- swap(q->red_parms, p);
- red_set_parms(q->red_parms,
+ red_set_parms(p,
ctl_v1->qth_min, ctl_v1->qth_max,
ctl_v1->Wlog,
ctl_v1->Plog, ctl_v1->Scell_log,
NULL,
ctl_v1->max_P);
}
- q->flags = ctl_v1->flags;
- q->headdrop = ctl_v1->headdrop;
+ flags = ctl_v1->flags;
+ headdrop = ctl_v1->headdrop;
}
if (ctl->limit) {
- q->limit = min_t(u32, ctl->limit, q->maxdepth * q->maxflows);
- q->maxflows = min_t(u32, q->maxflows, q->limit);
+ limit = min_t(u32, ctl->limit, maxdepth * maxflows);
+ maxflows = min_t(u32, maxflows, limit);
}
+ if (limit == 1) {
+ sch_tree_unlock(sch);
+ kfree(p);
+ NL_SET_ERR_MSG_MOD(extack, "invalid limit");
+ return -EINVAL;
+ }
+
+ /* commit configuration */
+ q->limit = limit;
+ q->divisor = divisor;
+ q->headdrop = headdrop;
+ q->maxdepth = maxdepth;
+ q->maxflows = maxflows;
+ WRITE_ONCE(q->perturb_period, perturb_period);
+ q->quantum = quantum;
+ q->flags = flags;
+ if (p)
+ swap(q->red_parms, p);
qlen = sch->q.qlen;
while (sch->q.qlen > q->limit) {
diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index 5da599f..d18a72d 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -7,10 +7,10 @@
tristate "The SCTP Protocol"
depends on INET
depends on IPV6 || IPV6=n
+ select CRC32
select CRYPTO
select CRYPTO_HMAC
select CRYPTO_SHA1
- select LIBCRC32C
select NET_UDP_TUNNEL
help
Stream Control Transmission Protocol
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 36ee34f..53725ee 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -72,8 +72,9 @@
/* Forward declarations for internal helper functions. */
static bool sctp_writeable(const struct sock *sk);
static void sctp_wfree(struct sk_buff *skb);
-static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
- size_t msg_len);
+static int sctp_wait_for_sndbuf(struct sctp_association *asoc,
+ struct sctp_transport *transport,
+ long *timeo_p, size_t msg_len);
static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p);
static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
static int sctp_wait_for_accept(struct sock *sk, long timeo);
@@ -1828,7 +1829,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
if (sctp_wspace(asoc) <= 0 || !sk_wmem_schedule(sk, msg_len)) {
timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
- err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
+ err = sctp_wait_for_sndbuf(asoc, transport, &timeo, msg_len);
if (err)
goto err;
if (unlikely(sinfo->sinfo_stream >= asoc->stream.outcnt)) {
@@ -9214,8 +9215,9 @@ void sctp_sock_rfree(struct sk_buff *skb)
/* Helper function to wait for space in the sndbuf. */
-static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
- size_t msg_len)
+static int sctp_wait_for_sndbuf(struct sctp_association *asoc,
+ struct sctp_transport *transport,
+ long *timeo_p, size_t msg_len)
{
struct sock *sk = asoc->base.sk;
long current_timeo = *timeo_p;
@@ -9225,7 +9227,9 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc,
*timeo_p, msg_len);
- /* Increment the association's refcnt. */
+ /* Increment the transport and association's refcnt. */
+ if (transport)
+ sctp_transport_hold(transport);
sctp_association_hold(asoc);
/* Wait on the association specific sndbuf space. */
@@ -9234,7 +9238,7 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
TASK_INTERRUPTIBLE);
if (asoc->base.dead)
goto do_dead;
- if (!*timeo_p)
+ if ((!*timeo_p) || (transport && transport->dead))
goto do_nonblock;
if (sk->sk_err || asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)
goto do_error;
@@ -9259,7 +9263,9 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
out:
finish_wait(&asoc->wait, &wait);
- /* Release the association's refcnt. */
+ /* Release the transport and association's refcnt. */
+ if (transport)
+ sctp_transport_put(transport);
sctp_association_put(asoc);
return err;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 59675f6..6946c14 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -117,6 +117,8 @@ struct sctp_transport *sctp_transport_new(struct net *net,
*/
void sctp_transport_free(struct sctp_transport *transport)
{
+ transport->dead = 1;
+
/* Try to delete the heartbeat timer. */
if (timer_delete(&transport->hb_timer))
sctp_transport_put(transport);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 50c2e08..18be6ff 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1046,6 +1046,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) {
if (imp == TIPC_SYSTEM_IMPORTANCE) {
pr_warn("%s<%s>, link overflow", link_rst_msg, l->name);
+ __skb_queue_purge(list);
return -ENOBUFS;
}
rc = link_schedule_user(l, hdr);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index cb86b0b..a3ccb313 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -852,6 +852,11 @@ static int tls_setsockopt(struct sock *sk, int level, int optname,
return do_tls_setsockopt(sk, optname, optval, optlen);
}
+static int tls_disconnect(struct sock *sk, int flags)
+{
+ return -EOPNOTSUPP;
+}
+
struct tls_context *tls_ctx_create(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -947,6 +952,7 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_BASE][TLS_BASE] = *base;
prot[TLS_BASE][TLS_BASE].setsockopt = tls_setsockopt;
prot[TLS_BASE][TLS_BASE].getsockopt = tls_getsockopt;
+ prot[TLS_BASE][TLS_BASE].disconnect = tls_disconnect;
prot[TLS_BASE][TLS_BASE].close = tls_sk_proto_close;
prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 33d861c..3ce7b54 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -522,7 +522,7 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
case INAT_PFX_REPNE:
if (modrm == 0xca)
/* eretu/erets */
- insn->type = INSN_CONTEXT_SWITCH;
+ insn->type = INSN_SYSRET;
break;
default:
if (modrm == 0xca)
@@ -535,11 +535,15 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
insn->type = INSN_JUMP_CONDITIONAL;
- } else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
- op2 == 0x35) {
+ } else if (op2 == 0x05 || op2 == 0x34) {
- /* sysenter, sysret */
- insn->type = INSN_CONTEXT_SWITCH;
+ /* syscall, sysenter */
+ insn->type = INSN_SYSCALL;
+
+ } else if (op2 == 0x07 || op2 == 0x35) {
+
+ /* sysret, sysexit */
+ insn->type = INSN_SYSRET;
} else if (op2 == 0x0b || op2 == 0xb9) {
@@ -676,7 +680,7 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
case 0xca: /* retf */
case 0xcb: /* retf */
- insn->type = INSN_CONTEXT_SWITCH;
+ insn->type = INSN_SYSRET;
break;
case 0xe0: /* loopne */
@@ -721,7 +725,7 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
} else if (modrm_reg == 5) {
/* jmpf */
- insn->type = INSN_CONTEXT_SWITCH;
+ insn->type = INSN_SYSRET;
} else if (modrm_reg == 6) {
diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
index 403e587..06ca4a2 100644
--- a/tools/objtool/arch/x86/special.c
+++ b/tools/objtool/arch/x86/special.c
@@ -126,7 +126,7 @@ struct reloc *arch_find_switch_table(struct objtool_file *file,
* indicates a rare GCC quirk/bug which can leave dead
* code behind.
*/
- if (reloc_type(text_reloc) == R_X86_64_PC32) {
+ if (!file->ignore_unreachables && reloc_type(text_reloc) == R_X86_64_PC32) {
WARN_INSN(insn, "ignoring unreachables due to jump table quirk");
file->ignore_unreachables = true;
}
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4a1f6c3..b649049 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3505,6 +3505,34 @@ static struct instruction *next_insn_to_validate(struct objtool_file *file,
return next_insn_same_sec(file, alt_group->orig_group->last_insn);
}
+static bool skip_alt_group(struct instruction *insn)
+{
+ struct instruction *alt_insn = insn->alts ? insn->alts->insn : NULL;
+
+ /* ANNOTATE_IGNORE_ALTERNATIVE */
+ if (insn->alt_group && insn->alt_group->ignore)
+ return true;
+
+ /*
+ * For NOP patched with CLAC/STAC, only follow the latter to avoid
+ * impossible code paths combining patched CLAC with unpatched STAC
+ * or vice versa.
+ *
+ * ANNOTATE_IGNORE_ALTERNATIVE could have been used here, but Linus
+ * requested not to do that to avoid hurting .s file readability
+ * around CLAC/STAC alternative sites.
+ */
+
+ if (!alt_insn)
+ return false;
+
+ /* Don't override ASM_{CLAC,STAC}_UNSAFE */
+ if (alt_insn->alt_group && alt_insn->alt_group->ignore)
+ return false;
+
+ return alt_insn->type == INSN_CLAC || alt_insn->type == INSN_STAC;
+}
+
/*
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
@@ -3625,7 +3653,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
}
}
- if (insn->alt_group && insn->alt_group->ignore)
+ if (skip_alt_group(insn))
return 0;
if (handle_insn_ops(insn, next_insn, &state))
@@ -3684,14 +3712,20 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
break;
- case INSN_CONTEXT_SWITCH:
- if (func) {
- if (!next_insn || !next_insn->hint) {
- WARN_INSN(insn, "unsupported instruction in callable function");
- return 1;
- }
- break;
+ case INSN_SYSCALL:
+ if (func && (!next_insn || !next_insn->hint)) {
+ WARN_INSN(insn, "unsupported instruction in callable function");
+ return 1;
}
+
+ break;
+
+ case INSN_SYSRET:
+ if (func && (!next_insn || !next_insn->hint)) {
+ WARN_INSN(insn, "unsupported instruction in callable function");
+ return 1;
+ }
+
return 0;
case INSN_STAC:
@@ -3886,6 +3920,12 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
WARN_INSN(insn, "RET before UNTRAIN");
return 1;
+ case INSN_SYSCALL:
+ break;
+
+ case INSN_SYSRET:
+ return 0;
+
case INSN_NOP:
if (insn->retpoline_safe)
return 0;
@@ -3895,6 +3935,9 @@ static int validate_unret(struct objtool_file *file, struct instruction *insn)
break;
}
+ if (insn->dead_end)
+ return 0;
+
if (!next) {
WARN_INSN(insn, "teh end!");
return 1;
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 089a1ac..01ef6f4 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -19,7 +19,8 @@ enum insn_type {
INSN_CALL,
INSN_CALL_DYNAMIC,
INSN_RETURN,
- INSN_CONTEXT_SWITCH,
+ INSN_SYSCALL,
+ INSN_SYSRET,
INSN_BUG,
INSN_NOP,
INSN_STAC,
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index da53a70..c176487 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -809,6 +809,10 @@
test.log.extend(parse_diagnostic(lines))
if test.name != "" and not peek_test_name_match(lines, test):
test.add_error(printer, 'missing subtest result line!')
+ elif not lines:
+ print_log(test.log, printer)
+ test.status = TestStatus.NO_TESTS
+ test.add_error(printer, 'No more test results!')
else:
parse_test_result(lines, test, expected_num, printer)
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 5ff4f6f..bbba921 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -371,8 +371,8 @@
"""
result = kunit_parser.parse_run_tests(output.splitlines(), stdout)
# Missing test results after test plan should alert a suspected test crash.
- self.assertEqual(kunit_parser.TestStatus.TEST_CRASHED, result.status)
- self.assertEqual(result.counts, kunit_parser.TestCounts(passed=1, crashed=1, errors=1))
+ self.assertEqual(kunit_parser.TestStatus.SUCCESS, result.status)
+ self.assertEqual(result.counts, kunit_parser.TestCounts(passed=1, errors=2))
def line_stream_from_strs(strs: Iterable[str]) -> kunit_parser.LineStream:
return kunit_parser.LineStream(enumerate(strs, start=1))
diff --git a/tools/testing/selftests/.gitignore b/tools/testing/selftests/.gitignore
index cb24124..674aaa02 100644
--- a/tools/testing/selftests/.gitignore
+++ b/tools/testing/selftests/.gitignore
@@ -4,7 +4,6 @@
gpioinclude/
gpiolsgpio
kselftest_install/
-tpm2/SpaceTest.log
# Python bytecode and cache
__pycache__/
diff --git a/tools/testing/selftests/bpf/config.x86_64 b/tools/testing/selftests/bpf/config.x86_64
index 5680bef..5e713ef 100644
--- a/tools/testing/selftests/bpf/config.x86_64
+++ b/tools/testing/selftests/bpf/config.x86_64
@@ -39,7 +39,6 @@
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPUSETS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_XXHASH=y
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 400a696..a17256d 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -88,22 +88,32 @@
# If isolated CPUs have been reserved at boot time (as shown in
# cpuset.cpus.isolated), these isolated CPUs should be outside of CPUs 0-8
# that will be used by this script for testing purpose. If not, some of
-# the tests may fail incorrectly. These pre-isolated CPUs should stay in
-# an isolated state throughout the testing process for now.
+# the tests may fail incorrectly. Wait a bit and retry again just in case
+# these isolated CPUs are leftover from previous run and have just been
+# cleaned up earlier in this script.
+#
+# These pre-isolated CPUs should stay in an isolated state throughout the
+# testing process for now.
#
BOOT_ISOLCPUS=$(cat $CGROUP2/cpuset.cpus.isolated)
+[[ -n "$BOOT_ISOLCPUS" ]] && {
+ sleep 0.5
+ BOOT_ISOLCPUS=$(cat $CGROUP2/cpuset.cpus.isolated)
+}
if [[ -n "$BOOT_ISOLCPUS" ]]
then
[[ $(echo $BOOT_ISOLCPUS | sed -e "s/[,-].*//") -le 8 ]] &&
skip_test "Pre-isolated CPUs ($BOOT_ISOLCPUS) overlap CPUs to be tested"
echo "Pre-isolated CPUs: $BOOT_ISOLCPUS"
fi
+
cleanup()
{
online_cpus
cd $CGROUP2
- rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1
- rmdir test > /dev/null 2>&1
+ rmdir A1/A2/A3 A1/A2 A1 B1 test/A1 test/B1 test > /dev/null 2>&1
+ rmdir rtest/p1/c11 rtest/p1/c12 rtest/p2/c21 \
+ rtest/p2/c22 rtest/p1 rtest/p2 rtest > /dev/null 2>&1
[[ -n "$SCHED_DEBUG" ]] &&
echo "$SCHED_DEBUG" > /sys/kernel/debug/sched/verbose
}
@@ -173,14 +183,22 @@
#
# Cgroup test hierarchy
#
-# root -- A1 -- A2 -- A3
-# +- B1
+# root
+# |
+# +------+------+
+# | |
+# A1 B1
+# |
+# A2
+# |
+# A3
#
# P<v> = set cpus.partition (0:member, 1:root, 2:isolated)
# C<l> = add cpu-list to cpuset.cpus
# X<l> = add cpu-list to cpuset.cpus.exclusive
# S<p> = use prefix in subtree_control
# T = put a task into cgroup
+# CX<l> = add cpu-list to both cpuset.cpus and cpuset.cpus.exclusive
# O<c>=<v> = Write <v> to CPU online file of <c>
#
# ECPUs - effective CPUs of cpusets
@@ -207,130 +225,129 @@
" C0-1:P1 . . C2-3 S+:C4-5 . . . 0 A1:4-5"
" C0-1 . . C2-3:P1 . . . C2 0 "
" C0-1 . . C2-3:P1 . . . C4-5 0 B1:4-5"
- "C0-3:P1:S+ C2-3:P1 . . . . . . 0 A1:0-1,A2:2-3"
- "C0-3:P1:S+ C2-3:P1 . . C1-3 . . . 0 A1:1,A2:2-3"
- "C2-3:P1:S+ C3:P1 . . C3 . . . 0 A1:,A2:3 A1:P1,A2:P1"
- "C2-3:P1:S+ C3:P1 . . C3 P0 . . 0 A1:3,A2:3 A1:P1,A2:P0"
- "C2-3:P1:S+ C2:P1 . . C2-4 . . . 0 A1:3-4,A2:2"
- "C2-3:P1:S+ C3:P1 . . C3 . . C0-2 0 A1:,B1:0-2 A1:P1,A2:P1"
- "$SETUP_A123_PARTITIONS . C2-3 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
+ "C0-3:P1:S+ C2-3:P1 . . . . . . 0 A1:0-1|A2:2-3|XA2:2-3"
+ "C0-3:P1:S+ C2-3:P1 . . C1-3 . . . 0 A1:1|A2:2-3|XA2:2-3"
+ "C2-3:P1:S+ C3:P1 . . C3 . . . 0 A1:|A2:3|XA2:3 A1:P1|A2:P1"
+ "C2-3:P1:S+ C3:P1 . . C3 P0 . . 0 A1:3|A2:3 A1:P1|A2:P0"
+ "C2-3:P1:S+ C2:P1 . . C2-4 . . . 0 A1:3-4|A2:2"
+ "C2-3:P1:S+ C3:P1 . . C3 . . C0-2 0 A1:|B1:0-2 A1:P1|A2:P1"
+ "$SETUP_A123_PARTITIONS . C2-3 . . . 0 A1:|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
# CPU offlining cases:
- " C0-1 . . C2-3 S+ C4-5 . O2=0 0 A1:0-1,B1:3"
- "C0-3:P1:S+ C2-3:P1 . . O2=0 . . . 0 A1:0-1,A2:3"
- "C0-3:P1:S+ C2-3:P1 . . O2=0 O2=1 . . 0 A1:0-1,A2:2-3"
- "C0-3:P1:S+ C2-3:P1 . . O1=0 . . . 0 A1:0,A2:2-3"
- "C0-3:P1:S+ C2-3:P1 . . O1=0 O1=1 . . 0 A1:0-1,A2:2-3"
- "C2-3:P1:S+ C3:P1 . . O3=0 O3=1 . . 0 A1:2,A2:3 A1:P1,A2:P1"
- "C2-3:P1:S+ C3:P2 . . O3=0 O3=1 . . 0 A1:2,A2:3 A1:P1,A2:P2"
- "C2-3:P1:S+ C3:P1 . . O2=0 O2=1 . . 0 A1:2,A2:3 A1:P1,A2:P1"
- "C2-3:P1:S+ C3:P2 . . O2=0 O2=1 . . 0 A1:2,A2:3 A1:P1,A2:P2"
- "C2-3:P1:S+ C3:P1 . . O2=0 . . . 0 A1:,A2:3 A1:P1,A2:P1"
- "C2-3:P1:S+ C3:P1 . . O3=0 . . . 0 A1:2,A2: A1:P1,A2:P1"
- "C2-3:P1:S+ C3:P1 . . T:O2=0 . . . 0 A1:3,A2:3 A1:P1,A2:P-1"
- "C2-3:P1:S+ C3:P1 . . . T:O3=0 . . 0 A1:2,A2:2 A1:P1,A2:P-1"
- "$SETUP_A123_PARTITIONS . O1=0 . . . 0 A1:,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . O2=0 . . . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . O3=0 . . . 0 A1:1,A2:2,A3: A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . T:O1=0 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1"
- "$SETUP_A123_PARTITIONS . . T:O2=0 . . 0 A1:1,A2:3,A3:3 A1:P1,A2:P1,A3:P-1"
- "$SETUP_A123_PARTITIONS . . . T:O3=0 . 0 A1:1,A2:2,A3:2 A1:P1,A2:P1,A3:P-1"
- "$SETUP_A123_PARTITIONS . T:O1=0 O1=1 . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . . T:O2=0 O2=1 . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . . . T:O3=0 O3=1 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O1=1 . 0 A1:1,A2:,A3:3 A1:P1,A2:P1,A3:P1"
- "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O2=1 . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1"
+ " C0-1 . . C2-3 S+ C4-5 . O2=0 0 A1:0-1|B1:3"
+ "C0-3:P1:S+ C2-3:P1 . . O2=0 . . . 0 A1:0-1|A2:3"
+ "C0-3:P1:S+ C2-3:P1 . . O2=0 O2=1 . . 0 A1:0-1|A2:2-3"
+ "C0-3:P1:S+ C2-3:P1 . . O1=0 . . . 0 A1:0|A2:2-3"
+ "C0-3:P1:S+ C2-3:P1 . . O1=0 O1=1 . . 0 A1:0-1|A2:2-3"
+ "C2-3:P1:S+ C3:P1 . . O3=0 O3=1 . . 0 A1:2|A2:3 A1:P1|A2:P1"
+ "C2-3:P1:S+ C3:P2 . . O3=0 O3=1 . . 0 A1:2|A2:3 A1:P1|A2:P2"
+ "C2-3:P1:S+ C3:P1 . . O2=0 O2=1 . . 0 A1:2|A2:3 A1:P1|A2:P1"
+ "C2-3:P1:S+ C3:P2 . . O2=0 O2=1 . . 0 A1:2|A2:3 A1:P1|A2:P2"
+ "C2-3:P1:S+ C3:P1 . . O2=0 . . . 0 A1:|A2:3 A1:P1|A2:P1"
+ "C2-3:P1:S+ C3:P1 . . O3=0 . . . 0 A1:2|A2: A1:P1|A2:P1"
+ "C2-3:P1:S+ C3:P1 . . T:O2=0 . . . 0 A1:3|A2:3 A1:P1|A2:P-1"
+ "C2-3:P1:S+ C3:P1 . . . T:O3=0 . . 0 A1:2|A2:2 A1:P1|A2:P-1"
+ "$SETUP_A123_PARTITIONS . O1=0 . . . 0 A1:|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . O2=0 . . . 0 A1:1|A2:|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . O3=0 . . . 0 A1:1|A2:2|A3: A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . T:O1=0 . . . 0 A1:2-3|A2:2-3|A3:3 A1:P1|A2:P-1|A3:P-1"
+ "$SETUP_A123_PARTITIONS . . T:O2=0 . . 0 A1:1|A2:3|A3:3 A1:P1|A2:P1|A3:P-1"
+ "$SETUP_A123_PARTITIONS . . . T:O3=0 . 0 A1:1|A2:2|A3:2 A1:P1|A2:P1|A3:P-1"
+ "$SETUP_A123_PARTITIONS . T:O1=0 O1=1 . . 0 A1:1|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . . T:O2=0 O2=1 . 0 A1:1|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . . . T:O3=0 O3=1 0 A1:1|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O1=1 . 0 A1:1|A2:|A3:3 A1:P1|A2:P1|A3:P1"
+ "$SETUP_A123_PARTITIONS . T:O1=0 O2=0 O2=1 . 0 A1:2-3|A2:2-3|A3:3 A1:P1|A2:P-1|A3:P-1"
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
#
# Remote partition and cpuset.cpus.exclusive tests
#
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 . . . 0 A1:0-3,A2:1-3,A3:2-3,XA1:2-3"
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3:P2 . . 0 A1:0-1,A2:2-3,A3:2-3 A1:P0,A2:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2,A2:3,A3:3 A1:P0,A2:P2 3"
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3,A2:1-3,A3:2-3,B1:2-3 A1:P0,A3:P0,B1:P-2"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 . . . 0 A1:0-3|A2:1-3|A3:2-3|XA1:2-3"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3:P2 . . 0 A1:0-1|A2:2-3|A3:2-3 A1:P0|A2:P2 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
" C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
- " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4"
- " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3,B1:4 A3:P2,B1:P2 2-4"
- " C0-3:S+ C1-3:S+ C2-3 C4 X1-3 X1-3:P2 P2 . 0 A2:1,A3:2-3 A2:P2,A3:P2 1-3"
- " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2:C4-5 0 A3:2-3,B1:4-5 A3:P2,B1:P2 2-5"
- " C4:X0-3:S+ X1-3:S+ X2-3 . . P2 . . 0 A1:4,A2:1-3,A3:1-3 A2:P2 1-3"
- " C4:X0-3:S+ X1-3:S+ X2-3 . . . P2 . 0 A1:4,A2:4,A3:2-3 A3:P2 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X1-3 X1-3:P2 P2 . 0 A2:1|A3:2-3 A2:P2|A3:P2 1-3"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2:C4-5 0 A3:2-3|B1:4-5 A3:P2|B1:P2 2-5"
+ " C4:X0-3:S+ X1-3:S+ X2-3 . . P2 . . 0 A1:4|A2:1-3|A3:1-3 A2:P2 1-3"
+ " C4:X0-3:S+ X1-3:S+ X2-3 . . . P2 . 0 A1:4|A2:4|A3:2-3 A3:P2 2-3"
# Nested remote/local partition tests
- " C0-3:S+ C1-3:S+ C2-3 C4-5 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4-5 \
- A1:P0,A2:P1,A3:P2,B1:P1 2-3"
- " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:,A3:2-3,B1:4 \
- A1:P0,A2:P1,A3:P2,B1:P1 2-4,2-3"
- " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 . P1 0 A1:0-1,A2:2-3,A3:2-3,B1:4 \
- A1:P0,A2:P1,A3:P0,B1:P1"
- " C0-3:S+ C1-3:S+ C3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1,A2:2,A3:3,B1:4 \
- A1:P0,A2:P1,A3:P2,B1:P1 2-4,3"
- " C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X4:P1 . 0 A1:0-1,A2:2-3,A3:4 \
- A1:P0,A2:P2,A3:P1 2-4,2-3"
- " C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X3-4:P1 . 0 A1:0-1,A2:2,A3:3-4 \
- A1:P0,A2:P2,A3:P1 2"
+ " C0-3:S+ C1-3:S+ C2-3 C4-5 X2-3 X2-3:P1 P2 P1 0 A1:0-1|A2:|A3:2-3|B1:4-5 \
+ A1:P0|A2:P1|A3:P2|B1:P1 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1|A2:|A3:2-3|B1:4 \
+ A1:P0|A2:P1|A3:P2|B1:P1 2-4|2-3"
+ " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3:P1 . P1 0 A1:0-1|A2:2-3|A3:2-3|B1:4 \
+ A1:P0|A2:P1|A3:P0|B1:P1"
+ " C0-3:S+ C1-3:S+ C3 C4 X2-3 X2-3:P1 P2 P1 0 A1:0-1|A2:2|A3:3|B1:4 \
+ A1:P0|A2:P1|A3:P2|B1:P1 2-4|3"
+ " C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X4:P1 . 0 A1:0-1|A2:2-3|A3:4 \
+ A1:P0|A2:P2|A3:P1 2-4|2-3"
+ " C0-4:S+ C1-4:S+ C2-4 . X2-4 X2-4:P2 X3-4:P1 . 0 A1:0-1|A2:2|A3:3-4 \
+ A1:P0|A2:P2|A3:P1 2"
" C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 \
- . . X5 . . 0 A1:0-4,A2:1-4,A3:2-4 \
- A1:P0,A2:P-2,A3:P-1"
+ . . X5 . . 0 A1:0-4|A2:1-4|A3:2-4 \
+ A1:P0|A2:P-2|A3:P-1 ."
" C0-4:X2-4:S+ C1-4:X2-4:S+:P2 C2-4:X4:P1 \
- . . . X1 . 0 A1:0-1,A2:2-4,A3:2-4 \
- A1:P0,A2:P2,A3:P-1 2-4"
+ . . . X1 . 0 A1:0-1|A2:2-4|A3:2-4 \
+ A1:P0|A2:P2|A3:P-1 2-4"
# Remote partition offline tests
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:O2=0 . 0 A1:0-1,A2:1,A3:3 A1:P0,A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:O2=0 O2=1 0 A1:0-1,A2:1,A3:2-3 A1:P0,A3:P2 2-3"
- " C0-3:S+ C1-3:S+ C3 . X2-3 X2-3 P2:O3=0 . 0 A1:0-2,A2:1-2,A3: A1:P0,A3:P2 3"
- " C0-3:S+ C1-3:S+ C3 . X2-3 X2-3 T:P2:O3=0 . 0 A1:0-2,A2:1-2,A3:1-2 A1:P0,A3:P-2 3,"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:O2=0 . 0 A1:0-1|A2:1|A3:3 A1:P0|A3:P2 2-3"
+ " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:O2=0 O2=1 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
+ " C0-3:S+ C1-3:S+ C3 . X2-3 X2-3 P2:O3=0 . 0 A1:0-2|A2:1-2|A3: A1:P0|A3:P2 3"
+ " C0-3:S+ C1-3:S+ C3 . X2-3 X2-3 T:P2:O3=0 . 0 A1:0-2|A2:1-2|A3:1-2 A1:P0|A3:P-2 3|"
# An invalidated remote partition cannot self-recover from hotplug
- " C0-3:S+ C1-3:S+ C2 . X2-3 X2-3 T:P2:O2=0 O2=1 0 A1:0-3,A2:1-3,A3:2 A1:P0,A3:P-2"
+ " C0-3:S+ C1-3:S+ C2 . X2-3 X2-3 T:P2:O2=0 O2=1 0 A1:0-3|A2:1-3|A3:2 A1:P0|A3:P-2 ."
# cpus.exclusive.effective clearing test
- " C0-3:S+ C1-3:S+ C2 . X2-3:X . . . 0 A1:0-3,A2:1-3,A3:2,XA1:"
+ " C0-3:S+ C1-3:S+ C2 . X2-3:X . . . 0 A1:0-3|A2:1-3|A3:2|XA1:"
# Invalid to valid remote partition transition test
- " C0-3:S+ C1-3 . . . X3:P2 . . 0 A1:0-3,A2:1-3,XA2: A2:P-2"
+ " C0-3:S+ C1-3 . . . X3:P2 . . 0 A1:0-3|A2:1-3|XA2: A2:P-2 ."
" C0-3:S+ C1-3:X3:P2
- . . X2-3 P2 . . 0 A1:0-2,A2:3,XA2:3 A2:P2 3"
+ . . X2-3 P2 . . 0 A1:0-2|A2:3|XA2:3 A2:P2 3"
# Invalid to valid local partition direct transition tests
- " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3,XA1:1-3,A2:1-3:XA2: A1:P2,A2:P-2 1-3"
- " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2,XA1:1-3,A2:3:XA2:3 A1:P2,A2:P2 1-3"
- " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4,B1:4-6 A1:P-2,B1:P0"
- " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3,B1:4-6 A1:P2,B1:P0 0-3"
- " C0-3:P2 . . C3-5:C4-5 . . . . 0 A1:0-3,B1:4-5 A1:P2,B1:P0 0-3"
+ " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
+ " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
+ " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
+ " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
# Local partition invalidation tests
" C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \
- . . . . . 0 A1:1,A2:2,A3:3 A1:P2,A2:P2,A3:P2 1-3"
+ . . . . . 0 A1:1|A2:2|A3:3 A1:P2|A2:P2|A3:P2 1-3"
" C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \
- . . X4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3"
+ . . X4 . . 0 A1:1-3|A2:1-3|A3:2-3|XA2:|XA3: A1:P2|A2:P-2|A3:P-2 1-3"
" C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \
- . . C4:X . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3"
+ . . C4:X . . 0 A1:1-3|A2:1-3|A3:2-3|XA2:|XA3: A1:P2|A2:P-2|A3:P-2 1-3"
# Local partition CPU change tests
- " C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2,A2:3-5 A1:P2,A2:P1 0-2"
- " C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3,A2:4-5 A1:P2,A2:P1 1-3"
+ " C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2|A2:3-5 A1:P2|A2:P1 0-2"
+ " C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3|A2:4-5 A1:P2|A2:P1 1-3"
# cpus_allowed/exclusive_cpus update tests
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
- . X:C4 . P2 . 0 A1:4,A2:4,XA2:,XA3:,A3:4 \
- A1:P0,A3:P-2"
+ . X:C4 . P2 . 0 A1:4|A2:4|XA2:|XA3:|A3:4 \
+ A1:P0|A3:P-2 ."
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
- . X1 . P2 . 0 A1:0-3,A2:1-3,XA1:1,XA2:,XA3:,A3:2-3 \
- A1:P0,A3:P-2"
+ . X1 . P2 . 0 A1:0-3|A2:1-3|XA1:1|XA2:|XA3:|A3:2-3 \
+ A1:P0|A3:P-2 ."
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \
- . . X3 P2 . 0 A1:0-2,A2:1-2,XA2:3,XA3:3,A3:3 \
- A1:P0,A3:P2 3"
+ . . X3 P2 . 0 A1:0-2|A2:1-2|XA2:3|XA3:3|A3:3 \
+ A1:P0|A3:P2 3"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3:P2 \
- . . X3 . . 0 A1:0-3,A2:1-3,XA2:3,XA3:3,A3:2-3 \
- A1:P0,A3:P-2"
+ . . X3 . . 0 A1:0-2|A2:1-2|XA2:3|XA3:3|A3:3|XA3:3 \
+ A1:P0|A3:P2 3"
" C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3:P2 \
- . X4 . . . 0 A1:0-3,A2:1-3,A3:2-3,XA1:4,XA2:,XA3 \
- A1:P0,A3:P-2"
+ . X4 . . . 0 A1:0-3|A2:1-3|A3:2-3|XA1:4|XA2:|XA3 \
+ A1:P0|A3:P-2"
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
@@ -339,68 +356,127 @@
#
# Adding CPUs to partition root that are not in parent's
# cpuset.cpus is allowed, but those extra CPUs are ignored.
- "C2-3:P1:S+ C3:P1 . . . C2-4 . . 0 A1:,A2:2-3 A1:P1,A2:P1"
+ "C2-3:P1:S+ C3:P1 . . . C2-4 . . 0 A1:|A2:2-3 A1:P1|A2:P1"
# Taking away all CPUs from parent or itself if there are tasks
# will make the partition invalid.
- "C2-3:P1:S+ C3:P1 . . T C2-3 . . 0 A1:2-3,A2:2-3 A1:P1,A2:P-1"
- " C3:P1:S+ C3 . . T P1 . . 0 A1:3,A2:3 A1:P1,A2:P-1"
- "$SETUP_A123_PARTITIONS . T:C2-3 . . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P-1,A3:P-1"
- "$SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . 0 A1:1,A2:2,A3:3 A1:P1,A2:P1,A3:P1"
+ "C2-3:P1:S+ C3:P1 . . T C2-3 . . 0 A1:2-3|A2:2-3 A1:P1|A2:P-1"
+ " C3:P1:S+ C3 . . T P1 . . 0 A1:3|A2:3 A1:P1|A2:P-1"
+ "$SETUP_A123_PARTITIONS . T:C2-3 . . . 0 A1:2-3|A2:2-3|A3:3 A1:P1|A2:P-1|A3:P-1"
+ "$SETUP_A123_PARTITIONS . T:C2-3:C1-3 . . . 0 A1:1|A2:2|A3:3 A1:P1|A2:P1|A3:P1"
# Changing a partition root to member makes child partitions invalid
- "C2-3:P1:S+ C3:P1 . . P0 . . . 0 A1:2-3,A2:3 A1:P0,A2:P-1"
- "$SETUP_A123_PARTITIONS . C2-3 P0 . . 0 A1:2-3,A2:2-3,A3:3 A1:P1,A2:P0,A3:P-1"
+ "C2-3:P1:S+ C3:P1 . . P0 . . . 0 A1:2-3|A2:3 A1:P0|A2:P-1"
+ "$SETUP_A123_PARTITIONS . C2-3 P0 . . 0 A1:2-3|A2:2-3|A3:3 A1:P1|A2:P0|A3:P-1"
# cpuset.cpus can contains cpus not in parent's cpuset.cpus as long
# as they overlap.
- "C2-3:P1:S+ . . . . C3-4:P1 . . 0 A1:2,A2:3 A1:P1,A2:P1"
+ "C2-3:P1:S+ . . . . C3-4:P1 . . 0 A1:2|A2:3 A1:P1|A2:P1"
# Deletion of CPUs distributed to child cgroup is allowed.
- "C0-1:P1:S+ C1 . C2-3 C4-5 . . . 0 A1:4-5,A2:4-5"
+ "C0-1:P1:S+ C1 . C2-3 C4-5 . . . 0 A1:4-5|A2:4-5"
# To become a valid partition root, cpuset.cpus must overlap parent's
# cpuset.cpus.
- " C0-1:P1 . . C2-3 S+ C4-5:P1 . . 0 A1:0-1,A2:0-1 A1:P1,A2:P-1"
+ " C0-1:P1 . . C2-3 S+ C4-5:P1 . . 0 A1:0-1|A2:0-1 A1:P1|A2:P-1"
# Enabling partition with child cpusets is allowed
- " C0-1:S+ C1 . C2-3 P1 . . . 0 A1:0-1,A2:1 A1:P1"
+ " C0-1:S+ C1 . C2-3 P1 . . . 0 A1:0-1|A2:1 A1:P1"
- # A partition root with non-partition root parent is invalid, but it
+ # A partition root with non-partition root parent is invalid| but it
# can be made valid if its parent becomes a partition root too.
- " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1,A2:1 A1:P0,A2:P-2"
- " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0,A2:1 A1:P1,A2:P2"
+ " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2"
+ " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
- " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P0"
- " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P-1,B1:P-1"
- " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2,B1:2-3 A1:P0,B1:P-1"
+ " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
+ " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
+ " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
- " C0-3 . . C4-5 X5 . . . 0 A1:0-3,B1:4-5"
+ " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
# Child partition root that try to take all CPUs from parent partition
# with tasks will remain invalid.
- " C1-4:P1:S+ P1 . . . . . . 0 A1:1-4,A2:1-4 A1:P1,A2:P-1"
- " C1-4:P1:S+ P1 . . . C1-4 . . 0 A1,A2:1-4 A1:P1,A2:P1"
- " C1-4:P1:S+ P1 . . T C1-4 . . 0 A1:1-4,A2:1-4 A1:P1,A2:P-1"
+ " C1-4:P1:S+ P1 . . . . . . 0 A1:1-4|A2:1-4 A1:P1|A2:P-1"
+ " C1-4:P1:S+ P1 . . . C1-4 . . 0 A1|A2:1-4 A1:P1|A2:P1"
+ " C1-4:P1:S+ P1 . . T C1-4 . . 0 A1:1-4|A2:1-4 A1:P1|A2:P-1"
# Clearing of cpuset.cpus with a preset cpuset.cpus.exclusive shouldn't
# affect cpuset.cpus.exclusive.effective.
- " C1-4:X3:S+ C1:X3 . . . C . . 0 A2:1-4,XA2:3"
+ " C1-4:X3:S+ C1:X3 . . . C . . 0 A2:1-4|XA2:3"
+
+ # cpuset.cpus can contain CPUs that overlap a sibling cpuset with cpus.exclusive
+ # but creating a local partition out of it is not allowed. Similarly and change
+ # in cpuset.cpus of a local partition that overlaps sibling exclusive CPUs will
+ # invalidate it.
+ " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1 0 A1:1|A2:2-4|B1:5-6|XB1:5-6 \
+ A1:P0|A2:P2:B1:P1 2-4"
+ " CX1-4:S+ CX2-4:P2 . C3-6 . . . P1 0 A1:1|A2:2-4|B1:5-6 \
+ A1:P0|A2:P2:B1:P-1 2-4"
+ " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \
+ A1:P0|A2:P2:B1:P-1 2-4"
# old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
# ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
# Failure cases:
# A task cannot be added to a partition with no cpu
- "C2-3:P1:S+ C3:P1 . . O2=0:T . . . 1 A1:,A2:3 A1:P1,A2:P1"
+ "C2-3:P1:S+ C3:P1 . . O2=0:T . . . 1 A1:|A2:3 A1:P1|A2:P1"
# Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
- " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3,B1:4-5"
+ " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
# cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
- " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3,B1:4-5"
+ " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
+)
+
+#
+# Cpuset controller remote partition test matrix.
+#
+# Cgroup test hierarchy
+#
+# root
+# |
+# rtest (cpuset.cpus.exclusive=1-7)
+# |
+# +------+------+
+# | |
+# p1 p2
+# +--+--+ +--+--+
+# | | | |
+# c11 c12 c21 c22
+#
+# REMOTE_TEST_MATRIX uses the same notational convention as TEST_MATRIX.
+# Only CPUs 1-7 should be used.
+#
+REMOTE_TEST_MATRIX=(
+ # old-p1 old-p2 old-c11 old-c12 old-c21 old-c22
+ # new-p1 new-p2 new-c11 new-c12 new-c21 new-c22 ECPUs Pstate ISOLCPUS
+ # ------ ------ ------- ------- ------- ------- ----- ------ --------
+ " X1-3:S+ X4-6:S+ X1-2 X3 X4-5 X6 \
+ . . P2 P2 P2 P2 c11:1-2|c12:3|c21:4-5|c22:6 \
+ c11:P2|c12:P2|c21:P2|c22:P2 1-6"
+ " CX1-4:S+ . X1-2:P2 C3 . . \
+ . . . C3-4 . . p1:3-4|c11:1-2|c12:3-4 \
+ p1:P0|c11:P2|c12:P0 1-2"
+ " CX1-4:S+ . X1-2:P2 . . . \
+ X2-4 . . . . . p1:1,3-4|c11:2 \
+ p1:P0|c11:P2 2"
+ " CX1-5:S+ . X1-2:P2 X3-5:P1 . . \
+ X2-4 . . . . . p1:1,5|c11:2|c12:3-4 \
+ p1:P0|c11:P2|c12:P1 2"
+ " CX1-4:S+ . X1-2:P2 X3-4:P1 . . \
+ . . X2 . . . p1:1|c11:2|c12:3-4 \
+ p1:P0|c11:P2|c12:P1 2"
+ # p1 as member, will get its effective CPUs from its parent rtest
+ " CX1-4:S+ . X1-2:P2 X3-4:P1 . . \
+ . . X1 CX2-4 . . p1:5-7|c11:1|c12:2-4 \
+ p1:P0|c11:P2|c12:P1 1"
+ " CX1-4:S+ X5-6:P1:S+ . . . . \
+ . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \
+ p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
+ 1-2,4-6|1-2,5-6"
)
#
@@ -453,25 +529,26 @@
PFILE=$CGRP/cpuset.cpus.partition
CFILE=$CGRP/cpuset.cpus
XFILE=$CGRP/cpuset.cpus.exclusive
- S=$(expr substr $CMD 1 1)
- if [[ $S = S ]]
- then
- PREFIX=${CMD#?}
+ case $CMD in
+ S*) PREFIX=${CMD#?}
COMM="echo ${PREFIX}${CTRL} > $SFILE"
eval $COMM $REDIRECT
- elif [[ $S = X ]]
- then
+ ;;
+ X*)
CPUS=${CMD#?}
COMM="echo $CPUS > $XFILE"
eval $COMM $REDIRECT
- elif [[ $S = C ]]
- then
- CPUS=${CMD#?}
+ ;;
+ CX*)
+ CPUS=${CMD#??}
+ COMM="echo $CPUS > $CFILE; echo $CPUS > $XFILE"
+ eval $COMM $REDIRECT
+ ;;
+ C*) CPUS=${CMD#?}
COMM="echo $CPUS > $CFILE"
eval $COMM $REDIRECT
- elif [[ $S = P ]]
- then
- VAL=${CMD#?}
+ ;;
+ P*) VAL=${CMD#?}
case $VAL in
0) VAL=member
;;
@@ -486,15 +563,17 @@
esac
COMM="echo $VAL > $PFILE"
eval $COMM $REDIRECT
- elif [[ $S = O ]]
- then
- VAL=${CMD#?}
+ ;;
+ O*) VAL=${CMD#?}
write_cpu_online $VAL
- elif [[ $S = T ]]
- then
- COMM="echo 0 > $TFILE"
+ ;;
+ T*) COMM="echo 0 > $TFILE"
eval $COMM $REDIRECT
- fi
+ ;;
+ *) echo "Unknown command: $CMD"
+ exit 1
+ ;;
+ esac
RET=$?
[[ $RET -ne 0 ]] && {
[[ -n "$SHOWERR" ]] && {
@@ -532,21 +611,18 @@
}
#
-# Return 1 if the list of effective cpus isn't the same as the initial list.
+# Remove all the test cgroup directories
#
reset_cgroup_states()
{
echo 0 > $CGROUP2/cgroup.procs
online_cpus
- rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1
- pause 0.02
- set_ctrl_state . R-
- pause 0.01
+ rmdir $RESET_LIST > /dev/null 2>&1
}
dump_states()
{
- for DIR in . A1 A1/A2 A1/A2/A3 B1
+ for DIR in $CGROUP_LIST
do
CPUS=$DIR/cpuset.cpus
ECPUS=$DIR/cpuset.cpus.effective
@@ -566,17 +642,33 @@
}
#
+# Set the actual cgroup directory into $CGRP_DIR
+# $1 - cgroup name
+#
+set_cgroup_dir()
+{
+ CGRP_DIR=$1
+ [[ $CGRP_DIR = A2 ]] && CGRP_DIR=A1/A2
+ [[ $CGRP_DIR = A3 ]] && CGRP_DIR=A1/A2/A3
+ [[ $CGRP_DIR = c11 ]] && CGRP_DIR=p1/c11
+ [[ $CGRP_DIR = c12 ]] && CGRP_DIR=p1/c12
+ [[ $CGRP_DIR = c21 ]] && CGRP_DIR=p2/c21
+ [[ $CGRP_DIR = c22 ]] && CGRP_DIR=p2/c22
+}
+
+#
# Check effective cpus
-# $1 - check string, format: <cgroup>:<cpu-list>[,<cgroup>:<cpu-list>]*
+# $1 - check string, format: <cgroup>:<cpu-list>[|<cgroup>:<cpu-list>]*
#
check_effective_cpus()
{
CHK_STR=$1
- for CHK in $(echo $CHK_STR | sed -e "s/,/ /g")
+ for CHK in $(echo $CHK_STR | sed -e "s/|/ /g")
do
set -- $(echo $CHK | sed -e "s/:/ /g")
CGRP=$1
- CPUS=$2
+ EXPECTED_CPUS=$2
+ ACTUAL_CPUS=
if [[ $CGRP = X* ]]
then
CGRP=${CGRP#X}
@@ -584,41 +676,39 @@
else
FILE=cpuset.cpus.effective
fi
- [[ $CGRP = A2 ]] && CGRP=A1/A2
- [[ $CGRP = A3 ]] && CGRP=A1/A2/A3
- [[ -e $CGRP/$FILE ]] || return 1
- [[ $CPUS = $(cat $CGRP/$FILE) ]] || return 1
+ set_cgroup_dir $CGRP
+ [[ -e $CGRP_DIR/$FILE ]] || return 1
+ ACTUAL_CPUS=$(cat $CGRP_DIR/$FILE)
+ [[ $EXPECTED_CPUS = $ACTUAL_CPUS ]] || return 1
done
}
#
# Check cgroup states
-# $1 - check string, format: <cgroup>:<state>[,<cgroup>:<state>]*
+# $1 - check string, format: <cgroup>:<state>[|<cgroup>:<state>]*
#
check_cgroup_states()
{
CHK_STR=$1
- for CHK in $(echo $CHK_STR | sed -e "s/,/ /g")
+ for CHK in $(echo $CHK_STR | sed -e "s/|/ /g")
do
set -- $(echo $CHK | sed -e "s/:/ /g")
CGRP=$1
- CGRP_DIR=$CGRP
- STATE=$2
+ EXPECTED_STATE=$2
FILE=
- EVAL=$(expr substr $STATE 2 2)
- [[ $CGRP = A2 ]] && CGRP_DIR=A1/A2
- [[ $CGRP = A3 ]] && CGRP_DIR=A1/A2/A3
+ EVAL=$(expr substr $EXPECTED_STATE 2 2)
- case $STATE in
+ set_cgroup_dir $CGRP
+ case $EXPECTED_STATE in
P*) FILE=$CGRP_DIR/cpuset.cpus.partition
;;
- *) echo "Unknown state: $STATE!"
+ *) echo "Unknown state: $EXPECTED_STATE!"
exit 1
;;
esac
- VAL=$(cat $FILE)
+ ACTUAL_STATE=$(cat $FILE)
- case "$VAL" in
+ case "$ACTUAL_STATE" in
member) VAL=0
;;
root) VAL=1
@@ -642,7 +732,7 @@
[[ $VAL -eq 1 && $VERBOSE -gt 0 ]] && {
DOMS=$(cat $CGRP_DIR/cpuset.cpus.effective)
[[ -n "$DOMS" ]] &&
- echo " [$CGRP] sched-domain: $DOMS" > $CONSOLE
+ echo " [$CGRP_DIR] sched-domain: $DOMS" > $CONSOLE
}
done
return 0
@@ -665,22 +755,22 @@
#
check_isolcpus()
{
- EXPECT_VAL=$1
- ISOLCPUS=
+ EXPECTED_ISOLCPUS=$1
+ ISCPUS=${CGROUP2}/cpuset.cpus.isolated
+ ISOLCPUS=$(cat $ISCPUS)
LASTISOLCPU=
SCHED_DOMAINS=/sys/kernel/debug/sched/domains
- ISCPUS=${CGROUP2}/cpuset.cpus.isolated
- if [[ $EXPECT_VAL = . ]]
+ if [[ $EXPECTED_ISOLCPUS = . ]]
then
- EXPECT_VAL=
- EXPECT_VAL2=
- elif [[ $(expr $EXPECT_VAL : ".*,.*") > 0 ]]
+ EXPECTED_ISOLCPUS=
+ EXPECTED_SDOMAIN=
+ elif [[ $(expr $EXPECTED_ISOLCPUS : ".*|.*") > 0 ]]
then
- set -- $(echo $EXPECT_VAL | sed -e "s/,/ /g")
- EXPECT_VAL=$1
- EXPECT_VAL2=$2
+ set -- $(echo $EXPECTED_ISOLCPUS | sed -e "s/|/ /g")
+ EXPECTED_ISOLCPUS=$2
+ EXPECTED_SDOMAIN=$1
else
- EXPECT_VAL2=$EXPECT_VAL
+ EXPECTED_SDOMAIN=$EXPECTED_ISOLCPUS
fi
#
@@ -689,20 +779,21 @@
# to make appending those CPUs easier.
#
[[ -n "$BOOT_ISOLCPUS" ]] && {
- EXPECT_VAL=${EXPECT_VAL:+${EXPECT_VAL},}${BOOT_ISOLCPUS}
- EXPECT_VAL2=${EXPECT_VAL2:+${EXPECT_VAL2},}${BOOT_ISOLCPUS}
+ EXPECTED_ISOLCPUS=${EXPECTED_ISOLCPUS:+${EXPECTED_ISOLCPUS},}${BOOT_ISOLCPUS}
+ EXPECTED_SDOMAIN=${EXPECTED_SDOMAIN:+${EXPECTED_SDOMAIN},}${BOOT_ISOLCPUS}
}
#
# Check cpuset.cpus.isolated cpumask
#
- [[ "$EXPECT_VAL2" != "$ISOLCPUS" ]] && {
+ [[ "$EXPECTED_ISOLCPUS" != "$ISOLCPUS" ]] && {
# Take a 50ms pause and try again
pause 0.05
ISOLCPUS=$(cat $ISCPUS)
}
- [[ "$EXPECT_VAL2" != "$ISOLCPUS" ]] && return 1
+ [[ "$EXPECTED_ISOLCPUS" != "$ISOLCPUS" ]] && return 1
ISOLCPUS=
+ EXPECTED_ISOLCPUS=$EXPECTED_SDOMAIN
#
# Use the sched domain in debugfs to check isolated CPUs, if available
@@ -736,7 +827,7 @@
done
[[ "$ISOLCPUS" = *- ]] && ISOLCPUS=${ISOLCPUS}$LASTISOLCPU
- [[ "$EXPECT_VAL" = "$ISOLCPUS" ]]
+ [[ "$EXPECTED_SDOMAIN" = "$ISOLCPUS" ]]
}
test_fail()
@@ -774,6 +865,63 @@
}
#
+# Check state transition test result
+# $1 - Test number
+# $2 - Expected effective CPU values
+# $3 - Expected partition states
+# $4 - Expected isolated CPUs
+#
+check_test_results()
+{
+ _NR=$1
+ _ECPUS="$2"
+ _PSTATES="$3"
+ _ISOLCPUS="$4"
+
+ [[ -n "$_ECPUS" && "$_ECPUS" != . ]] && {
+ check_effective_cpus $_ECPUS
+ [[ $? -ne 0 ]] && test_fail $_NR "effective CPU" \
+ "Cgroup $CGRP: expected $EXPECTED_CPUS, got $ACTUAL_CPUS"
+ }
+
+ [[ -n "$_PSTATES" && "$_PSTATES" != . ]] && {
+ check_cgroup_states $_PSTATES
+ [[ $? -ne 0 ]] && test_fail $_NR states \
+ "Cgroup $CGRP: expected $EXPECTED_STATE, got $ACTUAL_STATE"
+ }
+
+ # Compare the expected isolated CPUs with the actual ones,
+ # if available
+ [[ -n "$_ISOLCPUS" ]] && {
+ check_isolcpus $_ISOLCPUS
+ [[ $? -ne 0 ]] && {
+ [[ -n "$BOOT_ISOLCPUS" ]] && _ISOLCPUS=${_ISOLCPUS},${BOOT_ISOLCPUS}
+ test_fail $_NR "isolated CPU" \
+ "Expect $_ISOLCPUS, get $ISOLCPUS instead"
+ }
+ }
+ reset_cgroup_states
+ #
+ # Check to see if effective cpu list changes
+ #
+ _NEWLIST=$(cat $CGROUP2/cpuset.cpus.effective)
+ RETRY=0
+ while [[ $_NEWLIST != $CPULIST && $RETRY -lt 8 ]]
+ do
+ # Wait a bit longer & recheck a few times
+ pause 0.02
+ ((RETRY++))
+ _NEWLIST=$(cat $CGROUP2/cpuset.cpus.effective)
+ done
+ [[ $_NEWLIST != $CPULIST ]] && {
+ echo "Effective cpus changed to $_NEWLIST after test $_NR!"
+ exit 1
+ }
+ null_isolcpus_check
+ [[ $VERBOSE -gt 0 ]] && echo "Test $I done."
+}
+
+#
# Run cpuset state transition test
# $1 - test matrix name
#
@@ -785,6 +933,8 @@
{
TEST=$1
CONTROLLER=cpuset
+ CGROUP_LIST=". A1 A1/A2 A1/A2/A3 B1"
+ RESET_LIST="A1/A2/A3 A1/A2 A1 B1"
I=0
eval CNT="\${#$TEST[@]}"
@@ -812,10 +962,11 @@
STATES=${11}
ICPUS=${12}
- set_ctrl_state_noerr B1 $OLD_B1
set_ctrl_state_noerr A1 $OLD_A1
set_ctrl_state_noerr A1/A2 $OLD_A2
set_ctrl_state_noerr A1/A2/A3 $OLD_A3
+ set_ctrl_state_noerr B1 $OLD_B1
+
RETVAL=0
set_ctrl_state A1 $NEW_A1; ((RETVAL += $?))
set_ctrl_state A1/A2 $NEW_A2; ((RETVAL += $?))
@@ -824,51 +975,83 @@
[[ $RETVAL -ne $RESULT ]] && test_fail $I result
- [[ -n "$ECPUS" && "$ECPUS" != . ]] && {
- check_effective_cpus $ECPUS
- [[ $? -ne 0 ]] && test_fail $I "effective CPU"
- }
-
- [[ -n "$STATES" && "$STATES" != . ]] && {
- check_cgroup_states $STATES
- [[ $? -ne 0 ]] && test_fail $I states
- }
-
- # Compare the expected isolated CPUs with the actual ones,
- # if available
- [[ -n "$ICPUS" ]] && {
- check_isolcpus $ICPUS
- [[ $? -ne 0 ]] && {
- [[ -n "$BOOT_ISOLCPUS" ]] && ICPUS=${ICPUS},${BOOT_ISOLCPUS}
- test_fail $I "isolated CPU" \
- "Expect $ICPUS, get $ISOLCPUS instead"
- }
- }
- reset_cgroup_states
- #
- # Check to see if effective cpu list changes
- #
- NEWLIST=$(cat cpuset.cpus.effective)
- RETRY=0
- while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]]
- do
- # Wait a bit longer & recheck a few times
- pause 0.02
- ((RETRY++))
- NEWLIST=$(cat cpuset.cpus.effective)
- done
- [[ $NEWLIST != $CPULIST ]] && {
- echo "Effective cpus changed to $NEWLIST after test $I!"
- exit 1
- }
- null_isolcpus_check
- [[ $VERBOSE -gt 0 ]] && echo "Test $I done."
+ check_test_results $I "$ECPUS" "$STATES" "$ICPUS"
((I++))
done
echo "All $I tests of $TEST PASSED."
}
#
+# Run cpuset remote partition state transition test
+# $1 - test matrix name
+#
+run_remote_state_test()
+{
+ TEST=$1
+ CONTROLLER=cpuset
+ [[ -d rtest ]] || mkdir rtest
+ cd rtest
+ echo +cpuset > cgroup.subtree_control
+ echo "1-7" > cpuset.cpus
+ echo "1-7" > cpuset.cpus.exclusive
+ CGROUP_LIST=".. . p1 p2 p1/c11 p1/c12 p2/c21 p2/c22"
+ RESET_LIST="p1/c11 p1/c12 p2/c21 p2/c22 p1 p2"
+ I=0
+ eval CNT="\${#$TEST[@]}"
+
+ reset_cgroup_states
+ console_msg "Running remote partition state transition test ..."
+
+ while [[ $I -lt $CNT ]]
+ do
+ echo "Running test $I ..." > $CONSOLE
+ [[ $VERBOSE -gt 1 ]] && {
+ echo ""
+ eval echo \${$TEST[$I]}
+ }
+ eval set -- "\${$TEST[$I]}"
+ OLD_p1=$1
+ OLD_p2=$2
+ OLD_c11=$3
+ OLD_c12=$4
+ OLD_c21=$5
+ OLD_c22=$6
+ NEW_p1=$7
+ NEW_p2=$8
+ NEW_c11=$9
+ NEW_c12=${10}
+ NEW_c21=${11}
+ NEW_c22=${12}
+ ECPUS=${13}
+ STATES=${14}
+ ICPUS=${15}
+
+ set_ctrl_state_noerr p1 $OLD_p1
+ set_ctrl_state_noerr p2 $OLD_p2
+ set_ctrl_state_noerr p1/c11 $OLD_c11
+ set_ctrl_state_noerr p1/c12 $OLD_c12
+ set_ctrl_state_noerr p2/c21 $OLD_c21
+ set_ctrl_state_noerr p2/c22 $OLD_c22
+
+ RETVAL=0
+ set_ctrl_state p1 $NEW_p1 ; ((RETVAL += $?))
+ set_ctrl_state p2 $NEW_p2 ; ((RETVAL += $?))
+ set_ctrl_state p1/c11 $NEW_c11; ((RETVAL += $?))
+ set_ctrl_state p1/c12 $NEW_c12; ((RETVAL += $?))
+ set_ctrl_state p2/c21 $NEW_c21; ((RETVAL += $?))
+ set_ctrl_state p2/c22 $NEW_c22; ((RETVAL += $?))
+
+ [[ $RETVAL -ne 0 ]] && test_fail $I result
+
+ check_test_results $I "$ECPUS" "$STATES" "$ICPUS"
+ ((I++))
+ done
+ cd ..
+ rmdir rtest
+ echo "All $I tests of $TEST PASSED."
+}
+
+#
# Testing the new "isolated" partition root type
#
test_isolated()
@@ -932,6 +1115,7 @@
echo $$ > $CGROUP2/cgroup.procs
[[ -d A1 ]] && rmdir A1
null_isolcpus_check
+ pause 0.05
}
#
@@ -997,10 +1181,13 @@
else
echo "Inotify test PASSED"
fi
+ echo member > cpuset.cpus.partition
+ echo "" > cpuset.cpus
}
trap cleanup 0 2 3 6
run_state_test TEST_MATRIX
+run_remote_state_test REMOTE_TEST_MATRIX
test_isolated
test_inotify
echo "All tests PASSED."
diff --git a/tools/testing/selftests/drivers/net/hds.py b/tools/testing/selftests/drivers/net/hds.py
index 8b7f6ac..7c90a04 100755
--- a/tools/testing/selftests/drivers/net/hds.py
+++ b/tools/testing/selftests/drivers/net/hds.py
@@ -6,7 +6,7 @@
from lib.py import ksft_run, ksft_exit, ksft_eq, ksft_raises, KsftSkipEx
from lib.py import CmdExitFailure, EthtoolFamily, NlError
from lib.py import NetDrvEnv
-from lib.py import defer, ethtool, ip
+from lib.py import defer, ethtool, ip, random
def _get_hds_mode(cfg, netnl) -> str:
@@ -109,6 +109,36 @@
ksft_eq(0, rings['hds-thresh'])
+def set_hds_thresh_random(cfg, netnl) -> None:
+ try:
+ rings = netnl.rings_get({'header': {'dev-index': cfg.ifindex}})
+ except NlError as e:
+ raise KsftSkipEx('ring-get not supported by device')
+ if 'hds-thresh' not in rings:
+ raise KsftSkipEx('hds-thresh not supported by device')
+ if 'hds-thresh-max' not in rings:
+ raise KsftSkipEx('hds-thresh-max not defined by device')
+
+ if rings['hds-thresh-max'] < 2:
+ raise KsftSkipEx('hds-thresh-max is too small')
+ elif rings['hds-thresh-max'] == 2:
+ hds_thresh = 1
+ else:
+ while True:
+ hds_thresh = random.randint(1, rings['hds-thresh-max'] - 1)
+ if hds_thresh != rings['hds-thresh']:
+ break
+
+ try:
+ netnl.rings_set({'header': {'dev-index': cfg.ifindex}, 'hds-thresh': hds_thresh})
+ except NlError as e:
+ if e.error == errno.EINVAL:
+ raise KsftSkipEx("hds-thresh-set not supported by the device")
+ elif e.error == errno.EOPNOTSUPP:
+ raise KsftSkipEx("ring-set not supported by the device")
+ rings = netnl.rings_get({'header': {'dev-index': cfg.ifindex}})
+ ksft_eq(hds_thresh, rings['hds-thresh'])
+
def set_hds_thresh_max(cfg, netnl) -> None:
try:
rings = netnl.rings_get({'header': {'dev-index': cfg.ifindex}})
@@ -243,6 +273,7 @@
get_hds_thresh,
set_hds_disable,
set_hds_enable,
+ set_hds_thresh_random,
set_hds_thresh_zero,
set_hds_thresh_max,
set_hds_thresh_gt,
diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
index 7d7a6a0..2d8230d 100644
--- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -98,7 +98,7 @@ int main(int argc, char *argv[])
info("Calling futex_waitv on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
res = futex_waitv(&waitv, 1, 0, &to, CLOCK_MONOTONIC);
if (!res || errno != EWOULDBLOCK) {
- ksft_test_result_pass("futex_waitv returned: %d %s\n",
+ ksft_test_result_fail("futex_waitv returned: %d %s\n",
res ? errno : res,
res ? strerror(errno) : "");
ret = RET_FAIL;
diff --git a/tools/testing/selftests/hid/config.common b/tools/testing/selftests/hid/config.common
index 45b5570..b1f4085 100644
--- a/tools/testing/selftests/hid/config.common
+++ b/tools/testing/selftests/hid/config.common
@@ -39,7 +39,6 @@
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPUSETS=y
-CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_CRYPTO_SEQIV=y
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f773f8f..f62b0a5 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -50,8 +50,18 @@
# Non-compiled test targets
TEST_PROGS_x86 += x86/nx_huge_pages_test.sh
+# Compiled test targets valid on all architectures with libkvm support
+TEST_GEN_PROGS_COMMON = demand_paging_test
+TEST_GEN_PROGS_COMMON += dirty_log_test
+TEST_GEN_PROGS_COMMON += guest_print_test
+TEST_GEN_PROGS_COMMON += kvm_binary_stats_test
+TEST_GEN_PROGS_COMMON += kvm_create_max_vcpus
+TEST_GEN_PROGS_COMMON += kvm_page_table_test
+TEST_GEN_PROGS_COMMON += set_memory_region_test
+
# Compiled test targets
-TEST_GEN_PROGS_x86 = x86/cpuid_test
+TEST_GEN_PROGS_x86 = $(TEST_GEN_PROGS_COMMON)
+TEST_GEN_PROGS_x86 += x86/cpuid_test
TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test
TEST_GEN_PROGS_x86 += x86/feature_msrs_test
@@ -119,27 +129,21 @@
TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test
TEST_GEN_PROGS_x86 += access_tracking_perf_test
TEST_GEN_PROGS_x86 += coalesced_io_test
-TEST_GEN_PROGS_x86 += demand_paging_test
-TEST_GEN_PROGS_x86 += dirty_log_test
TEST_GEN_PROGS_x86 += dirty_log_perf_test
TEST_GEN_PROGS_x86 += guest_memfd_test
-TEST_GEN_PROGS_x86 += guest_print_test
TEST_GEN_PROGS_x86 += hardware_disable_test
-TEST_GEN_PROGS_x86 += kvm_create_max_vcpus
-TEST_GEN_PROGS_x86 += kvm_page_table_test
TEST_GEN_PROGS_x86 += memslot_modification_stress_test
TEST_GEN_PROGS_x86 += memslot_perf_test
TEST_GEN_PROGS_x86 += mmu_stress_test
TEST_GEN_PROGS_x86 += rseq_test
-TEST_GEN_PROGS_x86 += set_memory_region_test
TEST_GEN_PROGS_x86 += steal_time
-TEST_GEN_PROGS_x86 += kvm_binary_stats_test
TEST_GEN_PROGS_x86 += system_counter_offset_test
TEST_GEN_PROGS_x86 += pre_fault_memory_test
# Compiled outputs used by test targets
TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
+TEST_GEN_PROGS_arm64 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_arm64 += arm64/aarch32_id_regs
TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
@@ -158,22 +162,16 @@
TEST_GEN_PROGS_arm64 += access_tracking_perf_test
TEST_GEN_PROGS_arm64 += arch_timer
TEST_GEN_PROGS_arm64 += coalesced_io_test
-TEST_GEN_PROGS_arm64 += demand_paging_test
-TEST_GEN_PROGS_arm64 += dirty_log_test
TEST_GEN_PROGS_arm64 += dirty_log_perf_test
-TEST_GEN_PROGS_arm64 += guest_print_test
TEST_GEN_PROGS_arm64 += get-reg-list
-TEST_GEN_PROGS_arm64 += kvm_create_max_vcpus
-TEST_GEN_PROGS_arm64 += kvm_page_table_test
TEST_GEN_PROGS_arm64 += memslot_modification_stress_test
TEST_GEN_PROGS_arm64 += memslot_perf_test
TEST_GEN_PROGS_arm64 += mmu_stress_test
TEST_GEN_PROGS_arm64 += rseq_test
-TEST_GEN_PROGS_arm64 += set_memory_region_test
TEST_GEN_PROGS_arm64 += steal_time
-TEST_GEN_PROGS_arm64 += kvm_binary_stats_test
-TEST_GEN_PROGS_s390 = s390/memop
+TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON)
+TEST_GEN_PROGS_s390 += s390/memop
TEST_GEN_PROGS_s390 += s390/resets
TEST_GEN_PROGS_s390 += s390/sync_regs_test
TEST_GEN_PROGS_s390 += s390/tprot
@@ -182,27 +180,14 @@
TEST_GEN_PROGS_s390 += s390/cpumodel_subfuncs_test
TEST_GEN_PROGS_s390 += s390/shared_zeropage_test
TEST_GEN_PROGS_s390 += s390/ucontrol_test
-TEST_GEN_PROGS_s390 += demand_paging_test
-TEST_GEN_PROGS_s390 += dirty_log_test
-TEST_GEN_PROGS_s390 += guest_print_test
-TEST_GEN_PROGS_s390 += kvm_create_max_vcpus
-TEST_GEN_PROGS_s390 += kvm_page_table_test
TEST_GEN_PROGS_s390 += rseq_test
-TEST_GEN_PROGS_s390 += set_memory_region_test
-TEST_GEN_PROGS_s390 += kvm_binary_stats_test
+TEST_GEN_PROGS_riscv = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test
TEST_GEN_PROGS_riscv += riscv/ebreak_test
TEST_GEN_PROGS_riscv += arch_timer
TEST_GEN_PROGS_riscv += coalesced_io_test
-TEST_GEN_PROGS_riscv += demand_paging_test
-TEST_GEN_PROGS_riscv += dirty_log_test
TEST_GEN_PROGS_riscv += get-reg-list
-TEST_GEN_PROGS_riscv += guest_print_test
-TEST_GEN_PROGS_riscv += kvm_binary_stats_test
-TEST_GEN_PROGS_riscv += kvm_create_max_vcpus
-TEST_GEN_PROGS_riscv += kvm_page_table_test
-TEST_GEN_PROGS_riscv += set_memory_region_test
TEST_GEN_PROGS_riscv += steal_time
SPLIT_TESTS += arch_timer
diff --git a/tools/testing/selftests/kvm/arm64/page_fault_test.c b/tools/testing/selftests/kvm/arm64/page_fault_test.c
index ec33a8f..dc6559d 100644
--- a/tools/testing/selftests/kvm/arm64/page_fault_test.c
+++ b/tools/testing/selftests/kvm/arm64/page_fault_test.c
@@ -199,7 +199,7 @@ static bool guest_set_ha(void)
if (hadbs == 0)
return false;
- tcr = read_sysreg(tcr_el1) | TCR_EL1_HA;
+ tcr = read_sysreg(tcr_el1) | TCR_HA;
write_sysreg(tcr, tcr_el1);
isb();
diff --git a/tools/testing/selftests/kvm/include/arm64/processor.h b/tools/testing/selftests/kvm/include/arm64/processor.h
index 1e8d0d5..b0fc0f9 100644
--- a/tools/testing/selftests/kvm/include/arm64/processor.h
+++ b/tools/testing/selftests/kvm/include/arm64/processor.h
@@ -62,6 +62,67 @@
MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
+/* TCR_EL1 specific flags */
+#define TCR_T0SZ_OFFSET 0
+#define TCR_T0SZ(x) ((UL(64) - (x)) << TCR_T0SZ_OFFSET)
+
+#define TCR_IRGN0_SHIFT 8
+#define TCR_IRGN0_MASK (UL(3) << TCR_IRGN0_SHIFT)
+#define TCR_IRGN0_NC (UL(0) << TCR_IRGN0_SHIFT)
+#define TCR_IRGN0_WBWA (UL(1) << TCR_IRGN0_SHIFT)
+#define TCR_IRGN0_WT (UL(2) << TCR_IRGN0_SHIFT)
+#define TCR_IRGN0_WBnWA (UL(3) << TCR_IRGN0_SHIFT)
+
+#define TCR_ORGN0_SHIFT 10
+#define TCR_ORGN0_MASK (UL(3) << TCR_ORGN0_SHIFT)
+#define TCR_ORGN0_NC (UL(0) << TCR_ORGN0_SHIFT)
+#define TCR_ORGN0_WBWA (UL(1) << TCR_ORGN0_SHIFT)
+#define TCR_ORGN0_WT (UL(2) << TCR_ORGN0_SHIFT)
+#define TCR_ORGN0_WBnWA (UL(3) << TCR_ORGN0_SHIFT)
+
+#define TCR_SH0_SHIFT 12
+#define TCR_SH0_MASK (UL(3) << TCR_SH0_SHIFT)
+#define TCR_SH0_INNER (UL(3) << TCR_SH0_SHIFT)
+
+#define TCR_TG0_SHIFT 14
+#define TCR_TG0_MASK (UL(3) << TCR_TG0_SHIFT)
+#define TCR_TG0_4K (UL(0) << TCR_TG0_SHIFT)
+#define TCR_TG0_64K (UL(1) << TCR_TG0_SHIFT)
+#define TCR_TG0_16K (UL(2) << TCR_TG0_SHIFT)
+
+#define TCR_IPS_SHIFT 32
+#define TCR_IPS_MASK (UL(7) << TCR_IPS_SHIFT)
+#define TCR_IPS_52_BITS (UL(6) << TCR_IPS_SHIFT)
+#define TCR_IPS_48_BITS (UL(5) << TCR_IPS_SHIFT)
+#define TCR_IPS_40_BITS (UL(2) << TCR_IPS_SHIFT)
+#define TCR_IPS_36_BITS (UL(1) << TCR_IPS_SHIFT)
+
+#define TCR_HA (UL(1) << 39)
+#define TCR_DS (UL(1) << 59)
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define PTE_ATTRINDX(t) ((t) << 2)
+#define PTE_ATTRINDX_MASK GENMASK(4, 2)
+#define PTE_ATTRINDX_SHIFT 2
+
+#define PTE_VALID BIT(0)
+#define PGD_TYPE_TABLE BIT(1)
+#define PUD_TYPE_TABLE BIT(1)
+#define PMD_TYPE_TABLE BIT(1)
+#define PTE_TYPE_PAGE BIT(1)
+
+#define PTE_SHARED (UL(3) << 8) /* SH[1:0], inner shareable */
+#define PTE_AF BIT(10)
+
+#define PTE_ADDR_MASK(page_shift) GENMASK(47, (page_shift))
+#define PTE_ADDR_51_48 GENMASK(15, 12)
+#define PTE_ADDR_51_48_SHIFT 12
+#define PTE_ADDR_MASK_LPA2(page_shift) GENMASK(49, (page_shift))
+#define PTE_ADDR_51_50_LPA2 GENMASK(9, 8)
+#define PTE_ADDR_51_50_LPA2_SHIFT 8
+
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
struct kvm_vcpu *aarch64_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_vcpu_init *init, void *guest_code);
@@ -102,12 +163,6 @@ enum {
(v) == VECTOR_SYNC_LOWER_64 || \
(v) == VECTOR_SYNC_LOWER_32)
-/* Access flag */
-#define PTE_AF (1ULL << 10)
-
-/* Access flag update enable/disable */
-#define TCR_EL1_HA (1ULL << 39)
-
void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
uint32_t *ipa16k, uint32_t *ipa64k);
diff --git a/tools/testing/selftests/kvm/lib/arm64/processor.c b/tools/testing/selftests/kvm/lib/arm64/processor.c
index 7ba3aa3..9d69904 100644
--- a/tools/testing/selftests/kvm/lib/arm64/processor.c
+++ b/tools/testing/selftests/kvm/lib/arm64/processor.c
@@ -72,13 +72,13 @@ static uint64_t addr_pte(struct kvm_vm *vm, uint64_t pa, uint64_t attrs)
uint64_t pte;
if (use_lpa2_pte_format(vm)) {
- pte = pa & GENMASK(49, vm->page_shift);
- pte |= FIELD_GET(GENMASK(51, 50), pa) << 8;
- attrs &= ~GENMASK(9, 8);
+ pte = pa & PTE_ADDR_MASK_LPA2(vm->page_shift);
+ pte |= FIELD_GET(GENMASK(51, 50), pa) << PTE_ADDR_51_50_LPA2_SHIFT;
+ attrs &= ~PTE_ADDR_51_50_LPA2;
} else {
- pte = pa & GENMASK(47, vm->page_shift);
+ pte = pa & PTE_ADDR_MASK(vm->page_shift);
if (vm->page_shift == 16)
- pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
+ pte |= FIELD_GET(GENMASK(51, 48), pa) << PTE_ADDR_51_48_SHIFT;
}
pte |= attrs;
@@ -90,12 +90,12 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
uint64_t pa;
if (use_lpa2_pte_format(vm)) {
- pa = pte & GENMASK(49, vm->page_shift);
- pa |= FIELD_GET(GENMASK(9, 8), pte) << 50;
+ pa = pte & PTE_ADDR_MASK_LPA2(vm->page_shift);
+ pa |= FIELD_GET(PTE_ADDR_51_50_LPA2, pte) << 50;
} else {
- pa = pte & GENMASK(47, vm->page_shift);
+ pa = pte & PTE_ADDR_MASK(vm->page_shift);
if (vm->page_shift == 16)
- pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
+ pa |= FIELD_GET(PTE_ADDR_51_48, pte) << 48;
}
return pa;
@@ -128,7 +128,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t flags)
{
- uint8_t attr_idx = flags & 7;
+ uint8_t attr_idx = flags & (PTE_ATTRINDX_MASK >> PTE_ATTRINDX_SHIFT);
+ uint64_t pg_attr;
uint64_t *ptep;
TEST_ASSERT((vaddr % vm->page_size) == 0,
@@ -147,18 +148,21 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, vaddr) * 8;
if (!*ptep)
- *ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
+ *ptep = addr_pte(vm, vm_alloc_page_table(vm),
+ PGD_TYPE_TABLE | PTE_VALID);
switch (vm->pgtable_levels) {
case 4:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, vaddr) * 8;
if (!*ptep)
- *ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
+ *ptep = addr_pte(vm, vm_alloc_page_table(vm),
+ PUD_TYPE_TABLE | PTE_VALID);
/* fall through */
case 3:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pmd_index(vm, vaddr) * 8;
if (!*ptep)
- *ptep = addr_pte(vm, vm_alloc_page_table(vm), 3);
+ *ptep = addr_pte(vm, vm_alloc_page_table(vm),
+ PMD_TYPE_TABLE | PTE_VALID);
/* fall through */
case 2:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pte_index(vm, vaddr) * 8;
@@ -167,7 +171,11 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
TEST_FAIL("Page table levels must be 2, 3, or 4");
}
- *ptep = addr_pte(vm, paddr, (attr_idx << 2) | (1 << 10) | 3); /* AF */
+ pg_attr = PTE_AF | PTE_ATTRINDX(attr_idx) | PTE_TYPE_PAGE | PTE_VALID;
+ if (!use_lpa2_pte_format(vm))
+ pg_attr |= PTE_SHARED;
+
+ *ptep = addr_pte(vm, paddr, pg_attr);
}
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@@ -293,20 +301,20 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
case VM_MODE_P48V48_64K:
case VM_MODE_P40V48_64K:
case VM_MODE_P36V48_64K:
- tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
+ tcr_el1 |= TCR_TG0_64K;
break;
case VM_MODE_P52V48_16K:
case VM_MODE_P48V48_16K:
case VM_MODE_P40V48_16K:
case VM_MODE_P36V48_16K:
case VM_MODE_P36V47_16K:
- tcr_el1 |= 2ul << 14; /* TG0 = 16KB */
+ tcr_el1 |= TCR_TG0_16K;
break;
case VM_MODE_P52V48_4K:
case VM_MODE_P48V48_4K:
case VM_MODE_P40V48_4K:
case VM_MODE_P36V48_4K:
- tcr_el1 |= 0ul << 14; /* TG0 = 4KB */
+ tcr_el1 |= TCR_TG0_4K;
break;
default:
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
@@ -319,35 +327,35 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
case VM_MODE_P52V48_4K:
case VM_MODE_P52V48_16K:
case VM_MODE_P52V48_64K:
- tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
+ tcr_el1 |= TCR_IPS_52_BITS;
ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
break;
case VM_MODE_P48V48_4K:
case VM_MODE_P48V48_16K:
case VM_MODE_P48V48_64K:
- tcr_el1 |= 5ul << 32; /* IPS = 48 bits */
+ tcr_el1 |= TCR_IPS_48_BITS;
break;
case VM_MODE_P40V48_4K:
case VM_MODE_P40V48_16K:
case VM_MODE_P40V48_64K:
- tcr_el1 |= 2ul << 32; /* IPS = 40 bits */
+ tcr_el1 |= TCR_IPS_40_BITS;
break;
case VM_MODE_P36V48_4K:
case VM_MODE_P36V48_16K:
case VM_MODE_P36V48_64K:
case VM_MODE_P36V47_16K:
- tcr_el1 |= 1ul << 32; /* IPS = 36 bits */
+ tcr_el1 |= TCR_IPS_36_BITS;
break;
default:
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
}
- sctlr_el1 |= (1 << 0) | (1 << 2) | (1 << 12) /* M | C | I */;
- /* TCR_EL1 |= IRGN0:WBWA | ORGN0:WBWA | SH0:Inner-Shareable */;
- tcr_el1 |= (1 << 8) | (1 << 10) | (3 << 12);
- tcr_el1 |= (64 - vm->va_bits) /* T0SZ */;
+ sctlr_el1 |= SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_I;
+
+ tcr_el1 |= TCR_IRGN0_WBWA | TCR_ORGN0_WBWA | TCR_SH0_INNER;
+ tcr_el1 |= TCR_T0SZ(vm->va_bits);
if (use_lpa2_pte_format(vm))
- tcr_el1 |= (1ul << 59) /* DS */;
+ tcr_el1 |= TCR_DS;
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), sctlr_el1);
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_TCR_EL1), tcr_el1);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 279ad89..815bc45 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2019,9 +2019,8 @@ static struct exit_reason {
KVM_EXIT_STRING(RISCV_SBI),
KVM_EXIT_STRING(RISCV_CSR),
KVM_EXIT_STRING(NOTIFY),
-#ifdef KVM_EXIT_MEMORY_NOT_PRESENT
- KVM_EXIT_STRING(MEMORY_NOT_PRESENT),
-#endif
+ KVM_EXIT_STRING(LOONGARCH_IOCSR),
+ KVM_EXIT_STRING(MEMORY_FAULT),
};
/*
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index e589867..1375fca 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -196,25 +196,27 @@ static void calc_min_max_cpu(void)
static void help(const char *name)
{
puts("");
- printf("usage: %s [-h] [-u]\n", name);
+ printf("usage: %s [-h] [-u] [-l latency]\n", name);
printf(" -u: Don't sanity check the number of successful KVM_RUNs\n");
+ printf(" -l: Set /dev/cpu_dma_latency to suppress deep sleep states\n");
puts("");
exit(0);
}
int main(int argc, char *argv[])
{
+ int r, i, snapshot, opt, fd = -1, latency = -1;
bool skip_sanity_check = false;
- int r, i, snapshot;
struct kvm_vm *vm;
struct kvm_vcpu *vcpu;
u32 cpu, rseq_cpu;
- int opt;
- while ((opt = getopt(argc, argv, "hu")) != -1) {
+ while ((opt = getopt(argc, argv, "hl:u")) != -1) {
switch (opt) {
case 'u':
skip_sanity_check = true;
+ case 'l':
+ latency = atoi_paranoid(optarg);
break;
case 'h':
default:
@@ -243,6 +245,20 @@ int main(int argc, char *argv[])
pthread_create(&migration_thread, NULL, migration_worker,
(void *)(unsigned long)syscall(SYS_gettid));
+ if (latency >= 0) {
+ /*
+ * Writes to cpu_dma_latency persist only while the file is
+ * open, i.e. it allows userspace to provide guaranteed latency
+ * while running a workload. Keep the file open until the test
+ * completes, otherwise writing cpu_dma_latency is meaningless.
+ */
+ fd = open("/dev/cpu_dma_latency", O_RDWR);
+ TEST_ASSERT(fd >= 0, __KVM_SYSCALL_ERROR("open() /dev/cpu_dma_latency", fd));
+
+ r = write(fd, &latency, 4);
+ TEST_ASSERT(r >= 1, "Error setting /dev/cpu_dma_latency");
+ }
+
for (i = 0; !done; i++) {
vcpu_run(vcpu);
TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
@@ -278,6 +294,9 @@ int main(int argc, char *argv[])
"rseq CPU = %d, sched CPU = %d", rseq_cpu, cpu);
}
+ if (fd > 0)
+ close(fd);
+
/*
* Sanity check that the test was able to enter the guest a reasonable
* number of times, e.g. didn't get stalled too often/long waiting for
@@ -293,8 +312,8 @@ int main(int argc, char *argv[])
TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
"Only performed %d KVM_RUNs, task stalled too much?\n\n"
" Try disabling deep sleep states to reduce CPU wakeup latency,\n"
- " e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0',\n"
- " or run with -u to disable this sanity check.", i);
+ " e.g. via cpuidle.off=1 or via -l <latency>, or run with -u to\n"
+ " disable this sanity check.", i);
pthread_join(migration_thread, NULL);
diff --git a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
index 2b550ef..390ae2d 100644
--- a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
+++ b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
@@ -7,6 +7,7 @@
#include "kvm_util.h"
#include "processor.h"
+#include "kselftest.h"
#define CPUID_MWAIT (1u << 3)
@@ -14,6 +15,8 @@ enum monitor_mwait_testcases {
MWAIT_QUIRK_DISABLED = BIT(0),
MISC_ENABLES_QUIRK_DISABLED = BIT(1),
MWAIT_DISABLED = BIT(2),
+ CPUID_DISABLED = BIT(3),
+ TEST_MAX = CPUID_DISABLED * 2 - 1,
};
/*
@@ -35,11 +38,19 @@ do { \
testcase, vector); \
} while (0)
-static void guest_monitor_wait(int testcase)
+static void guest_monitor_wait(void *arg)
{
+ int testcase = (int) (long) arg;
u8 vector;
- GUEST_SYNC(testcase);
+ u64 val = rdmsr(MSR_IA32_MISC_ENABLE) & ~MSR_IA32_MISC_ENABLE_MWAIT;
+ if (!(testcase & MWAIT_DISABLED))
+ val |= MSR_IA32_MISC_ENABLE_MWAIT;
+ wrmsr(MSR_IA32_MISC_ENABLE, val);
+
+ __GUEST_ASSERT(this_cpu_has(X86_FEATURE_MWAIT) == !(testcase & MWAIT_DISABLED),
+ "Expected CPUID.MWAIT %s\n",
+ (testcase & MWAIT_DISABLED) ? "cleared" : "set");
/*
* Arbitrarily MONITOR this function, SVM performs fault checks before
@@ -50,19 +61,6 @@ static void guest_monitor_wait(int testcase)
vector = kvm_asm_safe("mwait", "a"(guest_monitor_wait), "c"(0), "d"(0));
GUEST_ASSERT_MONITOR_MWAIT("MWAIT", testcase, vector);
-}
-
-static void guest_code(void)
-{
- guest_monitor_wait(MWAIT_DISABLED);
-
- guest_monitor_wait(MWAIT_QUIRK_DISABLED | MWAIT_DISABLED);
-
- guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_DISABLED);
- guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED);
-
- guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_QUIRK_DISABLED | MWAIT_DISABLED);
- guest_monitor_wait(MISC_ENABLES_QUIRK_DISABLED | MWAIT_QUIRK_DISABLED);
GUEST_DONE();
}
@@ -74,56 +72,64 @@ int main(int argc, char *argv[])
struct kvm_vm *vm;
struct ucall uc;
int testcase;
+ char test[80];
- TEST_REQUIRE(this_cpu_has(X86_FEATURE_MWAIT));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_DISABLE_QUIRKS2));
- vm = vm_create_with_one_vcpu(&vcpu, guest_code);
- vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
+ ksft_print_header();
+ ksft_set_plan(12);
+ for (testcase = 0; testcase <= TEST_MAX; testcase++) {
+ vm = vm_create_with_one_vcpu(&vcpu, guest_monitor_wait);
+ vcpu_args_set(vcpu, 1, (void *)(long)testcase);
- while (1) {
+ disabled_quirks = 0;
+ if (testcase & MWAIT_QUIRK_DISABLED) {
+ disabled_quirks |= KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS;
+ strcpy(test, "MWAIT can fault");
+ } else {
+ strcpy(test, "MWAIT never faults");
+ }
+ if (testcase & MISC_ENABLES_QUIRK_DISABLED) {
+ disabled_quirks |= KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT;
+ strcat(test, ", MISC_ENABLE updates CPUID");
+ } else {
+ strcat(test, ", no CPUID updates");
+ }
+
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, disabled_quirks);
+
+ if (!(testcase & MISC_ENABLES_QUIRK_DISABLED) &&
+ (!!(testcase & CPUID_DISABLED) ^ !!(testcase & MWAIT_DISABLED)))
+ continue;
+
+ if (testcase & CPUID_DISABLED) {
+ strcat(test, ", CPUID clear");
+ vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
+ } else {
+ strcat(test, ", CPUID set");
+ vcpu_set_cpuid_feature(vcpu, X86_FEATURE_MWAIT);
+ }
+
+ if (testcase & MWAIT_DISABLED)
+ strcat(test, ", MWAIT disabled");
+
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
switch (get_ucall(vcpu, &uc)) {
- case UCALL_SYNC:
- testcase = uc.args[1];
- break;
case UCALL_ABORT:
- REPORT_GUEST_ASSERT(uc);
- goto done;
+ /* Detected in vcpu_run */
+ break;
case UCALL_DONE:
- goto done;
+ ksft_test_result_pass("%s\n", test);
+ break;
default:
TEST_FAIL("Unknown ucall %lu", uc.cmd);
- goto done;
+ break;
}
-
- disabled_quirks = 0;
- if (testcase & MWAIT_QUIRK_DISABLED)
- disabled_quirks |= KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS;
- if (testcase & MISC_ENABLES_QUIRK_DISABLED)
- disabled_quirks |= KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT;
- vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, disabled_quirks);
-
- /*
- * If the MISC_ENABLES quirk (KVM neglects to update CPUID to
- * enable/disable MWAIT) is disabled, toggle the ENABLE_MWAIT
- * bit in MISC_ENABLES accordingly. If the quirk is enabled,
- * the only valid configuration is MWAIT disabled, as CPUID
- * can't be manually changed after running the vCPU.
- */
- if (!(testcase & MISC_ENABLES_QUIRK_DISABLED)) {
- TEST_ASSERT(testcase & MWAIT_DISABLED,
- "Can't toggle CPUID features after running vCPU");
- continue;
- }
-
- vcpu_set_msr(vcpu, MSR_IA32_MISC_ENABLE,
- (testcase & MWAIT_DISABLED) ? 0 : MSR_IA32_MISC_ENABLE_MWAIT);
+ kvm_vm_free(vm);
}
+ ksft_finished();
-done:
- kvm_vm_free(vm);
return 0;
}
diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c
index e949a43..efabfcb 100644
--- a/tools/testing/selftests/mincore/mincore_selftest.c
+++ b/tools/testing/selftests/mincore/mincore_selftest.c
@@ -261,9 +261,6 @@ TEST(check_file_mmap)
TH_LOG("No read-ahead pages found in memory");
}
- EXPECT_LT(i, vec_size) {
- TH_LOG("Read-ahead pages reached the end of the file");
- }
/*
* End of the readahead window. The rest of the pages shouldn't
* be in memory.
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 13a3b68..befa66f 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -1441,6 +1441,15 @@
fi
fi
+ count=$(mptcp_lib_get_counter ${ns2} "MPTcpExtMPJoinSynAckHMacFailure")
+ if [ -z "$count" ]; then
+ rc=${KSFT_SKIP}
+ elif [ "$count" != "0" ]; then
+ rc=${KSFT_FAIL}
+ print_check "synack HMAC"
+ fail_test "got $count JOIN[s] synack HMAC failure expected 0"
+ fi
+
count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtMPJoinAckRx")
if [ -z "$count" ]; then
rc=${KSFT_SKIP}
@@ -1450,6 +1459,15 @@
fail_test "got $count JOIN[s] ack rx expected $ack_nr"
fi
+ count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtMPJoinAckHMacFailure")
+ if [ -z "$count" ]; then
+ rc=${KSFT_SKIP}
+ elif [ "$count" != "0" ]; then
+ rc=${KSFT_FAIL}
+ print_check "ack HMAC"
+ fail_test "got $count JOIN[s] ack HMAC failure expected 0"
+ fi
+
print_results "join Rx" ${rc}
join_syn_tx="${join_syn_tx:-${syn_nr}}" \
diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
index 47088b0..1f5979c 100755
--- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
@@ -27,7 +27,7 @@
net6_port_net6_port net_port_mac_proto_net"
# Reported bugs, also described by TYPE_ variables below
-BUGS="flush_remove_add reload net_port_proto_match"
+BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch"
# List of possible paths to pktgen script from kernel tree for performance tests
PKTGEN_SCRIPT_PATHS="
@@ -387,6 +387,25 @@
perf_duration 0
"
+
+TYPE_avx2_mismatch="
+display avx2 false match
+type_spec inet_proto . ipv6_addr
+chain_spec meta l4proto . ip6 daddr
+dst proto addr6
+src
+start 1
+count 1
+src_delta 1
+tools ping
+proto icmp6
+
+race_repeat 0
+
+perf_duration 0
+"
+
+
# Set template for all tests, types and rules are filled in depending on test
set_template='
flush ruleset
@@ -1629,6 +1648,24 @@
nft flush ruleset
}
+test_bug_avx2_mismatch()
+{
+ setup veth send_"${proto}" set || return ${ksft_skip}
+
+ local a1="fe80:dead:01ff:0a02:0b03:6007:8009:a001"
+ local a2="fe80:dead:01fe:0a02:0b03:6007:8009:a001"
+
+ nft "add element inet filter test { icmpv6 . $a1 }"
+
+ dst_addr6="$a2"
+ send_icmp6
+
+ if [ "$(count_packets)" -gt "0" ]; then
+ err "False match for $a2"
+ return 1
+ fi
+}
+
test_reported_issues() {
eval test_bug_"${subtest}"
}
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 9a85f93..5ded3b3a 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -1753,6 +1753,42 @@ TEST_F(tls_basic, rekey_tx)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST_F(tls_basic, disconnect)
+{
+ char const *test_str = "test_message";
+ int send_len = strlen(test_str) + 1;
+ struct tls_crypto_info_keys key;
+ struct sockaddr_in addr;
+ char buf[20];
+ int ret;
+
+ if (self->notls)
+ return;
+
+ tls_crypto_info_init(TLS_1_3_VERSION, TLS_CIPHER_AES_GCM_128,
+ &key, 0);
+
+ ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &key, key.len);
+ ASSERT_EQ(ret, 0);
+
+ /* Pre-queue the data so that setsockopt parses it but doesn't
+ * dequeue it from the TCP socket. recvmsg would dequeue.
+ */
+ EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
+
+ ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &key, key.len);
+ ASSERT_EQ(ret, 0);
+
+ addr.sin_family = AF_UNSPEC;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = 0;
+ ret = connect(self->cfd, &addr, sizeof(addr));
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno, EOPNOTSUPP);
+
+ EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len);
+}
+
TEST_F(tls, rekey)
{
char const *test_str_1 = "test_message_before_rekey";
diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
index 25454fd..d4ea9cd 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
@@ -158,5 +158,160 @@
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP addr del 10.10.10.10/24 dev $DUMMY || true"
]
+ },
+ {
+ "id": "a4bb",
+ "name": "Test FQ_CODEL with HTB parent - force packet drop with empty queue",
+ "category": [
+ "qdisc",
+ "fq_codel",
+ "htb"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY handle 1: root htb default 10",
+ "$TC class add dev $DUMMY parent 1: classid 1:10 htb rate 1kbit",
+ "$TC qdisc add dev $DUMMY parent 1:10 handle 10: fq_codel memory_limit 1 flows 1 target 0.1ms interval 1ms",
+ "$TC filter add dev $DUMMY parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:10",
+ "ping -c 5 -f -I $DUMMY 10.10.10.1 > /dev/null || true",
+ "sleep 0.1"
+ ],
+ "cmdUnderTest": "$TC -s qdisc show dev $DUMMY",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc fq_codel'",
+ "matchPattern": "dropped [1-9][0-9]*",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root",
+ "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
+ ]
+ },
+ {
+ "id": "a4be",
+ "name": "Test FQ_CODEL with QFQ parent - force packet drop with empty queue",
+ "category": [
+ "qdisc",
+ "fq_codel",
+ "qfq"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY handle 1: root qfq",
+ "$TC class add dev $DUMMY parent 1: classid 1:10 qfq weight 1 maxpkt 1000",
+ "$TC qdisc add dev $DUMMY parent 1:10 handle 10: fq_codel memory_limit 1 flows 1 target 0.1ms interval 1ms",
+ "$TC filter add dev $DUMMY parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:10",
+ "ping -c 10 -s 1000 -f -I $DUMMY 10.10.10.1 > /dev/null || true",
+ "sleep 0.1"
+ ],
+ "cmdUnderTest": "$TC -s qdisc show dev $DUMMY",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc fq_codel'",
+ "matchPattern": "dropped [1-9][0-9]*",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root",
+ "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
+ ]
+ },
+ {
+ "id": "a4bf",
+ "name": "Test FQ_CODEL with HFSC parent - force packet drop with empty queue",
+ "category": [
+ "qdisc",
+ "fq_codel",
+ "hfsc"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY handle 1: root hfsc default 10",
+ "$TC class add dev $DUMMY parent 1: classid 1:10 hfsc sc rate 1kbit ul rate 1kbit",
+ "$TC qdisc add dev $DUMMY parent 1:10 handle 10: fq_codel memory_limit 1 flows 1 target 0.1ms interval 1ms",
+ "$TC filter add dev $DUMMY parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:10",
+ "ping -c 5 -f -I $DUMMY 10.10.10.1 > /dev/null || true",
+ "sleep 0.1"
+ ],
+ "cmdUnderTest": "$TC -s qdisc show dev $DUMMY",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc fq_codel'",
+ "matchPattern": "dropped [1-9][0-9]*",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root",
+ "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
+ ]
+ },
+ {
+ "id": "a4c0",
+ "name": "Test FQ_CODEL with DRR parent - force packet drop with empty queue",
+ "category": [
+ "qdisc",
+ "fq_codel",
+ "drr"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY handle 1: root drr",
+ "$TC class add dev $DUMMY parent 1: classid 1:10 drr quantum 1500",
+ "$TC qdisc add dev $DUMMY parent 1:10 handle 10: fq_codel memory_limit 1 flows 1 target 0.1ms interval 1ms",
+ "$TC filter add dev $DUMMY parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:10",
+ "ping -c 5 -f -I $DUMMY 10.10.10.1 > /dev/null || true",
+ "sleep 0.1"
+ ],
+ "cmdUnderTest": "$TC -s qdisc show dev $DUMMY",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc fq_codel'",
+ "matchPattern": "dropped [1-9][0-9]*",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root",
+ "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
+ ]
+ },
+ {
+ "id": "a4c1",
+ "name": "Test FQ_CODEL with ETS parent - force packet drop with empty queue",
+ "category": [
+ "qdisc",
+ "fq_codel",
+ "ets"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.10.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY handle 1: root ets bands 2 strict 1",
+ "$TC class change dev $DUMMY parent 1: classid 1:1 ets",
+ "$TC qdisc add dev $DUMMY parent 1:1 handle 10: fq_codel memory_limit 1 flows 1 target 0.1ms interval 1ms",
+ "$TC filter add dev $DUMMY parent 1: protocol ip prio 1 u32 match ip protocol 1 0xff flowid 1:1",
+ "ping -c 5 -f -I $DUMMY 10.10.10.1 > /dev/null || true",
+ "sleep 0.1"
+ ],
+ "cmdUnderTest": "$TC -s qdisc show dev $DUMMY",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY | grep -A 5 'qdisc fq_codel'",
+ "matchPattern": "dropped [1-9][0-9]*",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root",
+ "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
+ ]
}
]
diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
index 50e8d72..28c6ce6 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
@@ -228,5 +228,41 @@
"matchCount": "0",
"teardown": [
]
+ },
+ {
+ "id": "7f8f",
+ "name": "Check that a derived limit of 1 is rejected (limit 2 depth 1 flows 1)",
+ "category": [
+ "qdisc",
+ "sfq"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [],
+ "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfq limit 2 depth 1 flows 1",
+ "expExitCode": "2",
+ "verifyCmd": "$TC qdisc show dev $DUMMY",
+ "matchPattern": "sfq",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "5168",
+ "name": "Check that a derived limit of 1 is rejected (limit 2 depth 1 divisor 1)",
+ "category": [
+ "qdisc",
+ "sfq"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [],
+ "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfq limit 2 depth 1 divisor 1",
+ "expExitCode": "2",
+ "verifyCmd": "$TC qdisc show dev $DUMMY",
+ "matchPattern": "sfq",
+ "matchCount": "0",
+ "teardown": []
}
]
diff --git a/tools/testing/selftests/tpm2/.gitignore b/tools/testing/selftests/tpm2/.gitignore
new file mode 100644
index 0000000..6d6165c5
--- /dev/null
+++ b/tools/testing/selftests/tpm2/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+AsyncTest.log
+SpaceTest.log
diff --git a/tools/testing/selftests/tpm2/test_smoke.sh b/tools/testing/selftests/tpm2/test_smoke.sh
index 168f4b1..3a60e6c 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -6,6 +6,6 @@
[ -e /dev/tpm0 ] || exit $ksft_skip
read tpm_version < /sys/class/tpm/tpm0/tpm_version_major
-[ "$tpm_version" == 2 ] || exit $ksft_skip
+[ "$tpm_version" = 2 ] || exit $ksft_skip
python3 -m unittest -v tpm2_tests.SmokeTest 2>&1
diff --git a/tools/testing/selftests/ublk/test_stripe_04.sh b/tools/testing/selftests/ublk/test_stripe_04.sh
new file mode 100755
index 0000000..1f2b6423
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_stripe_04.sh
@@ -0,0 +1,24 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="stripe_04"
+ERR_CODE=0
+
+_prep_test "stripe" "mkfs & mount & umount on zero copy"
+
+backfile_0=$(_create_backfile 256M)
+backfile_1=$(_create_backfile 256M)
+dev_id=$(_add_ublk_dev -t stripe -z -q 2 "$backfile_0" "$backfile_1")
+_check_add_dev $TID $? "$backfile_0" "$backfile_1"
+
+_mkfs_mount_test /dev/ublkb"${dev_id}"
+ERR_CODE=$?
+
+_cleanup_test "stripe"
+
+_remove_backfile "$backfile_0"
+_remove_backfile "$backfile_1"
+
+_show_result $TID $ERR_CODE
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 746e1f4..727b542 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -75,7 +75,7 @@
depends on KVM && COMPAT && !(S390 || ARM64 || RISCV)
config HAVE_KVM_IRQ_BYPASS
- bool
+ tristate
select IRQ_BYPASS_MANAGER
config HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 249ba5b..11e5d1e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -149,7 +149,7 @@ irqfd_shutdown(struct work_struct *work)
/*
* It is now safe to release the object's resources
*/
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
irq_bypass_unregister_consumer(&irqfd->consumer);
#endif
eventfd_ctx_put(irqfd->eventfd);
@@ -274,7 +274,7 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
write_seqcount_end(&irqfd->irq_entry_sc);
}
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
void __attribute__((weak)) kvm_arch_irq_bypass_stop(
struct irq_bypass_consumer *cons)
{
@@ -424,7 +424,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
if (events & EPOLLIN)
schedule_work(&irqfd->inject);
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
if (kvm_arch_has_irq_bypass()) {
irqfd->consumer.token = (void *)irqfd->eventfd;
irqfd->consumer.add_producer = kvm_arch_irq_bypass_add_producer;
@@ -609,14 +609,14 @@ void kvm_irq_routing_update(struct kvm *kvm)
spin_lock_irq(&kvm->irqfds.lock);
list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
/* Under irqfds.lock, so can read irq_entry safely */
struct kvm_kernel_irq_routing_entry old = irqfd->irq_entry;
#endif
irqfd_update(kvm, irqfd);
-#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+#if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)
if (irqfd->producer &&
kvm_arch_irqfd_route_changed(&old, &irqfd->irq_entry)) {
int ret = kvm_arch_update_irqfd_routing(