summaryrefslogtreecommitdiff
path: root/drivers/cxl/core/region.c
AgeCommit message (Collapse)AuthorFilesLines
2024-07-28Merge tag 'cxl-for-6.11' of ↵Linus Torvalds1-33/+70
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: "Core: - A CXL maturity map has been added to the documentation to detail the current state of CXL enabling. It provides the status of the current state of various CXL features to inform current and future contributors of where things are and which areas need contribution. - A notifier handler has been added in order for a newly created CXL memory region to trigger the abstract distance metrics calculation. This should bring parity for CXL memory to the same level vs hotplugged DRAM for NUMA abstract distance calculation. The abstract distance reflects relative performance used for memory tiering handling. - An addition for XOR math has been added to address the CXL DPA to SPA translation. CXL address translation did not support address interleave math with XOR prior to this change. Fixes: - Fix to address race condition in the CXL memory hotplug notifier - Add missing MODULE_DESCRIPTION() for CXL modules - Fix incorrect vendor debug UUID define Misc: - A warning has been added to inform users of an unsupported configuration when mixing CXL VH and RCH/RCD hierarchies - The ENXIO error code has been replaced with EBUSY for inject poison limit reached via debugfs and cxl-test support - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid unnecessary PCI config reads - A refactor to a common struct for DRAM and general media CXL events" * tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/pci: Move reading of control register to immediately before usage cxl: Remove defunct code calculating host bridge target positions cxl/region: Verify target positions using the ordered target list cxl: Restore XOR'd position bits during address translation cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() cxl/test: Replace ENXIO with EBUSY for inject poison limit reached cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy cxl/core: Fix incorrect vendor debug UUID define Documentation: CXL Maturity Map cxl/region: Simplify cxl_region_nid() cxl/region: Support to calculate memory tier abstract distance cxl/region: Fix a race condition in memory hotplug notifier cxl: add missing MODULE_DESCRIPTION() macros cxl/events: Use a common struct for DRAM and General Media events
2024-07-12Merge branch 'for-6.11/xor_fixes' into cxl-for-nextDave Jiang1-29/+30
Series to fix XOR math for DPA to SPA translation - Refactor and fold cxl_trace_hpa() into cxl_dpa_to_hpa() - Complete DPA->HPA->SPA translation and correct XOR translation issue - Add new method to verify a CXL target position - Remove old method of CXL target position verifiation
2024-07-12cxl/region: Verify target positions using the ordered target listAlison Schofield1-1/+4
When a root decoder is configured the interleave target list is read from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table 9-22 the target list is in interleave order. The CXL driver populates its decoder target list in the same order and stores it in 'struct cxl_switch_decoder' field "@target: active ordered target list in current decoder configuration" Given the promise of an ordered list, the driver can stop duplicating the work of BIOS and simply check target positions against the ordered list during region configuration. The simplified check against the ordered list is presented here. A follow-on patch will remove the unused code. For Modulo arithmetic this is not a fix, only a simplification. For XOR arithmetic this is a fix for HB IW of 3,6,12. Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/35d08d3aba08fee0f9b86ab1cef0c25116ca8a55.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-12cxl: Restore XOR'd position bits during address translationAlison Schofield1-9/+14
When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-12cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()Alison Schofield1-20/+13
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA addresses the work it performs is a DPA to HPA translation not a trace. Tidy up this naming by moving the minimal work done in cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa() for trace event callbacks. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://patch.msgid.link/452a9b0c525b774c72d9d5851515ffa928750132.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-02cxl/region: Simplify cxl_region_nid()Huang Ying1-6/+4
The node ID of the region can be gotten via resource start address directly. This simplifies the implementation of cxl_region_nid(). Signed-off-by: Huang Ying <ying.huang@intel.com> Suggested-by: Alison Schofield <alison.schofield@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240618084639.1419629-4-ying.huang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-02cxl/region: Support to calculate memory tier abstract distanceHuang Ying1-0/+27
An abstract distance value must be assigned by the driver that makes the memory available to the system. It reflects relative performance and is used to place memory nodes backed by CXL regions in the appropriate memory tiers allowing promotion/demotion within the existing memory tiering mechanism. The abstract distance is calculated based on the memory access latency and bandwidth of CXL regions. Signed-off-by: Huang, Ying <ying.huang@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240618084639.1419629-3-ying.huang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-02cxl/region: Fix a race condition in memory hotplug notifierHuang Ying1-4/+15
In the memory hotplug notifier function of the CXL region, cxl_region_perf_attrs_callback(), the node ID is obtained by checking the host address range of the region. However, the address range information is not available when the region is registered in devm_cxl_add_region(). Additionally, this information may be removed or added under the protection of cxl_region_rwsem during runtime. If the memory notifier is called for nodes other than that backed by the region, a race condition may occur, potentially leading to a NULL dereference or an invalid address range. The race condition is addressed by checking the availability of the address range information under the protection of cxl_region_rwsem. To enhance code readability and use guard(), the relevant code has been moved into a newly added function: cxl_region_nid(). Fixes: 067353a46d8c ("cxl/region: Add memory hotplug notifier for cxl region") Signed-off-by: Huang, Ying <ying.huang@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240618084639.1419629-2-ying.huang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-26cxl/region: check interleave capabilityYao Xingtao1-0/+82
Since interleave capability is not verified, if the interleave capability of a target does not match the region need, committing decoder should have failed at the device end. In order to checkout this error as quickly as possible, driver needs to check the interleave capability of target during attaching it to region. Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register), bits 11 and 12 indicate the capability to establish interleaving in 3, 6, 12 and 16 ways. If these bits are not set, the target cannot be attached to a region utilizing such interleave ways. Additionally, bits 8 and 9 represent the capability of the bits used for interleaving in the address, Linux tracks this in the cxl_port interleave_mask. Per CXL specification r3.1(8.2.4.20.13 Decoder Protection): eIW means encoded Interleave Ways. eIG means encoded Interleave Granularity. in HPA: if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used, the interleave bits are none, the following check is ignored. if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits start at bit position eIG + 8 and end at eIG + eIW + 8 - 1. if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits start at bit position eIG + 8 and end at eIG + eIW - 1. if the interleave mask is insufficient to cover the required interleave bits, the target cannot be attached to the region. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-26cxl/region: Avoid null pointer dereference in region lookupAlison Schofield1-4/+15
cxl_dpa_to_region() looks up a region based on a memdev and DPA. It wrongly assumes an endpoint found mapping the DPA is also of a fully assembled region. When not true it leads to a null pointer dereference looking up the region name. This appears during testing of region lookup after a failure to assemble a BIOS defined region or if the lookup raced with the assembly of the BIOS defined region. Failure to clean up BIOS defined regions that fail assembly is an issue in itself and a fix to that problem will alleviate some of the impact. It will not alleviate the race condition so let's harden this path. The behavior change is that the kernel oops due to a null pointer dereference is replaced with a dev_dbg() message noting that an endpoint was mapped. Additional comments are added so that future users of this function can more clearly understand what it provides. Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240604003609.202682-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-19cxl/mem: Fix no cxl_nvd during pmem region auto-assemblingLi Ming1-1/+1
When CXL subsystem is auto-assembling a pmem region during cxl endpoint port probing, always hit below calltrace. BUG: kernel NULL pointer dereference, address: 0000000000000078 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x160 ? do_user_addr_fault+0x65/0x6b0 ? exc_page_fault+0x7d/0x170 ? asm_exc_page_fault+0x26/0x30 ? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] ? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem] cxl_bus_probe+0x1b/0x60 [cxl_core] really_probe+0x173/0x410 ? __pfx___device_attach_driver+0x10/0x10 __driver_probe_device+0x80/0x170 driver_probe_device+0x1e/0x90 __device_attach_driver+0x90/0x120 bus_for_each_drv+0x84/0xe0 __device_attach+0xbc/0x1f0 bus_probe_device+0x90/0xa0 device_add+0x51c/0x710 devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core] cxl_bus_probe+0x1b/0x60 [cxl_core] The cxl_nvd of the memdev needs to be available during the pmem region probe. Currently the cxl_nvd is registered after the endpoint port probe. The endpoint probe, in the case of autoassembly of regions, can cause a pmem region probe requiring the not yet available cxl_nvd. Adjust the sequence so this dependency is met. This requires adding a port parameter to cxl_find_nvdimm_bridge() that can be used to query the ancestor root port. The endpoint port is not yet available, but will share a common ancestor with its parent, so start the query from there instead. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Ming <ming4.li@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240612064423.2567625-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-29cxl/region: Fix memregion leaks in devm_cxl_add_region()Li Zhijian1-9/+9
Move the mode verification to __create_region() before allocating the memregion to avoid the memregion leaks. Fixes: 6e099264185d ("cxl/region: Add volatile region creation support") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01cxl/region: Convert cxl_pmem_region_alloc to scope-based resource managementDan Williams1-26/+17
A recent bugfix to cxl_pmem_region_alloc() to fix an error-unwind-memleak [1], highlighted a use case for scope-based resource management. Delete the goto for releasing @cxl_region_rwsem, and return error codes directly from error condition paths. The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from @cxlr was already in place for @cxlr->cxl_nvb, and converting cxl_pmem_region_alloc() to return an int makes it less awkward to handle no_free_ptr(). Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01cxl/region: Fix cxlr_pmem leaksLi Zhijian1-0/+1
Before this error path, cxlr_pmem pointed to a kzalloc() memory, free it to avoid this memory leaking. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_trace_hpa() work to the region driverAlison Schofield1-0/+91
This work belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_dpa_to_region() work to the region driverAlison Schofield1-0/+44
This helper belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-13cxl/region: Deal with numa nodes not enumerated by SRATDave Jiang1-1/+6
For the numa nodes that are not created by SRAT, no memory_target is allocated and is not managed by the HMAT_REPORTING code. Therefore hmat_callback() memory hotplug notifier will exit early on those NUMA nodes. The CXL memory hotplug notifier will need to call node_set_perf_attrs() directly in order to setup the access sysfs attributes. In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is stored. Add a helper function acpi_node_backed_by_real_pxm() in order to check if a NUMA node id is defined by SRAT or created by CFMWS. node_set_perf_attrs() symbol is exported to allow update of perf attribs for a node. The sysfs path of /sys/devices/system/node/nodeX/access0/initiators/* is created by node_set_perf_attrs() for the various attributes where nodeX is matched to the NUMA node of the CXL region. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Add memory hotplug notifier for cxl regionDave Jiang1-0/+68
When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Add sysfs attribute for locality attributes of CXL regionsDave Jiang1-0/+94
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL region. The bandwidth is the aggregated bandwidth of all devices that contribute to the CXL region. The latency is the worst latency of the device amongst all the devices that contribute to the CXL region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Calculate performance data for a regionDave Jiang1-0/+2
Calculate and store the performance data for a CXL region. Find the worst read and write latency for all the included ranges from each of the devices that attributes to the region and designate that as the latency data. Sum all the read and write bandwidth data for each of the device region and that is the total bandwidth for the region. The perf list is expected to be constructed before the endpoint decoders are registered and thus there should be no early reading of the entries from the region assemble action. The calling of the region qos calculate function is under the protection of cxl_dpa_rwsem and will ensure that all DPA associated work has completed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-17cxl/region: Allow out of order assembly of autodiscovered regionsAlison Schofield1-10/+38
Autodiscovered regions can fail to assemble if they are not discovered in HPA decode order. The user will see failure messages like: [] cxl region0: endpoint5: HPA order violation region1 [] cxl region0: endpoint5: failed to allocate region reference The check that is causing the failure helps the CXL driver enforce a CXL spec mandate that decoders be committed in HPA order. The check is needless for autodiscovered regions since their decoders are already programmed. Trying to enforce order in the assembly of these regions is useless because they are assembled once all their member endpoints arrive, and there is no guarantee on the order in which endpoints are discovered during probe. Keep the existing check, but for autodiscovered regions, allow the out of order assembly after a sanity check that the lesser numbered decoder has the lesser HPA starting address. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-17cxl/region: Handle endpoint decoders in cxl_region_find_decoder()Alison Schofield1-6/+8
In preparation for adding a new caller of cxl_region_find_decoders() teach it to find a decoder from a cxl_endpoint_decoder structure. Combining switch and endpoint decoder lookup in one function prevents code duplication in call sites. Update the existing caller. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-25cxl/region:Fix overflow issue in alloc_hpa()Quanquan Cao1-2/+2
Creating a region with 16 memory devices caused a problem. The div_u64_rem function, used for dividing an unsigned 64-bit number by a 32-bit one, faced an issue when SZ_256M * p->interleave_ways. The result surpassed the maximum limit of the 32-bit divisor (4G), leading to an overflow and a remainder of 0. note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G To fix this issue, I replaced the div_u64_rem function with div64_u64_rem and adjusted the type of the remainder. Signed-off-by: Quanquan Cao <caoqq@fujitsu.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-06Merge branch 'for-6.8/cxl-misc' into for-6.8/cxlDan Williams1-1/+1
Pick up some miscellaneous fixups for v6.8.
2024-01-04cxl/region: fix x9 interleave typoJim Harris1-1/+1
CXL supports x3, x6 and x12 - not x9. Fixes: 80d10a6cee050 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/169904271254.204936.8580772404462743630.stgit@ubuntu Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-02Merge branch 'for-6.8/cxl-cdat' into for-6.8/cxlDan Williams1-5/+0
Pick up the CDAT parsing and QOS class infrastructure for v6.8.
2024-01-02cxl/region: use %pap format to print resource_size_tRandy Dunlap1-2/+2
Use "%pap" to print a resource_size_t (phys_addr_t derived type) to prevent build warnings on 32-bit arches (seen on i386 and riscv-32). ../drivers/cxl/core/region.c: In function 'alloc_hpa': ../drivers/cxl/core/region.c:556:25: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=] 556 | "HPA allocation error (%ld) for size:%#llx in %s %pr\n", Fixes: 7984d22f1315 ("cxl/region: Add dev_dbg() detail on failure to allocate HPA space") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Fan Ni <fan.ni@samsung.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <linux-cxl@vger.kernel.org> Link: https://lore.kernel.org/r/20240102173917.19718-1-rdunlap@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-25cxl/region: Add dev_dbg() detail on failure to allocate HPA spaceAlison Schofield1-2/+3
When the region driver fails while allocating HPA space for a new region it can be because the parent resource, the CXL Window, has no more available space. In that case, the debug user sees this message: cxl_core:alloc_hpa:555: cxl region2: failed to allocate HPA: -34 Expand the message like this: cxl_core:alloc_hpa:555: cxl region8: HPA allocation error (-34) for size:0x20000000 in CXL Window 0 [mem 0xf010000000-0xf04fffffff flags 0x200] Now the debug user can examine /proc/iomem and consider actions like removing other allocations in that space or reducing the size of their region request. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20231223004740.1401858-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-19cxl: Fix unregister_region() callback parameter assignmentDave Jiang1-4/+4
In devm_cxl_add_region(), devm_add_action_or_reset() is called by passing in unregister_region() with data ptr of 'cxlr'. However, in unregister_region(), the passed in parameter is incorrectly assumed to be a 'struct device' rather than the 'cxlr' pointer. The code has been working because 'struct device' is the first member of 'struct cxl_region'. Issue found by inspection. Fix the assignment so that cxlr is pointing directly to the passed in parameter. Not flagged for -stable since there is no functional impact of this fix. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/170258123810.952211.3907381447996426480.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-30cxl/core: Always hold region_rwsem while reading poison listsAlison Schofield1-5/+0
A read of a device poison list is triggered via a sysfs attribute and the results are logged as kernel trace events of type cxl_poison. The work is managed by either: a) the region driver when one of more regions map the device, or by b) the memdev driver when no regions map the device. In the case of a) the region driver holds the region_rwsem while reading the poison by committed endpoint decoder mappings and for any unmapped resources. This makes sure that the cxl_poison trace event trace reports valid region info. (Region name, HPA, and UUID). In the case of b) the memdev driver holds the dpa_rwsem preventing new DPA resources from being attached to a region. However, it leaves a gap between region attach and decoder commit actions. If a DPA in the gap is in the poison list, the cxl_poison trace event will omit the region info. Close the gap by holding the region_rwsem and the dpa_rwsem when reading poison per memdev. Since both methods now hold both locks, down_read both from the caller. Doing so also addresses the lockdep assert that found this issue: Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-01cxl/hdm: Remove broken error pathDan Williams1-0/+8
Dan reports that cxl_decoder_commit() potentially leaks a hold of cxl_dpa_rwsem. The potential error case is a "should not" happen scenario, turn it into a "can not" happen scenario by adding the error check to cxl_port_setup_targets() where other setting validation occurs. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31Merge branch 'for-6.7/cxl' into cxl/nextDan Williams1-1/+1
Pickup some misc. CXL updates for v6.7.
2023-10-27cxl/region: Fix x1 root-decoder granularity calculationsJim Harris1-1/+8
Root decoder granularity must match value from CFWMS, which may not be the region's granularity for non-interleaved root decoders. So when calculating granularities for host bridge decoders, use the region's granularity instead of the root decoder's granularity to ensure the correct granularities are set for the host bridge decoders and any downstream switch decoders. Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch. Region created with 2048 granularity using following command line: cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \ -g 2048 -s 2048M Use "cxl list -PDE | grep granularity" to get a view of the granularity set at each level of the topology. Before this patch: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":256, After: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in [djbw: fixup the prebuilt cxl_test region] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Fix cxl_region_rwsem lock held when returning to user spaceLi Zhijian1-1/+1
Fix a missed "goto out" to unlock on error to cleanup this splat: WARNING: lock held when returning to user space! 6.6.0-rc3-lizhijian+ #213 Not tainted ------------------------------------------------ cxl/673 is leaving the kernel with locks still held! 1 lock held by cxl/673: #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core] In terms of user visible impact of this bug for backports: cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a problematic instruction for virtualized environments. So, on virtualized x86, cxl_region_invalidate_memregion() returns an error. This failure case got missed because CXL memory-expander device passthrough is not a production use case, and emulation of CXL devices is typically limited to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y, that makes cxl_region_invalidate_memregion() succeed. In other words, the expected exposure of this bug is limited to CXL subsystem development environments using QEMU that neglected CONFIG_CXL_REGION_INVALIDATION_TEST=y. Fixes: d1257d098a5a ("cxl/region: Move cache invalidation before region teardown, and before setup") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Use cxl_calc_interleave_pos() for auto-discoveryAlison Schofield1-112/+15
For auto-discovered regions the driver must assign each target to a valid position in the region interleave set based on the decoder topology. The current implementation fails to parse valid decode topologies, as it does not consider the child offset into a parent port. The sort put all targets of one port ahead of another port when an interleave was expected, causing the region assembly to fail. Replace the existing relative sort with cxl_calc_interleave_pos() that finds the exact position in a region interleave for an endpoint based on a walk up the ancestral tree from endpoint to root decoder. cxl_calc_interleave_pos() was introduced in a prior patch, so the work here is to use it in cxl_region_sort_targets(). Remove the obsoleted helper functions from the prior sort. Testing passes on pre-production hardware with BIOS defined regions that natively trigger this autodiscovery path of the region driver. Testing passes a CXL unit test using the dev_dbg() calculation test (see cxl_region_attach()) across an expanded set of region configs: 1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number represents the count of endpoints per host bridge. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Calculate a target position in a region interleaveAlison Schofield1-0/+127
Introduce a calculation to find a target's position in a region interleave. Perform a self-test of the calculation on user-defined regions. The region driver uses the kernel sort() function to put region targets in relative order. Positions are assigned based on each target's index in that sorted list. That relative sort doesn't consider the offset of a port into its parent port which causes some auto-discovered regions to fail creation. In one failure case, a 2 + 2 config (2 host bridges each with 2 endpoints), the sort puts all the targets of one port ahead of another port when they were expected to be interleaved. In preparation for repairing the autodiscovery region assembly, introduce a new method for discovering a target position in the region interleave. cxl_calc_interleave_pos() adds a method to find the target position by ascending from an endpoint to a root decoder. The calculation starts with the endpoint's local position and position in the parent port. It traverses towards the root decoder and examines both position and ways in order to allow the position to be refined all the way to the root decoder. This calculation: position = position * parent_ways + parent_pos; applied iteratively yields the correct position. Include a self-test that exercises this new position calculation against every successfully configured user-defined region. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/0ac32c75cf81dd8b86bf07d70ff139d33c2300bc.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-26cxl/region: Prepare the decoder match range helper for reuseAlison Schofield1-6/+11
match_decoder_by_range() and decoder_match_range() both determine if an HPA range matches a decoder. The first does it for root decoders and the second one operates on switch decoders. Tidy these up with clear naming and make the switch helper more like the root decoder helper in style and functionality. Make it take the actual range, rather than an endpoint decoder from which it extracts the range. Require an exact match on switch decoders, because unlike a root decoder that maps an entire region, Linux only supports 1:1 mapping of switch to endpoint decoders. Note that root-decoders are a super-set of switch-decoders and the range they cover is a super-set of a region, hence the use of range_contains() for that case. Aside from aesthetics and maintainability, this is in preparation for reuse. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/011b1f498e1758bb8df17c5951be00bd8d489e3b.1698263080.git.alison.schofield@intel.com [djbw: fixup root decoder vs switch decoder range checks] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-25cxl/region: Do not try to cleanup after cxl_region_setup_targets() failsJim Harris1-7/+7
Commit 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") tried to avoid 'eiw' initialization errors when ->nr_targets exceeded 16, by just decrementing ->nr_targets when cxl_region_setup_targets() failed. Commit 86987c766276 ("cxl/region: Cleanup target list on attach error") extended that cleanup to also clear cxled->pos and p->targets[pos]. The initialization error was incidentally fixed separately by: Commit 8d4285425714 ("cxl/region: Fix port setup uninitialized variable warnings") which was merged a few days after 5e42bcbc3fef. But now the original cleanup when cxl_region_setup_targets() fails prevents endpoint and switch decoder resources from being reused: 1) the cleanup does not set the decoder's region to NULL, which results in future dpa_size_store() calls returning -EBUSY 2) the decoder is not properly freed, which results in future commit errors associated with the upstream switch Now that the initialization errors were fixed separately, the proper cleanup for this case is to just return immediately. Then the resources associated with this target get cleanup up as normal when the failed region is deleted. The ->nr_targets decrement in the error case also helped prevent a p->targets[] array overflow, so add a new check to prevent against that overflow. Tested by trying to create an invalid region for a 2 switch * 2 endpoint topology, and then following up with creating a valid region. Fixes: 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169703589120.1202031.14696100866518083806.stgit@bgt-140510-bm03.eng.stellus.in Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/memdev: Fix sanitize vs decoder setup lockingDan Williams1-6/+0
The sanitize operation is destructive and the expectation is that the device is unmapped while in progress. The current implementation does a lockless check for decoders being active, but then does nothing to prevent decoders from racing to be committed. Introduce state tracking to resolve this race. This incidentally cleans up unpriveleged userspace from triggering mmio read cycles by spinning on reading the 'security/state' attribute. Which at a minimum is a waste since the kernel state machine can cache the completion result. Lastly cxl_mem_sanitize() was mistakenly marked EXPORT_SYMBOL() in the original implementation, but an export was never required. Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Cc: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-16cxl/port: Quiet warning messages from the cxl_test environmentDan Williams1-1/+1
The cxl_test platform device CXL port hierarchy is useful for testing, but throws warning messages of the form: cxl_mem mem2: at cxl_root_port.1 no parent for dport: platform cxl_mem mem3: at cxl_root_port.2 no parent for dport: platform cxl_mem mem4: at cxl_root_port.3 no parent for dport: platform cxl_mem mem5: at cxl_root_port.0 no parent for dport: platform cxl_mem mem6: at cxl_root_port.1 no parent for dport: platform cxl_mem mem7: at cxl_root_port.2 no parent for dport: platform cxl_mem mem8: at cxl_root_port.3 no parent for dport: platform cxl_mem mem9: at cxl_root_port.4 no parent for dport: platform cxl_mem mem10: at cxl_root_port.4 no parent for dport: platform ...and this message when running testing in QEMU: cxl_region region4: Bypassing cpu_cache_invalidate_memregion() for testing! Noisy cxl_test warnings have caused other regressions to be missed. In the interest of using cxl_test for early detection of dev_err() and dev_warn() messages, silence platform device topology and cache-invalidation messages. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15cxl/region: Refactor granularity select in cxl_port_setup_targets()Alison Schofield1-9/+8
In cxl_port_setup_targets() the region driver validates the configuration of auto-discovered region decoders, as well as decoders the driver is preparing to program. The existing calculations use the encoded interleave granularity value to create an interleave granularity that properly fans out when routing an x1 interleave to a greater than x1 interleave. That all worked well, until this config came along: Host Bridge: 2 way at 256 granularity Switch Decoder_A: 1 way at 512 Endpoint_X: 2 way at 256 Switch Decoder_B: 1 way at 512 Endpoint_Y: 2 way at 256 When the Host Bridge interleave is greater than 1 and the root decoder interleave is exactly 1, the region driver needs to consider the number of targets in the region when calculating the expected granularity. While examining the existing logic, and trying to cover the case above, a couple of simplifications appeared, hence this proposed refactoring. The first simplification is to apply the logic to the nominal values and use the existing helper function granularity_to_eig() to translate the desired granularity to the encoded form. This means the comment and code regarding setting address bits is discarded. Although that logic is not wrong, it adds a level of complexity that is not required in the granularity selection. The eig and eiw are indeed part of the routing instructions programmed into the decoders. Up-level the discussion to nominal ways and granularity for clearer analysis. The second simplification reduces the logic to a single granularity calculation that works for all cases. The new calculation doesn't care if parent_iw => 1 because parent_iw is used as a multiplier. The refactor cleans up a useless assignment of eiw made after the iw is already calculated. Regression testing included an examination of all of the ways and granularity selections made during a run of the cxl_test unit tests. There were no differences in selections before and after this patch. Fixes: ("27b3f8d13830 cxl/region: Program target lists") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230822180928.117596-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15cxl/region: Match auto-discovered region decoders by HPA rangeAlison Schofield1-1/+23
Currently, when the region driver attaches a region to a port, it selects the ports next available decoder to program. With the addition of auto-discovered regions, a port decoder has already been programmed so grabbing the next available decoder can be a mismatch when there is more than one region using the port. The failure appears like this with CXL DEBUG enabled: [] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200] [] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference When CXL DEBUG is not enabled, there is no failure message. The region just never materializes. Users can suspect this issue if they know their firmware has programmed decoders so that more than one region is using a port. Note that the problem may appear intermittently, ie not on every reboot. Add a matching method for auto-discovered regions that finds a decoder based on an HPA range. The decoder range must exactly match the region resource parameter. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230905211007.256385-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-26Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxlDan Williams1-25/+27
Pick up the first half of the RCH error handling series. The back half needs some fixups for test regressions. Small conflicts with the PMU work around register enumeration and setup helpers.
2023-06-26Merge branch 'for-6.5/cxl-region-fixes' into for-6.5/cxlDan Williams1-39/+63
Pick up the recent fixes to how CPU caches are managed relative to region setup / teardown, and make sure that all decoders transition successfully before updating the region state from COMMIT => ACTIVE.
2023-06-26cxl/region: Manage decoder target_type at decoder-attach timeDan Williams1-0/+12
Switch-level (mid-level) decoders between the platform root and an endpoint can dynamically switch modes between HDM-H and HDM-D[B] depending on which region they target. Use the region type to fixup each decoder that gets allocated to map the given region. Note that endpoint decoders are meant to determine the region type, so warn if those ever need to be fixed up, but since it is possible to continue do so. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679262543.3436160.13053831955768440312.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-26cxl/port: Rename CXL_DECODER_{EXPANDER, ACCELERATOR} => {HOSTONLYMEM, DEVMEM}Dan Williams1-1/+1
In preparation for support for HDM-D and HDM-DB configuration (device-memory, and device-memory with back-invalidate). Rename the current type designators to use HOSTONLYMEM and DEVMEM as a suffix. HDM-DB can be supported by devices that are not accelerators, so DEVMEM is a more generic term for that case. Fixup one location where this type value was open coded. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679261369.3436160.7042443847605280593.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Fix state transitions after reset failureDan Williams1-11/+15
Jonathan reports that failed attempts to reset a region (teardown its HDM decoder configuration) mistakenly advance the state of the region to "not committed". Revert to the previous state of the region on reset failure so that the reset can be re-attempted. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Flag partially torn down regions as unusableDan Williams1-0/+12
cxl_region_decode_reset() walks all the decoders associated with a given region and disables them. Due to decoder ordering rules it is possible that a switch in the topology notices that a given decoder can not be shutdown before another region with a higher HPA is shutdown first. That can leave the region in a partially committed state. Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require that a successful cxl_region_decode_reset() attempt must be completed before cxl_region_probe() accepts the region. This is a corollary for the bug that Jonathan identified in "CXL/region : commit reset of out of order region appears to succeed." [1]. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1] Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Move cache invalidation before region teardown, and before setupDan Williams1-29/+37
Vikram raised a concern with the theoretical case of a CPU sending MemClnEvict to a device that is not prepared to receive. MemClnEvict is a message that is sent after a CPU has taken ownership of a cacheline from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder reconfiguration it is possible that the CPU is holding old contents for a new device that has taken over the physical address range being cached by the CPU. To avoid this scenario, invalidate caches prior to tearing down an HDM decoder configuration. Now, this poses another problem that it is possible for something to speculate into that space while the decode configuration is still up, so to close that gap also invalidate prior to establish new contents behind a given physical address range. With this change the cache invalidation is now explicit and need not be checked in cxl_region_probe(), and that obviates the need for CXL_REGION_F_INCOHERENT. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events") Reported-by: Vikram Sethi <vsethi@nvidia.com> Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl: Rename 'uport' to 'uport_dev'Dan Williams1-23/+25
For symmetry with the recent rename of ->dport_dev for a 'struct cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct cxl_port'. These devices represent the downstream-port-device and upstream-port-device respectively in the CXL/PCIe topology. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>