summaryrefslogtreecommitdiff
path: root/drivers/cxl/cxl.h
AgeCommit message (Collapse)AuthorFilesLines
2022-11-05cxl/region: Fix 'distance' calculation with passthrough portsDan Williams1-0/+2
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-05cxl/pmem: Fix cxl_pmem_region and cxl_memdev leakDan Williams1-1/+1
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks. Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations. Cc: <stable@vger.kernel.org> Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1] Reported-by: Vishal Verma <vishal.l.verma@intel.com> Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752183647.947915.2045230911503793901.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: describe targets and nr_targets members of cxl_region_paramsBagas Sanjaya1-0/+2
Sphinx reported undescribed parameters in cxl_region_params struct: ./drivers/cxl/cxl.h:376: warning: Function parameter or member 'targets' not described in 'cxl_region_params' ./drivers/cxl/cxl.h:376: warning: Function parameter or member 'nr_targets' not described in 'cxl_region_params' Describe these members. Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220804075448.98241-3-bagasdotme@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-02cxl/acpi: Minimize granularity for x1 interleavesDan Williams1-0/+2
The kernel enforces that region granularity is >= to the top-level interleave-granularity for the given CXL window. However, when the CXL window interleave is x1, i.e. non-interleaved at the host bridge level, then the specified granularity does not matter. Override the window specified granularity to the CXL minimum so that any valid region granularity is >= to the root granularity. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com [djbw: add CXL_DECODER_MIN_GRANULARITY per vishal] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-01cxl/region: prevent underflow in ways_to_cxl()Dan Carpenter1-1/+1
The "ways" variable comes from the user. The ways_to_cxl() function has an upper bound but it doesn't check for negatives. Make the "ways" variable an unsigned int to fix this bug. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yueo3NV2hFCXx1iV@kili [djbw: fixup interleave_ways_store() to only accept unsigned input] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Introduce cxl_pmem_region objectsDan Williams1-1/+35
The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/pmem: Fix offline_nvdimm_bus() to offline by bridgeDan Williams1-0/+1
Be careful to only disable cxl_pmem objects related to a given cxl_nvdimm_bridge. Otherwise, offline_nvdimm_bus() reaches across CXL domains and disables more than is expected. Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784339569.1758207.1557084545278004577.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Add region driver boiler plateDan Williams1-0/+1
The CXL region driver is responsible for routing fully formed CXL regions to one of libnvdimm, for persistent memory regions, device-dax for volatile memory regions, or just act as an enumeration placeholder if the region was setup and configuration locked by platform firmware. In the platform-firmware-setup case the expectation is that region is already accounted in the system memory map, i.e. already enabled as "System RAM". For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and take no further action. Given this driver is just a small / simple router, include it in the core rather than its own module. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/hdm: Commit decoder state to hardwareDan Williams1-1/+12
After all the soft validation of the region has completed, convey the region configuration to hardware while being careful to commit decoders in specification mandated order. In addition to programming the endpoint decoder base-address, interleave ways and granularity, the switch decoder target lists are also established. While the kernel can enforce spec-mandated commit order, it can not enforce spec-mandated reset order. For example, the kernel can't stop someone from removing an endpoint device that is occupying decoderN in a switch decoder where decoderN+1 is also committed. To reset decoderN, decoderN+1 must be torn down first. That "tear down the world" implementation is saved for a follow-on patch. Callback operations are provided for the 'commit' and 'reset' operations. While those callbacks may prove useful for CXL accelerators (Type-2 devices with memory) the primary motivation is to enable a simple way for cxl_test to intercept those operations. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784338418.1758207.14659830845389904356.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Program target listsDan Williams1-0/+2
Once the region's interleave geometry (ways, granularity, size) is established and all the endpoint decoder targets are assigned, the next phase is to program all the intermediate decoders. Specifically, each CXL switch in the path between the endpoint and its CXL host-bridge (including the logical switch internal to the host-bridge) needs to have its decoders programmed and the target list order assigned. The difficulty in this implementation lies in determining which endpoint decoder ordering combinations are valid. Consider the cxl_test case of 2 host bridges, each of those host-bridges attached to 2 switches, and each of those switches attached to 2 endpoints for a potential 8-way interleave. The x2 interleave at the host-bridge level requires that all even numbered endpoint decoder positions be located on the "left" hand side of the topology tree, and the odd numbered positions on the other. The endpoints that are peers on the same switch need to have a position that can be routed with a dedicated address bit per-endpoint. See check_last_peer() for the details. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Attach endpoint decodersDan Williams1-0/+20
CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/acpi: Add a host-bridge index lookup mechanismDan Williams1-0/+2
The ACPI CXL Fixed Memory Window Structure (CFMWS) defines multiple methods to determine which host bridge provides access to a given endpoint relative to that device's position in the interleave. The "Interleave Arithmetic" defines either a "standard modulo" / round-random algorithm, or "xormap" based algorithm which can be defined as a non-linear transform. Given that there are already more options beyond "standard modulo" and that "xormap" may turn out to be ACPI CXL specific, provide a callback for the region provisioning code to map endpoint positions back to expected host bridge id (cxl_dport target). For now just support the simple modulo math case and save the xormap for a follow-on change. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-14-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Enable the assignment of endpoint decoders to regionsDan Williams1-0/+11
The region provisioning process involves allocating DPA to a set of endpoint decoders, and HPA plus the region geometry to a region device. Then the decoder is assigned to the region. At this point several validation steps can be performed to validate that the decoder is suitable to participate in the region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Allocate HPA capacity to regionsDan Williams1-0/+2
After a region's interleave parameters (ways and granularity) are set, add a way for regions to allocate HPA (host physical address space) from the free capacity in their parent root-decoder. The allocator for this capacity reuses the 'struct resource' based allocator used for CONFIG_DEVICE_PRIVATE. Once the tuple of "ways, granularity, [uuid], and size" is set the region configuration transitions to the CXL_CONFIG_INTERLEAVE_ACTIVE state which is a precursor to allowing endpoint decoders to be added to a region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784335630.1758207.420216490941955417.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Add interleave geometry attributesBen Widawsky1-0/+33
Add ABI to allow the number of devices that comprise a region to be set as well as the interleave granularity for the region. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: reword changelog] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-11-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Add a 'uuid' attributeBen Widawsky1-0/+25
The process of provisioning a region involves triggering the creation of a new region object, pouring in the configuration, and then binding that configured object to the region driver to start its operation. For persistent memory regions the CXL specification mandates that it identified by a uuid. Add an ABI for userspace to specify a region's uuid. Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> [djbw: simplify locking] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784334465.1758207.8224025435884752570.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/region: Add region creation supportBen Widawsky1-0/+18
CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/mem: Enumerate port targets before adding endpointsDan Williams1-0/+5
The port scanning algorithm in devm_cxl_enumerate_ports() walks up the topology and adds cxl_port objects starting from the root down to the endpoint. When those ports are initially created they know all their dports, but they do not know the downstream cxl_port instance that represents the next descendant in the topology. Rework create_endpoint() into devm_cxl_add_endpoint() that enumerates the downstream cxl_port topology into each port's 'struct cxl_ep' record for each endpoint it that the port is an ancestor. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-7-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/port: Move dport tracking to an xarrayDan Williams1-5/+7
Reduce the complexity and the overhead of walking the topology to determine endpoint connectivity to root decoder interleave configurations. Note that cxl_detach_ep(), after it determines that the last @ep has departed and decides to delete the port, now needs to walk the dport array with the device_lock() held to remove entries. Previously list_splice_init() could be used atomically delete all dport entries at once and then perform entry tear down outside the lock. There is no list_splice_init() equivalent for the xarray. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784331647.1758207.6345820282285119339.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/port: Move 'cxl_ep' references to an xarray per portDan Williams1-3/+1
In preparation for region provisioning that needs to walk the topology by endpoints, use an xarray to record endpoint interest in a given port. In addition to being more space and time efficient it also reduces the complexity of the implementation by moving locking internal to the xarray implementation. It also allows for a single cxl_ep reference to be recorded in multiple xarrays. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-2-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/port: Record parent dport when adding portsDan Williams1-2/+5
At the time that cxl_port instances are being created, cache the dport from the parent port that points to this new child port. This will be useful for region provisioning when walking the tree to calculate decoder targets, and saves rewalking the dport list after the fact to build this information. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-1-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/port: Record dport in endpoint referencesDan Williams1-0/+2
Recall that the primary role of the cxl_mem driver is to probe if the given endpoint is connected to a CXL port topology. In that process it walks its device ancestry to its PCI root port. If that root port is also a CXL root port then the probe process adds cxl_port object instances at switch in the path between to the root and the endpoint. As those cxl_port instances are added, or if a previous enumeration attempt already created the port, a 'struct cxl_ep' instance is registered with that port to track the endpoints interested in that port. At the time the cxl_ep is registered the downstream egress path from the port to the endpoint is known. Take the opportunity to record that information as it will be needed for dynamic programming of decoder targets during region provisioning. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784329944.1758207.15203961796832072116.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/hdm: Track next decoder to allocateDan Williams1-0/+2
The CXL specification enforces that endpoint decoders are committed in hw instance id order. In preparation for adding dynamic DPA allocation, record the hw instance id in endpoint decoders, and enforce allocations to occur in hw instance id order. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784328827.1758207.9627538529944559954.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/hdm: Add 'mode' attribute to decoder objectsDan Williams1-0/+9
Recall that the Device Physical Address (DPA) space of a CXL Memory Expander is potentially partitioned into a volatile and persistent portion. A decoder maps a Host Physical Address (HPA) range to a DPA range and that translation depends on the value of all previous (lower instance number) decoders before the current one. In preparation for allowing dynamic provisioning of regions, decoders need an ABI to indicate which DPA partition a decoder targets. This ABI needs to be prepared for the possibility that some other agent committed and locked a decoder that spans the partition boundary. Add 'decoderX.Y/mode' to endpoint decoders that indicates which partition 'ram' / 'pmem' the decoder targets, or 'mixed' if the decoder currently spans the partition boundary. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165603881967.551046.6007594190951596439.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-22cxl/hdm: Enumerate allocated DPADan Williams1-0/+2
In preparation for provisioning CXL regions, add accounting for the DPA space consumed by existing regions / decoders. Recall, a CXL region is a memory range comprised from one or more endpoint devices contributing a mapping of their DPA into HPA space through a decoder. Record the DPA ranges covered by committed decoders at initial probe of endpoint ports relative to a per-device resource tree of the DPA type (pmem or volatile-ram). The cxl_dpa_rwsem semaphore is introduced to globally synchronize DPA state across all endpoints and their decoders at once. The vast majority of DPA operations are reads as region creation is expected to be as rare as disk partitioning and volume creation. The device_lock() for this synchronization is specifically avoided for concern of entangling with sysfs attribute removal. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784327682.1758207.7914919426043855876.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/core: Define a 'struct cxl_endpoint_decoder'Dan Williams1-1/+14
Previously the target routing specifics of switch decoders and platform CXL window resource tracking of root decoders were factored out of 'struct cxl_decoder'. While switch decoders translate from SPA to downstream ports, endpoint decoders translate from SPA to DPA. This patch, 3 of 3, adds a 'struct cxl_endpoint_decoder' that tracks an endpoint-specific Device Physical Address (DPA) resource. For now this just defines ->dpa_res, a follow-on patch will handle requesting DPA resource ranges from a device-DPA resource tree. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784327088.1758207.15502834501671201192.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/core: Define a 'struct cxl_root_decoder'Dan Williams1-2/+13
Previously the target routing specifics of switch decoders were factored out of 'struct cxl_decoder' into 'struct cxl_switch_decoder'. This patch, 2 of 3, adds a 'struct cxl_root_decoder' as a superset of a switch decoder that also track the associated CXL window platform resource. Note that the reason the resource for a given root decoder needs to be looked up after the fact (i.e. after cxl_parse_cfmws() and add_cxl_resource()) is because add_cxl_resource() may have merged CXL windows in order to keep them at the top of the resource tree / decode hierarchy. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784326541.1758207.9915663937394448341.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/core: Define a 'struct cxl_switch_decoder'Dan Williams1-8/+22
Currently 'struct cxl_decoder' contains the superset of attributes needed for all decoder types. Before more type-specific attributes are added to the common definition, reorganize 'struct cxl_decoder' into type specific objects. This patch, the first of three, factors out a cxl_switch_decoder type. See the new kdoc for what a 'struct cxl_switch_decoder' represents in a CXL topology. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/165784325340.1758207.5064717153608954960.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-20cxl/port: Read CDAT tableIra Weiny1-0/+7
The per-device CDAT data provides performance data that is relevant for mapping which CXL devices can participate in which CXL ranges by QTG (QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The QTG association specified in the ECN is advisory. Until the cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT data is only of interest to userspace that may need it for debug purposes. Search the DOE mailboxes available, query CDAT data, cache the data and make it available via a sysfs binary attribute per endpoint at: /sys/bus/cxl/devices/endpointX/CDAT ...similar to other ACPI-structured table data in /sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port' objects since switches in addition to endpoints can host a CDAT instance. Switch CDAT support is not implemented. This does not support table updates at runtime. It will always provide whatever was there when first cached. It is also the case that table updates are not expected outside of explicit DPA address map affecting commands like Set Partition with the immediate flag set. Given that the driver does not support Set Partition with the immediate flag set there is no current need for update support. Link: https://www.computeexpresslink.org/spec-landing [1] Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> [djbw: drop in-kernel parsing infra for now, and other minor fixups] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-11cxl/pmem: Delete unused nvdimm attributeDan Williams1-1/+0
While there is a need to go from a LIBNVDIMM 'struct nvdimm' to a CXL 'struct cxl_nvdimm', there is no use case to go the other direction. Likely this is a leftover from an early version of the referenced commit before it implemented devm for releasing the created nvdimm. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-19-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl/port: Cache CXL host bridge dataDan Williams1-0/+2
Region creation has need for checking host-bridge connectivity when adding endpoints to regions. Record, at port creation time, the host-bridge to provide a useful shortcut from any location in the topology to the most-significant ancestor. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-4-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl: Introduce cxl_to_{ways,granularity}Dan Williams1-0/+26
Interleave granularity and ways have CXL specification defined encodings. Promote the conversion helpers to a common header, and use them to replace other open-coded instances. Force caller to consider the error case of the conversion similarly to other conversion helpers like kstrto*(). Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165603875016.551046.17236943065932132355.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl/core: Drop is_cxl_decoder()Dan Williams1-1/+0
This helper was only used to identify the object type for lockdep purposes. Now that lockdep support is done with explicit lock classes, this helper can be dropped. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Link: https://lore.kernel.org/r/165603874340.551046.15491766127759244728.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl/core: Drop ->platform_res attribute for root decodersDan Williams1-5/+1
Root decoders are responsible for hosting the available host address space for endpoints and regions to claim. The tracking of that available capacity can be done in iomem_resource directly. As a result, root decoders no longer need to host their own resource tree. The current ->platform_res attribute was added prematurely. Otherwise, ->hpa_range fills the role of conveying the current decode range of the decoder. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Link: https://lore.kernel.org/r/165603873619.551046.791596854070136223.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl/core: Rename ->decoder_range ->hpa_rangeDan Williams1-2/+2
In preparation for growing a ->dpa_range attribute for endpoint decoders, rename the current ->decoder_range to the more descriptive ->hpa_range. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Link: https://lore.kernel.org/r/165603872867.551046.2170426227407458814.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-22cxl/core: Use is_endpoint_decoderBen Widawsky1-0/+1
Save some characters and directly check decoder type rather than port type. There's no need to check if the port is an endpoint port since, by this point, cxl_endpoint_decoder_alloc() has a specified type. Reviewed by: Adam Manzanares <a.manzanares@samsung.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-29cxl: Drop cxl_device_lock()Dan Williams1-78/+0
Now that all CXL subsystem locking is validated with custom lock classes, there is no need for the custom usage of the lockdep_mutex. Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165055520383.3745911.53447786039115271.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/port: Add endpoint decodersBen Widawsky1-0/+1
Recall that a CXL Port is any object that publishes a CXL HDM Decoder Capability structure. That is Host Bridge and Switches that have been enabled so far. Now, add decoder support to the 'endpoint' CXL Ports registered by the cxl_mem driver. They mostly share the same enumeration as Bridges and Switches, but witout a target list. The target of endpoint decode is device-internal DPA space, not another downstream port. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, hookup enumeration in the port driver] Link: https://lore.kernel.org/r/164386092069.765089.14895687988217608642.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/mem: Add the cxl_mem driverBen Widawsky1-0/+6
At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/port: Add switch port enumerationDan Williams1-0/+19
So far the platorm level CXL resources have been enumerated by the cxl_acpi driver, and cxl_pci has gathered all the pre-requisite information it needs to fire up a cxl_mem driver. However, the first thing the cxl_mem driver will be tasked to do is validate that all the PCIe Switches in its ancestry also have CXL capabilities and an CXL.mem link established. Provide a common mechanism for a CXL.mem endpoint driver to enumerate all the ancestor CXL ports in the topology and validate CXL.mem connectivity. Multiple endpoints may end up racing to establish a shared port in the topology. This race is resolved via taking the device-lock on a parent CXL Port before establishing a new child. The winner of the race establishes the port, the loser simply registers its interest in the port via 'struct cxl_ep' place-holder reference. At endpoint teardown the same parent port lock is taken as 'struct cxl_ep' references are deleted. Last endpoint to drop its reference unregisters the port. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164398731146.902644.1029761300481366248.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/port: Remove @host argument for dport + decoder enumerationDan Williams1-4/+4
Now that dport and decoder enumeration is centralized in the port driver, the @host argument for these helpers can be made implicit. For the root port the host is the port's uport device (ACPI0017 for cxl_acpi), and for all other descendant ports the devm context is the parent of @port. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/164375043390.484143.17617734732003230076.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/port: Add a driver for 'struct cxl_port' objectsBen Widawsky1-0/+4
The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/hdm: Add CXL standard decoder enumeration to the coreDan Williams1-6/+27
Unlike the decoder enumeration for "root decoders" described by platform firmware, standard decoders can be enumerated from the component registers space once the base address has been identified (via PCI, ACPI, or another mechanism). Add common infrastructure for HDM (Host-managed-Device-Memory) Decoder enumeration and share it between host-bridge, upstream switch port, and cxl_test defined decoders. The locking model for switch level decoders is to hold the port lock over the enumeration. This facilitates moving the dport and decoder enumeration to a 'port' driver. For now, the only enumerator of decoder resources is the cxl_acpi root driver. Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164374688404.395335.9239248252443123526.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core: Generalize dport enumeration in the coreDan Williams1-12/+4
The core houses infrastructure for decoder resources. A CXL port's dports are more closely related to decoder infrastructure than topology enumeration. Implement generic PCI based dport enumeration in the core, i.e. arrange for existing root port enumeration from cxl_acpi to share code with switch port enumeration which just amounts to a small difference in a pci_walk_bus() invocation once the appropriate 'struct pci_bus' has been retrieved. Set the convention that decoder objects are registered after all dports are enumerated. This enables userspace to know when the CXL core is finished establishing 'dportX' links underneath the 'portX' object. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164368114191.354031.5270501846455462665.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/pmem: Introduce a find_cxl_root() helperDan Williams1-0/+1
In preparation for switch port enumeration while also preserving the potential for multi-domain / multi-root CXL topologies. Introduce a 'struct device' generic mechanism for retrieving a root CXL port, if one is registered. Note that the only known multi-domain CXL configurations are running the cxl_test unit test on a system that also publishes an ACPI0017 device. With this in hand the nvdimm-bridge lookup can be with device_find_child() instead of bus_find_device() + custom mocked lookup infrastructure in cxl_test. The mechanism looks for a 2nd level port since the root level topology is platform-firmware specific and the 2nd level down follows standard PCIe topology expectations. The cxl_acpi 2nd level is associated with a PCIe Root Port. Reported-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164367562182.225521.9488555616768096049.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/port: Introduce cxl_port_to_pci_bus()Dan Williams1-0/+3
Add a helper for converting a PCI enumerated cxl_port into the pci_bus that hosts its dports. For switch ports this is trivial, but for root ports there is no generic way to go from a platform defined host bridge device, like ACPI0016 to its corresponding pci_bus. Rather than spill ACPI goop outside of the cxl_acpi driver, just arrange for it to register an xarray translation from the uport device to the corresponding pci_bus. This is in preparation for centralizing dport enumeration in the core. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/164364745633.85488.9744017377155103992.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/port: Use dedicated lock for decoder target listDan Williams1-0/+2
Lockdep reports: ====================================================== WARNING: possible circular locking dependency detected 5.16.0-rc1+ #142 Tainted: G OE ------------------------------------------------------ cxl/1220 is trying to acquire lock: ffff979b85475460 (kn->active#144){++++}-{0:0}, at: __kernfs_remove+0x1ab/0x1e0 but task is already holding lock: ffff979b87ab38e8 (&dev->lockdep_mutex#2/4){+.+.}-{3:3}, at: cxl_remove_ep+0x50c/0x5c0 [cxl_core] ...where cxl_remove_ep() is a helper that wants to delete ports while holding a lock on the host device for that port. That sets up a lockdep violation whereby target_list_show() can not rely holding the decoder's device lock while walking the target_list. Switch to a dedicated seqlock for this purpose. Reported-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164367209095.208169.1171673319121271280.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl: Prove CXL lockingDan Williams1-0/+81
When CONFIG_PROVE_LOCKING is enabled the 'struct device' definition gets an additional mutex that is not clobbered by lockdep_set_novalidate_class() like the typical device_lock(). This allows for local annotation of subsystem locks with mutex_lock_nested() per the subsystem's object/lock hierarchy. For CXL, this primarily needs the ability to lock ports by depth and child objects of ports by their parent parent-port lock. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/164365853422.99383.1052399160445197427.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core: Track port depthBen Widawsky1-0/+2
In preparation for proving CXL subsystem usage of the device_lock() order track the depth of ports with the expectation that shallower port locks can be held over deeper port locks. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164298419321.3018233.4469731547378993606.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-09cxl/core/port: Clarify decoder creationBen Widawsky1-1/+15
Add wrappers for the creation of decoder objects at the root level and switch level, and keep the core helper private to cxl/core/port.c. Root decoders are static descriptors conveyed from platform firmware (e.g. ACPI CFMWS). Switch decoders are CXL standard decoders enumerated via the HDM decoder capability structure. The base address for the HDM decoder capability structure may be conveyed either by PCIe or platform firmware (ACPI CEDT.CHBS). Additionally, the kdoc descriptions for these helpers and their dependencies is updated. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> [djbw: fixup changelog, clarify kdoc] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164366463014.111117.9714595404002687111.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>