From 781885fdf09f55ff14172749258f7380f859f2ba Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 15 Jun 2020 08:50:22 +0200 Subject: docs: sh: convert register-banks.txt to ReST - Add a SPDX header; - Adjust document title to follow ReST style; - Add blank lines to make ReST markup happy - Add it to sh/index.rst. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/adf117cf1edd7f43cb839ff2800f4315dfbcce13.1592203650.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- arch/sh/Kconfig.cpu | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/sh/Kconfig.cpu b/arch/sh/Kconfig.cpu index 97ca35f2cd37..fff419f3d757 100644 --- a/arch/sh/Kconfig.cpu +++ b/arch/sh/Kconfig.cpu @@ -85,7 +85,7 @@ config CPU_HAS_SR_RB that are lacking this bit must have another method in place for accomplishing what is taken care of by the banked registers. - See for further + See for further information on SR.RB and register banking in the kernel in general. config CPU_HAS_PTEAEX -- cgit v1.2.3 From 985098a05eee6bf5caca7e997e02a5b15242cfa0 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 23 Jun 2020 09:09:10 +0200 Subject: docs: fix references for DMA*.txt files As we moved those files to core-api, fix references to point to their newer locations. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/37b2fd159fbc7655dbf33b3eb1215396a25f6344.1592895969.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/PCI/pci.rst | 6 +++--- Documentation/block/biodoc.rst | 2 +- Documentation/bus-virt-phys-mapping.txt | 2 +- Documentation/core-api/dma-api.rst | 6 +++--- Documentation/core-api/dma-isa-lpc.rst | 2 +- Documentation/driver-api/usb/dma.rst | 6 +++--- Documentation/translations/ko_KR/memory-barriers.txt | 6 +++--- arch/ia64/hp/common/sba_iommu.c | 12 ++++++------ arch/parisc/kernel/pci-dma.c | 2 +- arch/x86/include/asm/dma-mapping.h | 4 ++-- arch/x86/kernel/amd_gart_64.c | 2 +- drivers/parisc/sba_iommu.c | 14 +++++++------- include/linux/dma-mapping.h | 2 +- include/media/videobuf-dma-sg.h | 2 +- kernel/dma/debug.c | 2 +- 15 files changed, 35 insertions(+), 35 deletions(-) (limited to 'arch') diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index 8c016d8c9862..d10d3fe604c5 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -265,7 +265,7 @@ Set the DMA mask size --------------------- .. note:: If anything below doesn't make sense, please refer to - Documentation/DMA-API.txt. This section is just a reminder that + :doc:`/core-api/dma-api`. This section is just a reminder that drivers need to indicate DMA capabilities of the device and is not an authoritative source for DMA interfaces. @@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are Setup shared control data ------------------------- Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) -memory. See Documentation/DMA-API.txt for a full description of +memory. See :doc:`/core-api/dma-api` for a full description of the DMA APIs. This section is just a reminder that it needs to be done before enabling DMA on the device. @@ -421,7 +421,7 @@ owners if there is one. Then clean up "consistent" buffers which contain the control data. -See Documentation/DMA-API.txt for details on unmapping interfaces. +See :doc:`/core-api/dma-api` for details on unmapping interfaces. Unregister from other subsystems diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst index b964796ec9c7..ba7f45d0271c 100644 --- a/Documentation/block/biodoc.rst +++ b/Documentation/block/biodoc.rst @@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address do not have a corresponding kernel virtual address space mapping) and low-memory pages. -Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion +Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion on PCI high mem DMA aspects and mapping of scatter gather lists, and support for 64 bit PCI. diff --git a/Documentation/bus-virt-phys-mapping.txt b/Documentation/bus-virt-phys-mapping.txt index 4bb07c2f3e7d..c7bc99cd2e21 100644 --- a/Documentation/bus-virt-phys-mapping.txt +++ b/Documentation/bus-virt-phys-mapping.txt @@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers The virt_to_bus() and bus_to_virt() functions have been superseded by the functionality provided by the PCI DMA interface - (see Documentation/DMA-API-HOWTO.txt). They continue + (see :doc:`/core-api/dma-api-howto`). They continue to be documented below for historical purposes, but new code must not use them. --davidm 00/12/12 diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 2d8d2fed7317..63b4a2f20867 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device :Author: James E.J. Bottomley This document describes the DMA API. For a more gentle introduction -of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. +of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. This API is split into two pieces. Part I describes the basic API. Part II describes extensions for supporting non-consistent memory @@ -471,7 +471,7 @@ without the _attrs suffixes, except that they pass an optional dma_attrs. The interpretation of DMA attributes is architecture-specific, and -each attribute should be documented in Documentation/DMA-attributes.txt. +each attribute should be documented in :doc:`/core-api/dma-attributes`. If dma_attrs are 0, the semantics of each of these functions is identical to those of the corresponding function @@ -484,7 +484,7 @@ for DMA:: #include /* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and - * documented in Documentation/DMA-attributes.txt */ + * documented in Documentation/core-api/dma-attributes.rst */ ... unsigned long attr; diff --git a/Documentation/core-api/dma-isa-lpc.rst b/Documentation/core-api/dma-isa-lpc.rst index b1ec7b16c21f..e59a3d35a93d 100644 --- a/Documentation/core-api/dma-isa-lpc.rst +++ b/Documentation/core-api/dma-isa-lpc.rst @@ -17,7 +17,7 @@ To do ISA style DMA you need to include two headers:: #include The first is the generic DMA API used to convert virtual addresses to -bus addresses (see Documentation/DMA-API.txt for details). +bus addresses (see :doc:`/core-api/dma-api` for details). The second contains the routines specific to ISA DMA transfers. Since this is not present on all platforms make sure you construct your diff --git a/Documentation/driver-api/usb/dma.rst b/Documentation/driver-api/usb/dma.rst index 59d5aee89e37..2b3dbd3265b4 100644 --- a/Documentation/driver-api/usb/dma.rst +++ b/Documentation/driver-api/usb/dma.rst @@ -10,7 +10,7 @@ API overview The big picture is that USB drivers can continue to ignore most DMA issues, though they still must provide DMA-ready buffers (see -``Documentation/DMA-API-HOWTO.txt``). That's how they've worked through +:doc:`/core-api/dma-api-howto`). That's how they've worked through the 2.4 (and earlier) kernels, or they can now be DMA-aware. DMA-aware usb drivers: @@ -60,7 +60,7 @@ and effects like cache-trashing can impose subtle penalties. force a consistent memory access ordering by using memory barriers. It's not using a streaming DMA mapping, so it's good for small transfers on systems where the I/O would otherwise thrash an IOMMU mapping. (See - ``Documentation/DMA-API-HOWTO.txt`` for definitions of "coherent" and + :doc:`/core-api/dma-api-howto` for definitions of "coherent" and "streaming" DMA mappings.) Asking for 1/Nth of a page (as well as asking for N pages) is reasonably @@ -91,7 +91,7 @@ Working with existing buffers Existing buffers aren't usable for DMA without first being mapped into the DMA address space of the device. However, most buffers passed to your driver can safely be used with such DMA mapping. (See the first section -of Documentation/DMA-API-HOWTO.txt, titled "What memory is DMA-able?") +of :doc:`/core-api/dma-api-howto`, titled "What memory is DMA-able?") - When you're using scatterlists, you can map everything at once. On some systems, this kicks in an IOMMU and turns the scatterlists into single diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index 34d041d68f78..604cee350e53 100644 --- a/Documentation/translations/ko_KR/memory-barriers.txt +++ b/Documentation/translations/ko_KR/memory-barriers.txt @@ -570,8 +570,8 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE [*] 버스 마스터링 DMA 와 일관성에 대해서는 다음을 참고하시기 바랍니다: Documentation/driver-api/pci/pci.rst - Documentation/DMA-API-HOWTO.txt - Documentation/DMA-API.txt + Documentation/core-api/dma-api-howto.rst + Documentation/core-api/dma-api.rst 데이터 의존성 배리어 (역사적) @@ -1907,7 +1907,7 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효 writel_relaxed() 와 같은 완화된 I/O 접근자들에 대한 자세한 내용을 위해서는 "커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을 - 위해선 Documentation/DMA-API.txt 문서를 참고하세요. + 위해선 Documentation/core-api/dma-api.rst 문서를 참고하세요. ========================= diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index a806227c1fad..656a4888c300 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -907,7 +907,7 @@ sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt) * @dir: dma direction * @attrs: optional dma attributes * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static dma_addr_t sba_map_page(struct device *dev, struct page *page, unsigned long poff, size_t size, @@ -1028,7 +1028,7 @@ sba_mark_clean(struct ioc *ioc, dma_addr_t iova, size_t size) * @dir: R/W or both. * @attrs: optional dma attributes * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -1105,7 +1105,7 @@ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, * @size: number of bytes mapped in driver buffer. * @dma_handle: IOVA of new buffer. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void * sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, @@ -1162,7 +1162,7 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, * @vaddr: virtual address IOVA of "consistent" buffer. * @dma_handler: IO virtual address of "consistent" buffer. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs) @@ -1425,7 +1425,7 @@ static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, * @dir: R/W or both. * @attrs: optional dma attributes * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction dir, @@ -1524,7 +1524,7 @@ static int sba_map_sg_attrs(struct device *dev, struct scatterlist *sglist, * @dir: R/W or both. * @attrs: optional dma attributes * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_unmap_sg_attrs(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction dir, diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 70cd24bdcfec..4f1596bb1936 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -3,7 +3,7 @@ ** PARISC 1.1 Dynamic DMA mapping support. ** This implementation is for PA-RISC platforms that do not support ** I/O TLBs (aka DMA address translation hardware). -** See Documentation/DMA-API-HOWTO.txt for interface definitions. +** See Documentation/core-api/dma-api-howto.rst for interface definitions. ** ** (c) Copyright 1999,2000 Hewlett-Packard Company ** (c) Copyright 2000 Grant Grundler diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h index 6b15a24930e0..fed67eafcacc 100644 --- a/arch/x86/include/asm/dma-mapping.h +++ b/arch/x86/include/asm/dma-mapping.h @@ -3,8 +3,8 @@ #define _ASM_X86_DMA_MAPPING_H /* - * IOMMU interface. See Documentation/DMA-API-HOWTO.txt and - * Documentation/DMA-API.txt for documentation. + * IOMMU interface. See Documentation/core-api/dma-api-howto.rst and + * Documentation/core-api/dma-api.rst for documentation. */ #include diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index 17cb5b933dcf..e89031e9c847 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -6,7 +6,7 @@ * This allows to use PCI devices that only support 32bit addresses on systems * with more than 4GB. * - * See Documentation/DMA-API-HOWTO.txt for the interface specification. + * See Documentation/core-api/dma-api-howto.rst for the interface specification. * * Copyright 2002 Andi Kleen, SuSE Labs. */ diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c index 7e112829d250..5368452eb5a6 100644 --- a/drivers/parisc/sba_iommu.c +++ b/drivers/parisc/sba_iommu.c @@ -666,7 +666,7 @@ sba_mark_invalid(struct ioc *ioc, dma_addr_t iova, size_t byte_cnt) * @dev: instance of PCI owned by the driver that's asking * @mask: number of address bits this PCI device can handle * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static int sba_dma_supported( struct device *dev, u64 mask) { @@ -698,7 +698,7 @@ static int sba_dma_supported( struct device *dev, u64 mask) * @size: number of bytes to map in driver buffer. * @direction: R/W or both. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static dma_addr_t sba_map_single(struct device *dev, void *addr, size_t size, @@ -788,7 +788,7 @@ sba_map_page(struct device *dev, struct page *page, unsigned long offset, * @size: number of bytes mapped in driver buffer. * @direction: R/W or both. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, @@ -867,7 +867,7 @@ sba_unmap_page(struct device *dev, dma_addr_t iova, size_t size, * @size: number of bytes mapped in driver buffer. * @dma_handle: IOVA of new buffer. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void *sba_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) @@ -898,7 +898,7 @@ static void *sba_alloc(struct device *hwdev, size_t size, dma_addr_t *dma_handle * @vaddr: virtual address IOVA of "consistent" buffer. * @dma_handler: IO virtual address of "consistent" buffer. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_free(struct device *hwdev, size_t size, void *vaddr, @@ -933,7 +933,7 @@ int dump_run_sg = 0; * @nents: number of entries in list * @direction: R/W or both. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static int sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents, @@ -1017,7 +1017,7 @@ sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents, * @nents: number of entries in list * @direction: R/W or both. * - * See Documentation/DMA-API-HOWTO.txt + * See Documentation/core-api/dma-api-howto.rst */ static void sba_unmap_sg(struct device *dev, struct scatterlist *sglist, int nents, diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 78f677cf45ab..ef2b153ddbd9 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -14,7 +14,7 @@ /** * List of possible attributes associated with a DMA mapping. The semantics - * of each attribute should be defined in Documentation/DMA-attributes.txt. + * of each attribute should be defined in Documentation/core-api/dma-attributes.rst. */ /* diff --git a/include/media/videobuf-dma-sg.h b/include/media/videobuf-dma-sg.h index b89d5e31f172..34450f7ad510 100644 --- a/include/media/videobuf-dma-sg.h +++ b/include/media/videobuf-dma-sg.h @@ -31,7 +31,7 @@ * does memory allocation too using vmalloc_32(). * * videobuf_dma_*() - * see Documentation/DMA-API-HOWTO.txt, these functions to + * see Documentation/core-api/dma-api-howto.rst, these functions to * basically the same. The map function does also build a * scatterlist for the buffer (and unmap frees it ...) * diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 36c962a86bf2..f97f088ace7e 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -1071,7 +1071,7 @@ static void check_unmap(struct dma_debug_entry *ref) /* * Drivers should use dma_mapping_error() to check the returned * addresses of dma_map_single() and dma_map_page(). - * If not, print this warning message. See Documentation/DMA-API.txt. + * If not, print this warning message. See Documentation/core-api/dma-api.rst. */ if (entry->map_err_type == MAP_ERR_NOT_CHECKED) { err_printk(ref->dev, entry, -- cgit v1.2.3 From c9b54d6f362c0846a11fedea20ec8b8da9b4c93d Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 23 Jun 2020 15:31:38 +0200 Subject: docs: move other kAPI documents to core-api There are a number of random documents that seem to be describing some aspects of the core-api. Move them to such directory, adding them at the core-api/index.rst file. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/86d979ed183adb76af93a92f20189bccf97f0055.1592918949.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/bus-virt-phys-mapping.txt | 220 ------------- Documentation/core-api/bus-virt-phys-mapping.rst | 220 +++++++++++++ Documentation/core-api/index.rst | 3 + Documentation/core-api/this_cpu_ops.rst | 339 +++++++++++++++++++++ Documentation/core-api/unaligned-memory-access.rst | 265 ++++++++++++++++ Documentation/process/unaligned-memory-access.rst | 265 ---------------- Documentation/this_cpu_ops.txt | 339 --------------------- arch/Kconfig | 2 +- 8 files changed, 828 insertions(+), 825 deletions(-) delete mode 100644 Documentation/bus-virt-phys-mapping.txt create mode 100644 Documentation/core-api/bus-virt-phys-mapping.rst create mode 100644 Documentation/core-api/this_cpu_ops.rst create mode 100644 Documentation/core-api/unaligned-memory-access.rst delete mode 100644 Documentation/process/unaligned-memory-access.rst delete mode 100644 Documentation/this_cpu_ops.txt (limited to 'arch') diff --git a/Documentation/bus-virt-phys-mapping.txt b/Documentation/bus-virt-phys-mapping.txt deleted file mode 100644 index c7bc99cd2e21..000000000000 --- a/Documentation/bus-virt-phys-mapping.txt +++ /dev/null @@ -1,220 +0,0 @@ -========================================================== -How to access I/O mapped memory from within device drivers -========================================================== - -:Author: Linus - -.. warning:: - - The virt_to_bus() and bus_to_virt() functions have been - superseded by the functionality provided by the PCI DMA interface - (see :doc:`/core-api/dma-api-howto`). They continue - to be documented below for historical purposes, but new code - must not use them. --davidm 00/12/12 - -:: - - [ This is a mail message in response to a query on IO mapping, thus the - strange format for a "document" ] - -The AHA-1542 is a bus-master device, and your patch makes the driver give the -controller the physical address of the buffers, which is correct on x86 -(because all bus master devices see the physical memory mappings directly). - -However, on many setups, there are actually **three** different ways of looking -at memory addresses, and in this case we actually want the third, the -so-called "bus address". - -Essentially, the three ways of addressing memory are (this is "real memory", -that is, normal RAM--see later about other details): - - - CPU untranslated. This is the "physical" address. Physical address - 0 is what the CPU sees when it drives zeroes on the memory bus. - - - CPU translated address. This is the "virtual" address, and is - completely internal to the CPU itself with the CPU doing the appropriate - translations into "CPU untranslated". - - - bus address. This is the address of memory as seen by OTHER devices, - not the CPU. Now, in theory there could be many different bus - addresses, with each device seeing memory in some device-specific way, but - happily most hardware designers aren't actually actively trying to make - things any more complex than necessary, so you can assume that all - external hardware sees the memory the same way. - -Now, on normal PCs the bus address is exactly the same as the physical -address, and things are very simple indeed. However, they are that simple -because the memory and the devices share the same address space, and that is -not generally necessarily true on other PCI/ISA setups. - -Now, just as an example, on the PReP (PowerPC Reference Platform), the -CPU sees a memory map something like this (this is from memory):: - - 0-2 GB "real memory" - 2 GB-3 GB "system IO" (inb/out and similar accesses on x86) - 3 GB-4 GB "IO memory" (shared memory over the IO bus) - -Now, that looks simple enough. However, when you look at the same thing from -the viewpoint of the devices, you have the reverse, and the physical memory -address 0 actually shows up as address 2 GB for any IO master. - -So when the CPU wants any bus master to write to physical memory 0, it -has to give the master address 0x80000000 as the memory address. - -So, for example, depending on how the kernel is actually mapped on the -PPC, you can end up with a setup like this:: - - physical address: 0 - virtual address: 0xC0000000 - bus address: 0x80000000 - -where all the addresses actually point to the same thing. It's just seen -through different translations.. - -Similarly, on the Alpha, the normal translation is:: - - physical address: 0 - virtual address: 0xfffffc0000000000 - bus address: 0x40000000 - -(but there are also Alphas where the physical address and the bus address -are the same). - -Anyway, the way to look up all these translations, you do:: - - #include - - phys_addr = virt_to_phys(virt_addr); - virt_addr = phys_to_virt(phys_addr); - bus_addr = virt_to_bus(virt_addr); - virt_addr = bus_to_virt(bus_addr); - -Now, when do you need these? - -You want the **virtual** address when you are actually going to access that -pointer from the kernel. So you can have something like this:: - - /* - * this is the hardware "mailbox" we use to communicate with - * the controller. The controller sees this directly. - */ - struct mailbox { - __u32 status; - __u32 bufstart; - __u32 buflen; - .. - } mbox; - - unsigned char * retbuffer; - - /* get the address from the controller */ - retbuffer = bus_to_virt(mbox.bufstart); - switch (retbuffer[0]) { - case STATUS_OK: - ... - -on the other hand, you want the bus address when you have a buffer that -you want to give to the controller:: - - /* ask the controller to read the sense status into "sense_buffer" */ - mbox.bufstart = virt_to_bus(&sense_buffer); - mbox.buflen = sizeof(sense_buffer); - mbox.status = 0; - notify_controller(&mbox); - -And you generally **never** want to use the physical address, because you can't -use that from the CPU (the CPU only uses translated virtual addresses), and -you can't use it from the bus master. - -So why do we care about the physical address at all? We do need the physical -address in some cases, it's just not very often in normal code. The physical -address is needed if you use memory mappings, for example, because the -"remap_pfn_range()" mm function wants the physical address of the memory to -be remapped as measured in units of pages, a.k.a. the pfn (the memory -management layer doesn't know about devices outside the CPU, so it -shouldn't need to know about "bus addresses" etc). - -.. note:: - - The above is only one part of the whole equation. The above - only talks about "real memory", that is, CPU memory (RAM). - -There is a completely different type of memory too, and that's the "shared -memory" on the PCI or ISA bus. That's generally not RAM (although in the case -of a video graphics card it can be normal DRAM that is just used for a frame -buffer), but can be things like a packet buffer in a network card etc. - -This memory is called "PCI memory" or "shared memory" or "IO memory" or -whatever, and there is only one way to access it: the readb/writeb and -related functions. You should never take the address of such memory, because -there is really nothing you can do with such an address: it's not -conceptually in the same memory space as "real memory" at all, so you cannot -just dereference a pointer. (Sadly, on x86 it **is** in the same memory space, -so on x86 it actually works to just deference a pointer, but it's not -portable). - -For such memory, you can do things like: - - - reading:: - - /* - * read first 32 bits from ISA memory at 0xC0000, aka - * C000:0000 in DOS terms - */ - unsigned int signature = isa_readl(0xC0000); - - - remapping and writing:: - - /* - * remap framebuffer PCI memory area at 0xFC000000, - * size 1MB, so that we can access it: We can directly - * access only the 640k-1MB area, so anything else - * has to be remapped. - */ - void __iomem *baseptr = ioremap(0xFC000000, 1024*1024); - - /* write a 'A' to the offset 10 of the area */ - writeb('A',baseptr+10); - - /* unmap when we unload the driver */ - iounmap(baseptr); - - - copying and clearing:: - - /* get the 6-byte Ethernet address at ISA address E000:0040 */ - memcpy_fromio(kernel_buffer, 0xE0040, 6); - /* write a packet to the driver */ - memcpy_toio(0xE1000, skb->data, skb->len); - /* clear the frame buffer */ - memset_io(0xA0000, 0, 0x10000); - -OK, that just about covers the basics of accessing IO portably. Questions? -Comments? You may think that all the above is overly complex, but one day you -might find yourself with a 500 MHz Alpha in front of you, and then you'll be -happy that your driver works ;) - -Note that kernel versions 2.0.x (and earlier) mistakenly called the -ioremap() function "vremap()". ioremap() is the proper name, but I -didn't think straight when I wrote it originally. People who have to -support both can do something like:: - - /* support old naming silliness */ - #if LINUX_VERSION_CODE < 0x020100 - #define ioremap vremap - #define iounmap vfree - #endif - -at the top of their source files, and then they can use the right names -even on 2.0.x systems. - -And the above sounds worse than it really is. Most real drivers really -don't do all that complex things (or rather: the complexity is not so -much in the actual IO accesses as in error handling and timeouts etc). -It's generally not hard to fix drivers, and in many cases the code -actually looks better afterwards:: - - unsigned long signature = *(unsigned int *) 0xC0000; - vs - unsigned long signature = readl(0xC0000); - -I think the second version actually is more readable, no? diff --git a/Documentation/core-api/bus-virt-phys-mapping.rst b/Documentation/core-api/bus-virt-phys-mapping.rst new file mode 100644 index 000000000000..c7bc99cd2e21 --- /dev/null +++ b/Documentation/core-api/bus-virt-phys-mapping.rst @@ -0,0 +1,220 @@ +========================================================== +How to access I/O mapped memory from within device drivers +========================================================== + +:Author: Linus + +.. warning:: + + The virt_to_bus() and bus_to_virt() functions have been + superseded by the functionality provided by the PCI DMA interface + (see :doc:`/core-api/dma-api-howto`). They continue + to be documented below for historical purposes, but new code + must not use them. --davidm 00/12/12 + +:: + + [ This is a mail message in response to a query on IO mapping, thus the + strange format for a "document" ] + +The AHA-1542 is a bus-master device, and your patch makes the driver give the +controller the physical address of the buffers, which is correct on x86 +(because all bus master devices see the physical memory mappings directly). + +However, on many setups, there are actually **three** different ways of looking +at memory addresses, and in this case we actually want the third, the +so-called "bus address". + +Essentially, the three ways of addressing memory are (this is "real memory", +that is, normal RAM--see later about other details): + + - CPU untranslated. This is the "physical" address. Physical address + 0 is what the CPU sees when it drives zeroes on the memory bus. + + - CPU translated address. This is the "virtual" address, and is + completely internal to the CPU itself with the CPU doing the appropriate + translations into "CPU untranslated". + + - bus address. This is the address of memory as seen by OTHER devices, + not the CPU. Now, in theory there could be many different bus + addresses, with each device seeing memory in some device-specific way, but + happily most hardware designers aren't actually actively trying to make + things any more complex than necessary, so you can assume that all + external hardware sees the memory the same way. + +Now, on normal PCs the bus address is exactly the same as the physical +address, and things are very simple indeed. However, they are that simple +because the memory and the devices share the same address space, and that is +not generally necessarily true on other PCI/ISA setups. + +Now, just as an example, on the PReP (PowerPC Reference Platform), the +CPU sees a memory map something like this (this is from memory):: + + 0-2 GB "real memory" + 2 GB-3 GB "system IO" (inb/out and similar accesses on x86) + 3 GB-4 GB "IO memory" (shared memory over the IO bus) + +Now, that looks simple enough. However, when you look at the same thing from +the viewpoint of the devices, you have the reverse, and the physical memory +address 0 actually shows up as address 2 GB for any IO master. + +So when the CPU wants any bus master to write to physical memory 0, it +has to give the master address 0x80000000 as the memory address. + +So, for example, depending on how the kernel is actually mapped on the +PPC, you can end up with a setup like this:: + + physical address: 0 + virtual address: 0xC0000000 + bus address: 0x80000000 + +where all the addresses actually point to the same thing. It's just seen +through different translations.. + +Similarly, on the Alpha, the normal translation is:: + + physical address: 0 + virtual address: 0xfffffc0000000000 + bus address: 0x40000000 + +(but there are also Alphas where the physical address and the bus address +are the same). + +Anyway, the way to look up all these translations, you do:: + + #include + + phys_addr = virt_to_phys(virt_addr); + virt_addr = phys_to_virt(phys_addr); + bus_addr = virt_to_bus(virt_addr); + virt_addr = bus_to_virt(bus_addr); + +Now, when do you need these? + +You want the **virtual** address when you are actually going to access that +pointer from the kernel. So you can have something like this:: + + /* + * this is the hardware "mailbox" we use to communicate with + * the controller. The controller sees this directly. + */ + struct mailbox { + __u32 status; + __u32 bufstart; + __u32 buflen; + .. + } mbox; + + unsigned char * retbuffer; + + /* get the address from the controller */ + retbuffer = bus_to_virt(mbox.bufstart); + switch (retbuffer[0]) { + case STATUS_OK: + ... + +on the other hand, you want the bus address when you have a buffer that +you want to give to the controller:: + + /* ask the controller to read the sense status into "sense_buffer" */ + mbox.bufstart = virt_to_bus(&sense_buffer); + mbox.buflen = sizeof(sense_buffer); + mbox.status = 0; + notify_controller(&mbox); + +And you generally **never** want to use the physical address, because you can't +use that from the CPU (the CPU only uses translated virtual addresses), and +you can't use it from the bus master. + +So why do we care about the physical address at all? We do need the physical +address in some cases, it's just not very often in normal code. The physical +address is needed if you use memory mappings, for example, because the +"remap_pfn_range()" mm function wants the physical address of the memory to +be remapped as measured in units of pages, a.k.a. the pfn (the memory +management layer doesn't know about devices outside the CPU, so it +shouldn't need to know about "bus addresses" etc). + +.. note:: + + The above is only one part of the whole equation. The above + only talks about "real memory", that is, CPU memory (RAM). + +There is a completely different type of memory too, and that's the "shared +memory" on the PCI or ISA bus. That's generally not RAM (although in the case +of a video graphics card it can be normal DRAM that is just used for a frame +buffer), but can be things like a packet buffer in a network card etc. + +This memory is called "PCI memory" or "shared memory" or "IO memory" or +whatever, and there is only one way to access it: the readb/writeb and +related functions. You should never take the address of such memory, because +there is really nothing you can do with such an address: it's not +conceptually in the same memory space as "real memory" at all, so you cannot +just dereference a pointer. (Sadly, on x86 it **is** in the same memory space, +so on x86 it actually works to just deference a pointer, but it's not +portable). + +For such memory, you can do things like: + + - reading:: + + /* + * read first 32 bits from ISA memory at 0xC0000, aka + * C000:0000 in DOS terms + */ + unsigned int signature = isa_readl(0xC0000); + + - remapping and writing:: + + /* + * remap framebuffer PCI memory area at 0xFC000000, + * size 1MB, so that we can access it: We can directly + * access only the 640k-1MB area, so anything else + * has to be remapped. + */ + void __iomem *baseptr = ioremap(0xFC000000, 1024*1024); + + /* write a 'A' to the offset 10 of the area */ + writeb('A',baseptr+10); + + /* unmap when we unload the driver */ + iounmap(baseptr); + + - copying and clearing:: + + /* get the 6-byte Ethernet address at ISA address E000:0040 */ + memcpy_fromio(kernel_buffer, 0xE0040, 6); + /* write a packet to the driver */ + memcpy_toio(0xE1000, skb->data, skb->len); + /* clear the frame buffer */ + memset_io(0xA0000, 0, 0x10000); + +OK, that just about covers the basics of accessing IO portably. Questions? +Comments? You may think that all the above is overly complex, but one day you +might find yourself with a 500 MHz Alpha in front of you, and then you'll be +happy that your driver works ;) + +Note that kernel versions 2.0.x (and earlier) mistakenly called the +ioremap() function "vremap()". ioremap() is the proper name, but I +didn't think straight when I wrote it originally. People who have to +support both can do something like:: + + /* support old naming silliness */ + #if LINUX_VERSION_CODE < 0x020100 + #define ioremap vremap + #define iounmap vfree + #endif + +at the top of their source files, and then they can use the right names +even on 2.0.x systems. + +And the above sounds worse than it really is. Most real drivers really +don't do all that complex things (or rather: the complexity is not so +much in the actual IO accesses as in error handling and timeouts etc). +It's generally not hard to fix drivers, and in many cases the code +actually looks better afterwards:: + + unsigned long signature = *(unsigned int *) 0xC0000; + vs + unsigned long signature = readl(0xC0000); + +I think the second version actually is more readable, no? diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 15ab86112627..69171b1799f2 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -39,6 +39,8 @@ Library functionality that is used throughout the kernel. rbtree generic-radix-tree packing + bus-virt-phys-mapping + this_cpu_ops timekeeping errseq @@ -82,6 +84,7 @@ more memory-management documentation in :doc:`/vm/index`. :maxdepth: 1 memory-allocation + unaligned-memory-access dma-api dma-api-howto dma-attributes diff --git a/Documentation/core-api/this_cpu_ops.rst b/Documentation/core-api/this_cpu_ops.rst new file mode 100644 index 000000000000..5cb8b883ae83 --- /dev/null +++ b/Documentation/core-api/this_cpu_ops.rst @@ -0,0 +1,339 @@ +=================== +this_cpu operations +=================== + +:Author: Christoph Lameter, August 4th, 2014 +:Author: Pranith Kumar, Aug 2nd, 2014 + +this_cpu operations are a way of optimizing access to per cpu +variables associated with the *currently* executing processor. This is +done through the use of segment registers (or a dedicated register where +the cpu permanently stored the beginning of the per cpu area for a +specific processor). + +this_cpu operations add a per cpu variable offset to the processor +specific per cpu base and encode that operation in the instruction +operating on the per cpu variable. + +This means that there are no atomicity issues between the calculation of +the offset and the operation on the data. Therefore it is not +necessary to disable preemption or interrupts to ensure that the +processor is not changed between the calculation of the address and +the operation on the data. + +Read-modify-write operations are of particular interest. Frequently +processors have special lower latency instructions that can operate +without the typical synchronization overhead, but still provide some +sort of relaxed atomicity guarantees. The x86, for example, can execute +RMW (Read Modify Write) instructions like inc/dec/cmpxchg without the +lock prefix and the associated latency penalty. + +Access to the variable without the lock prefix is not synchronized but +synchronization is not necessary since we are dealing with per cpu +data specific to the currently executing processor. Only the current +processor should be accessing that variable and therefore there are no +concurrency issues with other processors in the system. + +Please note that accesses by remote processors to a per cpu area are +exceptional situations and may impact performance and/or correctness +(remote write operations) of local RMW operations via this_cpu_*. + +The main use of the this_cpu operations has been to optimize counter +operations. + +The following this_cpu() operations with implied preemption protection +are defined. These operations can be used without worrying about +preemption and interrupts:: + + this_cpu_read(pcp) + this_cpu_write(pcp, val) + this_cpu_add(pcp, val) + this_cpu_and(pcp, val) + this_cpu_or(pcp, val) + this_cpu_add_return(pcp, val) + this_cpu_xchg(pcp, nval) + this_cpu_cmpxchg(pcp, oval, nval) + this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) + this_cpu_sub(pcp, val) + this_cpu_inc(pcp) + this_cpu_dec(pcp) + this_cpu_sub_return(pcp, val) + this_cpu_inc_return(pcp) + this_cpu_dec_return(pcp) + + +Inner working of this_cpu operations +------------------------------------ + +On x86 the fs: or the gs: segment registers contain the base of the +per cpu area. It is then possible to simply use the segment override +to relocate a per cpu relative address to the proper per cpu area for +the processor. So the relocation to the per cpu base is encoded in the +instruction via a segment register prefix. + +For example:: + + DEFINE_PER_CPU(int, x); + int z; + + z = this_cpu_read(x); + +results in a single instruction:: + + mov ax, gs:[x] + +instead of a sequence of calculation of the address and then a fetch +from that address which occurs with the per cpu operations. Before +this_cpu_ops such sequence also required preempt disable/enable to +prevent the kernel from moving the thread to a different processor +while the calculation is performed. + +Consider the following this_cpu operation:: + + this_cpu_inc(x) + +The above results in the following single instruction (no lock prefix!):: + + inc gs:[x] + +instead of the following operations required if there is no segment +register:: + + int *y; + int cpu; + + cpu = get_cpu(); + y = per_cpu_ptr(&x, cpu); + (*y)++; + put_cpu(); + +Note that these operations can only be used on per cpu data that is +reserved for a specific processor. Without disabling preemption in the +surrounding code this_cpu_inc() will only guarantee that one of the +per cpu counters is correctly incremented. However, there is no +guarantee that the OS will not move the process directly before or +after the this_cpu instruction is executed. In general this means that +the value of the individual counters for each processor are +meaningless. The sum of all the per cpu counters is the only value +that is of interest. + +Per cpu variables are used for performance reasons. Bouncing cache +lines can be avoided if multiple processors concurrently go through +the same code paths. Since each processor has its own per cpu +variables no concurrent cache line updates take place. The price that +has to be paid for this optimization is the need to add up the per cpu +counters when the value of a counter is needed. + + +Special operations +------------------ + +:: + + y = this_cpu_ptr(&x) + +Takes the offset of a per cpu variable (&x !) and returns the address +of the per cpu variable that belongs to the currently executing +processor. this_cpu_ptr avoids multiple steps that the common +get_cpu/put_cpu sequence requires. No processor number is +available. Instead, the offset of the local per cpu area is simply +added to the per cpu offset. + +Note that this operation is usually used in a code segment when +preemption has been disabled. The pointer is then used to +access local per cpu data in a critical section. When preemption +is re-enabled this pointer is usually no longer useful since it may +no longer point to per cpu data of the current processor. + + +Per cpu variables and offsets +----------------------------- + +Per cpu variables have *offsets* to the beginning of the per cpu +area. They do not have addresses although they look like that in the +code. Offsets cannot be directly dereferenced. The offset must be +added to a base pointer of a per cpu area of a processor in order to +form a valid address. + +Therefore the use of x or &x outside of the context of per cpu +operations is invalid and will generally be treated like a NULL +pointer dereference. + +:: + + DEFINE_PER_CPU(int, x); + +In the context of per cpu operations the above implies that x is a per +cpu variable. Most this_cpu operations take a cpu variable. + +:: + + int __percpu *p = &x; + +&x and hence p is the *offset* of a per cpu variable. this_cpu_ptr() +takes the offset of a per cpu variable which makes this look a bit +strange. + + +Operations on a field of a per cpu structure +-------------------------------------------- + +Let's say we have a percpu structure:: + + struct s { + int n,m; + }; + + DEFINE_PER_CPU(struct s, p); + + +Operations on these fields are straightforward:: + + this_cpu_inc(p.m) + + z = this_cpu_cmpxchg(p.m, 0, 1); + + +If we have an offset to struct s:: + + struct s __percpu *ps = &p; + + this_cpu_dec(ps->m); + + z = this_cpu_inc_return(ps->n); + + +The calculation of the pointer may require the use of this_cpu_ptr() +if we do not make use of this_cpu ops later to manipulate fields:: + + struct s *pp; + + pp = this_cpu_ptr(&p); + + pp->m--; + + z = pp->n++; + + +Variants of this_cpu ops +------------------------ + +this_cpu ops are interrupt safe. Some architectures do not support +these per cpu local operations. In that case the operation must be +replaced by code that disables interrupts, then does the operations +that are guaranteed to be atomic and then re-enable interrupts. Doing +so is expensive. If there are other reasons why the scheduler cannot +change the processor we are executing on then there is no reason to +disable interrupts. For that purpose the following __this_cpu operations +are provided. + +These operations have no guarantee against concurrent interrupts or +preemption. If a per cpu variable is not used in an interrupt context +and the scheduler cannot preempt, then they are safe. If any interrupts +still occur while an operation is in progress and if the interrupt too +modifies the variable, then RMW actions can not be guaranteed to be +safe:: + + __this_cpu_read(pcp) + __this_cpu_write(pcp, val) + __this_cpu_add(pcp, val) + __this_cpu_and(pcp, val) + __this_cpu_or(pcp, val) + __this_cpu_add_return(pcp, val) + __this_cpu_xchg(pcp, nval) + __this_cpu_cmpxchg(pcp, oval, nval) + __this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) + __this_cpu_sub(pcp, val) + __this_cpu_inc(pcp) + __this_cpu_dec(pcp) + __this_cpu_sub_return(pcp, val) + __this_cpu_inc_return(pcp) + __this_cpu_dec_return(pcp) + + +Will increment x and will not fall-back to code that disables +interrupts on platforms that cannot accomplish atomicity through +address relocation and a Read-Modify-Write operation in the same +instruction. + + +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) +-------------------------------------------- + +The first operation takes the offset and forms an address and then +adds the offset of the n field. This may result in two add +instructions emitted by the compiler. + +The second one first adds the two offsets and then does the +relocation. IMHO the second form looks cleaner and has an easier time +with (). The second form also is consistent with the way +this_cpu_read() and friends are used. + + +Remote access to per cpu data +------------------------------ + +Per cpu data structures are designed to be used by one cpu exclusively. +If you use the variables as intended, this_cpu_ops() are guaranteed to +be "atomic" as no other CPU has access to these data structures. + +There are special cases where you might need to access per cpu data +structures remotely. It is usually safe to do a remote read access +and that is frequently done to summarize counters. Remote write access +something which could be problematic because this_cpu ops do not +have lock semantics. A remote write may interfere with a this_cpu +RMW operation. + +Remote write accesses to percpu data structures are highly discouraged +unless absolutely necessary. Please consider using an IPI to wake up +the remote CPU and perform the update to its per cpu area. + +To access per-cpu data structure remotely, typically the per_cpu_ptr() +function is used:: + + + DEFINE_PER_CPU(struct data, datap); + + struct data *p = per_cpu_ptr(&datap, cpu); + +This makes it explicit that we are getting ready to access a percpu +area remotely. + +You can also do the following to convert the datap offset to an address:: + + struct data *p = this_cpu_ptr(&datap); + +but, passing of pointers calculated via this_cpu_ptr to other cpus is +unusual and should be avoided. + +Remote access are typically only for reading the status of another cpus +per cpu data. Write accesses can cause unique problems due to the +relaxed synchronization requirements for this_cpu operations. + +One example that illustrates some concerns with write operations is +the following scenario that occurs because two per cpu variables +share a cache-line but the relaxed synchronization is applied to +only one process updating the cache-line. + +Consider the following example:: + + + struct test { + atomic_t a; + int b; + }; + + DEFINE_PER_CPU(struct test, onecacheline); + +There is some concern about what would happen if the field 'a' is updated +remotely from one processor and the local processor would use this_cpu ops +to update field b. Care should be taken that such simultaneous accesses to +data within the same cache line are avoided. Also costly synchronization +may be necessary. IPIs are generally recommended in such scenarios instead +of a remote write to the per cpu area of another processor. + +Even in cases where the remote writes are rare, please bear in +mind that a remote write will evict the cache line from the processor +that most likely will access it. If the processor wakes up and finds a +missing local cache line of a per cpu area, its performance and hence +the wake up times will be affected. diff --git a/Documentation/core-api/unaligned-memory-access.rst b/Documentation/core-api/unaligned-memory-access.rst new file mode 100644 index 000000000000..1ee82419d8aa --- /dev/null +++ b/Documentation/core-api/unaligned-memory-access.rst @@ -0,0 +1,265 @@ +========================= +Unaligned Memory Accesses +========================= + +:Author: Daniel Drake , +:Author: Johannes Berg + +:With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, + Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, + Vadim Lobanov + + +Linux runs on a wide variety of architectures which have varying behaviour +when it comes to memory access. This document presents some details about +unaligned accesses, why you need to write code that doesn't cause them, +and how to write such code! + + +The definition of an unaligned access +===================================== + +Unaligned memory accesses occur when you try to read N bytes of data starting +from an address that is not evenly divisible by N (i.e. addr % N != 0). +For example, reading 4 bytes of data from address 0x10004 is fine, but +reading 4 bytes of data from address 0x10005 would be an unaligned memory +access. + +The above may seem a little vague, as memory access can happen in different +ways. The context here is at the machine code level: certain instructions read +or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 +assembly). As will become clear, it is relatively easy to spot C statements +which will compile to multiple-byte memory access instructions, namely when +dealing with types such as u16, u32 and u64. + + +Natural alignment +================= + +The rule mentioned above forms what we refer to as natural alignment: +When accessing N bytes of memory, the base memory address must be evenly +divisible by N, i.e. addr % N == 0. + +When writing code, assume the target architecture has natural alignment +requirements. + +In reality, only a few architectures require natural alignment on all sizes +of memory access. However, we must consider ALL supported architectures; +writing code that satisfies natural alignment requirements is the easiest way +to achieve full portability. + + +Why unaligned access is bad +=========================== + +The effects of performing an unaligned memory access vary from architecture +to architecture. It would be easy to write a whole document on the differences +here; a summary of the common scenarios is presented below: + + - Some architectures are able to perform unaligned memory accesses + transparently, but there is usually a significant performance cost. + - Some architectures raise processor exceptions when unaligned accesses + happen. The exception handler is able to correct the unaligned access, + at significant cost to performance. + - Some architectures raise processor exceptions when unaligned accesses + happen, but the exceptions do not contain enough information for the + unaligned access to be corrected. + - Some architectures are not capable of unaligned memory access, but will + silently perform a different memory access to the one that was requested, + resulting in a subtle code bug that is hard to detect! + +It should be obvious from the above that if your code causes unaligned +memory accesses to happen, your code will not work correctly on certain +platforms and will cause performance problems on others. + + +Code that does not cause unaligned access +========================================= + +At first, the concepts above may seem a little hard to relate to actual +coding practice. After all, you don't have a great deal of control over +memory addresses of certain variables, etc. + +Fortunately things are not too complex, as in most cases, the compiler +ensures that things will work for you. For example, take the following +structure:: + + struct foo { + u16 field1; + u32 field2; + u8 field3; + }; + +Let us assume that an instance of the above structure resides in memory +starting at address 0x10000. With a basic level of understanding, it would +not be unreasonable to expect that accessing field2 would cause an unaligned +access. You'd be expecting field2 to be located at offset 2 bytes into the +structure, i.e. address 0x10002, but that address is not evenly divisible +by 4 (remember, we're reading a 4 byte value here). + +Fortunately, the compiler understands the alignment constraints, so in the +above case it would insert 2 bytes of padding in between field1 and field2. +Therefore, for standard structure types you can always rely on the compiler +to pad structures so that accesses to fields are suitably aligned (assuming +you do not cast the field to a type of different length). + +Similarly, you can also rely on the compiler to align variables and function +parameters to a naturally aligned scheme, based on the size of the type of +the variable. + +At this point, it should be clear that accessing a single byte (u8 or char) +will never cause an unaligned access, because all memory addresses are evenly +divisible by one. + +On a related topic, with the above considerations in mind you may observe +that you could reorder the fields in the structure in order to place fields +where padding would otherwise be inserted, and hence reduce the overall +resident memory size of structure instances. The optimal layout of the +above example is:: + + struct foo { + u32 field2; + u16 field1; + u8 field3; + }; + +For a natural alignment scheme, the compiler would only have to add a single +byte of padding at the end of the structure. This padding is added in order +to satisfy alignment constraints for arrays of these structures. + +Another point worth mentioning is the use of __attribute__((packed)) on a +structure type. This GCC-specific attribute tells the compiler never to +insert any padding within structures, useful when you want to use a C struct +to represent some data that comes in a fixed arrangement 'off the wire'. + +You might be inclined to believe that usage of this attribute can easily +lead to unaligned accesses when accessing fields that do not satisfy +architectural alignment requirements. However, again, the compiler is aware +of the alignment constraints and will generate extra instructions to perform +the memory access in a way that does not cause unaligned access. Of course, +the extra instructions obviously cause a loss in performance compared to the +non-packed case, so the packed attribute should only be used when avoiding +structure padding is of importance. + + +Code that causes unaligned access +================================= + +With the above in mind, let's move onto a real life example of a function +that can cause an unaligned memory access. The following function taken +from include/linux/etherdevice.h is an optimized routine to compare two +ethernet MAC addresses for equality:: + + bool ether_addr_equal(const u8 *addr1, const u8 *addr2) + { + #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | + ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); + + return fold == 0; + #else + const u16 *a = (const u16 *)addr1; + const u16 *b = (const u16 *)addr2; + return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; + #endif + } + +In the above function, when the hardware has efficient unaligned access +capability, there is no issue with this code. But when the hardware isn't +able to access memory on arbitrary boundaries, the reference to a[0] causes +2 bytes (16 bits) to be read from memory starting at address addr1. + +Think about what would happen if addr1 was an odd address such as 0x10003. +(Hint: it'd be an unaligned access.) + +Despite the potential unaligned access problems with the above function, it +is included in the kernel anyway but is understood to only work normally on +16-bit-aligned addresses. It is up to the caller to ensure this alignment or +not use this function at all. This alignment-unsafe function is still useful +as it is a decent optimization for the cases when you can ensure alignment, +which is true almost all of the time in ethernet networking context. + + +Here is another example of some code that could cause unaligned accesses:: + + void myfunc(u8 *data, u32 value) + { + [...] + *((u32 *) data) = cpu_to_le32(value); + [...] + } + +This code will cause unaligned accesses every time the data parameter points +to an address that is not evenly divisible by 4. + +In summary, the 2 main scenarios where you may run into unaligned access +problems involve: + + 1. Casting variables to types of different lengths + 2. Pointer arithmetic followed by access to at least 2 bytes of data + + +Avoiding unaligned accesses +=========================== + +The easiest way to avoid unaligned access is to use the get_unaligned() and +put_unaligned() macros provided by the header file. + +Going back to an earlier example of code that potentially causes unaligned +access:: + + void myfunc(u8 *data, u32 value) + { + [...] + *((u32 *) data) = cpu_to_le32(value); + [...] + } + +To avoid the unaligned memory access, you would rewrite it as follows:: + + void myfunc(u8 *data, u32 value) + { + [...] + value = cpu_to_le32(value); + put_unaligned(value, (u32 *) data); + [...] + } + +The get_unaligned() macro works similarly. Assuming 'data' is a pointer to +memory and you wish to avoid unaligned access, its usage is as follows:: + + u32 value = get_unaligned((u32 *) data); + +These macros work for memory accesses of any length (not just 32 bits as +in the examples above). Be aware that when compared to standard access of +aligned memory, using these macros to access unaligned memory can be costly in +terms of performance. + +If use of such macros is not convenient, another option is to use memcpy(), +where the source or destination (or both) are of type u8* or unsigned char*. +Due to the byte-wise nature of this operation, unaligned accesses are avoided. + + +Alignment vs. Networking +======================== + +On architectures that require aligned loads, networking requires that the IP +header is aligned on a four-byte boundary to optimise the IP stack. For +regular ethernet hardware, the constant NET_IP_ALIGN is used. On most +architectures this constant has the value 2 because the normal ethernet +header is 14 bytes long, so in order to get proper alignment one needs to +DMA to an address which can be expressed as 4*n + 2. One notable exception +here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned +addresses can be very expensive and dwarf the cost of unaligned loads. + +For some ethernet hardware that cannot DMA to unaligned addresses like +4*n+2 or non-ethernet hardware, this can be a problem, and it is then +required to copy the incoming frame into an aligned buffer. Because this is +unnecessary on architectures that can do unaligned accesses, the code can be +made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:: + + #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + skb = original skb + #else + skb = copy skb + #endif diff --git a/Documentation/process/unaligned-memory-access.rst b/Documentation/process/unaligned-memory-access.rst deleted file mode 100644 index 1ee82419d8aa..000000000000 --- a/Documentation/process/unaligned-memory-access.rst +++ /dev/null @@ -1,265 +0,0 @@ -========================= -Unaligned Memory Accesses -========================= - -:Author: Daniel Drake , -:Author: Johannes Berg - -:With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, - Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, - Vadim Lobanov - - -Linux runs on a wide variety of architectures which have varying behaviour -when it comes to memory access. This document presents some details about -unaligned accesses, why you need to write code that doesn't cause them, -and how to write such code! - - -The definition of an unaligned access -===================================== - -Unaligned memory accesses occur when you try to read N bytes of data starting -from an address that is not evenly divisible by N (i.e. addr % N != 0). -For example, reading 4 bytes of data from address 0x10004 is fine, but -reading 4 bytes of data from address 0x10005 would be an unaligned memory -access. - -The above may seem a little vague, as memory access can happen in different -ways. The context here is at the machine code level: certain instructions read -or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 -assembly). As will become clear, it is relatively easy to spot C statements -which will compile to multiple-byte memory access instructions, namely when -dealing with types such as u16, u32 and u64. - - -Natural alignment -================= - -The rule mentioned above forms what we refer to as natural alignment: -When accessing N bytes of memory, the base memory address must be evenly -divisible by N, i.e. addr % N == 0. - -When writing code, assume the target architecture has natural alignment -requirements. - -In reality, only a few architectures require natural alignment on all sizes -of memory access. However, we must consider ALL supported architectures; -writing code that satisfies natural alignment requirements is the easiest way -to achieve full portability. - - -Why unaligned access is bad -=========================== - -The effects of performing an unaligned memory access vary from architecture -to architecture. It would be easy to write a whole document on the differences -here; a summary of the common scenarios is presented below: - - - Some architectures are able to perform unaligned memory accesses - transparently, but there is usually a significant performance cost. - - Some architectures raise processor exceptions when unaligned accesses - happen. The exception handler is able to correct the unaligned access, - at significant cost to performance. - - Some architectures raise processor exceptions when unaligned accesses - happen, but the exceptions do not contain enough information for the - unaligned access to be corrected. - - Some architectures are not capable of unaligned memory access, but will - silently perform a different memory access to the one that was requested, - resulting in a subtle code bug that is hard to detect! - -It should be obvious from the above that if your code causes unaligned -memory accesses to happen, your code will not work correctly on certain -platforms and will cause performance problems on others. - - -Code that does not cause unaligned access -========================================= - -At first, the concepts above may seem a little hard to relate to actual -coding practice. After all, you don't have a great deal of control over -memory addresses of certain variables, etc. - -Fortunately things are not too complex, as in most cases, the compiler -ensures that things will work for you. For example, take the following -structure:: - - struct foo { - u16 field1; - u32 field2; - u8 field3; - }; - -Let us assume that an instance of the above structure resides in memory -starting at address 0x10000. With a basic level of understanding, it would -not be unreasonable to expect that accessing field2 would cause an unaligned -access. You'd be expecting field2 to be located at offset 2 bytes into the -structure, i.e. address 0x10002, but that address is not evenly divisible -by 4 (remember, we're reading a 4 byte value here). - -Fortunately, the compiler understands the alignment constraints, so in the -above case it would insert 2 bytes of padding in between field1 and field2. -Therefore, for standard structure types you can always rely on the compiler -to pad structures so that accesses to fields are suitably aligned (assuming -you do not cast the field to a type of different length). - -Similarly, you can also rely on the compiler to align variables and function -parameters to a naturally aligned scheme, based on the size of the type of -the variable. - -At this point, it should be clear that accessing a single byte (u8 or char) -will never cause an unaligned access, because all memory addresses are evenly -divisible by one. - -On a related topic, with the above considerations in mind you may observe -that you could reorder the fields in the structure in order to place fields -where padding would otherwise be inserted, and hence reduce the overall -resident memory size of structure instances. The optimal layout of the -above example is:: - - struct foo { - u32 field2; - u16 field1; - u8 field3; - }; - -For a natural alignment scheme, the compiler would only have to add a single -byte of padding at the end of the structure. This padding is added in order -to satisfy alignment constraints for arrays of these structures. - -Another point worth mentioning is the use of __attribute__((packed)) on a -structure type. This GCC-specific attribute tells the compiler never to -insert any padding within structures, useful when you want to use a C struct -to represent some data that comes in a fixed arrangement 'off the wire'. - -You might be inclined to believe that usage of this attribute can easily -lead to unaligned accesses when accessing fields that do not satisfy -architectural alignment requirements. However, again, the compiler is aware -of the alignment constraints and will generate extra instructions to perform -the memory access in a way that does not cause unaligned access. Of course, -the extra instructions obviously cause a loss in performance compared to the -non-packed case, so the packed attribute should only be used when avoiding -structure padding is of importance. - - -Code that causes unaligned access -================================= - -With the above in mind, let's move onto a real life example of a function -that can cause an unaligned memory access. The following function taken -from include/linux/etherdevice.h is an optimized routine to compare two -ethernet MAC addresses for equality:: - - bool ether_addr_equal(const u8 *addr1, const u8 *addr2) - { - #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS - u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | - ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); - - return fold == 0; - #else - const u16 *a = (const u16 *)addr1; - const u16 *b = (const u16 *)addr2; - return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; - #endif - } - -In the above function, when the hardware has efficient unaligned access -capability, there is no issue with this code. But when the hardware isn't -able to access memory on arbitrary boundaries, the reference to a[0] causes -2 bytes (16 bits) to be read from memory starting at address addr1. - -Think about what would happen if addr1 was an odd address such as 0x10003. -(Hint: it'd be an unaligned access.) - -Despite the potential unaligned access problems with the above function, it -is included in the kernel anyway but is understood to only work normally on -16-bit-aligned addresses. It is up to the caller to ensure this alignment or -not use this function at all. This alignment-unsafe function is still useful -as it is a decent optimization for the cases when you can ensure alignment, -which is true almost all of the time in ethernet networking context. - - -Here is another example of some code that could cause unaligned accesses:: - - void myfunc(u8 *data, u32 value) - { - [...] - *((u32 *) data) = cpu_to_le32(value); - [...] - } - -This code will cause unaligned accesses every time the data parameter points -to an address that is not evenly divisible by 4. - -In summary, the 2 main scenarios where you may run into unaligned access -problems involve: - - 1. Casting variables to types of different lengths - 2. Pointer arithmetic followed by access to at least 2 bytes of data - - -Avoiding unaligned accesses -=========================== - -The easiest way to avoid unaligned access is to use the get_unaligned() and -put_unaligned() macros provided by the header file. - -Going back to an earlier example of code that potentially causes unaligned -access:: - - void myfunc(u8 *data, u32 value) - { - [...] - *((u32 *) data) = cpu_to_le32(value); - [...] - } - -To avoid the unaligned memory access, you would rewrite it as follows:: - - void myfunc(u8 *data, u32 value) - { - [...] - value = cpu_to_le32(value); - put_unaligned(value, (u32 *) data); - [...] - } - -The get_unaligned() macro works similarly. Assuming 'data' is a pointer to -memory and you wish to avoid unaligned access, its usage is as follows:: - - u32 value = get_unaligned((u32 *) data); - -These macros work for memory accesses of any length (not just 32 bits as -in the examples above). Be aware that when compared to standard access of -aligned memory, using these macros to access unaligned memory can be costly in -terms of performance. - -If use of such macros is not convenient, another option is to use memcpy(), -where the source or destination (or both) are of type u8* or unsigned char*. -Due to the byte-wise nature of this operation, unaligned accesses are avoided. - - -Alignment vs. Networking -======================== - -On architectures that require aligned loads, networking requires that the IP -header is aligned on a four-byte boundary to optimise the IP stack. For -regular ethernet hardware, the constant NET_IP_ALIGN is used. On most -architectures this constant has the value 2 because the normal ethernet -header is 14 bytes long, so in order to get proper alignment one needs to -DMA to an address which can be expressed as 4*n + 2. One notable exception -here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned -addresses can be very expensive and dwarf the cost of unaligned loads. - -For some ethernet hardware that cannot DMA to unaligned addresses like -4*n+2 or non-ethernet hardware, this can be a problem, and it is then -required to copy the incoming frame into an aligned buffer. Because this is -unnecessary on architectures that can do unaligned accesses, the code can be -made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:: - - #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS - skb = original skb - #else - skb = copy skb - #endif diff --git a/Documentation/this_cpu_ops.txt b/Documentation/this_cpu_ops.txt deleted file mode 100644 index 5cb8b883ae83..000000000000 --- a/Documentation/this_cpu_ops.txt +++ /dev/null @@ -1,339 +0,0 @@ -=================== -this_cpu operations -=================== - -:Author: Christoph Lameter, August 4th, 2014 -:Author: Pranith Kumar, Aug 2nd, 2014 - -this_cpu operations are a way of optimizing access to per cpu -variables associated with the *currently* executing processor. This is -done through the use of segment registers (or a dedicated register where -the cpu permanently stored the beginning of the per cpu area for a -specific processor). - -this_cpu operations add a per cpu variable offset to the processor -specific per cpu base and encode that operation in the instruction -operating on the per cpu variable. - -This means that there are no atomicity issues between the calculation of -the offset and the operation on the data. Therefore it is not -necessary to disable preemption or interrupts to ensure that the -processor is not changed between the calculation of the address and -the operation on the data. - -Read-modify-write operations are of particular interest. Frequently -processors have special lower latency instructions that can operate -without the typical synchronization overhead, but still provide some -sort of relaxed atomicity guarantees. The x86, for example, can execute -RMW (Read Modify Write) instructions like inc/dec/cmpxchg without the -lock prefix and the associated latency penalty. - -Access to the variable without the lock prefix is not synchronized but -synchronization is not necessary since we are dealing with per cpu -data specific to the currently executing processor. Only the current -processor should be accessing that variable and therefore there are no -concurrency issues with other processors in the system. - -Please note that accesses by remote processors to a per cpu area are -exceptional situations and may impact performance and/or correctness -(remote write operations) of local RMW operations via this_cpu_*. - -The main use of the this_cpu operations has been to optimize counter -operations. - -The following this_cpu() operations with implied preemption protection -are defined. These operations can be used without worrying about -preemption and interrupts:: - - this_cpu_read(pcp) - this_cpu_write(pcp, val) - this_cpu_add(pcp, val) - this_cpu_and(pcp, val) - this_cpu_or(pcp, val) - this_cpu_add_return(pcp, val) - this_cpu_xchg(pcp, nval) - this_cpu_cmpxchg(pcp, oval, nval) - this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) - this_cpu_sub(pcp, val) - this_cpu_inc(pcp) - this_cpu_dec(pcp) - this_cpu_sub_return(pcp, val) - this_cpu_inc_return(pcp) - this_cpu_dec_return(pcp) - - -Inner working of this_cpu operations ------------------------------------- - -On x86 the fs: or the gs: segment registers contain the base of the -per cpu area. It is then possible to simply use the segment override -to relocate a per cpu relative address to the proper per cpu area for -the processor. So the relocation to the per cpu base is encoded in the -instruction via a segment register prefix. - -For example:: - - DEFINE_PER_CPU(int, x); - int z; - - z = this_cpu_read(x); - -results in a single instruction:: - - mov ax, gs:[x] - -instead of a sequence of calculation of the address and then a fetch -from that address which occurs with the per cpu operations. Before -this_cpu_ops such sequence also required preempt disable/enable to -prevent the kernel from moving the thread to a different processor -while the calculation is performed. - -Consider the following this_cpu operation:: - - this_cpu_inc(x) - -The above results in the following single instruction (no lock prefix!):: - - inc gs:[x] - -instead of the following operations required if there is no segment -register:: - - int *y; - int cpu; - - cpu = get_cpu(); - y = per_cpu_ptr(&x, cpu); - (*y)++; - put_cpu(); - -Note that these operations can only be used on per cpu data that is -reserved for a specific processor. Without disabling preemption in the -surrounding code this_cpu_inc() will only guarantee that one of the -per cpu counters is correctly incremented. However, there is no -guarantee that the OS will not move the process directly before or -after the this_cpu instruction is executed. In general this means that -the value of the individual counters for each processor are -meaningless. The sum of all the per cpu counters is the only value -that is of interest. - -Per cpu variables are used for performance reasons. Bouncing cache -lines can be avoided if multiple processors concurrently go through -the same code paths. Since each processor has its own per cpu -variables no concurrent cache line updates take place. The price that -has to be paid for this optimization is the need to add up the per cpu -counters when the value of a counter is needed. - - -Special operations ------------------- - -:: - - y = this_cpu_ptr(&x) - -Takes the offset of a per cpu variable (&x !) and returns the address -of the per cpu variable that belongs to the currently executing -processor. this_cpu_ptr avoids multiple steps that the common -get_cpu/put_cpu sequence requires. No processor number is -available. Instead, the offset of the local per cpu area is simply -added to the per cpu offset. - -Note that this operation is usually used in a code segment when -preemption has been disabled. The pointer is then used to -access local per cpu data in a critical section. When preemption -is re-enabled this pointer is usually no longer useful since it may -no longer point to per cpu data of the current processor. - - -Per cpu variables and offsets ------------------------------ - -Per cpu variables have *offsets* to the beginning of the per cpu -area. They do not have addresses although they look like that in the -code. Offsets cannot be directly dereferenced. The offset must be -added to a base pointer of a per cpu area of a processor in order to -form a valid address. - -Therefore the use of x or &x outside of the context of per cpu -operations is invalid and will generally be treated like a NULL -pointer dereference. - -:: - - DEFINE_PER_CPU(int, x); - -In the context of per cpu operations the above implies that x is a per -cpu variable. Most this_cpu operations take a cpu variable. - -:: - - int __percpu *p = &x; - -&x and hence p is the *offset* of a per cpu variable. this_cpu_ptr() -takes the offset of a per cpu variable which makes this look a bit -strange. - - -Operations on a field of a per cpu structure --------------------------------------------- - -Let's say we have a percpu structure:: - - struct s { - int n,m; - }; - - DEFINE_PER_CPU(struct s, p); - - -Operations on these fields are straightforward:: - - this_cpu_inc(p.m) - - z = this_cpu_cmpxchg(p.m, 0, 1); - - -If we have an offset to struct s:: - - struct s __percpu *ps = &p; - - this_cpu_dec(ps->m); - - z = this_cpu_inc_return(ps->n); - - -The calculation of the pointer may require the use of this_cpu_ptr() -if we do not make use of this_cpu ops later to manipulate fields:: - - struct s *pp; - - pp = this_cpu_ptr(&p); - - pp->m--; - - z = pp->n++; - - -Variants of this_cpu ops ------------------------- - -this_cpu ops are interrupt safe. Some architectures do not support -these per cpu local operations. In that case the operation must be -replaced by code that disables interrupts, then does the operations -that are guaranteed to be atomic and then re-enable interrupts. Doing -so is expensive. If there are other reasons why the scheduler cannot -change the processor we are executing on then there is no reason to -disable interrupts. For that purpose the following __this_cpu operations -are provided. - -These operations have no guarantee against concurrent interrupts or -preemption. If a per cpu variable is not used in an interrupt context -and the scheduler cannot preempt, then they are safe. If any interrupts -still occur while an operation is in progress and if the interrupt too -modifies the variable, then RMW actions can not be guaranteed to be -safe:: - - __this_cpu_read(pcp) - __this_cpu_write(pcp, val) - __this_cpu_add(pcp, val) - __this_cpu_and(pcp, val) - __this_cpu_or(pcp, val) - __this_cpu_add_return(pcp, val) - __this_cpu_xchg(pcp, nval) - __this_cpu_cmpxchg(pcp, oval, nval) - __this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) - __this_cpu_sub(pcp, val) - __this_cpu_inc(pcp) - __this_cpu_dec(pcp) - __this_cpu_sub_return(pcp, val) - __this_cpu_inc_return(pcp) - __this_cpu_dec_return(pcp) - - -Will increment x and will not fall-back to code that disables -interrupts on platforms that cannot accomplish atomicity through -address relocation and a Read-Modify-Write operation in the same -instruction. - - -&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) --------------------------------------------- - -The first operation takes the offset and forms an address and then -adds the offset of the n field. This may result in two add -instructions emitted by the compiler. - -The second one first adds the two offsets and then does the -relocation. IMHO the second form looks cleaner and has an easier time -with (). The second form also is consistent with the way -this_cpu_read() and friends are used. - - -Remote access to per cpu data ------------------------------- - -Per cpu data structures are designed to be used by one cpu exclusively. -If you use the variables as intended, this_cpu_ops() are guaranteed to -be "atomic" as no other CPU has access to these data structures. - -There are special cases where you might need to access per cpu data -structures remotely. It is usually safe to do a remote read access -and that is frequently done to summarize counters. Remote write access -something which could be problematic because this_cpu ops do not -have lock semantics. A remote write may interfere with a this_cpu -RMW operation. - -Remote write accesses to percpu data structures are highly discouraged -unless absolutely necessary. Please consider using an IPI to wake up -the remote CPU and perform the update to its per cpu area. - -To access per-cpu data structure remotely, typically the per_cpu_ptr() -function is used:: - - - DEFINE_PER_CPU(struct data, datap); - - struct data *p = per_cpu_ptr(&datap, cpu); - -This makes it explicit that we are getting ready to access a percpu -area remotely. - -You can also do the following to convert the datap offset to an address:: - - struct data *p = this_cpu_ptr(&datap); - -but, passing of pointers calculated via this_cpu_ptr to other cpus is -unusual and should be avoided. - -Remote access are typically only for reading the status of another cpus -per cpu data. Write accesses can cause unique problems due to the -relaxed synchronization requirements for this_cpu operations. - -One example that illustrates some concerns with write operations is -the following scenario that occurs because two per cpu variables -share a cache-line but the relaxed synchronization is applied to -only one process updating the cache-line. - -Consider the following example:: - - - struct test { - atomic_t a; - int b; - }; - - DEFINE_PER_CPU(struct test, onecacheline); - -There is some concern about what would happen if the field 'a' is updated -remotely from one processor and the local processor would use this_cpu ops -to update field b. Care should be taken that such simultaneous accesses to -data within the same cache line are avoided. Also costly synchronization -may be necessary. IPIs are generally recommended in such scenarios instead -of a remote write to the per cpu area of another processor. - -Even in cases where the remote writes are rare, please bear in -mind that a remote write will evict the cache line from the processor -that most likely will access it. If the processor wakes up and finds a -missing local cache line of a per cpu area, its performance and hence -the wake up times will be affected. diff --git a/arch/Kconfig b/arch/Kconfig index 8cc35dc556c7..2a439fb8069e 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -147,7 +147,7 @@ config HAVE_EFFICIENT_UNALIGNED_ACCESS problems with received packets if doing so would not help much. - See Documentation/unaligned-memory-access.txt for more + See Documentation/core-api/unaligned-memory-access.rst for more information on the topic of unaligned memory accesses. config ARCH_USE_BUILTIN_BSWAP -- cgit v1.2.3