diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/checklist.rst | 55 | ||||
-rw-r--r-- | Documentation/bpf/index.rst | 13 | ||||
-rw-r--r-- | Documentation/bpf/libbpf/libbpf.rst | 14 | ||||
-rw-r--r-- | Documentation/bpf/libbpf/libbpf_api.rst | 27 | ||||
-rw-r--r-- | Documentation/bpf/libbpf/libbpf_build.rst | 37 | ||||
-rw-r--r-- | Documentation/bpf/libbpf/libbpf_naming_convention.rst | 162 | ||||
-rw-r--r-- | Documentation/networking/af_xdp.rst | 32 |
7 files changed, 303 insertions, 37 deletions
diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst index 1030119294d0..01cc21f17f7b 100644 --- a/Documentation/RCU/checklist.rst +++ b/Documentation/RCU/checklist.rst @@ -211,27 +211,40 @@ over a rather long period of time, but improvements are always welcome! of the system, especially to real-time workloads running on the rest of the system. -7. As of v4.20, a given kernel implements only one RCU flavor, - which is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y. - If the updater uses call_rcu() or synchronize_rcu(), - then the corresponding readers may use rcu_read_lock() and - rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(), - or any pair of primitives that disables and re-enables preemption, - for example, rcu_read_lock_sched() and rcu_read_unlock_sched(). - If the updater uses synchronize_srcu() or call_srcu(), - then the corresponding readers must use srcu_read_lock() and - srcu_read_unlock(), and with the same srcu_struct. The rules for - the expedited primitives are the same as for their non-expedited - counterparts. Mixing things up will result in confusion and - broken kernels, and has even resulted in an exploitable security - issue. - - One exception to this rule: rcu_read_lock() and rcu_read_unlock() - may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() - in cases where local bottom halves are already known to be - disabled, for example, in irq or softirq context. Commenting - such cases is a must, of course! And the jury is still out on - whether the increased speed is worth it. +7. As of v4.20, a given kernel implements only one RCU flavor, which + is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y. + If the updater uses call_rcu() or synchronize_rcu(), then + the corresponding readers may use: (1) rcu_read_lock() and + rcu_read_unlock(), (2) any pair of primitives that disables + and re-enables softirq, for example, rcu_read_lock_bh() and + rcu_read_unlock_bh(), or (3) any pair of primitives that disables + and re-enables preemption, for example, rcu_read_lock_sched() and + rcu_read_unlock_sched(). If the updater uses synchronize_srcu() + or call_srcu(), then the corresponding readers must use + srcu_read_lock() and srcu_read_unlock(), and with the same + srcu_struct. The rules for the expedited RCU grace-period-wait + primitives are the same as for their non-expedited counterparts. + + If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(), + then the readers must refrain from executing voluntary + context switches, that is, from blocking. If the updater uses + call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then + the corresponding readers must use rcu_read_lock_trace() and + rcu_read_unlock_trace(). If an updater uses call_rcu_tasks_rude() + or synchronize_rcu_tasks_rude(), then the corresponding readers + must use anything that disables interrupts. + + Mixing things up will result in confusion and broken kernels, and + has even resulted in an exploitable security issue. Therefore, + when using non-obvious pairs of primitives, commenting is + of course a must. One example of non-obvious pairing is + the XDP feature in networking, which calls BPF programs from + network-driver NAPI (softirq) context. BPF relies heavily on RCU + protection for its data structures, but because the BPF program + invocation happens entirely within a single local_bh_disable() + section in a NAPI poll cycle, this usage is safe. The reason + that this usage is safe is that readers can use anything that + disables BH when updaters use call_rcu() or synchronize_rcu(). 8. Although synchronize_rcu() is slower than is call_rcu(), it usually results in simpler code. So, unless update performance is diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 93e8cf12a6d4..baea6c2abba5 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -12,6 +12,19 @@ BPF instruction-set. The Cilium project also maintains a `BPF and XDP Reference Guide`_ that goes into great technical depth about the BPF Architecture. +libbpf +====== + +Libbpf is a userspace library for loading and interacting with bpf programs. + +.. toctree:: + :maxdepth: 1 + + libbpf/libbpf + libbpf/libbpf_api + libbpf/libbpf_build + libbpf/libbpf_naming_convention + BPF Type Format (BTF) ===================== diff --git a/Documentation/bpf/libbpf/libbpf.rst b/Documentation/bpf/libbpf/libbpf.rst new file mode 100644 index 000000000000..1b1e61d5ead1 --- /dev/null +++ b/Documentation/bpf/libbpf/libbpf.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +libbpf +====== + +This is documentation for libbpf, a userspace library for loading and +interacting with bpf programs. + +All general BPF questions, including kernel functionality, libbpf APIs and +their application, should be sent to bpf@vger.kernel.org mailing list. +You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the +mailing list search its `archive <https://lore.kernel.org/bpf/>`_. +Please search the archive before asking new questions. It very well might +be that this was already addressed or answered before. diff --git a/Documentation/bpf/libbpf/libbpf_api.rst b/Documentation/bpf/libbpf/libbpf_api.rst new file mode 100644 index 000000000000..f07eecd054da --- /dev/null +++ b/Documentation/bpf/libbpf/libbpf_api.rst @@ -0,0 +1,27 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +API +=== + +This documentation is autogenerated from header files in libbpf, tools/lib/bpf + +.. kernel-doc:: tools/lib/bpf/libbpf.h + :internal: + +.. kernel-doc:: tools/lib/bpf/bpf.h + :internal: + +.. kernel-doc:: tools/lib/bpf/btf.h + :internal: + +.. kernel-doc:: tools/lib/bpf/xsk.h + :internal: + +.. kernel-doc:: tools/lib/bpf/bpf_tracing.h + :internal: + +.. kernel-doc:: tools/lib/bpf/bpf_core_read.h + :internal: + +.. kernel-doc:: tools/lib/bpf/bpf_endian.h + :internal:
\ No newline at end of file diff --git a/Documentation/bpf/libbpf/libbpf_build.rst b/Documentation/bpf/libbpf/libbpf_build.rst new file mode 100644 index 000000000000..8e8c23e8093d --- /dev/null +++ b/Documentation/bpf/libbpf/libbpf_build.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +Building libbpf +=============== + +libelf and zlib are internal dependencies of libbpf and thus are required to link +against and must be installed on the system for applications to work. +pkg-config is used by default to find libelf, and the program called +can be overridden with PKG_CONFIG. + +If using pkg-config at build time is not desired, it can be disabled by +setting NO_PKG_CONFIG=1 when calling make. + +To build both static libbpf.a and shared libbpf.so: + +.. code-block:: bash + + $ cd src + $ make + +To build only static libbpf.a library in directory build/ and install them +together with libbpf headers in a staging directory root/: + +.. code-block:: bash + + $ cd src + $ mkdir build root + $ BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=root make install + +To build both static libbpf.a and shared libbpf.so against a custom libelf +dependency installed in /build/root/ and install them together with libbpf +headers in a build directory /build/root/: + +.. code-block:: bash + + $ cd src + $ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make
\ No newline at end of file diff --git a/Documentation/bpf/libbpf/libbpf_naming_convention.rst b/Documentation/bpf/libbpf/libbpf_naming_convention.rst new file mode 100644 index 000000000000..3de1d51e41da --- /dev/null +++ b/Documentation/bpf/libbpf/libbpf_naming_convention.rst @@ -0,0 +1,162 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +API naming convention +===================== + +libbpf API provides access to a few logically separated groups of +functions and types. Every group has its own naming convention +described here. It's recommended to follow these conventions whenever a +new function or type is added to keep libbpf API clean and consistent. + +All types and functions provided by libbpf API should have one of the +following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``, +``btf_dump_``, ``ring_buffer_``, ``perf_buffer_``. + +System call wrappers +-------------------- + +System call wrappers are simple wrappers for commands supported by +sys_bpf system call. These wrappers should go to ``bpf.h`` header file +and map one to one to corresponding commands. + +For example ``bpf_map_lookup_elem`` wraps ``BPF_MAP_LOOKUP_ELEM`` +command of sys_bpf, ``bpf_prog_attach`` wraps ``BPF_PROG_ATTACH``, etc. + +Objects +------- + +Another class of types and functions provided by libbpf API is "objects" +and functions to work with them. Objects are high-level abstractions +such as BPF program or BPF map. They're represented by corresponding +structures such as ``struct bpf_object``, ``struct bpf_program``, +``struct bpf_map``, etc. + +Structures are forward declared and access to their fields should be +provided via corresponding getters and setters rather than directly. + +These objects are associated with corresponding parts of ELF object that +contains compiled BPF programs. + +For example ``struct bpf_object`` represents ELF object itself created +from an ELF file or from a buffer, ``struct bpf_program`` represents a +program in ELF object and ``struct bpf_map`` is a map. + +Functions that work with an object have names built from object name, +double underscore and part that describes function purpose. + +For example ``bpf_object__open`` consists of the name of corresponding +object, ``bpf_object``, double underscore and ``open`` that defines the +purpose of the function to open ELF file and create ``bpf_object`` from +it. + +All objects and corresponding functions other than BTF related should go +to ``libbpf.h``. BTF types and functions should go to ``btf.h``. + +Auxiliary functions +------------------- + +Auxiliary functions and types that don't fit well in any of categories +described above should have ``libbpf_`` prefix, e.g. +``libbpf_get_error`` or ``libbpf_prog_type_by_name``. + +AF_XDP functions +------------------- + +AF_XDP functions should have an ``xsk_`` prefix, e.g. +``xsk_umem__get_data`` or ``xsk_umem__create``. The interface consists +of both low-level ring access functions and high-level configuration +functions. These can be mixed and matched. Note that these functions +are not reentrant for performance reasons. + +ABI +========== + +libbpf can be both linked statically or used as DSO. To avoid possible +conflicts with other libraries an application is linked with, all +non-static libbpf symbols should have one of the prefixes mentioned in +API documentation above. See API naming convention to choose the right +name for a new symbol. + +Symbol visibility +----------------- + +libbpf follow the model when all global symbols have visibility "hidden" +by default and to make a symbol visible it has to be explicitly +attributed with ``LIBBPF_API`` macro. For example: + +.. code-block:: c + + LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id); + +This prevents from accidentally exporting a symbol, that is not supposed +to be a part of ABI what, in turn, improves both libbpf developer- and +user-experiences. + +ABI versionning +--------------- + +To make future ABI extensions possible libbpf ABI is versioned. +Versioning is implemented by ``libbpf.map`` version script that is +passed to linker. + +Version name is ``LIBBPF_`` prefix + three-component numeric version, +starting from ``0.0.1``. + +Every time ABI is being changed, e.g. because a new symbol is added or +semantic of existing symbol is changed, ABI version should be bumped. +This bump in ABI version is at most once per kernel development cycle. + +For example, if current state of ``libbpf.map`` is: + +.. code-block:: c + + LIBBPF_0.0.1 { + global: + bpf_func_a; + bpf_func_b; + local: + \*; + }; + +, and a new symbol ``bpf_func_c`` is being introduced, then +``libbpf.map`` should be changed like this: + +.. code-block:: c + + LIBBPF_0.0.1 { + global: + bpf_func_a; + bpf_func_b; + local: + \*; + }; + LIBBPF_0.0.2 { + global: + bpf_func_c; + } LIBBPF_0.0.1; + +, where new version ``LIBBPF_0.0.2`` depends on the previous +``LIBBPF_0.0.1``. + +Format of version script and ways to handle ABI changes, including +incompatible ones, described in details in [1]. + +Stand-alone build +------------------- + +Under https://github.com/libbpf/libbpf there is a (semi-)automated +mirror of the mainline's version of libbpf for a stand-alone build. + +However, all changes to libbpf's code base must be upstreamed through +the mainline kernel tree. + +License +------------------- + +libbpf is dual-licensed under LGPL 2.1 and BSD 2-Clause. + +Links +------------------- + +[1] https://www.akkadia.org/drepper/dsohowto.pdf + (Chapter 3. Maintaining APIs and ABIs). diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 2ccc5644cc98..42576880aa4a 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -290,19 +290,19 @@ round-robin example of distributing packets is shown below: #define MAX_SOCKS 16 struct { - __uint(type, BPF_MAP_TYPE_XSKMAP); - __uint(max_entries, MAX_SOCKS); - __uint(key_size, sizeof(int)); - __uint(value_size, sizeof(int)); + __uint(type, BPF_MAP_TYPE_XSKMAP); + __uint(max_entries, MAX_SOCKS); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); } xsks_map SEC(".maps"); static unsigned int rr; SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { - rr = (rr + 1) & (MAX_SOCKS - 1); + rr = (rr + 1) & (MAX_SOCKS - 1); - return bpf_redirect_map(&xsks_map, rr, XDP_DROP); + return bpf_redirect_map(&xsks_map, rr, XDP_DROP); } Note, that since there is only a single set of FILL and COMPLETION @@ -379,7 +379,7 @@ would look like this for the TX path: .. code-block:: c if (xsk_ring_prod__needs_wakeup(&my_tx_ring)) - sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0); + sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0); I.e., only use the syscall if the flag is set. @@ -442,9 +442,9 @@ purposes. The supported statistics are shown below: .. code-block:: c struct xdp_statistics { - __u64 rx_dropped; /* Dropped for reasons other than invalid desc */ - __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ - __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ + __u64 rx_dropped; /* Dropped for reasons other than invalid desc */ + __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ + __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ }; XDP_OPTIONS getsockopt @@ -483,15 +483,15 @@ like this: .. code-block:: c // struct xdp_rxtx_ring { - // __u32 *producer; - // __u32 *consumer; - // struct xdp_desc *desc; + // __u32 *producer; + // __u32 *consumer; + // struct xdp_desc *desc; // }; // struct xdp_umem_ring { - // __u32 *producer; - // __u32 *consumer; - // __u64 *desc; + // __u32 *producer; + // __u32 *consumer; + // __u64 *desc; // }; // typedef struct xdp_rxtx_ring RING; |