summaryrefslogtreecommitdiff
path: root/samples/bpf/map_perf_test_user.c
AgeCommit message (Collapse)AuthorFilesLines
2017-05-03samples/bpf: load_bpf.c make callback fixup more flexibleJesper Dangaard Brouer1-7/+7
Do this change before others start to use this callback. Change map_perf_test_user.c which seems to be the only user. This patch extends capabilities of commit 9fd63d05f3e8 ("bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def"). Give fixup callback access to struct bpf_map_data, instead of only stuct bpf_map_def. This add flexibility to allow userspace to reassign the map file descriptor. This is very useful when wanting to share maps between several bpf programs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: lru: Add map-in-map LRU exampleMartin KaFai Lau1-0/+62
This patch adds a map-in-map LRU example. If we know only a subset of cores will use the LRU, we can allocate a common LRU list per targeting core and store it into an array-of-hashs. It allows using the common LRU map with map-update performance comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory on the unused cores that we know they will never access the LRU map. BPF_F_NO_COMMON_LRU: > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}' 9234314 (9.23M/s) map-in-map LRU: > map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}' 9962743 (9.96M/s) Notes that the max_entries for the map-in-map LRU test is 1260000 which is the max_entries for each inner LRU map. 8 processes have been started, so 8 * 1260000 = 10080000 (~10M) which is close to what is used in the BPF_F_NO_COMMON_LRU test. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: Allow bpf sample programs (*_user.c) to change bpf_map_defMartin KaFai Lau1-56/+92
The current bpf_map_def is statically defined during compile time. This patch allows the *_user.c program to change it during runtime. It is done by adding load_bpf_file_fixup_map() which takes a callback. The callback will be called before creating each map so that it has a chance to modify the bpf_map_def. The current usecase is to change max_entries in map_perf_test. It is interesting to test with a much bigger map size in some cases (e.g. the following patch on bpf_lru_map.c). However, it is hard to find one size to fit all testing environment. Hence, it is handy to take the max_entries as a cmdline arg and then configure the bpf_map_def during runtime. This patch adds two cmdline args. One is to configure the map's max_entries. Another is to configure the max_cnt which controls how many times a syscall is called. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: lru: Refactor LRU map tests in map_perf_testMartin KaFai Lau1-17/+36
One more LRU test will be added later in this patch series. In this patch, we first move all existing LRU map tests into a single syscall (connect) first so that the future new LRU test can be added without hunting another syscall. One of the map name is also changed from percpu_lru_hash_map to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-17samples/bpf: add map_lookup microbenchmarkAlexei Starovoitov1-0/+32
$ map_perf_test 128 speed of HASH bpf_map_lookup_elem() in lookups per second w/o JIT w/JIT before 46M 58M after 42M 74M perf report before: 54.23% map_perf_test [kernel.kallsyms] [k] __htab_map_lookup_elem 14.24% map_perf_test [kernel.kallsyms] [k] lookup_elem_raw 8.84% map_perf_test [kernel.kallsyms] [k] htab_map_lookup_elem 5.93% map_perf_test [kernel.kallsyms] [k] bpf_map_lookup_elem 2.30% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2 1.49% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler after: 60.03% map_perf_test [kernel.kallsyms] [k] __htab_map_lookup_elem 18.07% map_perf_test [kernel.kallsyms] [k] lookup_elem_raw 2.91% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2 1.94% map_perf_test [kernel.kallsyms] [k] _einittext 1.90% map_perf_test [kernel.kallsyms] [k] __audit_syscall_exit 1.72% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial functions, yet they take sizeable amount of cpu time. htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts htab_map_lookup_elem() into three BPF insns which causing cpu time for bpf_prog_da4fc6a3f41761a2() slightly increase. $ map_perf_test 256 speed of ARRAY bpf_map_lookup_elem() in lookups per second w/o JIT w/JIT before 97M 174M after 64M 280M before: 37.33% map_perf_test [kernel.kallsyms] [k] array_map_lookup_elem 13.95% map_perf_test [kernel.kallsyms] [k] bpf_map_lookup_elem 6.54% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2 4.57% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler after: 32.86% map_perf_test [kernel.kallsyms] [k] bpf_prog_da4fc6a3f41761a2 6.54% map_perf_test [kernel.kallsyms] [k] kprobe_ftrace_handler array_map_gen_lookup() removes calls to array_map_lookup_elem() and bpf_map_lookup_elem() and replaces them with 7 bpf insns. The performance without JIT is slower, since executing extra insns in the interpreter is slower than running native C code, but with JIT the performance gains are obvious, since native C->x86 code is replaced with fewer bpf->x86 instructions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24samples/bpf: add lpm-trie benchmarkDavid Herrmann1-0/+49
Extend the map_perf_test_{user,kern}.c infrastructure to stress test lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure the latency depending on trie size and lookup count. On my Intel Haswell i7-6400U, a single gettid() syscall with an empty bpf program takes roughly 6.5us on my system. Lookups in empty tries take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192 entries take ~7.1us (on the first _and_ any subsequent try). Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15bpf: Add tests for the LRU bpf_htabMartin KaFai Lau1-0/+32
This patch has some unit tests and a test_lru_dist. The test_lru_dist reads in the numeric keys from a file. The files used here are generated by a modified fio-genzipf tool originated from the fio test suit. The sample data file can be found here: https://github.com/iamkafai/bpf-lru The zipf.* data files have 100k numeric keys and the key is also ranged from 1 to 100k. The test_lru_dist outputs the number of unique keys (nr_unique). F.e. The following means, 61239 of them is unique out of 100k keys. nr_misses means it cannot be found in the LRU map, so nr_misses must be >= nr_unique. test_lru_dist also simulates a perfect LRU map as a comparison: [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a1_01.out 4000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) .... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a0_01.out 40000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) ... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) LRU map has also been added to map_perf_test: /* Global LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates /* Percpu LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06samples/bpf: Fix build breakage with map_perf_test_user.cNaveen N. Rao1-0/+1
Building BPF samples is failing with the below error: samples/bpf/map_perf_test_user.c: In function ‘main’: samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has initializer but incomplete type struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’ undeclared (first use in this function) struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ samples/bpf/map_perf_test_user.c:134:21: note: each undeclared identifier is reported only once for each function it appears in samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in struct initializer [enabled by default] struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization for ‘r’) [enabled by default] samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in struct initializer [enabled by default] samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization for ‘r’) [enabled by default] samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’ isn’t known struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of function ‘setrlimit’ [-Wimplicit-function-declaration] setrlimit(RLIMIT_MEMLOCK, &r); ^ samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’ undeclared (first use in this function) setrlimit(RLIMIT_MEMLOCK, &r); ^ samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’ [-Wunused-variable] struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; ^ make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1 Fix this by including the necessary header file. Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09samples/bpf: add map performance testAlexei Starovoitov1-0/+155
performance tests for hash map and per-cpu hash map with and without pre-allocation Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>