summaryrefslogtreecommitdiff
path: root/net/netfilter/ipvs/ip_vs_conn.c
AgeCommit message (Collapse)AuthorFilesLines
2023-12-27Kill sched.h dependency on rcupdate.hKent Overstreet1-0/+1
by moving cond_resched_rcu() to rcupdate_wait.h, we can kill another big sched.h dependency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-06-01ipvs: dynamically limit the connection hash tableJulian Anastasov1-9/+17
As we allow the hash table to be configured to rows above 2^20, we should limit it depending on the available memory to some sane values. Switch to kvmalloc allocation to better select the needed allocation type. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-01ipvs: increase ip_vs_conn_tab_bits range for 64BITAbhijeet Rastogi1-2/+2
Current range [8, 20] is set purely due to historical reasons because at the time, ~1M (2^20) was considered sufficient. With this change, 27 is the upper limit for 64-bit, 20 otherwise. Previous change regarding this limit is here. Link: https://lore.kernel.org/all/86eabeb9dd62aebf1e2533926fdd13fed48bab1f.1631289960.git.aclaudi@redhat.com/T/#u Signed-off-by: Abhijeet Rastogi <abhijeet.1989@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Consistently use array_size() in ip_vs_conn_init()Simon Horman1-6/+6
Consistently use array_size() to calculate the size of ip_vs_conn_tab in bytes. Flagged by Coccinelle: WARNING: array_size is already used (line 1498) to compute the same size No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02ipvs: fix WARNING in __ip_vs_cleanup_batch()Zhengchao Shao1-5/+21
During the initialization of ip_vs_conn_net_init(), if file ip_vs_conn or ip_vs_conn_sync fails to be created, the initialization is successful by default. Therefore, the ip_vs_conn or ip_vs_conn_sync file doesn't be found during the remove. The following is the stack information: name 'ip_vs_conn_sync' WARNING: CPU: 3 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 Call Trace: <TASK> __ip_vs_cleanup_batch+0x7d/0x120 ops_exit_list+0x125/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-02ipvs: use explicitly signed charsJason A. Donenfeld1-2/+2
The `char` type with no explicit sign is sometimes signed and sometimes unsigned. This code will break on platforms such as arm, where char is unsigned. So mark it here as explicitly signed, so that the todrop_counter decrement and subsequent comparison is correct. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-10-12treewide: use get_random_u32() when possibleJason A. Donenfeld1-1/+1
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-19ipvs: correctly print the memory size of ip_vs_conn_tabPengcheng Yang1-1/+1
The memory size of ip_vs_conn_tab changed after we use hlist instead of list. Fixes: 731109e78415 ("ipvs: use hlist instead of list") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-14ipvs: check that ip_vs_conn_tab_bits is between 8 and 20Andrea Claudi1-0/+4
ip_vs_conn_tab_bits may be provided by the user through the conn_tab_bits module parameter. If this value is greater than 31, or less than 0, the shift operator used to derive tab_size causes undefined behaviour. Fix this checking ip_vs_conn_tab_bits value to be in the range specified in ipvs Kconfig. If not, simply use default value. Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12ipvs: inspect reply packets from DR/TUN real serverslongguang.yue1-3/+15
Just like for MASQ, inspect the reply packets coming from DR/TUN real servers and alter the connection's state and timeout according to the protocol. It's ipvs's duty to do traffic statistic if packets get hit, no matter what mode it is. Signed-off-by: longguang.yue <bigclouds@163.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-22ipvs: queue delayed work to expire no destination connections if ↵Andrew Sy Kim1-0/+39
expire_nodest_conn=1 When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-01ipvs: avoid expiring many connections from timerJulian Anastasov1-15/+38
Add new functions ip_vs_conn_del() and ip_vs_conn_del_put() to release many IPVS connections in process context. They are suitable for connections found in table when we do not want to overload the timers. Currently, the change is useful for the dropentry delayed work but it will be used also in following patch when flushing connections to failed destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-6/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-16ipvs: don't show negative times in ip_vs_connMatteo Croce1-8/+14
Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"), timers duration can last even 12.5% more than the scheduled interval. IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync, which shows the remaining time before that a connection expires. The default expire time for a connection is 60 seconds, and the expiration timer can fire even 4 seconds later than the scheduled time. The expiration time is calculated subtracting jiffies to the scheduled expiration time, and it's shown as a huge number when the timer fires late, since both values are unsigned. This can confuse script and tools which relies on it, like ipvsadm: root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done TCP 00:05 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:04 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:03 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:02 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:01 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:00 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:44 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:43 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:42 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:41 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:40 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:39 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18ipvs: drop conn templates under attackJulian Anastasov1-20/+39
Before now, connection templates were ignored by the random dropentry procedure. But Michal Koutný suggests that we should add exception for connections under SYN attack. He provided patch that implements it for TCP: <quote> IPVS includes protection against filling the ip_vs_conn_tab by dropping 1/32 of feasible entries every second. The template entries (for persistent services) are never directly deleted by this mechanism but when a picked TCP connection entry is being dropped (1), the respective template entry is dropped too (realized by expiring 60 seconds after the connection entry being dropped). There is another mechanism that removes connection entries when they time out (2), in this case the associated template entry is not deleted. Under SYN flood template entries would accumulate (due to their entry longer timeout). The accumulation takes place also with drop_entry being enabled. Roughly 15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism (1) and are removed by the timeout mechanism (2)(defaults to 60 seconds for SYN_RECV), thus template entries would still accumulate. The patch ensures that when a connection entry times out, we also remove the template entry from the table. To prevent breaking persistent services (since the connection may time out in already established state) we add a new entry flag to protect templates what spawned at least one established TCP connection. </quote> We already added ASSURED flag for the templates in previous patch, so that we can use it now to decide which connection templates should be dropped under attack. But we also have some cases that need special handling. We modify the dropentry procedure as follows: - Linux timers currently use LIFO ordering but we can not rely on this to drop controlling connections. So, set cp->timeout to 0 to indicate that connection was dropped and that on expiration we should try to drop our controlling connections. As result, we can now avoid the ip_vs_conn_expire_now call. - move the cp->n_control check above, so that it avoids restarting the timer for controlling connections when not needed. - drop unassured connection templates here if they are not referred by any connections. On connection expiration: if connection was dropped (cp->timeout=0) try to drop our controlling connection except if it is a template in assured state. In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls according to the LIFO timer expiration order. It should work faster for controlling connections with single controlled one. Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18ipvs: provide just conn to ip_vs_state_nameJulian Anastasov1-4/+4
In preparation for followup patches, provide just the cp ptr to ip_vs_state_name. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-13treewide: Use array_size() in vmalloc()Kees Cook1-1/+2
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds1-30/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-05-16proc: introduce proc_create_net{,_data}Christoph Hellwig1-30/+5
Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-08ipvs: fix refcount usage for conns in ops modeJulian Anastasov1-11/+6
Connections in One-packet scheduling mode (-o, --ops) are removed with refcnt=0 because they are not hashed in conn table. To avoid refcount_dec reporting this as error, change them to be removed with refcount_dec_if_one as all other connections. refcount_t hit zero at ip_vs_conn_put+0x31/0x40 [ip_vs] in sh[15519], uid/euid: 497/497 WARNING: CPU: 0 PID: 15519 at ../kernel/panic.c:657 refcount_error_report+0x94/0x9e Modules linked in: ip_vs_rr cirrus ttm sb_edac edac_core drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc mousedev drm aesni_intel aes_x86_64 crypto_simd glue_helper cryptd psmouse evdev input_leds led_class intel_agp fb_sys_fops syscopyarea sysfillrect intel_rapl_perf mac_hid intel_gtt serio_raw sysimgblt agpgart i2c_piix4 i2c_core ata_generic pata_acpi floppy cfg80211 rfkill button loop macvlan ip_vs nf_conntrack libcrc32c crc32c_generic ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc16 mbcache jbd2 fscrypto ata_piix libata atkbd libps2 scsi_mod crc32c_intel i8042 rtc_cmos serio af_packet dm_mod dax fuse xen_netfront xen_blkfront CPU: 0 PID: 15519 Comm: sh Tainted: G W 4.15.17 #1-NixOS Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 RIP: 0010:refcount_error_report+0x94/0x9e RSP: 0000:ffffa344dde039c8 EFLAGS: 00010296 RAX: 0000000000000057 RBX: ffffffff92f20e06 RCX: 0000000000000006 RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffa344dde165c0 RBP: ffffa344dde03b08 R08: 0000000000000218 R09: 0000000000000004 R10: ffffffff93006a80 R11: 0000000000000001 R12: ffffa344d68cd100 R13: 00000000000001f1 R14: ffffffff92f12fb0 R15: 0000000000000004 FS: 00007fc9d2040fc0(0000) GS:ffffa344dde00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000262a000 CR3: 0000000016a0c004 CR4: 00000000001606f0 Call Trace: <IRQ> ex_handler_refcount+0x4e/0x80 fixup_exception+0x33/0x40 do_trap+0x83/0x140 do_error_trap+0x83/0xf0 ? ip_vs_conn_drop_conntrack+0x120/0x1a5 [ip_vs] ? ip_finish_output2+0x29c/0x390 ? ip_finish_output2+0x1a2/0x390 invalid_op+0x1b/0x40 RIP: 0010:ip_vs_conn_put+0x31/0x40 [ip_vs] RSP: 0000:ffffa344dde03bb8 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffffa344df31cf00 RCX: ffffa344d7450198 RDX: 0000000000000003 RSI: 00000000fffffe01 RDI: ffffa344d7450140 RBP: 0000000000000002 R08: 0000000000000476 R09: 0000000000000000 R10: ffffa344dde03b28 R11: ffffa344df200000 R12: ffffa344d7d09000 R13: ffffa344def3a980 R14: ffffffffc04f6e20 R15: 0000000000000008 ip_vs_in.part.29.constprop.36+0x34f/0x640 [ip_vs] ? ip_vs_conn_out_get+0xe0/0xe0 [ip_vs] ip_vs_remote_request4+0x47/0xa0 [ip_vs] ? ip_vs_in.part.29.constprop.36+0x640/0x640 [ip_vs] nf_hook_slow+0x43/0xc0 ip_local_deliver+0xac/0xc0 ? ip_rcv_finish+0x400/0x400 ip_rcv+0x26c/0x380 __netif_receive_skb_core+0x3a0/0xb10 ? inet_gro_receive+0x23c/0x2b0 ? netif_receive_skb_internal+0x24/0xb0 netif_receive_skb_internal+0x24/0xb0 napi_gro_receive+0xb8/0xe0 xennet_poll+0x676/0xb40 [xen_netfront] net_rx_action+0x139/0x3a0 __do_softirq+0xde/0x2b4 irq_exit+0xae/0xb0 xen_evtchn_do_upcall+0x2c/0x40 xen_hvm_callback_vector+0x7d/0x90 </IRQ> RIP: 0033:0x7fc9d11c91f9 RSP: 002b:00007ffebe8a2ea0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c RAX: 00000000ffffffff RBX: 0000000002609808 RCX: 0000000000000054 RDX: 0000000000000001 RSI: 0000000002605440 RDI: 00000000025f940e RBP: 00000000025f940e R08: 000000000260213d R09: 1999999999999999 R10: 000000000262a808 R11: 00000000025f942d R12: 00000000025f940e R13: 00007fc9d1301e20 R14: 00000000025f9408 R15: 00007fc9d1302720 Code: 48 8b 95 80 00 00 00 41 55 49 8d 8c 24 e0 05 00 00 45 8b 84 24 38 04 00 00 41 89 c1 48 89 de 48 c7 c7 a8 2f f2 92 e8 7c fa ff ff <0f> 0b 58 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 55 48 89 e5 41 56 Reported-by: Net Filter <netfilternetfilter@gmail.com> Fixes: b54ab92b84b6 ("netfilter: refcounter conversions") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-19netfilter: delete /proc THIS_MODULE referencesAlexey Dobriyan1-2/+0
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. # ipvs Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: ipvs: Remove useless ipvsh param of frag_safe_skb_hpGao Feng1-1/+1
The param of frag_safe_skb_hp, ipvsh, isn't used now. So remove it and update the callers' codes too. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+1
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-09netfilter: ipvs: Convert timers to use timer_setup()Kees Cook1-5/+5
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au>
2017-11-06netfilter: ipvs: Use %pS printk format for direct addressesHelge Deller1-1/+1
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-17netfilter: refcounter conversionsReshetova, Elena1-12/+12
refcount_t type and corresponding API (see include/linux/refcount.h) should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-28lib/vsprintf.c: remove %Z supportAlexey Dobriyan1-1/+1
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-06ipvs: update real-server binding of outgoing connections in SIP-peMarco Angaroni1-2/+3
Previous patch that introduced handling of outgoing packets in SIP persistent-engine did not call ip_vs_check_template() in case packet was matching a connection template. Assumption was that real-server was healthy, since it was sending a packet just in that moment. There are however real-server fault conditions requiring that association between call-id and real-server (represented by connection template) gets updated. Here is an example of the sequence of events: 1) RS1 is a back2back user agent that handled call-id1 and call-id2 2) RS1 is down and was marked as unavailable 3) new message from outside comes to IPVS with call-id1 4) IPVS reschedules the message to RS2, which becomes new call handler 5) RS2 forwards the message outside, translating call-id1 to call-id2 6) inside pe->conn_out() IPVS matches call-id2 with existing template 7) IPVS does not change association call-id2 <-> RS1 8) new message comes from client with call-id2 9) IPVS reschedules the message to a real-server potentially different from RS2, which is now the correct destination This patch introduces ip_vs_check_template() call in the handling of outgoing packets for SIP-pe. And also introduces a second optional argument for ip_vs_check_template() that allows to check if dest associated to a connection template is the same dest that was identified as the source of the packet. This is to change the real-server bound to a particular call-id independently from its availability status: the idea is that it's more reliable, for in->out direction (where internal network can be considered trusted), to always associate a call-id with the last real-server that used it in one of its messages. Think about above sequence of events where, just after step 5, RS1 returns instead to be available. Comparison of dests is done by simply comparing pointers to struct ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its memory address, but represent a different real-server in terms of ip-address / port. Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-05-06ipvs: make drop_entry protection effective for SIP-peMarco Angaroni1-3/+19
DoS protection policy that deletes connections to avoid out of memory is currently not effective for SIP-pe plus OPS-mode for two reasons: 1) connection templates (holding SIP call-id) are always skipped in ip_vs_random_dropentry() 2) in_pkts counter (used by drop_entry algorithm) is not incremented for connection templates This patch addresses such problems with the following changes: a) connection templates associated (via their dest) to virtual-services configured in OPS mode are included in ip_vs_random_dropentry() monitoring. This applies to SIP-pe over UDP (which requires OPS mode), but is more general principle: when OPS is controlled by templates memory can be used only by templates themselves, since OPS conns are deleted after packet is forwarded. b) OPS connections, if controlled by a template, cause increment of in_pkts counter of their template. This is already happening but only in case director is in master-slave mode (see ip_vs_sync_conn()). Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20ipvs: don't alter conntrack in OPS modeMarco Angaroni1-1/+2
When using OPS mode in conjunction with SIP persistent-engine, packets originating from the same ip-address/port could be balanced to different real servers, and (to properly handle SIP responses) OPS connections are created in the in-out direction too, where ip_vs_update_conntrack() is called to modify the reply tuple. As a result, there can be collision of conntrack tuples, causing random packet drops, as explained below: conntrack1: orig=CIP->VIP, reply=RIP1->CIP conntrack2: orig=RIP2->CIP, reply=CIP->VIP Tuple CIP->VIP is both in orig of conntrack1 and reply of conntrack2. The collision triggers packet drop inside nf_conntrack processing. In addition, the current implementation deletes the conntrack object at every expire of an OPS connection (once every forwarded packet), to have it recreated from scratch at next packet traversing IPVS. Since in OPS mode, by definition, we don't expect any associated response, the choices implemented in this patch are: a) don't call nf_conntrack_alter_reply() for OPS connections inside ip_vs_update_conntrack(). b) don't delete the conntrack object at OPS connection expire. The result is that created conntrack objects for each tuple CIP->VIP, RIP-N->CIP, etc. are left in UNREPLIED state and not modified by IPVS OPS connection management. This eliminates packet drops and leaves a single conntrack object for each tuple packets are sent from. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-04-20ipvs: optimize release of connections in OPS modeMarco Angaroni1-3/+23
One-packet-scheduling is the most expensive mode in IPVS from performance point of view: for each packet to be processed a new connection data structure is created and, after packet is sent, deleted by starting a new timer set to expire immediately. SIP persistent-engine needs OPS mode to have Call-ID based load balancing, so OPS mode performance has negative impact in SIP protocol load balancing. This patch aims to improve performance of OPS mode by means of the following changes in the release mechanism of OPS connections: a) call expire callback ip_vs_conn_expire() directly instead of starting a timer programmed to fire immediately. b) avoid call_rcu() overhead inside expire callback, since OPS connection are not inserted in the hash-table and last just the time to process the packet, hence there is no concurrent access to such data structures. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-10-07ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup}Simon Horman1-8/+5
If CONFIG_PROC_FS is undefined then the arguments of proc_create() and remove_proc_entry() are unused. As a result the net variables of ip_vs_conn_net_{init,cleanup} are unused. net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’: net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ [-Wunused-variable] net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’: net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ [-Wunused-variable] ... Resolve this by dereferencing net as needed rather than storing it in a variable. Fixes: 3d99376689ee ("ipvs: Pass ipvs not net into ip_vs_control_net_(init|cleanup)") Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2015-09-24ipvs: Pass ipvs not net into ip_vs_conn_net_init and ip_vs_conn_net_cleanupEric W. Biederman1-4/+4
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into ip_vs_conn_net_flushEric W. Biederman1-3/+4
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_conn_hashkeyEric W. Biederman1-4/+4
Use the address of struct netns_ipvs in the hash not the address of struct net. Both addresses are equally valid candidates and by using the address of struct netns_ipvs there becomes no need deal with struct net in this part of the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into conn_out_getEric W. Biederman1-2/+2
Move the hack of relying on "net_ipvs(skb_net(skb))" to derive the ipvs up a layer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into .conn_in_get and ip_vs_conn_in_get_protoEric W. Biederman1-2/+2
Stop relying on "net_ipvs(skb_net(skb))" to derive the ipvs as skb_net is a hack. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into ip_vs_conn_fill_param_protoEric W. Biederman1-4/+6
Move the ugly hack net_ipvs(skb_net(skb)) up a layer in the call stack so it is easier to remove. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_random_drop_entryEric W. Biederman1-2/+2
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_sync_connEric W. Biederman1-2/+1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_proto_data_getEric W. Biederman1-2/+2
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_find_destEric W. Biederman1-1/+1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_fill_connEric W. Biederman1-4/+4
ipvs is what is actually desired so change the parameter and the modify the callers to pass struct netns_ipvs. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Store ipvs not net in struct ip_vs_conn_paramEric W. Biederman1-7/+7
In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of param->net to access param->ipvs->net instead. When lookup up struct ip_vs_conn in a hash table replace comparisons of cp->net with comparisons of cp->ipvs which is possible now that ipvs is present in ip_vs_conn_param. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Store ipvs not net in struct ip_vs_connEric W. Biederman1-15/+15
In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of conn->net to access conn->ipvs->net instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01ipvs: drop inverse argument to conn_{in,out}_getAlex Gartrell1-6/+6
No longer necessary since the information is included in the ip_vs_iphdr itself. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18ipvs: use the new dest addr family fieldJulian Anastasov1-11/+38
Use the new address family field cp->daf when printing cp->daddr in logs or connection listing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwardingAlex Gartrell1-2/+10
Pull the common logic for preparing an skb to prepend the header into a single function and then set fields such that they can be used in either case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc) Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination address family to ip_vs_conn_newAlex Gartrell1-2/+3
The assumption that dest af is equal to service af is now unreliable, so we must specify it manually so as not to copy just the first 4 bytes of a v6 address or doing an illegal read of 16 butes on a v6 address. We "lie" in two places: for synchronization (which we will explicitly disallow from happening when we have heterogeneous pools) and for black hole addresses where there's no real dest. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}Alex Gartrell1-1/+7
We need to remove the assumption that virtual address family is the same as real address family in order to support heterogeneous services (that is, services with v4 vips and v6 backends or the opposite). Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>