How does VXLAN realize cross-node communication on the same network segment

created at 08-07-2021 views: 4

VXLAN is an encapsulation protocol that encapsulates Layer 2 data packets in Layer 3 data packets. Mainly to solve the problem of insufficient vlan id.

vxlan

In order to test vxlan to achieve the same subnet communication between nodes, use netns to build the following network environment, the topology is as follows, and the setting commands are shown at the end of the article

+-----------------+                           +-----------------+
|node-1           |                           |node-2           |
| +---------+     |                           | +---------+     |
| |   br0   |     |                           | |   br0   |     |
| +---------+     |                           | +---------+     |
| |         |     |                           | |         |     |
| br0       |     |                           | br0       |     |
|4e:69:85:99:e0:04|                           |e6:39:1c:37:20:a2|
|192.168.101.2/24 |                           |192.168.101.3/24 |
|           |     |                           |           |     |
|         vxlan0  |                           |         vxlan0  |
|4e:69:85:99:e0:04|                           |e6:39:1c:37:20:a2|
|                 |          +-----+          |                 |
|node1-eth -------|----------| sw0 |----------|-------node-2-eth|
|ae:93:c0:59:9e:a0|          +-----+          |4e:8b:ee:bd:58:b5|
|192.168.1.2/24   |                           |192.168.1.3/24   |
+-----------------+                           +-----------------+

Node-1 and node-2 each have an IP of 192.168.101.0/24. How does vxlan realize the intercommunication of these two nodes on this network segment?
The br0 port of the br0 bridge on node-1 and 2 is configured with the 192.168.101.0 network segment ip, and a tun device of vxlan0 is added, and vxlan0 is configured with remote and local and vni id.
When ping 192.168.101.3 in node-1:

1 Check the local routing table and go to br0 after matching, so sip=192.168.101.2 dip=192.168.101.3 smac=[MAC node-1 br0] dmac=[MAC node-2 br0]

➜  ~ ip netns exec node-1 ip r | grep 192.168.101.0
192.168.101.0/24 dev br0 proto kernel scope link src 192.168.101.2

2 After the data packet reaches br0, br0 is a switch, check fdb (forward database) to determine the data packet to be sent to the port of vxlan0

➜  ~ ip netns exec node-1 bridge fdb | grep e6:39:1c:37:20:a2
e6:39:1c:37:20:a2 dev vxlan0 master br0
e6:39:1c:37:20:a2 dev vxlan0 dst 192.168.1.3 self

3 After the layer 2 data packet arrives at the vxlan port, set the vxlan header, udp header, and ip header according to the vxlan configuration

➜  ~ ip netns exec node-1 ip -d l show vxlan0
3: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 4e:69:85:99:e0:04 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
    vxlan id 1111 remote 192.168.1.3 local 192.168.1.2 dev node1-eth srcport 0 0 dstport 4789 ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx
    bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.4e:69:85:99:e0:4 designated_root 8000.4e:69:85:99:e0:4 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

4 After the encapsulated three-layer data packet passes through the routing table, the data packet is encapsulated with the Ethernet header and sent out from the node1-eth port

➜  ~ ip netns exec node-1 ip r | grep 192.168.1.0
192.168.1.0/24 dev node1-eth proto kernel scope link src 192.168.1.2

The packet capture on node-2 is as follows:

➜  ~ ip netns exec node-2 tcpdump -i any -nnev
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
14:33:49.402662 node2-eth In  ifindex 10 ae:93:c0:59:9e:a0 ethertype IPv4 (0x0800), length 154: (tos 0x0, ttl 64, id 10391, offset 0, flags [none], proto UDP (17), length 134)
    192.168.1.2.45186 > 192.168.1.3.4789: VXLAN, flags [I] (0x08), vni 1111
4e:69:85:99:e0:04 > e6:39:1c:37:20:a2, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 6168, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.101.2 > 192.168.101.3: ICMP echo request, id 64021, seq 1, length 64
14:33:49.402662 vxlan0 In  ifindex 3 4e:69:85:99:e0:04 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 6168, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.101.2 > 192.168.101.3: ICMP echo request, id 64021, seq 1, length 64
14:33:49.402662 br0   In  ifindex 2 4e:69:85:99:e0:04 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 6168, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.101.2 > 192.168.101.3: ICMP echo request, id 64021, seq 1, length 64
14:33:49.402678 br0   Out ifindex 2 e6:39:1c:37:20:a2 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 26146, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.101.3 > 192.168.101.2: ICMP echo reply, id 64021, seq 1, length 64
14:33:49.402679 vxlan0 Out ifindex 3 e6:39:1c:37:20:a2 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 26146, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.101.3 > 192.168.101.2: ICMP echo reply, id 64021, seq 1, length 64
14:33:49.402680 node2-eth Out ifindex 10 4e:8b:ee:bd:58:b5 ethertype IPv4 (0x0800), length 154: (tos 0x0, ttl 64, id 2944, offset 0, flags [none], proto UDP (17), length 134)
    192.168.1.3.45186 > 192.168.1.2.4789: VXLAN, flags [I] (0x08), vni 1111
e6:39:1c:37:20:a2 > 4e:69:85:99:e0:04, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 26146, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.101.3 > 192.168.101.2: ICMP echo reply, id 64021, seq 1, length 64

6 packets captured
6 packets received by filter
0 packets dropped by kernel

appendix

Environment setup script

brctl addbr sw0
ip l set sw0 up
ip netns add node-1
ip netns add node-2
ip l add node1-eth type veth peer node1-tap
ip l add node2-eth type veth peer node2-tap
ip l set node1-eth netns node-1
ip l set node2-eth netns node-2
brctl addif sw0 node1-tap
brctl addif sw0 node2-tap
ip l set node1-tap up
ip l set node2-tap up
ip netns exec node-1 ip l set lo up
ip netns exec node-2 ip l set lo up
ip netns exec node-1 ip l set node1-eth up
ip netns exec node-2 ip l set node2-eth up
ip netns exec node-1 ip a add 192.168.1.2/24 dev node1-eth
ip netns exec node-2 ip a add 192.168.1.3/24 dev node2-eth
ip netns exec node-1 brctl addbr br0
ip netns exec node-2 brctl addbr br0
ip netns exec node-1 ip l set br0 up
ip netns exec node-2 ip l set br0 up
ip netns exec node-1 ip a add 192.168.101.2/24 dev br0
ip netns exec node-2 ip a add 192.168.101.3/24 dev br0
# Set up unicast point-to-point vxlan (if you set up multicast, use the command ip l add vxlan0 type vxlan id 1111 dstport 4789 group 239.1.1.1 dev node1-eth)
ip netns exec node-1 ip l add vxlan0 type vxlan id 1111 dstport 4789 remote 192.168.1.3 local 192.168.1.2 dev node1-eth
ip netns exec node-2 ip l add vxlan0 type vxlan id 1111 dstport 4789 remote 192.168.1.2 local 192.168.1.3 dev node2-eth
ip netns exec node-1 ip l set vxlan0 up
ip netns exec node-2 ip l set vxlan0 up
ip netns exec node-1 brctl addif br0 vxlan0
ip netns exec node-2 brctl addif br0 vxlan0
created at:08-07-2021
edited at: 08-07-2021: