The most detailed illustration of load balancing principles in the entire network

Latest update time：2021-02-26

Reads：

The origin of load balancing

In the early stages of business, we generally use a single server to provide external services. As business traffic increases, no matter how optimized a single server is, no matter how good the hardware is, there will always be a performance ceiling. When the performance of a single server cannot meet business needs, multiple servers need to be formed into a cluster system to improve Overall processing performance.

Based on the above requirements, we need to use a unified traffic portal to provide external services. Essentially, we need a traffic scheduler to evenly distribute a large amount of user request traffic to different servers in the cluster through a balanced algorithm. This is actually what we are talking about today, load balancing .

There are several benefits that using load balancing can bring us:

Improved the overall performance of the system;
Improved system scalability;
Improved system availability;

Load balancing type

Load balancers in a broad sense can be roughly divided into three categories, including: DNS load balancing, hardware load balancing, and software load balancing.

(1) DNS realizes load balancing

DNS is the most basic and simple way to achieve load balancing. A domain name is resolved to multiple IPs through DNS, and each IP corresponds to a different server instance. This completes traffic scheduling. Although a conventional load balancer is not used, a simple load balancing function is implemented.

The biggest advantage of implementing load balancing through DNS is that it is simple to implement and low cost. There is no need to develop or maintain load balancing equipment yourself. However, there are some disadvantages:

Server failover delays are large and server upgrades are inconvenient . We know that there are layers of caches between DNS and users. Even if the fault occurs through DNS modification or removal of the faulty server in time, the operator's DNS cache will be passed through in the middle, and the cache may not follow TTL rules, causing DNS to take effect. Time becomes very slow, and sometimes there is still a little request traffic after a day.
Traffic scheduling is unbalanced and the granularity is too coarse . The balance of DNS scheduling is affected by the regional operator's LocalDNS policy of returning IP lists. Some operators will not poll and return multiple different IP addresses. In addition, how many users a certain operator's LocalDNS serves will also be an important factor in uneven traffic scheduling.
The traffic distribution strategy is too simple and supports too few algorithms . DNS generally only supports rr the polling method, the traffic distribution strategy is relatively simple, and does not support scheduling algorithms such as weight and Hash.
The list of IPs supported by DNS is limited . We know that DNS uses UDP packets to transmit information. The size of each UDP packet is limited by the MTU of the link, so the number of IP addresses stored in the packet is also very limited. The Alibaba DNS system supports the configuration of 10 different IP addresses for the same domain name. IP address.

In fact, this method is rarely used to achieve load balancing in production environments. After all, the shortcomings are obvious. The reason why the DNS load balancing method is described in this article is to explain the concept of load balancing more clearly.

Companies like BAT generally use DNS to achieve global load balancing at the geographical level, achieve nearby access, and improve access speed. This method is generally the basic load balancing of inlet traffic, and the lower layer will be implemented by more professional load balancing equipment. Load architecture.

(2) Hardware load balancing

Hardware load balancing implements the load balancing function through specialized hardware equipment and is a dedicated load balancing device. Currently, there are two typical hardware load balancing devices in the industry: F5 and A10 .

This type of equipment has strong performance and powerful functions, but it is very expensive. Generally, only rich companies will use this type of equipment. Small and medium-sized companies generally cannot afford it and their business volume is not that large. It is quite wasteful to use these equipment.

Advantages of hardware load balancing:

Powerful functions: Comprehensive support for load balancing at all levels and comprehensive load balancing algorithms.
Powerful performance: The performance far exceeds that of common software load balancers.
High stability: Commercial hardware load balancing has undergone rigorous testing and large-scale use, and has high stability.
Security protection: It also has security functions such as firewall and anti-DDoS attacks, and supports SNAT function.

The disadvantages of hardware load balancing are also obvious:

Expensive;
Poor scalability, unable to be expanded and customized;
Debugging and maintenance are troublesome and require professionals;

(3) Software load balancing

Software load balancing can run load balancing software on ordinary servers to achieve load balancing functions. Common ones currently include Nginx , HAproxy , LVS . The difference:

Nginx : Seven-layer load balancing, supports HTTP, E-mail protocols, and also supports 4-layer load balancing;
HAproxy : Supports seven-layer rules, and the performance is also very good. The default load balancing software used by OpenStack is HAproxy;
LVS : Running in the kernel state, the performance is the highest among software load balancing. Strictly speaking, it works on the third layer, so it is more versatile and suitable for various application services.

Advantages of software load balancing:

Easy to operate: both deployment and maintenance are relatively simple;
Cheap: only the cost of the server is required, and the software is free;
Flexible: Layer 4 and Layer 7 load balancing can be selected according to business characteristics, making it easy to expand and customize functions.

Load balancing LVS

Software load balancing mainly includes: Nginx, HAproxy and LVS, all three software are commonly used. Four-layer load balancing basically uses LVS. It is understood that major manufacturers such as BAT are heavy users of LVS because of its excellent performance, which can save companies huge costs.

LVS, the full name Linux Virtual Server is an open source project initiated by Chinese Dr. Zhang Wensong. It is very popular in the community. It is a reverse proxy server based on four layers and has powerful performance.

Now part of the standard core, it delivers reliability, high performance, scalability and operability, resulting in optimal performance at low cost.

Basic principles of Netfilter

LVS is a load balancing function based on the netfilter framework in the Linux kernel. Therefore, before learning LVS, you must first briefly understand the basic working principle of netfilter. Netfilter is actually very complicated. What we usually call Linux firewall is netfilter, but we usually operate iptables. iptables is just a tool for writing and transmitting rules in user space. Netfilter is the real work. The following figure can briefly understand the working mechanism of netfilter:

Netfilter is a kernel-mode Linux firewall mechanism. As a general, abstract framework, it provides a complete set of hook function management mechanisms, providing functions such as packet filtering, network address translation, and connection tracking based on protocol types.

通俗点讲，就是 netfilter 提供一种机制，可以在数据包流经过程中，根据规则设置若干个关卡（hook 函数）来执行相关的操作。netfilter 总共设置了 5 个点，包括：PREROUTING、INPUT、FORWARD、OUTPUT、POSTROUTING

PREROUTING ：刚刚进入网络层，还未进行路由查找的包，通过此处
INPUT ：通过路由查找，确定发往本机的包，通过此处
FORWARD ：经路由查找后，要转发的包，在POST_ROUTING之前
OUTPUT ：从本机进程刚发出的包，通过此处
POSTROUTING ：进入网络层已经经过路由查找，确定转发，将要离开本设备的包，通过此处

当一个数据包进入网卡，经过链路层之后进入网络层就会到达 PREROUTING，接着根据目标 IP 地址进行路由查找，如果目标 IP 是本机，数据包继续传递到 INPUT 上，经过协议栈后根据端口将数据送到相应的应用程序。

应用程序处理请求后将响应数据包发送到 OUTPUT 上，最终通过 POSTROUTING 后发送出网卡。

如果目标 IP 不是本机，而且服务器开启了 forward 参数，就会将数据包递送给 FORWARD 上，最后通过 POSTROUTING 后发送出网卡。

LVS基础原理

LVS 是基于 netfilter 框架，主要工作于 INPUT 链上，在 INPUT 上注册 ip_vs_in HOOK 函数，进行 IPVS 主流程，大概原理如图所示：

当用户访问 www.sina.com.cn 时，用户数据通过层层网络，最后通过交换机进入 LVS 服务器网卡，并进入内核网络层。
进入 PREROUTING 后经过路由查找，确定访问的目的 VIP 是本机 IP 地址，所以数据包进入到 INPUT 链上
LVS 是工作在 INPUT 链上，会根据访问的 IP:Port 判断请求是否是 LVS 服务，如果是则进行 LVS 主流程，强行修改数据包的相关数据，并将数据包发往 POSTROUTING 链上。
POSTROUTING 上收到数据包后，根据目标 IP 地址（后端真实服务器），通过路由选路，将数据包最终发往后端的服务器上。

开源 LVS 版本有 3 种工作模式，每种模式工作原理都不同，每种模式都有自己的优缺点和不同的应用场景，包括以下三种模式：

DR 模式
NAT 模式
Tunnel 模式

这里必须要提另外一种模式是 FullNAT ，这个模式在开源版本中是模式没有的。这个模式最早起源于百度，后来又在阿里发扬光大，由阿里团队开源，代码地址如下：

https://github.com/alibaba/lvs

LVS 官网也有相关下载地址，不过并没有合进到内核主线版本。

后面会有专门章节详细介绍 FullNAT 模式。下边分别就 DR、NAT、Tunnel 模式分别详细介绍原理。

DR 模式实现原理

LVS 基本原理图中描述的比较简单，表述的是比较通用流程。下边会针对 DR 模式的具体实现原理，详细的阐述 DR 模式是如何工作的。

其实 DR 是最常用的工作模式，因为它的强大的性能。下边试图以某个请求和响应数据流的过程来描述 DR 模式的工作原理

（一）实现原理过程

① 当客户端请求 www.sina.com.cn 主页，请求数据包穿过网络到达 Sina 的 LVS 服务器网卡：源 IP 是客户端 IP 地址 CIP ，目的 IP 是新浪对外的服务器 IP 地址，也就是 VIP ；此时源 MAC 地址是 CMAC ，其实是 LVS 连接的路由器的 MAC 地址（为了容易理解记为 CMAC），目标 MAC 地址是 VIP 对应的 MAC，记为 VMAC 。

② 数据包经过链路层到达 PREROUTING 位置（刚进入网络层），查找路由发现目的 IP 是 LVS 的 VIP ，就会递送到 INPUT 链上，此时数据包 MAC、IP、Port 都没有修改。

③ 数据包到达 INPUT 链，INPUT 是 LVS 主要工作的位置。此时 LVS 会根据目的 IP 和 Port 来确认是否是 LVS 定义的服务，如果是定义过的 VIP 服务，就会根据配置信息，从真实服务器列表中选择一个作为 RS1，然后以 RS1 作为目标查找 Out 方向的路由，确定一下跳信息以及数据包要通过哪个网卡发出。最后将数据包投递到 OUTPUT 链上。

④ 数据包通过 POSTROUTING 链后，从网络层转到链路层，将目的 MAC 地址修改为 RealServer 服务器 MAC 地址，记为 RMAC ；而源 MAC 地址修改为 LVS 与 RS 同网段的 selfIP 对应的 MAC 地址，记为 DMAC 。此时，数据包通过交换机转发给了 RealServer 服务器（注： 为了简单图中没有画交换机 ）。

⑤ 请求数据包到达后端真实服务器后，链路层检查目的 MAC 是自己网卡地址。到了网络层，查找路由，目的 IP 是 VIP（lo 上配置了 VIP），判定是本地主机的数据包，经过协议栈拷贝至应用程序（比如 nginx 服务器），nginx 响应请求后，产生响应数据包。

然后以 CIP 查找出方向的路由，确定下一跳信息和发送网卡设备信息。此时数据包源、目的 IP 分别是 VIP、CIP，而源 MAC 地址是 RS1 的 RMAC ，目的 MAC 是下一跳（路由器）的 MAC 地址，记为 CMAC（为了容易理解，记为 CMAC ）。然后数据包通过 RS 相连的路由器转发给真正客户端，完成了请求响应的全过程。

从整个过程可以看出，DR 模式 LVS 逻辑比较简单，数据包通过直接路由方式转发给后端服务器，而且响应数据包是由 RS 服务器直接发送给客户端，不经过 LVS。

我们知道通常请求数据包会比较小，响应报文较大，经过 LVS 的数据包基本上都是小包，所以这也是 LVS 的 DR 模式性能强大的主要原因。

（二）优缺点和使用场景

Advantages of DR mode

The response data does not go through lvs, and the performance is high

Small modifications to data packets, complete information preservation (carrying client source IP)

Disadvantages of DR Mode

lvs and rs must be on the same physical network (cross-machine rooms are not supported)

lo and other kernel parameters must be configured on the server

Port mapping is not supported

Usage scenarios of DR mode

If the performance requirements are very high, DR mode can be preferred, and the client source IP address can be transparently transmitted.

NAT mode implementation principle

The second working mode of lvs is NAT mode. The following figure details the entire process in which the data packet enters lvs from the client and is forwarded to rs. Then the response data is forwarded to lvs again through rs, and lvs replies to the client. .

(1) Implementation principles and processes

① The user request data packet passes through the network layers and reaches the lvs network card. At this time, the source IP of the data packet is CIP and the destination IP is VIP.

②Enter the prerouting position of the network layer through the network card, search the route according to the destination IP, confirm that it is the local IP, and forward the data packet to INPUT. At this time, the source and destination IP have not changed.

③After arriving at lvs, check whether it is an IPVS service through the destination IP and destination port. If it is an IPVS service, an RS will be selected as the back-end server, the destination IP of the data packet will be changed to RIP, and routing information will be searched using RIP as the destination IP, the next hop and exit information will be determined, and the data packet will be forwarded to the output.

④The modified data packet reaches the RS server after postrouting and link layer processing. At this time, the source IP of the data packet is CIP and the destination IP is RIP.

⑤After the data packets arriving at the RS server are inspected by the link layer and network layer, they are sent to the user space nginx program. After the nginx program is processed, a response packet is sent. Since the default gateway on the RS is configured as the lvs device IP, the nginx server will forward the packet to the next hop, which is the lvs server. At this time, the source IP of the data packet is RIP and the destination IP is CIP.

⑥ After receiving the RS response packet, the lvs server searches the route and finds that the destination IP is not the local IP, and the lvs server has turned on the forwarding mode, so it forwards the packet to the forward chain. At this time, the packet is not modified.

⑦ After receiving the response packet, lvs searches the service and connection table according to the destination IP and destination port, changes the source IP to VIP, determines the next hop and exit information through route search, and sends the packet to the gateway. After a complicated process The network reaches the user client and finally completes an interaction of request and response.

Bidirectional traffic in NAT mode passes through LVS, so there will be a certain bottleneck in NAT mode performance. However, what is different from other modes is that NAT supports port mapping and supports Windows operating systems.

(2) Advantages, Disadvantages and Usage Scenarios

NAT mode advantages

Able to support windows operating system

Support port mapping. If the rs port is inconsistent with the vport, lvs will not only modify the destination IP, but also modify the dport to support port mapping.

Disadvantages of NAT mode

Backend RS needs to configure a gateway

Bidirectional traffic puts a lot of pressure on LVS load

Usage scenarios of NAT mode

If you are using a windows system and using lvs, you must choose NAT mode.

Tunnel mode implementation principle

Tunnel mode is rarely used in China, but it is said that Tencent uses a lot of Tunnel mode. It is also a one-arm mode. Only the request data will go through lvs, and the response data is sent directly from the back-end server to the client. The performance is also very powerful, and it supports cross-machine rooms. Continue to look at the picture analysis principle below.

(1) Implementation principles and processes

①The user request data packet passes through the multi-layer network and reaches the lvs network card. At this time, the source IP of the data packet is cip and the destination ip is vip.

② Enter the prerouting position of the network layer through the network card, find the route according to the destination IP, confirm that it is the local IP, forward the data packet to the input chain, and reach the lvs. At this time, the source and destination IP have not changed.

③After arriving at lvs, check whether it is an IPVS service through the destination ip and destination port. If it is an IPVS service, an rs will be selected as the back-end server, and rip will be used as the destination ip to find routing information, determine the next hop, dev and other information, and then an additional IP header will be added in front of the IP header (with dip as the source, rip For the destination IP), forward the data packet to the output.

④The data packet finally passes through the lvs network card according to the routing information, is sent to the router gateway, and reaches the back-end server through the network.

⑤After the back-end server receives the data packet, the ipip module unloads the Tunnel header. The source IP that is normally seen is cip and the destination IP is vip. Since vip is configured on tunl0, it is determined to be the local IP after route search and sent to app. After the application nginx responds to the data normally, the data packet is sent out of the network card with vip as the source IP and cip as the destination IP, and finally reaches the client.

Tunnel mode has the high performance of DR mode and supports cross-machine room access, which sounds perfect. However, domestic operators have certain characteristics. For example, the source IP of RS's response packet is VIP. VIP and back-end servers may be cross-operator and may be blocked by the operator's policy. Tunnel is in the production environment I have never used it before. It may be difficult to implement Tunnel in China.

(2) Advantages, Disadvantages and Usage Scenarios

Advantages of Tunnel Mode

Single-arm mode, less pressure on LVS load

The modifications to the data package are minor and the information is completely preserved.

Can span computer rooms (but it is difficult to implement in China)

Disadvantages of Tunnel Mode

You need to install and configure the ipip module on the backend server

VIP needs to be configured on the backend server tunl0

The addition of tunnel headers may cause fragmentation and affect server performance.

The tunnel head IP address is fixed, and the backend server network card hash may be uneven.

Port mapping is not supported

Usage scenarios of Tunnel mode

Theoretically, if the forwarding performance requirements are high and there are cross-machine room requirements, Tunnel may be a better choice.

So far, the principle of LVS has been explained clearly, and the content is relatively large. It is recommended to read it twice. Since the article is too long, the practical operation content will be left to the next article.

------------ END ------------

Recommended reading