Linux bridge source code framework analysis[Copy link]
http://www.chinaunix.net Author: Dugu Jiujian Published: 2006-05-24
This post was written while I was looking at the code. Some of it is messy and there are errors in some places. As for the organization and correction of the post, I will post it on my personal homepage: http://www.skynet.org.cn/forumdisplay.php?fid=12&page =. I hope everyone can correct me.
Today I encountered trouble while dealing with the STP problem of the bridge. I have read a lot of theories about this thing, but I haven't really learned its origin. Now I want to look at the Linux implementation, but I don't have any information at hand. I have read for two hours and only read the framework structure of the bridge, so I want to post it first. I hope there are big brothers who are studying this area to discuss it and continue to finish it so that I can learn from it:
Version: Linux 2.4.18
1. Call the soft interrupt function static void net_rx_action(struct softirq_action *h) in src/net/core/dev.c: line 1479
#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE) if (skb->dev->br_port != NULL && br_handle_frame_hook != NULL) { handle_bridge(skb, pt_prev); dev_put(rx_dev); continue; } #endif If a bridge or bridge module is defined, it will be processed by the handle_bridge function skb->dev->br_port: the port that receives the data packet is a member of the bridge port group br_handle_frame_hook: the bridge processing function is defined
2. Initialization src/net/bridge/br.c: static int __init br_init(void) { printk(KERN_INFO "NET4: Ethernet Bridge 008 for NET4.0\n");
return 0; } The initialization function specifies that the bridge processing function is br_handle_frame The ioctl processing function is: br_ioctl_deviceless_stub
/*Get the destination MAC address*/ dest = skb->mac.ethernet->h_dest;
/*skb->dev->br_port is used to specify the port that receives the data packet. If it is not a port belonging to the bridge, it is NULL*/ p = skb->dev->br_port; if (p == NULL) /*The port is not in the bridge group port*/ goto err_nolock;
/*The bridge group to which this port belongs*/ br = p->br;
/*Lock, because the CAM table needs to be read during forwarding, so the read lock must be added to prevent other kernel control paths (such as system calls on another CPU on a multiprocessor) from modifying the CAM table during this process*/ read_lock(&br->lock); if (skb->dev->br_port == NULL) /*Previously judged*/ goto err;
/*br->dev is the virtual network card of the bridge. If it is not UP, or the bridge is DISABLED, p->state is actually the state of the current port of the bridge after STP calculation and judgment*/ if (!(br->dev.flags & IFF_UP) || p->state == BR_STATE_DISABLED) goto err;
/*The source MAC address is 255.XXX, that is, the source MAC is multicast or broadcast, so it is discarded*/ if (skb->mac.ethernet->h_source[0] & 1) goto err;
/*As we all know, the reason why a bridge is a bridge and is smarter than a hub is that it has a MAC-PORT table, so that data forwarding does not need to be broadcasted, but can be determined by looking up the table . Every time a packet is received, the bridge will learn its source MAC and add it to this table. This table in Linux is called the CAM table (this name is from other materials). If the bridge state is LEARNING or FORWARDING (learning or forwarding), the source address of the packet skb->mac is learned.ethernet->h_source, add it to the CAM table, if it already exists in the table, update the timer, br_fdb_insert completes the process */ if (p->state == BR_STATE_LEARNING || p->state == BR_STATE_FORWARDING) br_fdb_insert(br, p, skb->mac.ethernet->h_source, 0);
/*The destination MAC of the BPDU packet of the STP protocol uses the multicast destination MAC address: starting from 01-80-c2-00-00-00 (Bridge_group_addr: bridge group multicast address). So here, if STP is enabled and the current data packet is a BPDU (!memcmp(dest, bridge_ula, 5), unsigned char bridge_ula[6] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };), it will be handled by the corresponding function*/ if (br->stp_enabled && /*Only the first 5 bytes are compared here. I haven't studied it carefully. STP uses all multicast addresses (from 0 1: 0 0: 5 e: 0 0:00:00 to 01:00:5e:7f:ff:ff. ), or only a part is used. It seems that only a part is used here. I didn’t go into it. */ !memcmp(dest, bridge_ula, 5) && !(dest[5] & 0xF0)) /*What address is 01-80-c2-00-00-F0? Why do we need to judge it? */ goto handle_special_frame;
/*Process the hook function, and then transfer it to the br_handle_frame_finish function to continue processing*/ if (p->state == BR_STATE_FORWARDING) { NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL, br_handle_frame_finish); read_unlock(&br->lock); return; }
handle_special_frame: if (!dest[5]) { br_stp_handle_bpdu(skb); return; }
kfree_skb(skb); }
4. br_handle_frame_finish
static int br_handle_frame_finish(struct sk_buff *skb) { struct net_bridge *br; unsigned char *dest; struct net_bridge_fdb_entry *dst; struct net_bridge_port *p; int passedup;
/*Basically the same as before*/ dest = skb->mac.ethernet->h_dest;
p = skb->dev->br_port; if (p == NULL) goto err_nolock;
br = p->br; read_lock(&br->lock); if (skb->dev->br_port == NULL) goto err;
passedup = 0;
/*If the virtual network card of the bridge is in promiscuous mode, then each received data packet needs to be cloned and sent to the AF_PACKET protocol handler (processing of the ptype_all chain in the network soft interrupt function net_rx_action). */ if (br->dev.flags & IFF_PROMISC) { struct sk_buff *skb2;
/*If the destination MAC is broadcast or multicast, the data packet needs to be transmitted to the upper layer protocol stack of the local machine. There is a flag variable passedup to indicate whether it has been transmitted. If it has been transmitted, then forget it*/ if (dest[0] & 1) { br_flood_forward(br, skb, !passedup); if (!passedup) br_pass_frame_up(br, skb); goto out; }
/*The MAC-PORT table in Linux is a CAM table. Here, the table is looked up according to the destination address to determine which interface forwards the packet. Each table entry is described by the structure struct net_bridge_fdb_entry: struct net_bridge_fdb_entry { struct net_bridge_fdb_entry *next_hash; //Linked list pointer for CAM table connection struct net_bridge_fdb_entry **pprev_hash; //Why is it pprev instead of prev? I haven't studied it carefully yet atomic_t use_count; //The current reference counter of this item mac_addr addr; //MAC address struct net_bridge_port *dst; //The physical port corresponding to this item unsigned long ageing_timer; //Handle MAC timeout unsigned is_local:1; //Is it the MAC address of this machine? unsigned is_static:1; //Is it a static MAC address ? };*/ dst = br_fdb_get(br, dest);
/*After querying the CAM table, if the table entry can be found and the destination MAC is to the virtual network card of this machine, then this packet needs to be submitted to the upper layer protocol, so that we can remotely manage the bridge through the address of this virtual network card*/ if (dst != NULL && dst->is_local) { if (!passedup) br_pass_frame_up(br, skb); else kfree_skb(skb); br_fdb_put(dst); goto out; }
/*Found the table, and it is not the local virtual network card, forward it*/ if (dst != NULL) { br_forward(dst->dst, skb); br_fdb_put(dst); goto out; }
/*If it cannot be found in the table, then we have to learn about HUB...*/ br_flood_forward(br, skb, 0);
The basic framework is like this, which is basically the same as what is said in the books on the principles of bridges... The reason why a bridge is a bridge is mainly due to these two functions: br_fdb_insert br_fdb_get is one for learning and one for table lookup; in addition, to support STP and process BPDU, the function br_stp_handle_bpdu is needed. Does anyone have a detailed analysis of these three functions? Can you send me a copy so that I don’t have to work so hard to study the code in the afternoon...
I took a look at br_fdb_insert, and the structure is still very clear. If the current item already exists in the hash table item, it will be updated (__fdb_possibly_replace). If it is a new item, it will be inserted. It is actually a two-way linked list maintenance process (__hash_link):
void br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, unsigned char *addr, int is_local) { struct net_bridge_fdb_entry *fdb ; int hash;
Similarly, table lookup is also a process of traversing the linked list and matching the address: struct net_bridge_fdb_entry *br_fdb_get(struct net_bridge *br, unsigned char *addr) { struct net_bridge_fdb_entry *fdb;
Keep going, give it a thumbs up. After reading it, everyone must give it a thumbs up, so that the author can give you more exciting content. Otherwise, who is willing to share knowledge with you? Even with such a small request, you are not willing to do it???
Quote:Originally posted bysnow_inskyon 2006-1-12 13:00 Go ahead, give it a thumbs up. After reading it, everyone must give it a thumbs up, so that the author can give you more exciting content. Otherwise, who is willing to share knowledge with you? Even with such a small request, you are not willing to do it???
I don't have any more exciting content. I'm just looking at an implementation based on the idea of dealing with the problem I encountered. The purpose of posting this is to hope that the experts who study this field can write more exciting articles so that we can learn a thing or two...
I looked at another function and continued to post it: STP processing function /* called under bridge lock */ void br_stp_handle_bpdu(struct sk_buff *skb) { unsigned char *buf; struct net_bridge_port *p;
/*BPDU packets are divided into two categories, marked by the TYPE field, into configuration and TCN (Topology Change Notification)*/
/*If it is configuration type*/ if (buf[6] == BPDU_TYPE_CONFIG) { /* In the kernel, struct br_config_bpdu is used to describe a BPDU packet: struct br_config_bpdu { unsigned topology_change:1; //Topology change flag unsigned topology_change_ack:1; //Topology change response flag bridge_id root; //Root ID, used in the converged bridge network, this field in all configuration BPDUs should have the same value (same VLAN), and can be divided into two BID subfields: bridge priority and bridge MAC address int root_path_cost; //Path cost, the accumulated capital of all links leading to the root bridge (Root Bridge) bridge_id bridge_id; //Bridge BID that creates the current BPDU. For all BPDUs sent by a single switch (single VLAN), the value of this field is the same, while for BPDUs sent between switches, the value of this field is different) port_id port_id; //Port ID, each port value is unique. The value of port 1/1 is 0×8001, and the value of port 1/2 is 0×8002. int message_age; //Records the time consumed by the Root Bridge to generate the current BPDU origin information int max_age; //The maximum time to save BPDU, which also reflects the bridge table lifetime in the process of Topology Change Notification int hello_time; //Refers to the time between periodic configuration BPDUs int forward_delay; //The time used in the Listening and Learning states, which also reflects the time in the process of Topology Change Notification }; In this structure, three fields of the bpdu packet are not included: Protocol ID - protocol field, always 0. Version - version field, always 0. Type - determines the two BPDU format types contained in the frame (configuration BPDU or TCN BPDU). In the above, buf[6] is used for direct access. This is because there is a three-byte LLC header before bpdu, plus ProtocolID (2 bytes), VersionID (1 byte), 3+2+1, so it is buf[6]. This is a standard 802.3 packet method, which is slightly different from Ethernet packet. See the top picture on the second page of Chapter 2 of "TCP/IP Detailed Explanation Volume 1" (remember it is) */
to Dugu Jiujian: I have a question. Is there any simple way to obtain the corresponding table of arp and ip addresses and the routing table of the current system?
Quote:Original post byguotieon 2006-1-12 13:59 to Dugu Jiujie: I have a question. Is there any simple way to get the current system's arp and ip address correspondence table and routing table?
I analyzed that the MAC address table and arp table mentioned in the bridge are two completely different concepts.
"Get the current system's arp and ip address correspondence table and routing table" can be done by reading proc, see the source code of nettools or busybox... But the simplest way is still system(......)
The detailed STP protocol will not be posted here, as the RFC is already available. Let's continue to analyze config BPDU:
Let's first briefly talk about the operation process of STP: STP needs to determine the root bridge, root port, and designated port, so it is necessary to make a judgment between the determinations. The judgment principle is: 1. The smallest root BID (the switch with the smallest BID among all switches becomes the root bridge) 2. The smallest path cost to the root bridge (determine the root port) 3. The smallest send BID (determine the pointing port) 4. The smallest port ID (if other standards are the same, the selection standard is determined based on the port ID, and the smaller one is given priority) Therefore, the bridge needs to compare these values in the packet with the values it originally saved every time it receives a BPDU packet. The corresponding function is: br_supersedes_port_info
After determining these values, it is necessary to elect the root bridge, root port, and designated port based on these values. The operation process is: 1.Select the root bridge. The election scope is the entire network. The selection process is that the switches exchange BPDUs with each other. The selection basis is to determine whose BID is smaller (smaller priority, smaller bridge MAC) based on the BID. 2. Select the root port. The election scope is between each nonbridge port connected to other switches (ports on the same switch connected to other switches). The selection basis is the smaller path cost. Each nonbridge has a root port and can send and receive data. 3. Select the designate port. The selection scope is the port connecting each network segment (ports on different switches). The selection basis is also the smaller path cost. If they are the same, further compare the BID. Each network segment has one designate port and can send and receive data. 4. Through the above selection, the port that has not become any role is called a nondesignate port. The port is set to block state and can receive data but not forward data.
The first three steps are the selection process. The corresponding function is br_configuration_update. The fourth step is to determine the state of the port based on the result after the election. The corresponding function is: br_port_state_selection. The switch port with STP turned on may be in 5 states: 1. Block: Blocking state, receiving but not forwarding data. 2. Listening: Listening state, no data forwarding, can send and receive BPDU, perform root bridge election, root port, designate port and other actions. 3. Learning: Learning state, no data forwarding, start learning MAC, prepare for data forwarding 4. Forward: Forwarding state, forward data. 5. Disable: Disabled state, neither participate in STP calculation nor forward data. Before the election, it is necessary to use the relevant values in the transmitted BPUD to update the corresponding relevant values. The corresponding function is: br_record_config_information
if (!br_is_root_bridge(br) && was_root) { br_timer_clear(&br->hello_timer); if (br->topology_change_detected) { br_timer_clear(&br->topology_change_timer); br_transmit_tcn(br); br_timer_set(&br->tcn_timer, jiffies); } }
/*I don't quite understand the function of this judgment. Hope to see your advice...*/ if (p->port_no == br->root_port) { br_record_config_timeout_values(br, bpdu); br_config_bpdu_generation(br); if (bpdu->topology_change_ack) br_topology_change_acknowledged(br); } } /*If the current port is a designated port, generate a BPDU based on the current configuration information and send it out*/ else if (br_is_designated_port(p)) { br_reply(p); }
read_unlock(&br->lock); } The br_is_designated_port function checks whether the current bridge is the designated root bridge, and whether the current port is the designated port: /* called under bridge lock */ int br_is_designated_port(struct net_bridge_port *p) { return !memcmp(&p->designated_bridge, &p->br->bridge_id, 8) && (p->designated_port == p->port_id); } br_reply is the process of extracting previous information, assembling and sending packets. /* called under bridge lock */ static int br_supersedes_port_info(struct net_bridge_port *p, struct br_config_bpdu *bpdu) { int t; t = memcmp(&bpdu->root, &p->designated_root, 8);
if ( t < 0) return 1; else if (t > 0) return 0; if (bpdu->root_path_cost < p->designated_cost) return 1; else if (bpdu->root_path_cost > p->designated_cost) return 0; t = memcmp(&bpdu->bridge_id, &p->designated_bridge, 8); if (t < 0) return 1; else if (t > 0) return 0; if (memcmp(&bpdu->bridge_id, &p->br->bridge_id, 8)) return 1; if (bpdu->port_id <= p->designated_port) return 1; return 0; } Before updating, copy the corresponding value in the packet: /* called under bridge lock */ static void br_record_config_information(struct net_bridge_port *p, struct br_config_bpdu *bpdu) { p->designated_root = bpdu->root; p->designated_cost = bpdu->root_path_cost; p->designated_bridge = bpdu->bridge_id; p->designated_port = bpdu->port_id; br_timer_set(&p->message_age_timer, jiffies - bpdu->message_age); } Then STP election is carried out. The meaning of their corresponding protocols has been described above: /* called under bridge lock */ void br_configuration_update(struct net_bridge *br) { br_root_selection(br); br_designated_port_selection(br); } Then set the port state: /* called under bridge lock */ void br_port_state_selection(struct net_bridge *br) { struct net_bridge_port *p; p = br->port_list; while (p != NULL) { if (p->state !=BR_STATE_DISABLED){ if(p->port_no==br->root_port){ p->config_pending=0; p->topology_change_ack=0; br_make_forwarding(p); }elseif(br_is_designated_port(p)){ br_timer_clear(&p->message_age_timer); br_make_forwarding(p); }else{ p->config_pending=0; p->topology_change_ack=0; br_make_blocking(p); } }
p = p->next; } }
If it was the root bridge before but is not now, that is, the topology has changed, it needs to send a TCN type BPDU packet to notify the update (there is another case that it was not the root before but has become the root now. There will be similar processing in the previous br_configuration_update function call): if (!br_is_root_bridge(br) && was_root) { br_timer_clear(&br->hello_timer); if (br->topology_change_detected) { br_timer_clear(&br->topology_change_timer); br_transmit_tcn(br); br_timer_set(&br->tcn_timer, jiffies); } }
The latter judgment is not very clear. I hope you can point it out...
Jiujian, I have also been looking at this recently. My main work is on bridges under Linux. I have previously implemented a simple bridge without a spanning tree or an operating system. But now I have a few stupid questions that I hope you can answer:
/*Get the destination MAC address*/dest = skb->mac.ethernet->h_dest
, I think skb should be the received frame, but is it in the buffer? This sentence
should get the destination MAC address of the frame, then why is there a statement in the spanning tree if (br->stp_enabled && !memcmp(dest, bridge_ula, 5) && !(dest[5] & 0xF0)) /*What address is 01-80-c2-00-F0-00? Why do we need to judge? */ goto handle_special_frame; The high 4 bits of the destination MAC address should be 0? Why is there such a requirement? What is the address of 01-80-c2-00-F0-00? This is also the part you didn't find the answer to. Do you use this to determine whether it is a BPDU? I don't quite understand
how the system calls the bridge processing function br_handle_frame (struct sk_buff *skb)?
Quote:Original post byPagliucaat 2006-1-12 15:28 Jiujian, I have also been looking at this recently. My main work is the bridge under Linux. I have implemented a simple bridge without spanning tree and operating system before. But now I have a few stupid questions, hope to answer them:
/*Get the destination MAC address*/dest = skb->mac.ethernet- ...
Coincidentally, I was stumped by STP when dealing with an engineering problem. I came to see the implementation of this protocol this morning. I am also a beginner, not an expert. Let's discuss it together in the future:
1. The first question is, the structure after the second layer unpacking is placed in skb... 2. The high 4 bits of the destination MAC address should be 0? Why is there such a requirement? What does this mean? I don't understand what you mean, but in the STP protocol, the destination MAC is the multicast destination MAC address: 01-80-c2-00-00-00 (Bridge_group_addr: bridge group multicast address) 3. Program judgment! (dest[5] & 0xF0)) , I looked up the RFC document and googled it, but couldn't find why this judgment was made. I'm still looking for information... 4. Function call, I said it at the beginning...
Do you mean that memcmp(dest, bridge_ula, 5) is to determine whether the destination address of the frame is one of the multicast destination MAC addresses: 01-80-c2-00-00-00? Also, do you think there will be an explanation in the 802.1d protocol for !(dest[5] & 0xF0))?
Quote:Originally posted byPagliucaat 2006-1-12 15:52 Do you mean that memcmp(dest, bridge_ula, 5) is to determine whether the destination address of the frame is one of the multicast destination MAC addresses: 01-80-c2-00-00-00? And !(dest[5] & 0xF0)), do you think there will be an explanation in the 802.1d protocol?
Did you misread the function? Look at the definition of bridge_ula: unsigned char bridge_ula[6] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 }; It should be used to determine whether the current destination MAC is one of "01-80-c2-00-00-00 to 01-80-c2-00-00-FF", and cannot be 01-80-c2-00-00-F0 (in fact, the multicast address segment is from 01-80-c2-00-00-00 to 01-80-c2-7F-FF-FF, which seems to indicate that STP only uses a part of it) to determine whether it is an STP protocol packet.
I read the RFC document and did not see 0xF0. I am looking for the latest one.