Article count:1382 Read by:1966155

Account Entry

PCIe basic concepts and device tree

Latest update time:2024-03-07
    Reads:

Perface

The full name of PCIe is Peripheral Component Interconnect Express, which is a bus used to connect peripherals.

It was proposed in 2003 as an alternative to PCI and PCI-X. It has now become the standard or cornerstone for the interaction between modern CPUs and almost all other peripherals, such as GPUs, network cards, and USB controllers that we can immediately think of. Sound cards, network cards, etc. , are all connected through the PCIe bus, and now very common SSDs based on the m.2 interface are also connected through the PCIe bus using the NVMe protocol . In addition, Thunderbolt 3 [2] , USB4 [3], and even the latest CXL interconnect protocol [4], are all based on PCIe!

CXL (Compute Express Link) is an industry-supported cache-coherent interconnect protocol for communication between processors, memory extensions, and accelerators. CXL technology maintains coherence between CPU memory space and memory on attached devices, which allows resource sharing for higher performance, reduces software stack complexity, and reduces overall system cost.

So once you start moving towards device-related development, PCIe can be regarded as an unavoidable hurdle. I have read some information related to PCIe in the past few days. Here is a brief summary. I hope it will be helpful to everyone. This article will mainly focus on the hardware part and has nothing to do with the operating system itself. Whether it is Windows or Linux, the underlying parts are very similar. The article will also mention the debugging method, but it will mainly Mainly Linux.

Then let’s get started!

1. PCIe overall block diagram

First, let’s start with the basic concepts of PCIe. The PCIe architecture mainly consists of five parts:

  • Root Complex,
  • PCIe Bus,
  • Endpoint,
  • Port and Bridge,
  • Switch.

Its overall architecture presents a tree structure, as shown in the figure below:

2. Root Complex (RC)

Root Complex is the root node of the entire PCIe device tree. The CPU is connected to the PCIe bus through it, and is ultimately connected to all PCIe devices.

The two most important and costliest chips on the motherboard are called Northbridge and Southbridge. Northbridge is responsible for interfacing with the processor . Its main functions include: memory controller, PCI-E controller, integrated graphics card, and front/back-end bus. etc., are all faster modules ; while Southbridge is responsible for peripheral peripheral functions , which are slower , mainly including: disk controller, network port, expansion card slot, audio module, I/O interface, etc.

The new generation processors of Intel and AMD have integrated most of the functions of the traditional Northbridge into the CPU. Intel's Clarkdale and Sandybridge processors fully integrate the Northbridge chip, and are paired with P55/H55/P67/H67 and other chips. The group is actually a south bridge.

Schematic diagram of Clarkdale's Northbridge (GPU) and CPU parts

The CPU part and the GPU part are independent, microscopically connected through the QPI bus, macroscopically packaged together, and the interface is the same LGA1156 as Lynnfield. Overall, Clarkdale not only integrates the memory controller and PCI-E controller, but also integrates the display core, which seems to be more advanced.

In fact, Clarkdale just moved the Northbridge chip originally placed on the motherboard under the iron cover of the CPU, and essentially did not integrate anything (including the graphics card and memory controller). But the H55 chipset that goes with it does have only one south bridge left. The heat generated by the North Bridge is much higher than that of the South Bridge. Since the North Bridge is located above the processor, users no longer have to worry about the heat dissipation of the motherboard.

The currently hot-selling Core i7 8XX and Core i5 7XX processors on the market are products based on the Lynnfield core, which are processors that truly integrate the Northbridge.

Summarize with pictures

  • FSB bus: Front Side Bus, the bridge between the CPU and North Bridge. All data transferred by the CPU and North Bridge must pass through the FSB bus. It can be said that the frequency of the FSB bus directly affects the speed of the CPU accessing memory.
  • Northbridge: Northbridge is the only bridge for data exchange between the CPU and memory, graphics card and other components. This means that the CPU must pass through the Northbridge if it wants to communicate with any other part. Northbridge chips are usually integrated with memory controllers, etc., to control communication with the memory. The Northbridge is no longer visible on current motherboards, and its functions have been integrated into the CPU.
  • PCI bus: The PCI bus is a high-performance local bus that is not restricted by the CPU and forms a high-speed channel between the CPU and peripherals. For example, today's graphics cards generally use PCI slots. The PCI bus transmission speed is fast and can effectively allow the graphics card and CPU to exchange data.
  • Southbridge: Mainly responsible for communication between I/O devices. If the CPU wants to access peripherals, it must go through the Southbridge chip.

Back to topic

Since the Root Complex manages external IO devices, on early CPUs, the Root Complex was actually placed on the North Bridge (MCU) [5]. Later, with the development of technology, it has now been integrated into the CPU [5] 8]. (Pay attention to the System Agent part in the picture below, which is where the PCIe Root Complex is located.)

In addition, although it is the root node, there can be more than one Root Complex in the system. As the number of PCIe Lanes increases, the number of PCIe controllers and Root Complexes also increases . For example, my desktop CPU is i9-10980xe, which has 4 Root Complexes, while my laptop is i7-9750H, which has only one Root Complex. We can check it through Device Manager on Windows:

It's similar on Linux. The picture below is a block diagram cut out from the motherboard manual of my server. The CPU used is EPYC 7742. You can clearly see PEG P0-3, which corresponds to 4 PCIe Controllers and Root Complex: [6]

And we can view all Root Complex through the lspci command:

$ lspci -t -v
-+-[0000:c0]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
+-[0000:80]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
+-[0000:40]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
\-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex

3. PCIe bus (Bus)

Devices on PCIe are connected to each other via the PCIe bus. Although PCIe evolved from PCI and is even compatible in many places, it has two particularly important differences from the old PCI and PCI-X:

PCIe的总线并不是我们传统意义上共享线路的总线(Bus),而是一个点对点的网络,我们如果把PCI比喻成网络中的集线器(Hub), 那么PCIe对应的就是交换机了 。换句话说,当Root Complex或者PCIe上的设备之间需要通信的时候,它们会与对方直接连接或者通过交换电路进行点对点的信号传输。[7]

PCI Bus
PCIe Bus

老式的PCI使用的是单端并行信号进行连接,但是由于干扰过大导致频率无法提升,所以后来就演变成PCIe之后就开始使用了高速串行信号。 这也导致了PCI设备和PCIe设备无法兼容,只能通过PCI-PCIe桥接器来进行连接 。当然这些我们都不需要再去关心了,因为现在已经很少看见PCI的设备了。

关于PCIe的通讯和包路由交换,我们先到这里,后面会更深入的介绍。

4. PCIe Device

PCIe上连接的设备可以分为两种类型:

  • Type 0:它表示一个PCIe上 最终端的设备,比如我们常见的显卡,声卡,网卡等等。
  • Type 1:它表示 一个PCIe Switch或者Root Port 。和终端设备不同, 它的主要作用是用来连接其他的PCIe设备,其中PCIe的Switch和网络中的交换机类似。

4.1. BDF(Bus Number, Device Number, Function Number)

PCIe上所有的设备,无论是Type 0还是Type 1,在系统启动的时候,都会被分配一个唯一的地址,它有三个部分组成:

  • Bus Number:8 bits,也就是最多256条总线
  • Device Number:5 bits,也就是最多32个设备
  • Function Number:3 bits,也就是最多8个功能

这就是我们常说的BDF,它类似于网络中的IP地址,一般写作BB:DD.F的格式。 在Linux上,我们可以通过lspci命令来查看每个设备的BDF ,比如,下面这个FCH SMBus Controller就是00:14.0:

$ lspci -t -v
# [Domain:Bus]
\-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
# Device.Function
+-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller

在我们知道了任何一个设备的BDF之后,我们就可以通过它查看到这个设备的详细信息了,如下:

$ lspci -s 00:14.0 -vv 
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
Subsystem: Super Micro Computer Inc H12SSL-i
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
IOMMU group: 39
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4, sp5100_tco

另外,由于默认BDF的方式最多只支持8个Function,可能不够用,所以PCIe还支持另一种解析方式,叫做ARI(Alternative Routing-ID Interpretation),它将Device Number和Function Number合并为一个8bit的字段,只用于表示Function,所以最多可以支持256个Function,但是它是可选的,需要通过设备配置启用 [1]。

4.2. Type 0 Device和Endpoint

所有连接到PCIe总线上的Type 0设备(终端设备), 都可以来实现PCIe的Endpoint ,用来发起或者接收PCIe的请求和消息。**每个设备可以实现一个或者多个Endpoint,每个Endpoint都对应着一个特定的功能。**比如:

  • 一块双网口的网卡,可以每个为每个网口实现一个单独的Endpoint;
  • 一块显卡,其中实现了4个Endpoint:一个显卡本身的Endpoint,一个Audio Endpoint,一个USB Endpoint,一个UCSI Endpoint; 这些我们都可以通过lspci或者Windows上的设备管理器来查看:
$ lspci -t -v
-+-[0000:c0]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
# A NIC card with 2 ports:
| +-01.1-[c1]--+-00.0 Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
| | \-00.1 Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
+-[0000:80]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
# A graphic card with 4 endpoints:
| +-01.1-[81]--+-00.0 NVIDIA Corporation TU104 [GeForce RTX 2080]
| | +-00.1 NVIDIA Corporation TU104 HD Audio Controller
| | +-00.2 NVIDIA Corporation TU104 USB 3.1 Host Controller
| | \-00.3 NVIDIA Corporation TU104 USB Type-C UCSI Controller

4.3. RCIE(Root Complex Integrated Endpoint)

说到PCIe设备,脑海里面可能第一反应就是有一个PCIe的插槽,然后把显卡或者其他设备插在里面,就像我们上面看到的这样。但是其实系统中有大量的设备是主板上集成好了的, 比如,内存控制器,集成显卡,Ethernet网卡,声卡,USB控制器等等。

这些设备在连接PCIe的时候,可以直接连接到Root Complex上面 。这种设备就叫做RCIE(Root Complex Integrated Endpoint),如果我们去查看的话, 他们的Bus Number都是0,代表Root Complex。

4.4. Port / Bridge

那么其他的需要通过插槽连接的设备呢? 这些设备就需要通过PCIe Port来连接了

在Root Complex上,有很多的Root Port,这些Port每一个都可以连接一个PCIe设备(Type 0或者Type 1)。

本质上,所有这些连接其他设备用的部件都是由桥(Bridge)来实现的,这些桥的两端连接着两个不同的PCIe Bus(Bus Number不同)。

比如,一个Root Port其实是靠两个Bridge来实现的:一个(共享的)Host Bridge(上游连接着CPU,下游连接着Bus 0)和一个PCI Bridge用来连接下游设备(上游连着的是Bus 0(Root Complex),下游连着的PCIe的设备(Bus Number在启动过程中自动分配)) [1]。

我们通过lspci命令可以看到这些桥的存在(注意设备详情中的Kernel driver in use: pcieport):

 +-[0000:80]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
# This is the Host bridge that connects to the root port and CPU:
| +-01.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
# This is the PCI bridge that connects to the root port and device with a new bus - 0x81:
| +-01.1-[81]--+-00.0 NVIDIA Corporation TU104 [GeForce RTX 2080]
| | +-00.1 NVIDIA Corporation TU104 HD Audio Controller
| | +-00.2 NVIDIA Corporation TU104 USB 3.1 Host Controller
| | \-00.3 NVIDIA Corporation TU104 USB Type-C UCSI Controller

# Host bridge
$ sudo lspci -s 80:01.0 -v
80:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
Flags: fast devsel, IOMMU group 13

# PCI bridge
$ sudo lspci -s 80:01.1 -v
80:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 35, IOMMU group 13
Bus: primary=80, secondary=81, subordinate=81, sec-latency=0
I/O behind bridge: 0000b000-0000bfff [size=4K]
Memory behind bridge: f0000000-f10fffff [size=17M]
Prefetchable memory behind bridge: 0000020030000000-00000200420fffff [size=289M]
....
Kernel driver in use: pcieport

注意:是否使用PCIe Bridge和是否通过插槽连接不能直接划等号,这取决于你系统的硬件实现,比如,从上面RCIE的截图中我们可以看到USB Controller作为RCIE存在,而下面EPYC的CPU则不同,USB控制器是通过Root Port连接的,但是它在主板上并没有插槽。

$ lspci -t -v
+-[0000:40]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
+-03.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
| +-03.3-[42]----00.0 ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller
# ^====== 40:03.3 here is a Bridge. And USB controller is connected
# to this Bridge with a new Bus Number 42.

4.5. Switch

如果我们需要连接不止一个设备怎么办呢?这时候就需要用到PCIe Switch了。

PCIe Switch内部主要有三个部分:

  • 一个Upstream Port和Bridge:用于连接到上游的Port,比如,Root Port或者上游Switch的Downstream Port
  • A set of Downstream Port and Bridge: used to connect downstream devices, such as graphics cards, network cards, or the Upstream Port of the downstream Switch.
  • A virtual bus: used to connect all upstream and downstream ports, so that the upstream ports can access downstream devices

In addition, let me explain again - since PCIe signal transmission is point-to-point, the bus in the middle of the Switch is just a logical virtual bus, which does not actually exist. The real structure inside is a set of switching circuits for forwarding [ 9].

Finally, after seeing this, you may suddenly think that Root Complex can also be regarded as a Switch? I think it is best to separate these two concepts. Although they do look similar from many block diagrams, the Root Complex does not have an Upstream Port. The Host Bridge connecting to the upstream is connected to the CPU. However, the internal functions of the Root Complex are far greater than Switch is much more complex, and it involves more than just simple packet forwarding, for example, the generation and conversion of PCIe requests that will be mentioned later, etc.

5. Summary

Okay, now we have introduced all the main components in the PCIe device tree. If we connect all these components together, the overall structure is like this [10]:

Okay, in order to avoid the article being too long, we will stop here first. When we have time, we will continue to summarize other knowledge related to PCIe, such as configuration space and domain, messages and message routing, etc.

PCIe system example

  • PCIe topology characteristics: The top of the diagram is a CPU. The point to note here is that the CPU is considered the top level of the PCle hierarchy. PCle only allows simple tree structures, which means no loops or other complex topologies are allowed. This is done to maintain backward compatibility with PCI software, which uses a simple configuration scheme to track topology and does not support complex environments. To maintain this compatibility, the software must be able to generate configuration cycles in the same way as before, and the bus topology must be the same as before. So all the configuration registers that the software expects to find are still there and behaving the way they always have.

In the PCIe system shown above, there are several device types, such as Root Complex, Switch, Bridge, Endpoint, etc. The concepts are introduced below.

  • Root Complex: RC for short, the interface between the CPU and PCle bus, may contain several components (processor interface, DRAM interface, etc.), and may even contain several chips. The RC is at the "root" of the PCI inverted tree topology and communicates with the rest of the system on behalf of the CPU. However, the specification does not define it carefully but gives a list of required and optional features. Broadly speaking, RC can be understood as the interface between the system CPU and PCle topology, and the PCle port is marked as the "root port" in the configuration space.

  • Bridge: A bridge provides an interface to other buses such as PCI or PCI-x, or even another PCle bus. A bridge like the one shown in the diagram is sometimes called a "forward bridge" and allows older PCI or PCIX cards to be plugged into the new system. The opposite type or "reverse bridge" allows a new PCIe card to be plugged into an older PCI system.

  • Switch: Provides expansion or aggregation capabilities and allows more devices to be connected to a PCle port. They act as packet routers, identifying which path a given packet needs to take based on an address or other routing information. It is a PCIe to PCIe bridge.

  • Endpoint: Located at the end of the PCIe bus system topology, it generally serves as the initiator (similar to the master in the PCI bus) or the terminator (Completers, similar to the slaves in the PCI bus) of the bus operation. Obviously, Endpoint can only accept data packets from the upper-level topology or send data packets to the upper-level topology. If the Endpoint types are subdivided, they are divided into Lagacy PCIe Endpoint and Native PCIe Endpoint. Lagacy PCIe Endpoint refers to devices that were originally designed as PCI-X bus interfaces, but were changed to PCIe interfaces. Native PCIe Endpoint is a standard PCIe device. Among them, Lagacy PCIe Endpoint can use some operations that are prohibited in Native PCIe Endpoint, such as IO Space and Locked Request. Native PCIe Endpoint all operates through Memory Map. Therefore, Native PCIe Endpoint is also called Memory Mapped Devices (MMIO Devices).

As shown in the picture above, it is a high-end server system with built-in other networking interfaces, such as FC, ETH, SAS/SATA, etc.

An "Intel Processor" contains many components, as does most modern CPU architectures. This one includes a PCle port to access the graphics, and 2 DRAM channels, which means the memory controller and some routing logic have been integrated into the CPU. These resources are often collectively referred to as "Uncore" logic to distinguish them from the several CPU cores and their associated logic in the package .

Root Complex is described as the interface between CPU and PCle topology, which means that this part must be located in the CPU.

6. References

  • [1]: PCI Express Base Specification
  • [2]: Thunderbolt™ 3 Technology Brief
  • [3]: USB4™ Specification
  • [4]: Compute Express Link™ (CXL™) Specification
  • [5]: Intel® 3000 and 3010 Chipset Memory Controller Hub (MCH) datasheet
  • [6]: H12DSi-NT6 motherboard manual
  • [7]: fpga4fun - PCI Express 2 - Topology
  • [8]: White Paper: Introduction to Intel® Architecture
  • [9]: Crossbar Switch
  • [10]: Mindshare - An Introduction to PCI Express

  • https://r12f.com/posts/pcie-1-basics/
  • https://blog.csdn.net/u013253075/article/details/119045277

end



A mouthful of Linux


Follow and reply [ 1024 ] Massive Linux information will be given away

Collection of wonderful articles

Article recommendation

【Album】 ARM
【Album】 Fans Q&A
【Album】 All original works
Album Introduction to linux
Album Computer Network
Album Linux driver


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号