Completely understand in one article | Linux interrupt handling
what is interrupt
中断
It is a mechanism to solve the problem of notifying the CPU after the external device completes certain work (for example, after the hard disk completes the read and write operations, it notifies the CPU that it has been completed through an interrupt). Computers that did not have an interrupt mechanism in the early days had to query the status of external devices through polling. Since polling is a heuristic query (that is, the device is not necessarily in a ready state), many useless queries are often made, resulting in inefficiency. Very low. Since the interrupt is actively notified to the CPU by the external device, the CPU does not need to poll for query, and the efficiency is greatly improved.
From a physical point of view, an interrupt is an electrical signal that is generated by a hardware device and sent directly to the input pin of the interrupt controller (such as 8259A), and then the interrupt controller sends the corresponding signal to the processor. Once the processor detects this signal, it interrupts the work it is currently processing and switches to handle the interrupt. Thereafter, the processor notifies the OS that an interrupt has occurred. In this way, the OS can handle the interrupt appropriately. Different devices have different interrupts, and each interrupt is identified by a unique number. These values are usually called interrupt request lines.
interrupt controller
The CPU of an X86 computer only provides two external pins for interrupts: NMI and INTR. Among them, NMI is a non-maskable interrupt, which is usually used for power outage and physical memory parity check; INTR is a maskable interrupt, which can be masked by setting the interrupt mask bit. It is mainly used to accept interrupt signals from external hardware. These The signal is passed to the CPU by the interrupt controller.
There are two common interrupt controllers:
Programmable Interrupt Controller 8259A
The traditional PIC (Programmable Interrupt Controller, programmable interrupt controller) is composed of two 8259A style external chips connected together in a "cascade" manner. Each chip can handle up to 8 different IRQs. Because the INT output line of the slave PIC is connected to the IRQ2 pin of the master PIC, the number of available IRQ lines reaches 15, as shown in the figure below.
Advanced Programmable Interrupt Controller (APIC)
8259A is only suitable for single CPU situations. In order to fully exploit the parallelism of the SMP architecture, it is crucial to be able to deliver interrupts to each CPU in the system. For this reason, Intel introduced a new component called the I/O Advanced Programmable Controller to replace the old 8259A programmable interrupt controller. This component consists of two major components: one is the "local APIC", which is mainly responsible for delivering interrupt signals to the specified processor; for example, a machine with three processors must have three local APICs. . Another important part is the I/O APIC, which mainly collects Interrupt signals from I/O devices and sends signals to the local APIC when those devices need to be interrupted. There can be up to 8 I/O APICs in the system.
Each local APIC has 32-bit registers, an internal clock, a local timing device and two additional IRQ lines LINT0 and LINT1 reserved for local interrupts. All local APICs are connected to the I/O APIC, forming a multi-level APIC system, as shown in the figure below.
Most current single-processor systems include an I/O APIC chip, which can be configured in two ways:
-
As a standard 8259A working method. The local APIC is disabled, the external I/O APIC is connected to the CPU, and the two LINT0 and LINT1 pins are connected to the INTR and NMI pins respectively.
-
as a standard external I/O APIC. The local APIC is activated and all external interrupts are received through the I/O APIC.
To identify whether a system is using I/O APIC, enter the following command at the command line:
# cat /proc/interrupts
CPU0
0: 90504 IO-APIC-edge timer
1: 131 IO-APIC-edge i8042
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 111 IO-APIC-edge i8042
14: 1862 IO-APIC-edge ide0
15: 28 IO-APIC-edge ide1
177: 9 IO-APIC-level eth0
185: 0 IO-APIC-level via82cxxx
...
If IO-APIC is listed in the output, your system is using APIC. If you see XT-PIC, it means your system is using the 8259A chip.
Interrupt classification
Interrupts can be divided into synchronous interrupts and asynchronous interrupts:
-
Synchronous interrupts are generated by the CPU control unit when instructions are executed. They are called synchronous because the CPU will issue an interrupt only after an instruction is executed, rather than during the execution of code instructions, such as a system call.
-
Asynchronous interrupts are randomly generated by other hardware devices according to the CPU clock signal, which means that interrupts can occur between instructions, such as keyboard interrupts.
According to Intel official information, synchronous interrupts are called exceptions, and asynchronous interrupts are called interrupts.
Interrupts can be divided into
可屏蔽中断
(Maskable interrupt) and
非屏蔽中断
(Nomaskable interrupt). Exceptions can be divided into
three categories:
故障
(fault),
陷阱
(trap), and
终止
(abort).
Broadly speaking, interrupts can be divided into four categories:
中断
,
故障
,
陷阱
,
终止
. Please see the table for similarities and differences between these categories.
Table: Interrupt categories and their behavior
category | reason | asynchronous/synchronous | return behavior |
---|---|---|---|
interrupt | Signals from I/O devices | asynchronous | Always return to the next instruction |
trap | intentional exception | Synchronize | Always return to the next instruction |
Fault | Potentially recoverable errors | Synchronize | Return to current command |
termination | unrecoverable error | Synchronize | will not return |
Each interrupt on the X86 architecture is assigned a unique number or vector (8-bit unsigned integer). Non-maskable interrupt and exception vectors are fixed, while maskable interrupt vectors can be changed by programming the interrupt controller.
Interrupt Handling - First Half (Hard Interrupts)
Since
APIC中断控制器
is a bit complicated, this article mainly
8259A中断控制器
introduces Linux's interrupt processing process.
Interrupt processing related structures
As mentioned before,
8259A中断控制器
two 8259A style external chips
级联
are connected together in a way. Each chip can handle up to 8 different IRQs (interrupt requests), so the number of available IRQ lines reaches 15. As shown below:
Each IRQ line in the kernel
irq_desc_t
is described by a structure,
irq_desc_t
which is defined as follows:
typedef struct {
unsigned int status; /* IRQ status */
hw_irq_controller *handler;
struct irqaction *action; /* IRQ action list */
unsigned int depth; /* nested irq disables */
spinlock_t lock;
} irq_desc_t;
The following introduces
irq_desc_t
the functions of each field of the structure:
-
status
: The status of the IRQ line. -
handler
: The type ishw_interrupt_type
structure, which represents the hardware-related processing function corresponding to the IRQ line. For example,8259A中断控制器
when an interrupt signal is received, a confirmation signal needs to be sent before the interrupt signal can continue to be received. The function that sends the confirmation signal is the functionhw_interrupt_type
inack
. -
action
: 类型为irqaction
结构,中断信号的处理入口。由于一条IRQ线可以被多个硬件共享,所以action
是一个链表,每个action
代表一个硬件的中断处理入口。 -
depth
: 防止多次开启和关闭IRQ线。 -
lock
: 防止多核CPU同时对IRQ进行操作的自旋锁。
hw_interrupt_type
这个结构与硬件相关,这里就不作介绍了,我们来看看
irqaction
这个结构:
struct irqaction {
void (*handler)(int, void *, struct pt_regs *);
unsigned long flags;
unsigned long mask;
const char *name;
void *dev_id;
struct irqaction *next;
};
下面说说
irqaction
结构各个字段的作用:
-
handler
: 中断处理的入口函数,handler
的第一个参数是中断号,第二个参数是设备对应的ID,第三个参数是中断发生时由内核保存的各个寄存器的值。 -
flags
: 标志位,用于表示irqaction
的一些行为,例如是否能够与其他硬件共享IRQ线。 -
name
: 用于保存中断处理的名字。 -
dev_id
: 设备ID。 -
next
: 每个硬件的中断处理入口对应一个irqaction
结构,由于多个硬件可以共享同一条IRQ线,所以这里通过next
字段来连接不同的硬件中断处理入口。
irq_desc_t
结构关系如下图:
注册中断处理入口
在内核中,可以通过
setup_irq()
函数来注册一个中断处理入口。
setup_irq()
函数代码如下:
int setup_irq(unsigned int irq, struct irqaction * new)
{
int shared = 0;
unsigned long flags;
struct irqaction *old, **p;
irq_desc_t *desc = irq_desc + irq;
...
spin_lock_irqsave(&desc->lock,flags);
p = &desc->action;
if ((old = *p) != NULL) {
if (!(old->flags & new->flags & SA_SHIRQ)) {
spin_unlock_irqrestore(&desc->lock,flags);
return -EBUSY;
}
do {
p = &old->next;
old = *p;
} while (old);
shared = 1;
}
*p = new;
if (!shared) {
desc->depth = 0;
desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
desc->handler->startup(irq);
}
spin_unlock_irqrestore(&desc->lock,flags);
register_irq_proc(irq); // 注册proc文件系统
return 0;
}
setup_irq()
函数比较简单,就是通过
irq
号来查找对应的
irq_desc_t
结构,并把新的
irqaction
连接到
irq_desc_t
结构的
action
链表中。要注意的是,如果设备不支持共享IRQ线(也即是
flags
字段没有设置
SA_SHIRQ
标志),那么就返回
EBUSY
错误。
我们看看
时钟中断处理入口
的注册实例:
static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, 0, "timer", NULL, NULL};
void __init time_init(void)
{
...
setup_irq(0, &irq0);
}
可以看到,时钟中断处理入口的IRQ号为0,处理函数为
timer_interrupt()
,并且不支持共享IRQ线(
flags
字段没有设置
SA_SHIRQ
标志)。
处理中断请求
当一个中断发生时,中断控制层会发送信号给CPU,CPU收到信号会中断当前的执行,转而执行中断处理过程。中断处理过程首先会保存寄存器的值到栈中,然后调用
do_IRQ()
函数进行进一步的处理,
do_IRQ()
函数代码如下:
asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{
int irq = regs.orig_eax & 0xff; /* 获取IRQ号 */
int cpu = smp_processor_id();
irq_desc_t *desc = irq_desc + irq;
struct irqaction * action;
unsigned int status;
kstat.irqs[cpu][irq]++;
spin_lock(&desc->lock);
desc->handler->ack(irq);
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
status |= IRQ_PENDING; /* we _want_ to handle it */
action = NULL;
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { // 当前IRQ不在处理中
action = desc->action; // 获取 action 链表
status &= ~IRQ_PENDING; // 去除IRQ_PENDING标志, 这个标志用于记录是否在处理IRQ请求的时候又发生了中断
status |= IRQ_INPROGRESS; // 设置IRQ_INPROGRESS标志, 表示正在处理IRQ
}
desc->status = status;
if (!action) // 如果上一次IRQ还没完成, 直接退出
goto out;
for (;;) {
spin_unlock(&desc->lock);
handle_IRQ_event(irq, ®s, action); // 处理IRQ请求
spin_lock(&desc->lock);
if (!(desc->status & IRQ_PENDING)) // 如果在处理IRQ请求的时候又发生了中断, 继续处理IRQ请求
break;
desc->status &= ~IRQ_PENDING;
}
desc->status &= ~IRQ_INPROGRESS;
out:
desc->handler->end(irq);
spin_unlock(&desc->lock);
if (softirq_active(cpu) & softirq_mask(cpu))
do_softirq(); // 中断下半部处理
return 1;
}
do_IRQ()
函数首先通过IRQ号获取到其对应的
irq_desc_t
结构,注意的是同一个中断有可能发生多次,所以要判断当前IRQ是否正在被处理当中(判断
irq_desc_t
结构的
status
字段是否设置了
IRQ_INPROGRESS
标志),如果不是处理当前,那么就获取到
action
链表,然后通过调用
handle_IRQ_event()
函数来执行 action 链表中的中断处理函数。
如果在处理中断的过程中又发生了相同的中断(
irq_desc_t
结构的
status
字段被设置了
IRQ_INPROGRESS
标志),那么就继续对中断进行处理。处理完中断后,调用
do_softirq()
函数来对中断下半部进行处理(下面会说)。
接下来看看
handle_IRQ_event()
函数的实现:
int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
int status;
int cpu = smp_processor_id();
irq_enter(cpu, irq);
status = 1; /* Force the "do bottom halves" bit */
if (!(action->flags & SA_INTERRUPT)) // 如果中断处理能够在打开中断的情况下执行, 那么就打开中断
__sti();
do {
status |= action->flags;
action->handler(irq, action->dev_id, regs);
action = action->next;
} while (action);
if (status & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
__cli();
irq_exit(cpu, irq);
return status;
}
handle_IRQ_event()
函数非常简单,就是遍历 action 链表并且执行其中的处理函数,比如对于
时钟中断
就是调用
timer_interrupt()
函数。这里要注意的是,如果中断处理过程能够开启中断的,那么就把中断打开(因为CPU接收到中断信号时会关闭中断)。
中断处理 - 下半部(软中断)
由于中断处理一般在关闭中断的情况下执行,所以中断处理不能太耗时,否则后续发生的中断就不能实时地被处理。鉴于这个原因,Linux把中断处理分为两个部分,
上半部
和
下半部
,
上半部
在前面已经介绍过,接下来就介绍一下
下半部
的执行。
一般中断
上半部
只会做一些最基础的操作(比如从网卡中复制数据到缓存中),然后对要执行的中断
下半部
进行标识,标识完调用
do_softirq()
函数进行处理。
softirq机制
中断下半部
由
softirq(软中断)
机制来实现的,在Linux内核中,有一个名为
softirq_vec
的数组,如下:
static struct softirq_action softirq_vec[32];
其类型为
softirq_action
结构,定义如下:
struct softirq_action
{
void (*action)(struct softirq_action *);
void *data;
};
softirq_vec
数组是
softirq
机制的核心,
softirq_vec
数组每个元素代表一种软中断。但在Linux中只定义了四种软中断,如下:
enum
{
HI_SOFTIRQ=0,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
TASKLET_SOFTIRQ
};
HI_SOFTIRQ
是高优先级tasklet,而
TASKLET_SOFTIRQ
是普通tasklet,tasklet是基于softirq机制的一种任务队列(下面会介绍)。
NET_TX_SOFTIRQ
和
NET_RX_SOFTIRQ
特定用于网络子模块的软中断(不作介绍)。
注册softirq处理函数
要注册一个softirq处理函数,可以通过
open_softirq()
函数来进行,代码如下:
void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
{
unsigned long flags;
int i;
spin_lock_irqsave(&softirq_mask_lock, flags);
softirq_vec[nr].data = data;
softirq_vec[nr].action = action;
for (i=0; i<NR_CPUS; i++)
softirq_mask(i) |= (1<<nr);
spin_unlock_irqrestore(&softirq_mask_lock, flags);
}
open_softirq()
函数的主要工作就是向
softirq_vec
数组添加一个softirq处理函数。
Linux在系统初始化时注册了两种softirq处理函数,分别为
TASKLET_SOFTIRQ
和
HI_SOFTIRQ
:
void __init softirq_init()
{
...
open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}
处理softirq
处理softirq是通过
do_softirq()
函数实现,代码如下:
asmlinkage void do_softirq()
{
int cpu = smp_processor_id();
__u32 active, mask;
if (in_interrupt())
return;
local_bh_disable();
local_irq_disable();
mask = softirq_mask(cpu);
active = softirq_active(cpu) & mask;
if (active) {
struct softirq_action *h;
restart:
softirq_active(cpu) &= ~active;
local_irq_enable();
h = softirq_vec;
mask &= ~active;
do {
if (active & 1)
h->action(h);
h++;
active >>= 1;
} while (active);
local_irq_disable();
active = softirq_active(cpu);
if ((active &= mask) != 0)
goto retry;
}
local_bh_enable();
return;
retry:
goto restart;
}
前面说了
softirq_vec
数组有32个元素,每个元素对应一种类型的softirq,那么Linux怎么知道哪种softirq需要被执行呢?在Linux中,每个CPU都有一个类型为
irq_cpustat_t
结构的变量,
irq_cpustat_t
结构定义如下:
typedef struct {
unsigned int __softirq_active;
unsigned int __softirq_mask;
...
} irq_cpustat_t;
The
__softirq_active
field indicates which softirq is triggered (the int type has 32 bits, each bit represents a softirq), and
__softirq_mask
the field indicates which softirq is blocked. Linux uses
__softirq_active
this field to know which softirq needs to be executed (just set the corresponding bit to 1).
Therefore,
do_softirq()
the function first
obtains
the
softirq_mask(cpu)
blocked softirq corresponding to the current CPU
softirq_active(cpu) & mask
through
__softirq_active
tasklet mechanism
As mentioned earlier, the tasklet mechanism is based on the softirq mechanism. The tasklet mechanism is actually a task queue, which is then executed through softirq. There are two types of tasklets in the Linux kernel, one is a high-priority tasklet and the other is a normal tasklet. The implementation of these two tasklets is basically the same. The only difference is the execution priority. High-priority tasklets will be executed before ordinary tasklets.
Tasklet is essentially a queue, stored through a structure
tasklet_head
, and each CPU has such a queue. Let's take a look at
tasklet_head
the definition of the structure:
struct tasklet_head
{
struct tasklet_struct *list;
};
struct tasklet_struct
{
struct tasklet_struct *next;
unsigned long state;
atomic_t count;
void (*func)(unsigned long);
unsigned long data;
};
From
tasklet_head
the definition of , we can know that
tasklet_head
the structure is
tasklet_struct
the head of the structure queue, and
the field
tasklet_struct
of the structure
func
is the function pointer to which the formal task is to be executed. Linux defines two types of tasklet queues, namely
tasklet_vec
and
tasklet_hi_vec
, which are defined as follows:
struct tasklet_head tasklet_vec[NR_CPUS];
struct tasklet_head tasklet_hi_vec[NR_CPUS];
It can be seen
tasklet_vec
that and
tasklet_hi_vec
are both arrays, and the number of elements in the array is the number of CPU cores. That is, each CPU core has a high-priority tasklet queue and a normal tasklet queue.
Scheduling tasklets
If we have a tasklet that needs to be executed, the high-priority tasklet can be
tasklet_hi_schedule()
scheduled through the function, and the ordinary tasklet can be scheduled through
tasklet_schedule()
the function. These two functions are basically the same, so we only analyze one of them:
static inline void tasklet_hi_schedule(struct tasklet_struct *t)
{
if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
int cpu = smp_processor_id();
unsigned long flags;
local_irq_save(flags);
t->next = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = t;
__cpu_raise_softirq(cpu, HI_SOFTIRQ);
local_irq_restore(flags);
}
}
The type of function parameter is
tasklet_struct
a pointer to a structure, which represents the tasklet structure that needs to be executed.
tasklet_hi_schedule()
The function first determines whether the tasklet has been added to the queue. If not, it is added to
tasklet_hi_vec
the queue and
__cpu_raise_softirq(cpu, HI_SOFTIRQ)
tells softirq that it needs to execute
HI_SOFTIRQ
a softirq of type by calling. Let's take a look at
__cpu_raise_softirq()
the implementation of the function:
static inline void __cpu_raise_softirq(int cpu, int nr)
{
softirq_active(cpu) |= (1<<nr);
}
It can be seen that
__cpu_raise_softirq()
the function sets
the field
irq_cpustat_t
of the structure
to 1. For
the function, the
bit (bit 0) is set to 1.
__softirq_active
nr位
tasklet_hi_schedule()
HI_SOFTIRQ
We have also introduced before that Linux will register two softirqs during initialization,
TASKLET_SOFTIRQ
and
HI_SOFTIRQ
:
void __init softirq_init()
{
...
open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}
So when
the bit (0 bit)
of the field
irq_cpustat_t
of the structure
is set to 1, the softirq mechanism will execute
the function. Let's take a look at
the implementation of the function:
__softirq_active
HI_SOFTIRQ
tasklet_hi_action()
tasklet_hi_action()
static void tasklet_hi_action(struct softirq_action *a)
{
int cpu = smp_processor_id();
struct tasklet_struct *list;
local_irq_disable();
list = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = NULL;
local_irq_enable();
while (list != NULL) {
struct tasklet_struct *t = list;
list = list->next;
if (tasklet_trylock(t)) {
if (atomic_read(&t->count) == 0) {
clear_bit(TASKLET_STATE_SCHED, &t->state);
t->func(t->data); // 调用tasklet处理函数
tasklet_unlock(t);
continue;
}
tasklet_unlock(t);
}
...
}
}
tasklet_hi_action()
The function is very simple, it just traverses
tasklet_hi_vec
the queue and executes the tasklet processing function.