Completely understand in one article | Linux interrupt handling

Latest update time：2021-12-28

Reads：

what is interrupt

中断 It is a mechanism to solve the problem of notifying the CPU after the external device completes certain work (for example, after the hard disk completes the read and write operations, it notifies the CPU that it has been completed through an interrupt). Computers that did not have an interrupt mechanism in the early days had to query the status of external devices through polling. Since polling is a heuristic query (that is, the device is not necessarily in a ready state), many useless queries are often made, resulting in inefficiency. Very low. Since the interrupt is actively notified to the CPU by the external device, the CPU does not need to poll for query, and the efficiency is greatly improved.

From a physical point of view, an interrupt is an electrical signal that is generated by a hardware device and sent directly to the input pin of the interrupt controller (such as 8259A), and then the interrupt controller sends the corresponding signal to the processor. Once the processor detects this signal, it interrupts the work it is currently processing and switches to handle the interrupt. Thereafter, the processor notifies the OS that an interrupt has occurred. In this way, the OS can handle the interrupt appropriately. Different devices have different interrupts, and each interrupt is identified by a unique number. These values are usually called interrupt request lines.

interrupt controller

The CPU of an X86 computer only provides two external pins for interrupts: NMI and INTR. Among them, NMI is a non-maskable interrupt, which is usually used for power outage and physical memory parity check; INTR is a maskable interrupt, which can be masked by setting the interrupt mask bit. It is mainly used to accept interrupt signals from external hardware. These The signal is passed to the CPU by the interrupt controller.

There are two common interrupt controllers:

Programmable Interrupt Controller 8259A

The traditional PIC (Programmable Interrupt Controller, programmable interrupt controller) is composed of two 8259A style external chips connected together in a "cascade" manner. Each chip can handle up to 8 different IRQs. Because the INT output line of the slave PIC is connected to the IRQ2 pin of the master PIC, the number of available IRQ lines reaches 15, as shown in the figure below.

8259A

Advanced Programmable Interrupt Controller (APIC)

8259A is only suitable for single CPU situations. In order to fully exploit the parallelism of the SMP architecture, it is crucial to be able to deliver interrupts to each CPU in the system. For this reason, Intel introduced a new component called the I/O Advanced Programmable Controller to replace the old 8259A programmable interrupt controller. This component consists of two major components: one is the "local APIC", which is mainly responsible for delivering interrupt signals to the specified processor; for example, a machine with three processors must have three local APICs. . Another important part is the I/O APIC, which mainly collects Interrupt signals from I/O devices and sends signals to the local APIC when those devices need to be interrupted. There can be up to 8 I/O APICs in the system.

Each local APIC has 32-bit registers, an internal clock, a local timing device and two additional IRQ lines LINT0 and LINT1 reserved for local interrupts. All local APICs are connected to the I/O APIC, forming a multi-level APIC system, as shown in the figure below.

APIC

Most current single-processor systems include an I/O APIC chip, which can be configured in two ways:

As a standard 8259A working method. The local APIC is disabled, the external I/O APIC is connected to the CPU, and the two LINT0 and LINT1 pins are connected to the INTR and NMI pins respectively.
as a standard external I/O APIC. The local APIC is activated and all external interrupts are received through the I/O APIC.

To identify whether a system is using I/O APIC, enter the following command at the command line:

# cat /proc/interrupts
           CPU0       
  0:      90504    IO-APIC-edge  timer
  1:        131    IO-APIC-edge  i8042
  8:          4    IO-APIC-edge  rtc
  9:          0    IO-APIC-level  acpi
 12:        111    IO-APIC-edge  i8042
 14:       1862    IO-APIC-edge  ide0
 15:         28    IO-APIC-edge  ide1
177:          9    IO-APIC-level  eth0
185:          0    IO-APIC-level  via82cxxx
...

If IO-APIC is listed in the output, your system is using APIC. If you see XT-PIC, it means your system is using the 8259A chip.

Interrupt classification

Interrupts can be divided into synchronous interrupts and asynchronous interrupts:

Synchronous interrupts are generated by the CPU control unit when instructions are executed. They are called synchronous because the CPU will issue an interrupt only after an instruction is executed, rather than during the execution of code instructions, such as a system call.
Asynchronous interrupts are randomly generated by other hardware devices according to the CPU clock signal, which means that interrupts can occur between instructions, such as keyboard interrupts.

According to Intel official information, synchronous interrupts are called exceptions, and asynchronous interrupts are called interrupts.

Interrupts can be divided into 可屏蔽中断 (Maskable interrupt) and 非屏蔽中断 (Nomaskable interrupt). Exceptions can be divided into three categories: 故障 (fault), 陷阱 (trap), and 终止 (abort).

Broadly speaking, interrupts can be divided into four categories: 中断 , 故障 , 陷阱 , 终止 . Please see the table for similarities and differences between these categories.

Table: Interrupt categories and their behavior

category	reason	asynchronous/synchronous	return behavior
interrupt	Signals from I/O devices	asynchronous	Always return to the next instruction
trap	intentional exception	Synchronize	Always return to the next instruction
Fault	Potentially recoverable errors	Synchronize	Return to current command
termination	unrecoverable error	Synchronize	will not return

Each interrupt on the X86 architecture is assigned a unique number or vector (8-bit unsigned integer). Non-maskable interrupt and exception vectors are fixed, while maskable interrupt vectors can be changed by programming the interrupt controller.

Interrupt Handling - First Half (Hard Interrupts)

Since APIC中断控制器 is a bit complicated, this article mainly 8259A中断控制器 introduces Linux's interrupt processing process.

Interrupt processing related structures

As mentioned before, 8259A中断控制器 two 8259A style external chips 级联 are connected together in a way. Each chip can handle up to 8 different IRQs (interrupt requests), so the number of available IRQ lines reaches 15. As shown below:

8259A

Each IRQ line in the kernel irq_desc_t is described by a structure, irq_desc_t which is defined as follows:

typedef struct {
    unsigned int status;        /* IRQ status */
    hw_irq_controller *handler;
    struct irqaction *action;   /* IRQ action list */
    unsigned int depth;         /* nested irq disables */
    spinlock_t lock;
} irq_desc_t;

The following introduces irq_desc_t the functions of each field of the structure:

status : The status of the IRQ line.
handler : The type is hw_interrupt_type structure, which represents the hardware-related processing function corresponding to the IRQ line. For example, 8259A中断控制器 when an interrupt signal is received, a confirmation signal needs to be sent before the interrupt signal can continue to be received. The function that sends the confirmation signal is the function hw_interrupt_type in ack .
action : 类型为 irqaction 结构，中断信号的处理入口。由于一条IRQ线可以被多个硬件共享，所以 action 是一个链表，每个 action 代表一个硬件的中断处理入口。
depth : 防止多次开启和关闭IRQ线。
lock : 防止多核CPU同时对IRQ进行操作的自旋锁。

hw_interrupt_type 这个结构与硬件相关，这里就不作介绍了，我们来看看 irqaction 这个结构：

struct irqaction {
    void (*handler)(int, void *, struct pt_regs *);
    unsigned long flags;
    unsigned long mask;
    const char *name;
    void *dev_id;
    struct irqaction *next;
};

下面说说 irqaction 结构各个字段的作用：

handler : 中断处理的入口函数， handler 的第一个参数是中断号，第二个参数是设备对应的ID，第三个参数是中断发生时由内核保存的各个寄存器的值。
flags : 标志位，用于表示 irqaction 的一些行为，例如是否能够与其他硬件共享IRQ线。
name : 用于保存中断处理的名字。
dev_id : 设备ID。
next : 每个硬件的中断处理入口对应一个 irqaction 结构，由于多个硬件可以共享同一条IRQ线，所以这里通过 next 字段来连接不同的硬件中断处理入口。

irq_desc_t 结构关系如下图：

irq_desc_t

注册中断处理入口

在内核中，可以通过 setup_irq() 函数来注册一个中断处理入口。 setup_irq() 函数代码如下：

int setup_irq(unsigned int irq, struct irqaction * new)
{
    int shared = 0;
    unsigned long flags;
    struct irqaction *old, **p;
    irq_desc_t *desc = irq_desc + irq;
    ...
    spin_lock_irqsave(&desc->lock,flags);
    p = &desc->action;
    if ((old = *p) != NULL) {
        if (!(old->flags & new->flags & SA_SHIRQ)) {
            spin_unlock_irqrestore(&desc->lock,flags);
            return -EBUSY;
        }

        do {
            p = &old->next;
            old = *p;
        } while (old);
        shared = 1;
    }

    *p = new;

    if (!shared) {
        desc->depth = 0;
        desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
        desc->handler->startup(irq);
    }
    spin_unlock_irqrestore(&desc->lock,flags);

    register_irq_proc(irq); // 注册proc文件系统
    return 0;
}

setup_irq() 函数比较简单，就是通过 irq 号来查找对应的 irq_desc_t 结构，并把新的 irqaction 连接到 irq_desc_t 结构的 action 链表中。要注意的是，如果设备不支持共享IRQ线（也即是 flags 字段没有设置 SA_SHIRQ 标志），那么就返回 EBUSY 错误。

我们看看 时钟中断处理入口 的注册实例：

static struct irqaction irq0  = { timer_interrupt, SA_INTERRUPT, 0, "timer", NULL, NULL};

void __init time_init(void)
{
    ...
    setup_irq(0, &irq0);
}

可以看到，时钟中断处理入口的IRQ号为0，处理函数为 timer_interrupt() ，并且不支持共享IRQ线（ flags 字段没有设置 SA_SHIRQ 标志）。

处理中断请求

当一个中断发生时，中断控制层会发送信号给CPU，CPU收到信号会中断当前的执行，转而执行中断处理过程。中断处理过程首先会保存寄存器的值到栈中，然后调用 do_IRQ() 函数进行进一步的处理， do_IRQ() 函数代码如下：

asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{
    int irq = regs.orig_eax & 0xff; /* 获取IRQ号  */
    int cpu = smp_processor_id();
    irq_desc_t *desc = irq_desc + irq;
    struct irqaction * action;
    unsigned int status;

    kstat.irqs[cpu][irq]++;
    spin_lock(&desc->lock);
    desc->handler->ack(irq);

    status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
    status |= IRQ_PENDING; /* we _want_ to handle it */

    action = NULL;
    if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { // 当前IRQ不在处理中
        action = desc->action;    // 获取 action 链表
        status &= ~IRQ_PENDING;   // 去除IRQ_PENDING标志, 这个标志用于记录是否在处理IRQ请求的时候又发生了中断
        status |= IRQ_INPROGRESS; // 设置IRQ_INPROGRESS标志, 表示正在处理IRQ
    }
    desc->status = status;

    if (!action)  // 如果上一次IRQ还没完成, 直接退出
        goto out;

    for (;;) {
        spin_unlock(&desc->lock);
        handle_IRQ_event(irq, &regs, action); // 处理IRQ请求
        spin_lock(&desc->lock);
        
        if (!(desc->status & IRQ_PENDING)) // 如果在处理IRQ请求的时候又发生了中断, 继续处理IRQ请求
            break;
        desc->status &= ~IRQ_PENDING;
    }
    desc->status &= ~IRQ_INPROGRESS;
out:

    desc->handler->end(irq);
    spin_unlock(&desc->lock);

    if (softirq_active(cpu) & softirq_mask(cpu))
        do_softirq(); // 中断下半部处理
    return 1;
}

do_IRQ() 函数首先通过IRQ号获取到其对应的 irq_desc_t 结构，注意的是同一个中断有可能发生多次，所以要判断当前IRQ是否正在被处理当中（判断 irq_desc_t 结构的 status 字段是否设置了 IRQ_INPROGRESS 标志），如果不是处理当前，那么就获取到 action 链表，然后通过调用 handle_IRQ_event() 函数来执行 action 链表中的中断处理函数。

如果在处理中断的过程中又发生了相同的中断（ irq_desc_t 结构的 status 字段被设置了 IRQ_INPROGRESS 标志），那么就继续对中断进行处理。处理完中断后，调用 do_softirq() 函数来对中断下半部进行处理（下面会说）。

接下来看看 handle_IRQ_event() 函数的实现：

int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
    int status;
    int cpu = smp_processor_id();

    irq_enter(cpu, irq);

    status = 1; /* Force the "do bottom halves" bit */

    if (!(action->flags & SA_INTERRUPT)) // 如果中断处理能够在打开中断的情况下执行, 那么就打开中断
        __sti();

    do {
        status |= action->flags;
        action->handler(irq, action->dev_id, regs);
        action = action->next;
    } while (action);
    if (status & SA_SAMPLE_RANDOM)
        add_interrupt_randomness(irq);
    __cli();

    irq_exit(cpu, irq);

    return status;
}

handle_IRQ_event() 函数非常简单，就是遍历 action 链表并且执行其中的处理函数，比如对于 时钟中断 就是调用 timer_interrupt() 函数。这里要注意的是，如果中断处理过程能够开启中断的，那么就把中断打开（因为CPU接收到中断信号时会关闭中断）。

中断处理 - 下半部（软中断）

由于中断处理一般在关闭中断的情况下执行，所以中断处理不能太耗时，否则后续发生的中断就不能实时地被处理。鉴于这个原因，Linux把中断处理分为两个部分， 上半部 和 下半部 ， 上半部 在前面已经介绍过，接下来就介绍一下 下半部 的执行。

一般中断 上半部 只会做一些最基础的操作（比如从网卡中复制数据到缓存中），然后对要执行的中断 下半部 进行标识，标识完调用 do_softirq() 函数进行处理。

softirq机制

中断下半部 由 softirq（软中断） 机制来实现的，在Linux内核中，有一个名为 softirq_vec 的数组，如下：

static struct softirq_action softirq_vec[32];

其类型为 softirq_action 结构，定义如下：

struct softirq_action
{
    void    (*action)(struct softirq_action *);
    void    *data;
};

softirq_vec 数组是 softirq 机制的核心， softirq_vec 数组每个元素代表一种软中断。但在Linux中只定义了四种软中断，如下：

enum
{
    HI_SOFTIRQ=0,
    NET_TX_SOFTIRQ,
    NET_RX_SOFTIRQ,
    TASKLET_SOFTIRQ
};

HI_SOFTIRQ 是高优先级tasklet，而 TASKLET_SOFTIRQ 是普通tasklet，tasklet是基于softirq机制的一种任务队列（下面会介绍）。 NET_TX_SOFTIRQ 和 NET_RX_SOFTIRQ 特定用于网络子模块的软中断（不作介绍）。

注册softirq处理函数

要注册一个softirq处理函数，可以通过 open_softirq() 函数来进行，代码如下：

void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
{
    unsigned long flags;
    int i;

    spin_lock_irqsave(&softirq_mask_lock, flags);
    softirq_vec[nr].data = data;
    softirq_vec[nr].action = action;

    for (i=0; i<NR_CPUS; i++)
        softirq_mask(i) |= (1<<nr);
    spin_unlock_irqrestore(&softirq_mask_lock, flags);
}

open_softirq() 函数的主要工作就是向 softirq_vec 数组添加一个softirq处理函数。

Linux在系统初始化时注册了两种softirq处理函数，分别为 TASKLET_SOFTIRQ 和 HI_SOFTIRQ ：

void __init softirq_init()
{
    ...
    open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
    open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}

处理softirq

处理softirq是通过 do_softirq() 函数实现，代码如下：

asmlinkage void do_softirq()
{
    int cpu = smp_processor_id();
    __u32 active, mask;

    if (in_interrupt())
        return;

    local_bh_disable();

    local_irq_disable();
    mask = softirq_mask(cpu);
    active = softirq_active(cpu) & mask;

    if (active) {
        struct softirq_action *h;

restart:
        softirq_active(cpu) &= ~active;

        local_irq_enable();

        h = softirq_vec;
        mask &= ~active;

        do {
            if (active & 1)
                h->action(h);
            h++;
            active >>= 1;
        } while (active);

        local_irq_disable();

        active = softirq_active(cpu);
        if ((active &= mask) != 0)
            goto retry;
    }

    local_bh_enable();

    return;

retry:
    goto restart;
}

前面说了 softirq_vec 数组有32个元素，每个元素对应一种类型的softirq，那么Linux怎么知道哪种softirq需要被执行呢？在Linux中，每个CPU都有一个类型为 irq_cpustat_t 结构的变量， irq_cpustat_t 结构定义如下：

typedef struct {
    unsigned int __softirq_active;
    unsigned int __softirq_mask;
    ...
} irq_cpustat_t;

The __softirq_active field indicates which softirq is triggered (the int type has 32 bits, each bit represents a softirq), and __softirq_mask the field indicates which softirq is blocked. Linux uses __softirq_active this field to know which softirq needs to be executed (just set the corresponding bit to 1).

Therefore, do_softirq() the function first obtains the softirq_mask(cpu) blocked softirq corresponding to the current CPU softirq_active(cpu) & mask through __softirq_active

tasklet mechanism

As mentioned earlier, the tasklet mechanism is based on the softirq mechanism. The tasklet mechanism is actually a task queue, which is then executed through softirq. There are two types of tasklets in the Linux kernel, one is a high-priority tasklet and the other is a normal tasklet. The implementation of these two tasklets is basically the same. The only difference is the execution priority. High-priority tasklets will be executed before ordinary tasklets.

Tasklet is essentially a queue, stored through a structure tasklet_head , and each CPU has such a queue. Let's take a look at tasklet_head the definition of the structure:

struct tasklet_head
{
    struct tasklet_struct *list;
};

struct tasklet_struct
{
    struct tasklet_struct *next;
    unsigned long state;
    atomic_t count;
    void (*func)(unsigned long);
    unsigned long data;
};

From tasklet_head the definition of , we can know that tasklet_head the structure is tasklet_struct the head of the structure queue, and the field tasklet_struct of the structure func is the function pointer to which the formal task is to be executed. Linux defines two types of tasklet queues, namely tasklet_vec and tasklet_hi_vec , which are defined as follows:

struct tasklet_head tasklet_vec[NR_CPUS];
struct tasklet_head tasklet_hi_vec[NR_CPUS];

It can be seen tasklet_vec that and tasklet_hi_vec are both arrays, and the number of elements in the array is the number of CPU cores. That is, each CPU core has a high-priority tasklet queue and a normal tasklet queue.

Scheduling tasklets

If we have a tasklet that needs to be executed, the high-priority tasklet can be tasklet_hi_schedule() scheduled through the function, and the ordinary tasklet can be scheduled through tasklet_schedule() the function. These two functions are basically the same, so we only analyze one of them:

static inline void tasklet_hi_schedule(struct tasklet_struct *t)
{
    if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
        int cpu = smp_processor_id();
        unsigned long flags;

        local_irq_save(flags);
        t->next = tasklet_hi_vec[cpu].list;
        tasklet_hi_vec[cpu].list = t;
        __cpu_raise_softirq(cpu, HI_SOFTIRQ);
        local_irq_restore(flags);
    }
}

The type of function parameter is tasklet_struct a pointer to a structure, which represents the tasklet structure that needs to be executed. tasklet_hi_schedule() The function first determines whether the tasklet has been added to the queue. If not, it is added to tasklet_hi_vec the queue and __cpu_raise_softirq(cpu, HI_SOFTIRQ) tells softirq that it needs to execute HI_SOFTIRQ a softirq of type by calling. Let's take a look at __cpu_raise_softirq() the implementation of the function:

static inline void __cpu_raise_softirq(int cpu, int nr)
{
    softirq_active(cpu) |= (1<<nr);
}

It can be seen that __cpu_raise_softirq() the function sets the field irq_cpustat_t of the structure to 1. For the function, the bit (bit 0) is set to 1. __softirq_activenr位tasklet_hi_schedule()HI_SOFTIRQ

We have also introduced before that Linux will register two softirqs during initialization, TASKLET_SOFTIRQ and HI_SOFTIRQ :

void __init softirq_init()
{
    ...
    open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
    open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}

So when the bit (0 bit) of the field irq_cpustat_t of the structure is set to 1, the softirq mechanism will execute the function. Let's take a look at the implementation of the function: __softirq_activeHI_SOFTIRQtasklet_hi_action()tasklet_hi_action()

static void tasklet_hi_action(struct softirq_action *a)
{
    int cpu = smp_processor_id();
    struct tasklet_struct *list;

    local_irq_disable();
    list = tasklet_hi_vec[cpu].list;
    tasklet_hi_vec[cpu].list = NULL;
    local_irq_enable();

    while (list != NULL) {
        struct tasklet_struct *t = list;

        list = list->next;

        if (tasklet_trylock(t)) {
            if (atomic_read(&t->count) == 0) {
                clear_bit(TASKLET_STATE_SCHED, &t->state);

                t->func(t->data);  // 调用tasklet处理函数
                tasklet_unlock(t);
                continue;
            }
            tasklet_unlock(t);
        }
        ...
    }
}

tasklet_hi_action() The function is very simple, it just traverses tasklet_hi_vec the queue and executes the tasklet processing function.

Latest articles about

■How to emulate mdio communication using gpio?

■Kingsoft C++ first round, full of intensity! With answers, recommended to save! !

■Fans asked: What should I do if the network packet loss is serious? Actually, it's just like that!

■Are you still using top htop? Switch to btop now, it’s awesome!

■Still worried about slow data transfer? Linux zero-copy technology can help you!

■Hitting the pain points! Huawei Linux experts carefully compiled the interview questions, just one click to get them!

■11 Data Structures Every Developer Should Know

■The most comprehensive and detailed explanation of Linux inter-process communication methods on the entire network is here, you can't miss it!

■Why doesn't the boss come to you directly, but asks colleagues to pass the message! I finally understand

■I recommend a Rockchip RK3588J development board, eight-core 2.4GHz + computing power comparable to Core i3! Half price!