Synchronous operation under ARM architecture

Publisher:Qinghua2022Latest update time:2016-06-23 Source: eefocusKeywords:ARM Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
When the processor accesses shared resources, it must synchronize the critical section, that is, ensure that there is only one visitor to the critical section at the same time.

When the shared resource is a memory address, atomic operations are the best way to synchronize access to this type of shared resource.

As applications become more complex and SMP becomes more widely used, processors begin to provide hardware synchronization primitives to support atomic updates of memory addresses.

CISC processors such as IA32 can provide separate multiple atomic instructions to perform complex atomic operations, and the processor guarantees the atomicity of the read-modify-write back process.

RISC is different, because all operations except Load and Store must be completed in registers.

How to ensure that the process from loading the memory address to the register, modifying the value in the register, and then writing the value in the register back to the memory can be completed atomically has become the key to processor design.

Starting from the ARMv6 architecture, the ARM processor provides Exclusive accesses synchronization primitives, which include two instructions:

LDREX
STREX

 

The LDREX and STREX instructions split the atomic operation of a memory address into two steps.

Together with the processor's built-in exclusive monitors that record exclusive accesses, atomic operations on memory are performed.

LDREX

LDREX is similar to the LDR instruction and completes the operation of loading data from memory into a register.

Unlike the LDR instruction, this instruction also initializes the exclusive monitor to record the synchronous access to the address.

LDREX R1, [R0]

The data at the memory address in the R0 register will be loaded into R1 and the exclusive monitor will be updated.

STREX

The format of this instruction is:

STREX Rd, Rm, [Rn]

 

STREX will decide whether to write the value in the register back to the memory based on the instructions of the exclusive monitor.

If the exclusive monitor permits this write, STREX will write the value of register Rm back to the memory address stored in Rn and set the Rd register to 0 to indicate a successful operation.

If the exclusive monitor prohibits this write, the STREX instruction will set the value of the Rd register to 1 to indicate that the operation failed and abandon the write.

The application can determine whether the write-back is successful based on the value in Rd.

In this article, we will first take the atomic addition operation of the ARM architecture in the Linux Kernel as an example to introduce how to use these two instructions;

After that, we will introduce some built-in functions provided by GCC. These synchronization functions use these two instructions to complete synchronization operations.

atomic_add function in Linux Kernel

The following is the definition of the atomic_add function used in the Linux Kernel, which implements the function of atomically adding i to the atomic_t pointed to by v.

1 static inline void atomic_add(int i, atomic_t *v)
 2 {
 3 unsigned long tmp;
 4 int result;
 5
 6 __asm__ __volatile__("@ atomic_add\n"
 7 "1: ldrex %0, [%3]\n"
 8 " add %0, %0, %4\n"
 9 "strex %1, %0, [%3]\n"
10 " teq %1, #0\n"
11 "bne 1b"
12: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
13: "r" (&v->counter), "Ir" (i)
14 : "cc");
15 }

 

In line 7, the LDREX instruction is used to load the value of the memory address pointed to by v->counter into the register and initialize the exclusive monitor.

In line 8, the value in this register is added to i.

In lines 9, 10, and 11, the STREX instruction is used to try to store the modified value into the original address.

If the value written by STREX to the %1 register is 0, the atomic update is considered successful and the function returns;

If the value of the %1 register is not 0, it is considered that the exclusive monitor has rejected the access to the memory address.

Then jump back to line 7 and repeat the above process until the modified value is successfully written into the memory.

This process may be repeated many times, but it can be guaranteed that no other code accesses the memory address during the last read-modify-write-back process.

static inline void atomic_set(atomic_t *v, int i)
{
    unsigned long tmp;

    __asm__ __volatile__("@ atomic_set/n"
"1: ldrex %0, [%1]/n"
"strex %0, %2, [%1]/n"
"teq %0, #0/n"
"bne 1b"
    : "=&r" (tmp)
    : "r" (&v->counter), "r" (i)
    : "cc");
}

The inputs are v (atomic variable) and i (value to be set), both stored in dynamically allocated registers. tmp is used to indicate whether the operation was successful.

GCC built-in atomic operation functions

After reading the GCC inline assembly above, do you feel a little dizzy?

In user mode, GCC provides us with a series of built-in functions that allow us to enjoy the benefits of atomic operations.

This series of functions all start with __sync and are divided into the following categories:

type __sync_fetch_and_add (type *ptr, type value, ...)
type __sync_fetch_and_sub (type *ptr, type value, ...)
type __sync_fetch_and_or (type *ptr, type value, ...)
type __sync_fetch_and_and (type *ptr, type value, ...)
type __sync_fetch_and_xor (type *ptr, type value, ...)
type __sync_fetch_and_nand (type *ptr, type value, ...)

 

This series of functions completes the corresponding operations on the memory address pointed to by ptr and returns the value before the operation.

type __sync_add_and_fetch (type *ptr, type value, ...)
type __sync_sub_and_fetch (type *ptr, type value, ...)
type __sync_or_and_fetch (type *ptr, type value, ...)
type __sync_and_and_fetch (type *ptr, type value, ...)
type __sync_xor_and_fetch (type *ptr, type value, ...)
type __sync_nand_and_fetch (type *ptr, type value, ...)

This series of functions completes the corresponding operations on the memory address pointed to by ptr and returns the value after the operation.

bool __sync_bool_compare_and_swap (type *ptr, type oldval, type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval, type newval, ...)

These two functions perform atomic comparison and exchange of variables.

That is, if the value stored in the memory address pointed to by ptr is the same as oldval, it will be replaced by the value of newval.

A function that returns a bool type returns the result of the comparison, true if they are the same, and false if they are different.

The function that returns type returns the value stored at the address pointed to by ptr before the exchange.

LDREX and STREX

Exclusive load and store registers.

grammar

LDREX{cond} Rt, [Rn {, #offset}]
STREX{cond} Rd, Rt, [Rn {, #offset}]
LDREXB{cond} Rt, [Rn]
STREXB{cond} Rd, Rt, [Rn]
LDREXH{cond} Rt, [Rn]
STREXH{cond} Rd, Rt, [Rn]
LDREXD{cond} Rt, Rt2, [Rn]
STREXD{cond} Rd, Rt, Rt2, [Rn]

 

in:

cond

is an optional condition code (see Conditional Execution).

 
Rd 

It is the destination register to store the return status.

 
R

is the register to load or store.

 
Rt2

The second register to use when doing a doubleword load or store.

 
R

Is the register on which the memory address is based.

 
offset

An optional offset to apply to the value in Rn. offset is only available in Thumb-2 instructions. If offset is omitted, the offset is assumed to be 0.

LDREX

LDREX loads data from memory.

  • If the physical address has the Shared TLB attribute, LDREX marks the physical address as exclusively accessible by the current processor and clears any exclusive access flags for any other physical addresses by that processor.

  • Otherwise, it is marked that the executing processor has marked a physical address, but the access has not yet been completed.

STREX

STREX can store data to memory under certain conditions. The conditions are as follows:

  • If the physical address does not have the Shared TLB attribute, and the executing processor has a tagged but outstanding physical address, the store occurs, clearing the tag and returning a value of 0 in Rd.

  • If the physical address does not have the Shared TLB attribute, and the executing processor does not have a tagged but outstanding physical address, then the store is not performed and the value 1 is returned in Rd.

  • If the physical address has the Shared TLB attribute and has been marked for exclusive access by the executing processor, the store occurs, the tag is cleared, and the value 0 is returned in Rd.

  • If the physical address has the Shared TLB attribute but is not marked for exclusive access by the executing processor, then no store is performed and the value 1 is returned in Rd.

limit

r15 cannot be used for any of Rd, Rt, Rt2, or Rn.

For STREX, Rd must not be the same register as Rt, Rt2, or Rn.

For ARM instructions:

  • Rt must be an even-numbered register and cannot be r14

  • Rt2 must be R(t+1)

  • Offset is not allowed.

For Thumb instructions:

  • r13 cannot be used for any of Rd, Rt, or Rt2

  • For LDREXD, Rt and Rt2 cannot be the same register

  • The value of offset can be any multiple of 4 in the range 0-1020.

usage

LDREX and STREX can be used to implement interprocess communication across multiple processors and shared memory systems.

For performance reasons, keep the number of instructions between corresponding LDREX instructions and STREX instructions to a minimum.

Note

The address used in the STREX instruction must be the same as the address used by the most recently executed LDREX instruction. 
If a different address is used, the result of the STREX instruction is unpredictable.

Architecture

ARM LDREX and STREX are available in ARMv6 and above.

ARM LDREXB, LDREXH, LDREXD, STREXB, STREXD, and STREXH are available in ARMv6K and later.

All of these 32-bit Thumb instructions are available in ARMv6T2 and above, except for LDREXD and STREXD, which are not available in the ARMv7-M architecture.

There are no 16-bit versions of these instructions.

Example

    MOV r1, #0x1 ; load the 'lock taken' value
try
    LDREX r0, [LockAddr] ; load the lock value
    CMP r0, #0 ; is the lock free?
    STREXEQ r0, r1, [LockAddr] ; try and claim the lock
    CMPEQ r0, #0 ; did this succeed?
    BNE try ; no – try again
    .... ; yes – we have the lock

http://lxr.free-electrons.com/source/arch/arm/include/asm/atomic.h?v=2.6.33

/*
* arch/arm/include/asm/atomic.h
*
* Copyright (C) 1996 Russell King.
* Copyright (C) 2002 Deep Blue Solutions Ltd.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#ifndef __ASM_ARM_ATOMIC_H
#define __ASM_ARM_ATOMIC_H

#include 
#include 
#include 

#define ATOMIC_INIT(i) { (i) }

#ifdef __KERNEL__

/*
* On ARM, ordinary assignment (str instruction) doesn't clear the local
* strex/ldrex monitor on some implementations. The reason we can use it for
* atomic_set() is the clrex or dummy strex done on every exception return.
*/
#define atomic_read(v) ((v)->counter)
#define atomic_set(v,i) (((v)->counter) = (i))

#if __LINUX_ARM_ARCH__ >= 6

/*
* ARMv6 UP and SMP safe atomic ops. We use load exclusive and store exclusive to ensure that these are atomic.  
* We may loop to ensure that the update happens.
*/
static inline void atomic_add(int i, atomic_t *v)
{
       unsigned long tmp;
       int result;

       __asm__ __volatile__("@ atomic_add\n"
"1: ldrex %0, [%2]\n"
" add %0, %0, %3\n"
"strex %1, %0, [%2]\n"
" teq %1, #0\n"
"bne 1b"
       : "=&r" (result), "=&r" (tmp)
       : "r" (&v->counter), "Ir" (i)
       : "cc");
}

static inline int atomic_add_return(int i, atomic_t *v)
{
       unsigned long tmp;
       int result;

       smp_mb();

       __asm__ __volatile__("@ atomic_add_return\n"
"1: ldrex %0, [%2]\n"
" add %0, %0, %3\n"
"strex %1, %0, [%2]\n"
" teq %1, #0\n"
"bne 1b"
       : "=&r" (result), "=&r" (tmp)
       : "r" (&v->counter), "Ir" (i)
       : "cc");

       smp_mb();

       return result;
}

static inline void atomic_sub(int i, atomic_t *v)
{
       unsigned long tmp;
       int result;

       __asm__ __volatile__("@ atomic_sub\n"
"1: ldrex %0, [%2]\n"
" sub %0, %0, %3\n"
"strex %1, %0, [%2]\n"
" teq %1, #0\n"
"bne 1b"
       : "=&r" (result), "=&r" (tmp)
       : "r" (&v->counter), "Ir" (i)
       : "cc");
}

static inline int atomic_sub_return(int i, atomic_t *v)
{
       unsigned long tmp;
       int result;

       smp_mb();

       __asm__ __volatile__("@ atomic_sub_return\n"
"1: ldrex %0, [%2]\n"
" sub %0, %0, %3\n"
"strex %1, %0, [%2]\n"
" teq %1, #0\n"
"bne 1b"
       : "=&r" (result), "=&r" (tmp)
       : "r" (&v->counter), "Ir" (i)
       : "cc");

       smp_mb();

       return result;
}

static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
{
       unsigned long oldval, res;

       smp_mb();

       do {
               __asm__ __volatile__("@ atomic_cmpxchg\n"
               "ldrex %1, [%2]\n"
               "mov %0, #0\n"
               "teq %1, %3\n"
               "strexeq %0, %4, [%2]\n"
                   : "=&r" (res), "=&r" (oldval)
                   : "r" (&ptr->counter), "Ir" (old), "r" (new)
                   : "cc");
       } while (res);

       smp_mb();

       return oldval;
}

static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
{
       unsigned long tmp, tmp2;

       __asm__ __volatile__("@ atomic_clear_mask\n"
"1: ldrex %0, [%2]\n"
"bic %0, %0, %3\n"
"strex %1, %0, [%2]\n"
" teq %1, #0\n"
"bne 1b"
       : "=&r" (tmp), "=&r" (tmp2)
       : "r" (addr), "Ir" (mask)
       : "cc");
}

#else /* ARM_ARCH_6 */

#ifdef CONFIG_SMP
#error SMP not supported on pre-ARMv6 CPUs
#endif

static inline int atomic_add_return(int i, atomic_t *v)
{
       unsigned long flags;
       int val;

       raw_local_irq_save(flags);
       val = v->counter;
       v->counter = val += i;
       raw_local_irq_restore(flags);

       return val;
}
#define atomic_add(i, v) (void) atomic_add_return(i, v)

static inline int atomic_sub_return(int i, atomic_t *v)
{
       unsigned long flags;
       int val;

       raw_local_irq_save(flags);
       val = v->counter;
       v->counter = val -= i;
       raw_local_irq_restore(flags);

       return val;
}
#define atomic_sub(i, v) (void) atomic_sub_return(i, v)

static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
{
       int ret;
       unsigned long flags;

       raw_local_irq_save(flags);
       ret = v->counter;
       if (likely(ret == old))
               v->counter = new;
       raw_local_irq_restore(flags);

       return ret;
}

static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
{
       unsigned long flags;

       raw_local_irq_save(flags);
       *addr &= ~mask;
       raw_local_irq_restore(flags);
}

#endif /* __LINUX_ARM_ARCH__ */

#define atomic_xchg(v, new) (xchg(&((v)->counter), new))

static inline int atomic_add_unless(atomic_t *v, int a, int u)
{
       int c, old;

       c = atomic_read(v);
       while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
               c = old;
       return c != u;
}
#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)

#define atomic_inc(v) atomic_add(1, v)
#define atomic_dec(v) atomic_sub(1, v)

#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
#define atomic_inc_return(v) (atomic_add_return(1, v))
#define atomic_dec_return(v) (atomic_sub_return(1, v))
#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)

#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)

#define smp_mb__before_atomic_dec() smp_mb()
#define smp_mb__after_atomic_dec() smp_mb()
#define smp_mb__before_atomic_inc() smp_mb()
#define smp_mb__after_atomic_inc() smp_mb()

#include 
#endif
#endif
 

 


Keywords:ARM Reference address:Synchronous operation under ARM architecture

Previous article:Differences between atomic operations on x86 and arm architectures
Next article:Brief analysis of interrupt processing flow under arm linux

Recommended ReadingLatest update time:2024-11-16 16:20

Mali-D71 and the Next Generation of Display Solutions
A few months ago , we gave a special preview of our next generation display processor , codenamed “Cetus” . At that time , we clearly discussed the improvements that this display processor would bring to the entire graphics pipeline and the Mali multimedia family (including graphics, video and display pro
[Home Electronics]
Mali-D71 and the Next Generation of Display Solutions
ARM Notes: Memory Driver Experiment
1.1.8 Memory driver experiment Set the runtime address of the project to 0x30000000 when it is loaded, as shown in Figure 2-55:   Figure 2-55 Setting the runtime address when loading init.s: This program file mainly realizes turning off the watchdog, initializing the memory, copying the ROM data to the memory, and th
[Microcontroller]
ARM develops various burning file format descriptions (ELF, HEX, BIN)
1. ELF          Executable and linking format (ELF) files are a common object file format under x86 Linux systems. There are three main types:          (1) Relocatable files suitable for linking, which can be used with other object files to create executable files and shared object files.           (2) Executable f
[Microcontroller]
ARM develops various burning file format descriptions (ELF, HEX, BIN)
SoC front-end (ARM) embedded system development practice training (Part 2)
Introduction: In the RISC reduced instructions, it contains multiple meanings, so it is difficult to understand. However, under the full transparency of the debugger to the CPU, you can clearly see the execution of instructions and deepen your understanding of the CPU. In order to watch the execution of instructions a
[Microcontroller]
SoC front-end (ARM) embedded system development practice training (Part 2)
ARM-Linux automatically creates device nodes
Hardware platform: FL2440 Kernel version: 2.6.28 Host platform: Ubuntu 11.04 Kernel version: 2.6.39 1. First configure busybox busybox Linux System Utilities ---     mdev     Support /etc/mdev.conf     Support command execution at device addition/removal 2. Configure the kernel 3. Modify /etc/ini
[Microcontroller]
ARM-Linux automatically creates device nodes
Design of Embedded Aviation Target Towing Altitude Controller Based on ARM
0 Introduction Aerial towed targets are special unmanned aerial vehicles towed by aircraft. A complete towed target system mainly includes towing aircraft, cable retracting and releasing device, towing cable and towed target. The towing aircraft is generally a manned or unmanned aircraft, the cable retracting and rel
[Microcontroller]
Design of Embedded Aviation Target Towing Altitude Controller Based on ARM
Design of obstacle monitoring and early warning system for loaders based on ARM single chip microcomputer
1 Introduction With the development of informatization, intelligence and networking, embedded system technology has gained a broad space for development. The field of industrial control is also undergoing a huge transformation. Real-time embedded software and hardware technology based on 32-bit high-end process
[Test Measurement]
Design of obstacle monitoring and early warning system for loaders based on ARM single chip microcomputer
Linux ARM (IMX6U) bare metal assembly LED driver experiment--driver writing
1. Initialization of i.MX6ULL ①、Enable clock Enable clock. The seven registers CCGR0-CCGR6 control the enable of all peripheral clocks of i.MX6ULL. For simplicity, set all seven registers CCGR0-CCGR6 to 0xFFFFFFFF, which is equivalent to enabling all peripheral clocks. CCGR0: CCGR1: CCGR2: CCGR3: CCGR4: CC
[Microcontroller]
Linux ARM (IMX6U) bare metal assembly LED driver experiment--driver writing
Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号