Embedded system design examples to solve software and hardware interface problems-EEWORLD

Collect

In embedded system design, software and hardware interface issues often trouble software development engineers. Correctly understanding the constraints of the interface in the processor and high-level language development environment can accelerate the entire system design and provide guarantees for improving the quality, performance and reliability of the system, shortening the development cycle and reducing costs. This article starts with a comparison of two design examples, introducing the design principles of embedded systems and various considerations about registers and their domains.

Embedded system design is usually divided into two parts: hardware design and software development. These two tasks are usually handled by different design teams, and there is little overlap between them. Since the software team rarely gets involved in the previous hardware design, problems often arise when this approach is used for development, especially when the interface performance between the hardware and software development environment is poor, which will extend system development time, increase development costs, and ultimately delay product launch.

The ideal solution is for the software team to participate in hardware design, but this is often impractical in terms of time, funding, and personnel. A workaround is to create a set of hardware interface specifications to speed up the software development process. Understanding the optimal hardware interface design from the perspective of software developers can effectively prevent unnecessary hardware problems from occurring during software development, and this approach has little impact on the hardware design process.

General Model of Embedded System Architecture

From a system perspective, an embedded system is a collection of many interfaces between various system elements. The main resource listed here is the system processor. Processor interfaces can be divided into two categories, identified as local buses and hardware buses. It is worth noting that the bus in this article is defined separately according to the access type when the processor uses resources, and has no corresponding relationship with the specific hardware connection.

The local bus is an interface bus between resources and the processor that allows unrestricted continuous access. Unrestricted access means that the processor can access all elements of a resource using its internal data types (such as bytes, words, and double words); continuous access means that the resource address space occupied by all resource elements is continuous without any gaps in between. RAM and EPROM are common examples of interfaces with local buses.

The connection between a hardware bus and a resource usually has certain restrictions, such as size, location, addressing, address space, or relocation. I/O ports that only accept word writes, or peripheral chips on the PCI bus that must be mapped before use are some examples of hardware bus interfaces. Using a hardware bus connection has certain restrictions on the software design engineer's access to resources, which may cause complex code and code errors during software design, development, and integration.

Correct hardware bus interface design can speed up the software design process and usually speed up hardware verification. This article focuses on the design and implementation of the hardware bus connected to programmable logic resources.

System-defined instances

Two different hardware implementations are considered here. The system is a processor-controlled three-axis servo system, and the system design in this section is limited to the design of position feedback control, so it helps us focus on the implementation of the hardware interface.

Both implementations of the system implement an interface between the processor and the user ASIC (or FPGA) to provide drive and feedback information for the three-axis servo. The ASIC in each system must use a 32-bit data bus to connect the processor to three sets of drive/feedback resources. Each resource contains a signed 10-bit drive register, a signed 8-bit position register, and a 3-bit error status register. Any position bit indicates an error state, which generates a shutdown message for the axis drive.

FIG1 and FIG2 show possible implementations of a register interface, which are respectively labeled as system implementation A and system implementation B. For the convenience of description, the following text refers to these two implementations as system A and system B, respectively.

When implemented in VHDL (or other high-level hardware design methods), the design complexity of these two hardware interfaces is almost equal. System A appears to be slightly more efficient because its register address decoding is relatively simple and the amount of hardware used is less than that of System B. In order to reduce the number of logic units in the programmable device that interfaces with the processor, most hardware design engineers will choose the implementation of System A.

The pseudo-random code shown in Table 1 is an axis driver program that can be used in both systems A and B. The pseudo-random code is designed for system implementation based on advanced processors and runs on a real-time operating system to implement axis control with three independent copies (or task instances) of a general axis control program. When using the interface defined in system A, only the lines of code marked with asterisks in the pseudo-random code are required.

It is clear that even at the code prototype stage, System B requires much less code than System A. The hardware design in System B is slightly more complex, but it reduces the burden of software development. The two example systems and pseudo-random codes will be reviewed later.

When reading this article, a hardware designer might ask the question, “Why is the first design less efficient than the second?” The parameters controlling the axis operation are the same for both implementations, and the first method requires significantly fewer programmable hardware devices than the second. To properly answer this question, the designer must view the design from a system perspective, rather than the “logic gate” perspective that hardware designers are accustomed to. The next section will describe some of the concepts that hardware designers encounter when developing system hardware interfaces. These techniques will be discussed further, and the results of applying these concepts to an example system design will be examined.

In order to meet the project requirements, a compromise needs to be made between hardware and software implementation when optimizing the entire system structure. In reality, no project can meet all the ideal software interface requirements mentioned here. Understanding the ideal state can help hardware design engineers identify and eliminate some obstacles that affect software design.

Design principles

1. Use standard bus access

The general principle of effective embedded hardware interface design is that the hardware design should ensure that access to hardware resources is as transparent as possible to the software design engineer. The processor can achieve transparent access using all standard read and write instructions without considering the previous access content or timing.

Things like page register settings and write data encoding on address lines can seriously affect code development, and often require the development of drivers that convert between standard access and required special access. [page]

It is usually inevitable to use some special buses, but the choice of using special access space needs to be carefully considered, because this situation will bring certain difficulties to the system software design. System A uses write-only registers, so the system software is required to provide "shadow" memory to save the data written to the resource. System B does not have this restriction because it allows all registers to be read and written.

2. Developing a processor-based resource interface

Hardware design engineers are accustomed to analyzing resource interface issues and connections to the system bus from the bottom up, and it is better to analyze the process of processor access to resources in the system.

The interface between the processor and resources is often the most important interface, and its efficiency should be the top priority in the hardware design process. Unified planning of resource access throughout the system is important for correctly understanding the access restrictions caused by hardware design choices.

State-of-the-art systems contain memory controllers and remappable buses that change the type of access between the processor and the resource interface. Generally speaking, a poor hardware interface design is not reflected until the software team tries to connect to the actual resource. This is very important when designing the hardware interface.

3. Creation and maintenance of system memory map

The memory map of all resources is very important for a good system design. As mentioned earlier, the memory map should be designed with specific processor requirements in mind, rather than simply stating the type of address lines a resource decodes. If register-configurable resources are used, such as the PCI bus, the hardware designer should configure all configuration registers associated with the resource in the memory map and provide configuration register initialization values to create the static map required for hardware verification.

Hardware designers must also carefully consider the advantages of dynamic reconfiguration. Systems that do not add (or remove) resources on the reconfigurable bus can be reduced to a static mapping by forcing the configuration registers to revert to the same value after a system reset. This "static" system map provides a stable, unified structure for hardware integration and software development while also avoiding the use of error-prone pointer operations in system code.

Finally, as the system matures, the memory map must be refined and improved as hardware and software development progresses.

4. Unified access mode

Due to the increasing complexity of current embedded systems, they are usually designed by multiple people working together. The design of each hardware component must be consistent with the whole so that a unified resource access pattern can be developed. If the access to different functional modules is inconsistent, potential access restriction errors will occur during software development, which may require the design of a dedicated software driver for each subsystem. Inconsistent access to different logic blocks will also make hardware integration and verification difficult.

For example, a designer editing four hexadecimal digits on a debugger does not guarantee that the processor will use a 16-bit read/write cycle. Therefore, setting multiple types of restricted access using debug tools in software development and hardware integration is also difficult. In this regard, it is very useful to evaluate the ability of the simulator to handle multiple restricted access address spaces, especially in processor architectures where "out of restriction" accesses can trigger bus faults.

Register design

Now that the focus of hardware design engineers has shifted from logic gates and buses to system design, let's take a look at the most commonly used register design in any processor system. The register interface allows high-speed access to resources, and the efficiency of its access has a great impact on the performance of the system.

Designers should carefully choose the hardware register size so that the processor can access the hardware most efficiently. Generally speaking, always use the system's internal integer access method. Registers should be decoded as consecutive groups (no address gaps) to speed up pointer or array index access to registers. Any writable register should also be readable in the same format to avoid using local memory to cache these register values.

Registers that control a subsystem should be grouped together in the same structure so that software can access them using a common driver. This is especially important when multiple subsystems of the same type are required in the design.

To avoid conflicts between software tasks coded as independent processes, independent subsystems cannot share writable registers during access by the system processor. These "independent" software processes will compete for access to shared registers unless uninterruptible read/write drivers are used in the system code. Depending on the operating system, multiple processes sharing registers may even incur the additional overhead of function calls. Accessing shared registers while executing other processes is a common software design mistake that can cause intermittent system failures and slow down the process of integrating and testing system software.

System A violates many of the principles mentioned above, such as using write-only registers, sharing control and status registers, and not providing a common register map for each axis. System A must use a dedicated driver to buffer write output data, shift and mask axis drive and position information, and prevent the contents of axis drive registers from being affected by code written for each axis task. System B overcomes these problems by separating and reorganizing the registers associated with each axis.

Hardware designers should carefully consider the reset state of the system. Hardware designs often use a boot program to take control of the system after it boots up and initialize the system to a safe state. The hardware should be placed in a defined safe state after the system is reset, and the hardware should remain in a safe state until the system software is initialized. The code should also reset the hardware under software control to aid debugging, self-testing, and original code development.

System A does not control the reset contents of the drive registers, requiring code intervention to set the drive registers for all three axes to zero. This structure creates a serious system design problem because the processor is usually held in reset until the FPGA and ASIC are powered up and configured. If the developer uses an emulator, System A will have another problem during the integration process: the processor controlled by the emulator may require a long initialization time after the system is powered up before it can work properly. The axes of both System A and B are in random drive states before the software takes control.

System B sets all axis drive registers to zero after power-up, and its control of axis drive settings does not depend on the startup time. Because System B has no hidden state machine, it is not necessary to consider adding additional software reset registers in this design.

Register field design

Most resource interfaces contain data items that do not fit neatly into a register. In this case, the hardware designer must divide a register into several domains. Proper domain structure is very important for system performance and has similar effects to register interface design. The rules for effective domain interface design are similar to those for register design, but designers also need to pay special attention to the order and placement of domains, and to handle some unused bytes in registers.

1. Register fields

A field is defined as a subset of bits in a register that is used to report or control a functional element of a resource. The most common field types used in hardware design are: 1. Boolean field: True or false, usually one bit; 2. Multi-bit status field and control field: multiple bits are used to report or control internal related functions; 3. Enumeration status field and control field: a collection of multiple bits, each of which represents a different hardware state; 4. Numeric field: multiple bits are combined to represent a certain quantitative value.

From the perspective of software users, the most efficient domain structure is to use only one domain per register. This ideal software structure may lead to inefficient hardware implementation, so a good system design requires a compromise between software and hardware design, and multiple domains should be placed in each register.

The following discussion will focus on the case where multiple domains are assumed in a register. However, hardware designers should still consider using a single-domain register when effective access to a particular parameter of a resource will seriously affect system software performance.

2. Domain Structure

The structural concepts mentioned above for registers also apply to the fields within registers. A register should only contain fields that belong to the same functional element in the design, and all writable fields in the register should be readable. [page]

Registers that contain fields belonging to multiple functional elements also require special driver support so that multiple processes can safely access each field. Fields configured as "write-only" functions require shadow memory to save the previous state value in the register field. The simple "mask/write" operation originally envisioned by the hardware design engineer has now become a complicated multi-step function call, which must first disable interrupts and task switching, then read local memory, mask input and output values, then perform hardware register writes, and finally enable interrupts and multi-task switching. This can be effectively avoided if all fields in the register can be effectively arranged and all fields can be accessed by a software task.

Since system A combines multiple fields belonging to unrelated functions into one register, it requires a special driver. System B follows the principle of "fields within a single register are organized by task" and places each field in its own dedicated register, thus efficiently accessing each axis parameter in the resource.

3. Hexadecimal number alignment

Hardware designers should also understand alignment constraints for processors and software development environments. Placing a domain at the wrong address and beyond a word boundary will force software designers to access each domain in blocks, increasing access complexity and slowing down access. During debugging, padding the domain with zero values is very useful so that the lowest bit of each domain is aligned to the boundary of a hexadecimal digit (4 bits): when displaying register conditions on a logic analyzer, debugger, or emulator, hexadecimal digit alignment helps visually extract domain values. The register domains of system A are not aligned, so it is difficult to extract domain values from the raw hexadecimal data. Because the control domain is not aligned, it is also difficult to mask test inputs when debugging. All domains of system B are aligned to even hexadecimal digits, so the state of each domain can be easily determined by register reads, and a domain can be easily set to a specified value.

4. Domain location allocation and order

The placement of fields within registers can also have a significant impact on the efficiency of software implementations. Boolean and multi-bit fields are generally position-independent, but enumeration and numeric fields are usually most efficiently accessed when they are placed in the least significant bit (LSB) of a register (the actual number of LSBs depends on the processor type; bit 0 is not necessarily the LSB). Placing a field in the LSB of a register effectively eliminates shift operations that mask the field contents, and also makes it easier to identify the field value when accessing the register with test equipment or a debugger for visual inspection.

The domain values for axis 2 and axis 3 in system A must be masked and shifted by software before use. System B, on the other hand, places all digital domains in the LSB of the register, allowing for more efficient access. System B also has better integration, and the hexadecimal data of the resource registers can be truly separated into the correct domain values.

5. Unused data bits

Unused bits in registers also affect the efficiency of software implementations. All unused bits should be returned to zero and written without special handling to avoid unnecessary masking and clearing operations. The only exception to this rule is when the register contains a numeric field that is two's complement and the remaining most significant bits (MSBs) in the register are unused. In this case, it is useful to have the hardware implementation sign-extend the MSB of the field into the unused bits. Numeric fields extended in this way can be directly accessed by the processor because signed values do not require software sign extension. When the speed of access to a particular numeric field variable seriously affects the overall system performance, it is useful to consider this type of field in conjunction with the "single register single field" approach. Since no masking or sign extension is required, these fields can be accessed directly using internal data access.

When the domain value needs to be extracted from the register in system A, the software is required to sign-extend each numeric domain value, while system B allows direct access to the domain value through internal integer access to the register.

6. Domain Type Selection

The correct choice of field type can also greatly improve software implementation efficiency. Boolean fields are most effective when turning individual resource functions on or off. Note that single-bit fields are easy to encode only when the register is readable and writable. If hardware registers have restricted access to the field, a dedicated buffer (and possibly a dedicated driver) is required to store the current contents. Restricted access also limits the use of some programming constructs, such as bit fields, which affects the readability of system code and does not help reduce programming errors.

Numeric fields are useful when the data expressing the state of a resource needs to occupy a certain range of values. When a field can hold both positive and negative values, signed expressions usually require more software work. Also, avoid encoding other data in a numeric field (such as using the field sign to represent an unrelated resource state).

From the perspective of hardware implementation, multi-bit fields are more efficient, but they increase the complexity of the code when writing system code. Enumeration types usually better reflect the actual availability of related functions in resources, and can effectively prevent the use of conflicting functions (such as switching memory blocks to local buses). Enumeration types should also provide such options: unconditionally allow "parking belts" between switches, and unconditionally allow "interrupt first and then implement" code switching in system software.

The "write-only" access to the axis drive domain in system A makes the software's access to the target domain very inefficient, and RAM must be used to save the past axis content that is not modified during the write process. In system B, since each register has only one domain and allows read and write operations, there is no such problem.

Performance evaluation of the example system

In order to evaluate the performance of the final system software, the pseudo-random code in Listing 1 was correctly converted into C code and used in both systems A and B. Then, the hardware interface of each system was simulated using the structure in the internal memory. Bit fields should be avoided in the code because the standard C implementation cannot work correctly on the address space with restricted access. The system code simulation runs on PowerPC, the compiler tool is Green Hills MultiC, the target operating system is VxWorks, and the compiler is set at a medium optimization level (to help debugging and allow design engineers to associate each assembly instruction with each line of C code).

Table 1 lists each line of the pseudo-random code and gives the number of assembly instructions and function calls used by each system implementation. The code execution speed used by the two implementations was also tested. The subroutine upgrades the axis of system B 5.3 times faster than system A, mainly due to the removal of task blocking and unblocking function calls. It should be noted that the speedup effect may not be obvious in the actual system because the actual hardware access time has the greatest impact on the total execution time.

In the experiment, we wanted to improve the optimization level of the compilers used in the two implementations. It turned out that the improvement of the optimization level was ineffective for system B, but only reduced a small amount of code for system A, and the speed was slightly reduced. This result shows that the hardware interface of system B is very close to the efficiency of internal access in terms of resource access in the axis domain.

Additionally, to evaluate the hardware used for both implementations, the hardware interface was coded in VHDL, then synthesized using Xilinx's Webpack software and mapped to a Xilinx Virtex FPGA. The result of using the Virtex family of chips is that system A consumes 56 functional slices and system B consumes 85 functional slices. The V300E-PQ240 device has a total of 3072 slices, so system A consumes 1.8% of the available resources and system B consumes 2.8%. For 9500 series devices with more limited internal resources, such as the XC95288XL-PQ208, system A will consume 18% of the available resources of the device and system B will consume 30%.

A closer look at the two designs revealed that the primary driver of the extra resources used by System B was the combined axis addressing scheme. To verify this result, the register map was reorganized so that each axis was used as an independent resource, with the individual axis maps aligned on address bit boundaries. This alternate implementation retained all of the software interface benefits of System B while reducing overall hardware device usage, reducing chip utilization by 2.3% for Virtex family devices and 22% for the 9500 family.

Hardware design will greatly affect the complexity and quality of system software implementation. A good hardware design requires designers to make decisions based on the complexity of hardware implementation and the final software design environment. Correctly understanding the impact of hardware interface design on the software development process can greatly improve system quality, performance and reliability, while reducing the cycle and cost of system development.

Reference address：Embedded system design examples to solve software and hardware interface problems

Previous article：Embedded System HAL Principle and BSP Implementation Method
Next article：Research on real-time multi-task scheduling strategy for recorders

Popular Resources
Popular amplifiers