Design method for real-time visibility into complex device interiors

Publisher:EnchantedDreamsLatest update time:2011-08-07 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

If you have several processors and peripherals in a system, understanding the real-time dynamics of all these chips becomes very important to develop cost-effective and reliable products, especially under today's short product development time conditions. Real-time embedded systems are increasingly implemented on multi-core ASICs or system-on-chips (SoCs) to take advantage of the low power, low cost and higher integration of these devices.

Many of these standard design tools that developers have at their disposal rely on understanding the inner workings of older technology products and are no longer suitable for these new, powerful, multi-functional designs. Bottlenecks, delays, and contention for shared resources like buses and memory are fatal to real-time data transfer. To achieve the best performance, developers need to understand the details of the chip's internal operation more than ever before.

However, monitoring transactions between system components is no longer as simple as connecting a logic analyzer or bus analyzer, as many signals of interest are buried deep inside the chip. Visibility into the SoC requires a mix of hardware and software to collect data from within the SoC itself, supported by characterization and correlation tools that help developers analyze the collected data.

Back to reality

Traditionally, when logic analyzers were not available or too difficult to set up, developers used software testing tools to gain visibility into their designs. They would add debug code to the target object to collect, process, and upload debug data. For example, turning a timer on and off upon entering and leaving a function was and is a convenient way to profile a function with a software analyzer.

Although this only requires adding a few C language printf instructions to the test code to format the collected data and output it to a standard I/O device, these codes have a great impact on code size, memory utilization, buffer performance, timing, and system resource contention. These shortcomings make printf only suitable for testing non-real-time control code. For real-time or deterministic code,

Reduce distractions

There are many ways to increase visibility while reducing intrusion. Conceptually, monitoring a system involves data acquisition, data buffering, uploading data from the target device, subsequent processing, and display. Careful scheduling of when and where these activities occur can minimize their impact on system performance. Reducing the memory area associated with test code and data acquisition equipment can collect more data and increase the accuracy or breadth of the test of the system's real-time behavior.

It is common to need several times the size of a data point to record the associated information needed to understand it more accurately. For example, in addition to the data value at the time of acquisition, you may need to label the variable name associated with the data, obtain a timestamp of when the data was acquired, and note the function that was executing when the timestamp was obtained. There are several ways to obtain and organize this associated information that do not require relying on printf and its string formatting functions. It is common to include patterns in the data, and if the data is acquired in a certain way, some additional characteristics can be inferred without including them in the buffer. Some ways to increase visibility include:

1. Recording Format

If you are collecting one variable in a buffer, you no longer need to label what variable you are collecting. If you need to collect multiple values, you can create a record format where each value corresponds to a given position, thus avoiding the need to re-label what is collected.

2. Multiple Buffers

The buffer register is also called a buffer. It is divided into two types: input buffer and output buffer. The former is used to temporarily store the data sent by the peripheral so that the processor can take it away; the latter is used to temporarily store the data sent by the processor to the peripheral. With a CNC buffer, the high-speed CPU and the slow-speed peripheral can be coordinated and buffered to achieve synchronization of data transmission. Since the buffer is connected to the data bus, it must have a three-state output function. There are also elevator buffers and automobile spring buffers.

By grouping similar data points together, circular buffer management can be simplified, reducing the latency of collecting each data point. Likewise, if data collection is segregated by priority, it is possible to overflow the circular buffer to capture non-critical information when the system is at 100% utilization, rather than messing up the timing requirements of the real-time system with a non-real-time upload. In any case, a mechanism is needed to mark the overflow. And if there are some preconditions on the buffer, such as timestamps that need to be reconstructed, it is possible to track how much data is lost.

3. Sampling Data

The process of converting analog audio to digital audio is called sampling, and the main equipment used is the analog to digital converter (ADC, which corresponds to the digital to analog converter, DAC). The sampling process actually converts the electrical signal of the usual analog audio signal into binary code 0 and 1, and these 0 and 1 constitute the digital audio file. The higher the sampling frequency, the more guaranteed the sound quality. Since the sampling frequency must be higher than twice the highest frequency of the recording to avoid distortion, and the human hearing range is 20Hz~20KHz, the sampling frequency must be at least 20k×2=40KHz to ensure that there is no low-frequency distortion. This is also the reason why CD sound quality uses 44.1KHz (slightly higher than 40kHz to leave room for sampling process).

The process of converting an analog audio signal into a digital signal by periodically sampling the audio signal at a certain specified interval. Each sample is assigned a number representing the amplitude of the audio signal at the sampling moment.

Configuring hardware counters and getting them running has minimal impact on the system. Reading a counter and uploading its value is intrusive anyway. The more frequently you record a counter, the more accurate the recording, but the more intrusive the collection and upload. Keep the recording frequency low until you are sure that more accurate information is really needed. For example, a periodic profiler that records which function is currently executing can ensure a very accurate picture of the percentage of code used. Such a profiler collects only a portion of the information collected during the recording of each function call, so the intrusion is low. It is also possible to sample data points by low-priority tasks, although this may skew the results somewhat.

4. Deterministic Data

If the data is sampled at a fixed frequency, it is not necessary to include a timestamp. As an alternative, if the data must pass through a set of consecutive blocks, only the data value and the timestamp can be recorded, because the actual block can be determined from the order of the timestamps. If several values ​​are acquired, it may be more efficient to ensure that the data flows through a series of blocks, in which case only the function and timestamp need to be recorded and the data record format should be used.

5. Dynamic/Intelligent Recording

It is common to collect data only when needed (in other words, when there is some information of interest), thus reducing the impact of data collection. With several debug flags, the scope of the collection can be reduced. This can be done by setting a specific flag, saving buffer space. Setting or checking a flag only takes one or two processor cycles, so this is a very useful method, even for hardware-based counters.

6. Block Records

In some cases, it may be possible to pause an object without affecting execution (e.g. when there is no real-time computation). In this case, the buffer upload overhead can be "avoided" by triggering a pause when it is safe to do so, then uploading the buffer while the system is paused.

7. Upload in chunks

If the task is relatively idle, it can be used to upload the buffer when the system is not fully utilized. Although this does not reduce the upload overhead, it can shift the impact of the upload to a period of time when it has less impact on system performance.

8. RTOS Monitoring

RTOS, or real-time operating system, is a system that can complete system functions and respond to external or internal, synchronous or asynchronous time within a specified or determined time. Its correctness depends not only on the logical result of the system calculation, but also on the time when the result is produced. Therefore, the real-time system should be able to recognize and process discrete events within a pre-defined time range; the system should be able to process and store large amounts of data required by the control system. For ease of understanding, the airport ticketing system is a typical real-time system.

For more complex monitoring, support can be found in real-time operating systems. Many operating systems have built-in mechanisms and libraries to support on-chip monitoring hardware that are easy to configure and provide the basic code for the infrastructure needed to manage circular buffers and send data streams out, as well as hook functions for self-monitoring. By abstracting the logging process and offloading the data, you can quickly reconfigure what to monitor, how to monitor, how often to monitor, where to get the data, and how to offload it. Before creating your own foundation to test your code, you should first check what the RTOS provides.

RTOS is such a standard kernel, including the initialization of various on-chip peripherals and the formatting of data structures. It is not necessary and recommended for users to directly operate hardware devices and resources. All hardware settings and resource access must go through the RTOS core. After the hardware is shielded in this way, users do not need to know every detail of the hardware system to develop, thus reducing the amount of learning before development.

Generally speaking, the less direct access to hardware, the higher the reliability of the system. RTOS is a tested kernel, which is more standardized, efficient and reliable than the main program kernel written by general users. For an "old hand" who is proficient in single-chip hardware system and programming, managing the system through RTOS may not be as intuitive and free as direct access, but management through RTOS can eliminate human negligence and improve software reliability.

In addition, efficient multi-tasking support is a main line of RTOS design from beginning to end. The use of RTOS management system can coordinate various tasks and optimize the allocation of CPU time and system resources so that they are not idle or congested. For a specific application, a carefully considered application may achieve higher efficiency without RTOS than with RTOS; but for most general users and novices, using RTOS can improve resource utilization, especially today when on-chip resources are growing, product reliability and time to market are more important.

9. Avoid accessing memory and other system resources

Software means should only be used to supplement hardware mechanisms when they are insufficient, such as when very large buffers are needed or when limited processing reduces overall noise. Ideally, more accurate results will be obtained if the system bus or memory is monitored without using the system bus or memory. If the amount of data collected is kept to a minimum, the use of buffers in memory can be avoided and simply sent directly via JTAG or the bus.

10. Continuous acquisition

Continuous sampling is the process of collecting samples from a fluid over a period of time. Continuous sampling is mainly used in situations where the concentration of pollutants changes over time to describe the average concentration of pollutants in the fluid. It is especially suitable for monitoring the water quality of rivers and the monitoring of continuously discharged wastewater pollution sources and waste gas pollution sources. It can reflect the average level within a certain period of time and is highly representative; the disadvantage is that it cannot reflect the instantaneous concentration and maximum value of pollutants, and it is easy to cover up the extreme conditions of pollution.

If you need to collect a lot of data during process monitoring, consider continuous collection. Note that because different information is collected in each run, it will not be possible to correlate the results because the timestamps contain different test delays. This method is best when considering where to start finding problems. Because it can reduce the probability and also reduce the level of monitoring.

11. Modules

If the data is sent over JTAG or a bus, a processing module can be placed between the target and the host to handle the timestamp generation and limited data processing. By offloading the timestamp work to this module, the bandwidth of the test bus can be freed up to send more information. Modules are also a very effective way to achieve completely non-intrusive monitoring. For example, the module can monitor the system bus, monitor a specific memory address range on the test bus, or use direct memory access (DMA) to trigger a near real-time data block acquisition.

Hardware-assisted monitoring

In some cases, the test code may either be too impactful to the system, not accurate enough, or simply not capture the information needed to understand the dynamics of data flow through a complex SoC. More and more SoC architectures include features to assist in monitoring the hardware operation of the device to meet these needs:

1. Event Counter

Counting is the simplest and most basic operation. Counters are logic circuits that implement this operation. In digital systems, counters mainly count the number of pulses to achieve measurement, counting and control functions. They also have frequency division functions. Counters are composed of basic counting units and some control gates. The counting units are composed of a series of various triggers with information storage functions, such as RS triggers, T triggers, D triggers and JK triggers. Counters are widely used in digital systems, such as counting instruction addresses in the controller of electronic computers to sequentially fetch the next instruction, recording the number of additions and subtractions when performing multiplication and division operations in the arithmetic unit, and counting pulses in digital instruments. Counters can be used to display the working status of products. Generally speaking, they are mainly used to indicate how many copies of folding and pagination work the product has completed.

When monitoring an event in software, many subtle details are not easily discovered. For example, recording the number of times a particular CPU core has stopped while waiting to access a shared resource (such as external memory) is not possible using software. A hardware design that includes some well-set counters can provide a deep understanding of the dynamic characteristics of the system at very low additional cost. The data can be read out through the debugger's JTAG interface, or it can be read out periodically, for example, by a background task in software and written to a buffer for querying at a later time.

2. High watermark counter

Often, developers need to understand the extreme conditions under which a device will operate, such as the maximum time to service an outage or the minimum and maximum jitter in the incoming data. High-watermark counters provide hardware that can be configured to monitor specific bus events and latch the maximum (high watermark) or minimum (low watermark) timing parameters. Without requiring excessive overhead, they can provide invaluable statistics that would otherwise need to be implemented in the target software or collected and sent off-chip for subsequent processing.

3. Tracking

A costly but very useful hardware-assisted monitoring method is tracing, in which bus transactions are recorded in a dedicated on-chip memory so that the last N bus transactions leading up to an event can be captured.

Uploading captured data

Typically, you will upload the data to a development system (such as a PC) or to a monitoring module for further analysis. Once you have determined what debug information you need to collect, and how to collect it to minimize intrusion, you must decide how to send the data out of the chip - ideally while the application is still running.

The tradeoff should be between buffer depth and upload frequency. The smaller your debug data buffer, the more frequently you need to upload data. Frequent uploads will have a lasting impact on system performance. If there is a large memory pool for buffering debug data, then the impact of collecting data on system performance will be less. However, a larger buffer requires more target memory, and uploading data while the device is running will have a more significant impact on system performance.

When acquiring more data than can be provided on-chip in real time, gaps will inevitably be introduced in the acquired data. In these cases, it is necessary to periodically insert enough context information to ensure that the data can be successfully decoded once it is finally captured on-chip. Packing the data or introducing periodic "sync points" are two methods of providing this additional information in the data stream. This can be done as part of the data upload process so that redundant information does not have to be stored on-chip.

If multiple CPU cores are working together in an SoC, it is often necessary to upload the acquired information to each core in parallel in order to present a complete picture of the system. If multiple upload paths are not available, either the data from multiple cores must be combined into a buffer before uploading, or they must be multiplexed in some way to share the upload path. Again, the dynamic nature of the system and the relative importance of the data need to be considered when deciding the best way to handle these problems. If there is a lot of relatively unimportant data coming from one core, and another core occasionally sends some important information, you need a way to ensure that the important information takes precedence over the less important information.

Visualization and analysis

Visualization technology was first used in computational science and has become an important branch of visualization technology - Visualization in Scientific Computing. Scientific computing visualization can transform scientific data, including measured values, images, or digital information involved in or generated during calculations, into intuitive physical phenomena or quantities that change over time and space, expressed in the form of graphic image information, and presented to researchers, allowing them to observe, simulate, and calculate.

Converting the raw information generated by SoC devices into an easily digestible format presents many challenges to the SoC itself. The types of data that can be collected, the specific hardware mechanisms required to collect this data, and the various different applications all require the user to address all of these unique issues, which is often the best way to overcome the flexibility required to overcome these challenges. Using a modular framework makes it easy to correlate data from different data streams, analyze the correlated data for specific types of information, and display the information derived from the data in an easily digestible form. Here are some examples of the types of functionality that such a framework should provide:

1. Associate data points

When troubleshooting system-level problems such as bottlenecks, contention, or load balancing in a multiprocessor SoC, it may be necessary to collect data from multiple processors and accelerators. In this case, reconstructing the system behavior requires correlating multiple records to a timeline. On some systems, access from other cores exploits the clock. If a common clock is not feasible, other mechanisms can be used to periodically synchronize the times of multiple cores. One approach is to use interrupts to pass a synchronized timestamp through shared memory.

2. Analytical infrastructure

A module framework can make common analysis activities into modules, which can be used to implement many different analysis and visualization tools. For example, a common customizable data converter and table can be easily used to create a message log browser; to construct data sent to other analysis modules; a module for analyzing high-water marks over a period of time can provide some basic basis for application-specific dashboards, bandwidth utilization monitors, etc.

3. Scalability

This refers to whether the speaker supports multi-channel simultaneous input, whether it has an output interface for connecting to passive surround speakers, whether it has USB input function, etc. The number of surround speakers that can be connected to the subwoofer is also one of the criteria for measuring expansion performance. The interfaces of ordinary multimedia speakers mainly include analog interfaces and USB interfaces. Other interfaces such as optical fiber interfaces and innovative dedicated digital interfaces are not very common, so they are not introduced in detail.

Although it is possible to use some common components to evaluate the large amounts of data collected, it is better to be able to build custom components to extend the tool environment.

4. Configurability

Visualization tools are critical to extracting meaningful information from large buffer uploads, and developers need to be able to configure tools to emphasize specific differences and data spikes to discover common and anomalous behaviors. To offload data processing from the target object, the tools used should provide programmable infrastructure that allows intelligent features to be built into the tool and reduce the amount of data that needs to be collected. Sufficient control should also be provided to determine what data should be collected at any given time.

Look at the reality

The challenge of gaining visibility into the internals of a real-time SoC system is certainly not trivial. Collecting enough information to produce meaningful results that do not require correction requires a system-level approach. By using a library of software tools, taking advantage of hardware-assisted monitoring and managing how data is sent off the chip, increasing the accuracy, breadth, depth, and granularity of the collected data enables developers to collect more reliable information. New flexible tool suites and software development strategies will help developers meet the challenges of testing and debugging complex SoC architectures for real-time applications with high accuracy and confidence.

Reference address:Design method for real-time visibility into complex device interiors

Previous article:Oversampling technology based on DSP
Next article:Advantech handheld terminal data collection brings lost pets home

Latest Industrial Control Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号