[Digital Power Series Articles] Digital Power Monitoring and Telemetry-EEWORLD

Collect

What role do digital power devices play between startup and shutdown ? Two core functions are monitoring and telemetry . Monitoring is a fast-acting safety function that prevents damage to the device and/or load. Telemetry is an ongoing quality management function. A recent advertisement in Bodo's Power Systems magazine lists the benefits of these two functions for digital power :

● Optimization ● Predictive maintenance ● Fault detection

Here we look at a typical POL internal architecture and study its impact on power system design.

POL Internal Architecture

Figure 1 shows a simplified POL with three main functional blocks.

● Monitor

● Monitor

● Digital processing unit

The Digital Processing Unit is the "brains" (not shown is the core power conversion). The Digital Processing Unit is responsible for handling the PMBus and CONTROL inputs and asserting POWER_GOOD and FAULT/.

Most devices have multiple of these POLs, but for simplicity we can insert the inputs and outputs.

Figure 1: POL structure

monitor

The monitoring circuit is a fast-acting single-channel comparator or window comparator. Typically, the output bypasses the state machine in the digital processing unit and can directly prevent power switching and assert FAULT/. The digital processing unit is then updated so that the PMBus host can query the device's fault registers.

The purpose of the monitor is to protect the load and the device, so it will trade off accuracy for speed. The HI/LO values are usually stored in non-volatile read-only memory (NVROM) or programmed by PMBus through commands such as VOUT_UV_FAULT_LIMIT. In addition, the fault behavior characteristics are also stored in NVROM and include things like "retry", "delay between retries", etc.

Monitoring

Monitoring is a high accuracy measurement performed by the ADC. The digital processing unit is usually implemented as a state machine or software loop that polls the ADC output data and makes it available to the PMBus. Alternatively, the monitoring data can be used in a very accurate digital processing unit servo loop to improve the output accuracy.

Fault

Faults can be initiated by either a supervisor or a monitor. For a supervisor, the DAC provides a reference to the comparator and the output is fed directly to the FAULT/ pin. For a monitor, the digital processing unit uses a digital comparator or a software conditional command to the digital processing unit FAULT/ pin.

Trade-offs

The tradeoffs that POL designers make are fairly straightforward. Safety dictates which inputs and outputs have monitors. The tradeoffs in monitoring involve accuracy (because the ADC takes up board real estate and consumes power), as well as the number of channels and multiplexers.

As a system designer, he must consider the purpose for which his system uses the data and how accurate it must be. For example, typical uses are as follows:

● System development and operation status verification and debugging

● Efficiency monitoring

● Energy consumption monitoring

● Failure prediction

● Optimization (local and global)

● Improved accuracy (servo)

Examples

Every power architecture is different and there is no universal set of tradeoffs, so I'm going to give you a few examples of using supervisors and monitors to spark your imagination. And, when you understand the possibilities, you may find a competitive advantage.

Monitor failure instance

This example is taken from the LTC2974, a supervisory/monitoring device that manages four POLs. The output voltage of one of the POLs it is responsible for has a monitor based on a window comparator.

Figure 2: Faults generated by the monitor

Trace 4 is the FAULT/ pin of the device, and trace 3 is the ALERT/ pin of the device. I shorted the output to ground. On this device, the delay between grounding and FAULT/ going low is about 12μs. A very short time later, we also pull an ALERT/ low. These are very fast because the supervisor bypasses all the state machines required to monitor the slower ADC and generates a fault directly. It also stops the power conversion.

Looking at the PMBus, the PMBus host completes an Alert Response Address (ARA) transaction. Address 0x0C is placed on the bus, and the device that failed places address 0x64 on the bus. The host shifts this right one position to obtain address 0x32. The host then reads the stored information of the fault register by placing address 0x32 on the bus followed by command byte 0x79. A repeated start signal with address 0x32 is then placed on the bus and two data bytes are sent back to provide a status word of 0x8041.

Figure 3: Fault word bits

Looking at the datasheet for the device, it states that an undervoltage fault will occur.

Alternatively, this can be observed using an external tool that displays the registers and status of the device.

Figure 4: Fault status

Remember the usage model I proposed in another article?

The supervisor raises a fault which supports both models. It can be handled by the PMBus or an external tool.

(Note: We will see implementations of this design in future articles, but basically the ALERT/ pin is connected to a microcontroller interrupt.)

Temperature Monitoring Example

Many devices have the ability to monitor both the internal die temperature and the external temperature using a diode. In this case (LTC3880), I have a board manager that monitors the power rails via PMBus and has an LCD touch screen display.

Figure 5: Temperature monitoring

The telemetry graph shows the internal die temperature. As I placed my finger on the device and it cooled, a dip appeared in the graph. The minimum and maximum values on the graph were 30°C and 40°C, respectively. As you can see from the graph, the measurement results are pretty good.

Devices will use this temperature monitoring circuitry to protect themselves, but it can also be used to detect more subtle problems. If you add an I ² C temperature monitoring device and place the sensor around a PCB between the sensor and all the PMBus devices, you can get a good idea of how the board is performing. You can

This can be used to balance temperatures (by controlling the load), characterize the system under different load conditions, or simply send a warning message to the system operator so they can swap out the offending board and send it in for repair.

The same approach can be taken with efficiency. By measuring the input and output voltages and currents, you can calculate the power supply efficiency on the fly and use that information to optimize the system by shifting the workload to get the converter closer to its best efficiency. You can also watch for unusual patterns that may help you detect problems before they occur. Board managers typically have a communication interface that can give you these notifications.

Autonomous and managed systems

I would like to put these performance parameters into the context of using the model:

In my previous article I proposed two usage models:

1. Configure and Deploy

2. Monitor and Act

Another way to describe this is as an “autonomous versus managed” system. In an autonomous system, the power converter is powered up and operates completely independent of the PMBus, much like Model 1. A managed system actively uses the PMBus, much like Model 2.

These models have different performance characteristics. The performance of PMBus itself is limited by the 400Hz (typical) bus clock. The monitoring performance is independent of PMBus, whether implemented as an analog comparator and direct logic or slower logic in the digital processing unit.

In a managed system (e.g., Monitor and Act), the Act portion has the same performance as an autonomous system until the PMBus is in a decision loop managed by the host. When the host must read telemetry data and implement some function or parameter change on the device, performance is usually limited by the PMBus.

Also, the performance of the hosted system is very different because the host has to manage multiple power rails (the number of rails depends on the system architecture). Let's say it takes 200μs to read a value and correspondingly change a value (400kHz bus). Then let's say I have 10 power rails in the control loop of the host, which takes 2ms. Now, add a few I2C ^chips to monitor the temperature . Add other functions in the host that are not related to the PMBus, and you end up with a system that has a slower response time than the digital processing unit. And if you run the bus at 100kHz because of some slower I2C ^devices , the response will be even slower.

Because of this, we adopted a hybrid usage model, where critical functions are all handled by the digital processing unit (and fast supervisor) and do not rely on PMBus, while higher-level functions (such as energy consumption and fault prediction) are handled by a PMBus master.

By the same token, when higher-level features are not needed, the device can operate completely autonomously, and PMBus is an enabler for configuration tools. In particular, PMBus tools are very useful for board development and health verification. Such tools display the status of all power rails in the system in a dashboard format: telemetry, faults, and setpoints.

review

Most digital power devices have supervisors and monitors. I have characterized supervisors as fast-acting safety devices and monitors as sampling devices used for telemetry. While this is a convenient way to classify, the terminology should be used with caution, especially with regard to supervisors and faults. Sometimes the term "supervisor" is used for a fault generation method that uses data from a supervisor, so the delay is usually greater than that of a comparator.

There is nothing wrong with this. If a device already needs a monitor, and if the fault does not have to occur super fast, why pay for a comparator or logic device that is not needed? Just read the data sheet and look carefully at its block diagram to get a good idea of the device's operation and characteristics. Chip designers are very good at making trade-offs, but only you can determine whether the compromises are appropriate for your application. Generally speaking, however, you will find that if it is a safety issue, you will need a comparator, and if it is an accuracy issue, you will use a high-accuracy ADC.

While the importance of using monitors is fairly obvious, monitors sometimes go unappreciated until you want to use them. It’s easy to focus on determining the highs and lows of power rails, studying transient response, and all the other analog operating characteristics, without considering system-level considerations. But when you have PMBus with all its controllers, NVMRAM to store setpoints, and software tools for configuration, think about what you can do with the real-time data. With just a little extra effort, you might find a competitive advantage. And you don’t have to implement all the firmware up front. The good thing about firmware is that it can be upgraded without changing the hardware.

If you can predict failures or optimize efficiency, you can often recoup your investment in firmware (up to 100 times the development cost). All it takes is adding or adapting an existing FPGA or microcontroller to your design, some knowledge of the application area, and a little imagination.

In future articles I will go into more detail on PMBus integration. By now you probably think that is going to be a lot of work. However, it is not that difficult at all.

Reference address：[Digital Power Series Articles] Digital Power Monitoring and Telemetry

Previous article：The digital power design approach is the general trend
Next article：There's More Analog Technology in Digital Power Than You Think

Popular Resources
Popular amplifiers