This is probably the first article to date that explains how to debug Android lag issues.

Latest update time：2020-05-17

Reads：

Preface

FPS is short for frames per second, which is what we often call "frame rate". In the field of games, FPS is a basic indicator for measuring game performance. It is a very important data for game developers and mobile phone vendors, and calculating the FPS of games has become a basic requirement for daily testing. There are many tools on the market that can calculate FPS, so what are the methods for these tools to calculate FPS? What is the principle? This article will analyze the source code in depth to these questions and strive to find a detailed answer (source code analysis based on Android Q)

Calculation method

At present, most of the frame rate statistics software, and various fps statistics scripts that can be found on the Internet, use two sources of information: one is based on dumpsys SurfaceFlinger --latency Layer-name (note that it is the Layer name, not the package name, not the Activity name, as for why, it will be explained below); the other is based on dumpsys gfxinfo . In fact, the principles of these two are basically the same. This article focuses on analyzing the first one. Most of the frame rate statistics software on the market also uses the first one, but some software encapsulates this calculation logic into a so library to increase the difficulty of decompilation in order to avoid being seen by decompilers. However, it has been verified that these software finally calculates fps by calling the above command.

But why can this command calculate fps? Let's take a look at the output of this command, taking Honor of Kings as an example (games like Honor of Kings use SurfaceView as a control, so their layer names all SurfaceView - start with ):

> adb shell dumpsys SurfaceFlinger --latency "SurfaceView - com.tencent.tmgp.sgame/com.tencent.tmgp.sgame.SGameActivity#0"1666666659069638658663  59069678041684  5906965415829859069653090955  59069695022100  5906967089423659069671034444  59069711403455  5906968794986159069688421840  59069728057361  5906970441512159069705420850  59069744773350  5906972076783059069719818975  59069761378975  5906973741600759069736702673  59069778060955  5906975456866359069753361528  59069794716007  5906977076163259069768766371  59069811380486  59069787649600......

What do these numbers mean? First, the numbers in the first line are the current VSYNC interval in nanoseconds. For example, the current screen is 60Hz, so it is 16.6ms. Then there are a bunch of numbers below, a total of 127 lines (why 127 lines, will be explained below), each line has 3 numbers, each number is a timestamp in nanoseconds, the specific meaning will be explained later. When calculating fps, the second timestamp is used. The reason will also be explained later.

Fence Analysis

The following principle analysis involves fences, but fences cover a lot of content, so here I will just give a brief description of fences. If you are interested, I will write a detailed explanation article for fences later.

What is fence

First, let's explain how the content drawn by the App is displayed on the screen:

The content that the App needs to display needs to be drawn in the Buffer, and this Buffer is applied from the BufferQueue through dequeueBuffer() . After applying for the Buffer, the App fills the content into the Buffer and needs to queueBuffer() return the Buffer to SurfaceFlinger for synthesis and display through . Then, when SurfaceFlinger starts synthesis, it needs to call to acquireBuffer() get a Buffer from the BufferQueue for synthesis, and after synthesis, releaseBuffer() return the Buffer to the BufferQueue through , as shown below:

BufferQueue

In the above process, there is actually a problem. queueBuffer() When the App returns the Buffer after drawing, it is only completed on the CPU side, and the GPU side is not actually completed. Therefore, if this Buffer is used for synthesis/display at this time, there will be problems (the Buffer may not be completely drawn).

In fact, because the CPU and GPU were asynchronous before , when we execute a series of OpenGL function calls in the code, it seems that the function has returned, but in fact, the command is just placed in the local command buffer. The CPU does not know when this GL command is actually executed, unless it uses to glFinish() wait for these commands to be fully executed, but this will bring serious performance problems, because this will completely lose the parallelism between the CPU and GPU, and the CPU will be in a state of waiting until the GPU is completed. Therefore, if there is a mechanism, when there is no need to read and write the buffer, everyone does their own thing; when it is necessary to read and write the buffer, you can know the status of the buffer on the GPU at this time, and wait when necessary, there will be no above problems.

Fence is such a synchronization mechanism, just as its literal translation means - "fence", used to block things. So what does the fence block? It is the buffer mentioned above. During the entire process of drawing, synthesis, and display, the buffer is always passed between the CPU, GPU, and HWC. Before one party wants to use the buffer, it needs to check whether the previous user has transferred the "right to use" the buffer. The "right to use" here is the fence. When the fence is released (i.e., signal ), it means that the previous user of the buffer has handed over the right to use, and it is safe to operate on the buffer.

fence in code

In the Android source code, the implementation of fence is divided into four parts:

• fence driver Core implementation of synchronization • libsync Located in system/core/libsync , libsync the main function is to encapsulate the driver interface • Fence class This Fence class frameworks/native/libs/ui/Fence.cpp is located in the C++ package of libsync to facilitate framework calls. • FenceTime class This FenceTime is a tool class, which is Fence a further encapsulation of , providing two main interfaces - isValid() and getSignalTime() , which are mainly used for scenarios where the release time of the fence needs to be queried multiple times (by calling Fence::getSignalTime() to query the release time of the fence). By Fence wrapping , when is called for the first time FenceTime::getSignalTime() , if the fence has been released, the release time of the fence will be cached, and then the next FenceTime::getSignal() time is called, the cached release time can be directly returned, thereby reducing Fence::getSignalTime() unnecessary calls to (because the release time of the fence will not change).

fence in Android

In Android, there are three types of fences: acquire fence, release fence and present fence. Among them, acquire fence and release fence belong to Layer, and present fence belongs to frame (ie Layers):

• acquire fence As mentioned earlier, queueBuffer() when the App returns the Buffer to the BufferQueue through , the GPU side of the Buffer is not yet completed. At this time, a fence will be attached, which is the acquire fence . When SurfaceFlinger/HWC wants to read the Buffer for synthesis operations, it needs to wait until the acquire fence is released. • release fence When the App dequeueBuffer() requests a Buffer from the BufferQueue and wants to draw on the Buffer, it needs to ensure that the HWC no longer needs the Buffer, that is, it needs to wait for the release fence signal before writing to the Buffer. • present fence The present fence was called the retire fence in HWC1 and was renamed the present fence in HWC2. When the current frame is successfully displayed on the screen, the present fence will signal.

Principle Analysis

Simple version

Now let's look at dumpsys SurfaceFlinger --latency Layer-name the principle of calculating Layer fps through . I won't go into details about the calling process of dumpsys, which will eventually lead to SurfaceFlinger::doDump() :

status_t SurfaceFlinger::doDump(int fd, const DumpArgs& args,                                bool asProto) NO_THREAD_SAFETY_ANALYSIS {
    ...        static const std::unordered_map<std::string, Dumper> dumpers = {                ......                {"--latency"s, argsDumper(&SurfaceFlinger::dumpStatsLocked)},                ......        };

From here we can see that dumpsys SurfaceFlinger the parameters we added after executing --xxx will eventually be parsed here. Here we are --latency , so look SurfaceFlinger::dumpStatsLocked :

void SurfaceFlinger::dumpStatsLocked(const DumpArgs& args, std::string& result) const {    StringAppendF(&result, "%" PRId64 "\n", getVsyncPeriod());
    if (args.size() > 1) {        const auto name = String8(args[1]);        mCurrentState.traverseInZOrder([&](Layer* layer) {            if (name == layer->getName()) {                layer->dumpFrameStats(result);            }        });    } else {        mAnimFrameTracker.dumpStats(result);    }}

As you can see here, it will first print the current VSYNC interval, then traverse the current layer, and then compare the layer names one by one. If they are consistent with the parameters passed in, it will start dumping the layer information; otherwise the command will end. Therefore, many people will encounter this problem:

❔Why does this command only print a number?

✔ In fact, at this time you should check whether your Layer parameters are correct.

Next, layer->dumpFrameStats() I will adjust FrameTrack::dumpStats() :

void FrameTracker::dumpStats(std::string& result) const {    Mutex::Autolock lock(mMutex);    processFencesLocked();
    const size_t o = mOffset;    for (size_t i = 1; i < NUM_FRAME_RECORDS; i++) {        const size_t index = (o+i) % NUM_FRAME_RECORDS;        base::StringAppendF(&result, "%" PRId64 "\t%" PRId64 "\t%" PRId64 "\n",                            mFrameRecords[index].desiredPresentTime,                            mFrameRecords[index].actualPresentTime,                            mFrameRecords[index].frameReadyTime);    }    result.append("\n");}

NUM_FRAME_RECORDS is defined as 128, so the output array has 127. Each group has three numbers - desiredPresentTime , actualPresentTime , frameReadyTime , and the meaning of each number is:

• desiredPresentTime Timestamp of next HW-VSYNC • actualPresentTime Timestamp of the retire fence signal • frameReadyTime Timestamp of the acquire fence signal

Combined with the previous description of present fence, we can see dumpsys SurfaceFlinger --latency the principle of calculating fps:

From dumpsys SurfaceFlinger --latency Get the signal time of the present fence of the latest 127 frames. Combined with the previous description of present fence, when the present fence of a frame is signaled, it means that this frame has been displayed on the screen. Therefore, we can determine how many present fence signals there are in one second, and then infer how many frames are refreshed on the screen in one second , and thus calculate the fps.

Complex version

We already know the principle of fps calculation, but, kids, do you have a lot of questions?

• Where did this actualPresentTime come from? • Suppose the layer for counting fps is not updated, but other layers are updated. In this case, the present fence will also signal normally. Then, will the calculated fps be inaccurate?

To answer these questions, we have to continue watching.

As mentioned above, the second value is used when calculating fps, so the following article will focus on analyzing this actualPresentTime . So actualPresentTime where is the value assigned to ? The actual assignment location is in FrameTracker::dumpStats() a sub-function called by processFencesLocked() :

void FrameTracker::processFencesLocked() const {    FrameRecord* records = const_cast<FrameRecord*>(mFrameRecords);    int& numFences = const_cast<int&>(mNumFences);
    for (int i = 1; i < NUM_FRAME_RECORDS && numFences > 0; i++) {        size_t idx = (mOffset+NUM_FRAME_RECORDS-i) % NUM_FRAME_RECORDS;        ...        const std::shared_ptr<FenceTime>& pfence =                records[idx].actualPresentFence;        if (pfence != nullptr) {            // actualPresentTime 是在这里赋值的            records[idx].actualPresentTime = pfence->getSignalTime();            if (records[idx].actualPresentTime < INT64_MAX) {                records[idx].actualPresentFence = nullptr;                numFences--;                updated = true;            }        }        ......

The complete definition of is as follows FrameRecord :

struct FrameRecord {     FrameRecord() :         desiredPresentTime(0),         frameReadyTime(0),         actualPresentTime(0) {}     nsecs_t desiredPresentTime;     nsecs_t frameReadyTime;     nsecs_t actualPresentTime;     std::shared_ptr<FenceTime> frameReadyFence;     std::shared_ptr<FenceTime> actualPresentFence;};

As can be seen from the above code, the value of actualPresentTime is assigned actualPresentFence by calling getSignalTime() . And actualPresentFence is setActualPresentFence() assigned by :

void FrameTracker::setActualPresentFence(        std::shared_ptr<FenceTime>&& readyFence) {    Mutex::Autolock lock(mMutex);    mFrameRecords[mOffset].actualPresentFence = std::move(readyFence);    mNumFences++;}

setActualPresentFence() It is finally called through the following calling process:

SurfaceFlinger::postComposition()  \_ BufferLayer::onPostCompostion()

Here we focus on SurfaceFlinger::postComposition() :

void SurfaceFlinger::postComposition(){    ......    mDrawingState.traverseInZOrder([&](Layer* layer) {        bool frameLatched =                layer->onPostComposition(displayDevice->getId(), glCompositionDoneFenceTime,                                         presentFenceTime, compositorTiming);    ......

Recall our previous question:

❔Suppose the layer for counting fps is not updated, but other layers are updated. In this case, the present fence will also signal normally. Then, will the calculated fps be inaccurate?

The answer is that mDrawingState there are two global variables in Surfacelinger that are used to record the state of Layers in the current system:

• mDrawingState mDrawingState represents the state of the last "drawing" • mCurrentState mCurrentState represents the current state. Therefore, if the current Layer is not updated, it will not be recorded in mDrawingState, so the present fence will not be recorded in the FrameTracker Layer actualPresentTime .

Let's go back SurfaceFlinger::postComposition() to the final stage of SurfaceFlinger synthesis. presentFenceTime This is the previous readyFence parameter, which is assigned here:

mPreviousPresentFences[0] = mActiveVsyncSource        ? getHwComposer().getPresentFence(*mActiveVsyncSource->getId())        : Fence::NO_FENCE;auto presentFenceTime = std::make_shared<FenceTime>(mPreviousPresentFences[0]);

And getPresentFence() this function transfers this process to HWC:

sp<Fence> HWComposer::getPresentFence(DisplayId displayId) const {    RETURN_IF_INVALID_DISPLAY(displayId, Fence::NO_FENCE);    return mDisplayData.at(displayId).lastPresentFence;}

So far, we have finally found the true body of this present fence, but it is still covered by a veil. We need to see lastPresentFence where this is assigned. The position here is different according to different synthesis methods:

DEVICE Synthesis

DEVICE synthesis lastPresentFence is HWComposer::prepare() assigned in:

status_t HWComposer::prepare(DisplayId displayId, const compositionengine::Output& output) {    ......    if (!displayData.hasClientComposition) {        sp<Fence> outPresentFence;        uint32_t state = UINT32_MAX;        error = hwcDisplay->presentOrValidate(&numTypes, &numRequests, &outPresentFence , &state);        if (error != HWC2::Error::HasChanges) {            RETURN_IF_HWC_ERROR_FOR("presentOrValidate", error, displayId, UNKNOWN_ERROR);        }        if (state == 1) { //Present Succeeded.            ......            displayData.lastPresentFence = outPresentFence;

Students who often watch systrace will definitely be familiar with this function, which is the one of SurfaceFlinger in systrace prepare() :

Prepare in Systrace

This function is very important, it goes through a series of calls:

HWComposer::prepare()  \_ Display::presentOrValidate()       \_ Composer::presentOrValidateDisplay()            \_ CommandWriter::presentOrvalidateDisplay()

Finally, HwBinder notifies the server of HWC to start DEVICE synthesis. After receiving the request from the client, the server returns a present fence to the client (always remember that fence is used for synchronization across environments, for example, here it is the synchronization between Surfacelinger and HWC). Then when the next HW-VSYNC comes, the synthesized content will be displayed on the screen and the present fence signal will be sent, indicating that this frame has been displayed on the screen.

GPU Compositing

GPU synthesis lastPresentFence is presentAndGetPresentFences() assigned in :

status_t HWComposer::presentAndGetReleaseFences(DisplayId displayId) {    ......    displayData.lastPresentFence = Fence::NO_FENCE;    auto error = hwcDisplay->present(&displayData.lastPresentFence);

The subsequent process is similar to DEVICE synthesis. Display::present() In the end, it will also go through a series of calls, notify the HWC server through HwBinder, and call to presentDisplay() display the synthesized content on the screen.

Summarize

Having said so much, the principle of calculating the fps of an App can be summarized in one sentence:

Count how many frames the app refreshes to the screen in one second. In the Android world, the sign of each frame being displayed on the screen is: present fence signal. Therefore, calculating the fps of the app can be converted into: how many valid present fence signals the app's layer has in one second (here valid present fence means that the layer has an updated present fence in this VSYNC)

Latest articles about

■Wow! Enter the application interface in 5.2 seconds! Linux quick boot solution sharing, based on Allwinner T113-i domestic platform

■CPU cache consistency: from theory to practice

■Throw some cold water on the cunning Hongmeng

■The process of receiving network data packets

■Let's talk about the current AI and a bunch of other things in plain language

■Vomiting blood sorting | Liver over Linux interrupt all knowledge points

■Introduction to Linux V4L2 subsystem and video codec equipment

■Arm64 stack backtrace

■Unbeatable! I strongly recommend taking the software exam this year!

■Domestic real-time operating system: real-time comparison with RT-Linux and Zephyr