This is probably the first article to date that explains how to debug Android lag issues.
Preface
FPS is short for frames per second, which is what we often call "frame rate". In the field of games, FPS is a basic indicator for measuring game performance. It is a very important data for game developers and mobile phone vendors, and calculating the FPS of games has become a basic requirement for daily testing. There are many tools on the market that can calculate FPS, so what are the methods for these tools to calculate FPS? What is the principle? This article will analyze the source code in depth to these questions and strive to find a detailed answer (source code analysis based on Android Q)
Calculation method
At present, most of the frame rate statistics software, and various fps statistics scripts that can be found on the Internet, use two sources of information: one is based on
dumpsys SurfaceFlinger --latency Layer-name
(note that it is the Layer name, not the package name, not the Activity name, as for why, it will be explained below); the other is based on
dumpsys gfxinfo
. In fact, the principles of these two are basically the same. This article focuses on analyzing the first one. Most of the frame rate statistics software on the market also uses the first one, but some software encapsulates this calculation logic into a so library to increase the difficulty of decompilation in order to avoid being seen by decompilers. However, it has been verified that these software finally calculates fps by calling the above command.
But why can this command calculate fps? Let's take a look at the output of this command, taking Honor of Kings as an example (games like Honor of Kings use SurfaceView as a control, so their layer names all
SurfaceView -
start with ):
> adb shell dumpsys SurfaceFlinger --latency "SurfaceView - com.tencent.tmgp.sgame/com.tencent.tmgp.sgame.SGameActivity#0"
16666666
59069638658663 59069678041684 59069654158298
59069653090955 59069695022100 59069670894236
59069671034444 59069711403455 59069687949861
59069688421840 59069728057361 59069704415121
59069705420850 59069744773350 59069720767830
59069719818975 59069761378975 59069737416007
59069736702673 59069778060955 59069754568663
59069753361528 59069794716007 59069770761632
59069768766371 59069811380486 59069787649600
......
What do these numbers mean? First, the numbers in the first line are the current VSYNC interval in nanoseconds. For example, the current screen is 60Hz, so it is 16.6ms. Then there are a bunch of numbers below, a total of 127 lines (why 127 lines, will be explained below), each line has 3 numbers, each number is a timestamp in nanoseconds, the specific meaning will be explained later. When calculating fps, the second timestamp is used. The reason will also be explained later.
Fence Analysis
The following principle analysis involves fences, but fences cover a lot of content, so here I will just give a brief description of fences. If you are interested, I will write a detailed explanation article for fences later.
What is fence
First, let's explain how the content drawn by the App is displayed on the screen:
The content that the App needs to display needs to be drawn in the Buffer, and this Buffer is applied from the BufferQueue through
dequeueBuffer()
. After applying for the Buffer, the App fills the content into the Buffer and needs to
queueBuffer()
return the Buffer to SurfaceFlinger for synthesis and display through . Then, when SurfaceFlinger starts synthesis, it needs to call to
acquireBuffer()
get a Buffer from the BufferQueue for synthesis, and after synthesis,
releaseBuffer()
return the Buffer to the BufferQueue through , as shown below:
In the above process, there is actually a problem.
queueBuffer()
When the App returns the Buffer after drawing,
it is only completed on the CPU side, and the GPU side is not actually completed.
Therefore, if this Buffer is used for synthesis/display at this time, there will be problems (the Buffer may not be completely drawn).
In fact, because the CPU and GPU were
asynchronous
before
, when we execute a series of OpenGL function calls in the code, it seems that the function has returned, but in fact, the command is just placed in the local command buffer. The CPU does not know when this GL command is actually executed, unless it uses to
glFinish()
wait for these commands to be fully executed, but this will bring serious performance problems, because this will completely lose the parallelism between the CPU and GPU, and the CPU will be in a state of waiting until the GPU is completed. Therefore, if there is a mechanism, when there is no need to read and write the buffer, everyone does their own thing; when it is necessary to read and write the buffer, you can know the status of the buffer on the GPU at this time, and wait when necessary, there will be no above problems.
Fence is such a synchronization mechanism, just as its literal translation means - "fence", used to block things. So what does the fence block? It is the buffer mentioned above. During the entire process of drawing, synthesis, and display, the buffer is always passed between the CPU, GPU, and HWC. Before one party wants to use the buffer, it needs to check whether the previous user has transferred the "right to use" the buffer. The "right to use" here is the fence. When the fence is released (i.e., signal ), it means that the previous user of the buffer has handed over the right to use, and it is safe to operate on the buffer.
fence in code
In the Android source code, the implementation of fence is divided into four parts:
•
fence driver
Core implementation of synchronization
•
libsync
Located in
system/core/libsync
,
libsync
the main function is to encapsulate the driver interface
•
Fence class
This
Fence
class
frameworks/native/libs/ui/Fence.cpp
is located in the C++ package of libsync to facilitate framework calls.
•
FenceTime class
This
FenceTime
is a tool class, which is
Fence
a further encapsulation of , providing two main interfaces -
isValid()
and
getSignalTime()
, which are mainly used for scenarios where the release time of the fence needs to be queried multiple times (by calling
Fence::getSignalTime()
to query the release time of the fence). By
Fence
wrapping , when is called for the first time
FenceTime::getSignalTime()
, if the fence has been released, the release time of the fence will be cached, and then the next
FenceTime::getSignal()
time is called, the cached release time can be directly returned, thereby reducing
Fence::getSignalTime()
unnecessary calls to (because the release time of the fence will not change).
fence in Android
In Android, there are three types of fences: acquire fence, release fence and present fence. Among them, acquire fence and release fence belong to Layer, and present fence belongs to frame (ie Layers):
•
acquire fence
As mentioned earlier,
queueBuffer()
when the App returns the Buffer to the BufferQueue through , the GPU side of the Buffer is not yet completed. At this time, a fence will be attached, which is the
acquire fence
. When SurfaceFlinger/HWC wants to read the Buffer for synthesis operations, it needs to wait until
the acquire fence
is released.
•
release fence
When the App
dequeueBuffer()
requests a Buffer from the BufferQueue and wants to draw on the Buffer, it needs to ensure that the HWC no longer needs the Buffer, that is, it needs to wait for the release fence signal before writing to the Buffer.
•
present fence
The present fence was called the retire fence in HWC1 and was renamed the present fence in HWC2. When the current frame is successfully displayed on the screen, the present fence will signal.
Principle Analysis
Simple version
Now let's look at
dumpsys SurfaceFlinger --latency Layer-name
the principle of calculating Layer fps through . I won't go into details about the calling process of dumpsys, which will eventually lead to
SurfaceFlinger::doDump()
:
status_t SurfaceFlinger::doDump(int fd, const DumpArgs& args,
bool asProto) NO_THREAD_SAFETY_ANALYSIS {
...
static const std::unordered_map<std::string, Dumper> dumpers = {
......
{"--latency"s, argsDumper(&SurfaceFlinger::dumpStatsLocked)},
......
};
From here we can see that
dumpsys SurfaceFlinger
the parameters we added after
executing
--xxx
will eventually be parsed here. Here we are
--latency
, so look
SurfaceFlinger::dumpStatsLocked
:
void SurfaceFlinger::dumpStatsLocked(const DumpArgs& args, std::string& result) const {
StringAppendF(&result, "%" PRId64 "\n", getVsyncPeriod());
if (args.size() > 1) {
const auto name = String8(args[1]);
mCurrentState.traverseInZOrder([&](Layer* layer) {
if (name == layer->getName()) {
layer->dumpFrameStats(result);
}
});
} else {
mAnimFrameTracker.dumpStats(result);
}
}
As you can see here, it will first print the current VSYNC interval, then traverse the current layer, and then compare the layer names one by one. If they are consistent with the parameters passed in, it will start dumping the layer information; otherwise the command will end. Therefore, many people will encounter this problem:
❔Why does this command only print a number?
✔ In fact, at this time you should check whether your Layer parameters are correct.
Next,
layer->dumpFrameStats()
I will adjust
FrameTrack::dumpStats()
:
void FrameTracker::dumpStats(std::string& result) const {
Mutex::Autolock lock(mMutex);
processFencesLocked();
const size_t o = mOffset;
for (size_t i = 1; i < NUM_FRAME_RECORDS; i++) {
const size_t index = (o+i) % NUM_FRAME_RECORDS;
base::StringAppendF(&result, "%" PRId64 "\t%" PRId64 "\t%" PRId64 "\n",
mFrameRecords[index].desiredPresentTime,
mFrameRecords[index].actualPresentTime,
mFrameRecords[index].frameReadyTime);
}
result.append("\n");
}
NUM_FRAME_RECORDS
is defined as 128, so the output array has 127. Each group has three numbers -
desiredPresentTime
,
actualPresentTime
,
frameReadyTime
, and the meaning of each number is:
•
desiredPresentTime
Timestamp of next HW-VSYNC
•
actualPresentTime
Timestamp of the retire fence signal
•
frameReadyTime
Timestamp of the acquire fence signal
Combined with the previous description of present fence, we can see
dumpsys SurfaceFlinger --latency
the principle of calculating fps:
From
dumpsys SurfaceFlinger --latency
Get the signal time of the present fence of the latest 127 frames. Combined with the previous description of present fence, when the present fence of a frame is signaled, it means that this frame has been displayed on the screen. Therefore, we can determine how many present fence signals there are in one second, and then infer how many frames are refreshed on the screen in one second , and thus calculate the fps.
Complex version
We already know the principle of fps calculation, but, kids, do you have a lot of questions?
•
Where did
this
actualPresentTime
come from?
•
Suppose the layer for counting fps is not updated, but other layers are updated. In this case, the present fence will also signal normally. Then, will the calculated fps be inaccurate?
To answer these questions, we have to continue watching.
As mentioned above, the second value is used when calculating fps, so the following article will focus on analyzing this
actualPresentTime
. So
actualPresentTime
where is the value assigned to ? The actual assignment location is in
FrameTracker::dumpStats()
a sub-function called by
processFencesLocked()
:
void FrameTracker::processFencesLocked() const {
FrameRecord* records = const_cast<FrameRecord*>(mFrameRecords);
int& numFences = const_cast<int&>(mNumFences);
for (int i = 1; i < NUM_FRAME_RECORDS && numFences > 0; i++) {
size_t idx = (mOffset+NUM_FRAME_RECORDS-i) % NUM_FRAME_RECORDS;
...
const std::shared_ptr<FenceTime>& pfence =
records[idx].actualPresentFence;
if (pfence != nullptr) {
// actualPresentTime 是在这里赋值的
records[idx].actualPresentTime = pfence->getSignalTime();
if (records[idx].actualPresentTime < INT64_MAX) {
records[idx].actualPresentFence = nullptr;
numFences--;
updated = true;
}
}
......
The complete definition of is as follows
FrameRecord
:
struct FrameRecord {
FrameRecord() :
desiredPresentTime(0),
frameReadyTime(0),
actualPresentTime(0) {}
nsecs_t desiredPresentTime;
nsecs_t frameReadyTime;
nsecs_t actualPresentTime;
std::shared_ptr<FenceTime> frameReadyFence;
std::shared_ptr<FenceTime> actualPresentFence;
};
As can be seen from the above code,
the value of
actualPresentTime
is assigned
actualPresentFence
by
calling
getSignalTime()
. And
actualPresentFence
is
setActualPresentFence()
assigned by :
void FrameTracker::setActualPresentFence(
std::shared_ptr<FenceTime>&& readyFence) {
Mutex::Autolock lock(mMutex);
mFrameRecords[mOffset].actualPresentFence = std::move(readyFence);
mNumFences++;
}
setActualPresentFence()
It is finally called through the following calling process:
SurfaceFlinger::postComposition()
\_ BufferLayer::onPostCompostion()
Here we focus on
SurfaceFlinger::postComposition()
:
void SurfaceFlinger::postComposition()
{
......
mDrawingState.traverseInZOrder([&](Layer* layer) {
bool frameLatched =
layer->onPostComposition(displayDevice->getId(), glCompositionDoneFenceTime,
presentFenceTime, compositorTiming);
......
Recall our previous question:
❔Suppose the layer for counting fps is not updated, but other layers are updated. In this case, the present fence will also signal normally. Then, will the calculated fps be inaccurate?
The answer is that
mDrawingState
there are two global variables in Surfacelinger that are used to record the state of Layers in the current system:
•
mDrawingState
mDrawingState represents the state of the last "drawing"
•
mCurrentState
mCurrentState represents the current state. Therefore,
if the current Layer is not updated, it will not be recorded in mDrawingState, so the present fence will not be recorded in the
FrameTracker
Layer
actualPresentTime
.
Let's go back
SurfaceFlinger::postComposition()
to the final stage of SurfaceFlinger synthesis.
presentFenceTime
This is the previous
readyFence
parameter, which is assigned here:
mPreviousPresentFences[0] = mActiveVsyncSource
? getHwComposer().getPresentFence(*mActiveVsyncSource->getId())
: Fence::NO_FENCE;
auto presentFenceTime = std::make_shared<FenceTime>(mPreviousPresentFences[0]);
And
getPresentFence()
this function transfers this process to HWC:
sp<Fence> HWComposer::getPresentFence(DisplayId displayId) const {
RETURN_IF_INVALID_DISPLAY(displayId, Fence::NO_FENCE);
return mDisplayData.at(displayId).lastPresentFence;
}
So far, we have finally found the true body of this present fence, but it is still covered by a veil. We need to see
lastPresentFence
where this is assigned. The position here is different according to different synthesis methods:
DEVICE Synthesis
DEVICE synthesis
lastPresentFence
is
HWComposer::prepare()
assigned in:
status_t HWComposer::prepare(DisplayId displayId, const compositionengine::Output& output) {
......
if (!displayData.hasClientComposition) {
sp<Fence> outPresentFence;
uint32_t state = UINT32_MAX;
error = hwcDisplay->presentOrValidate(&numTypes, &numRequests, &outPresentFence , &state);
if (error != HWC2::Error::HasChanges) {
RETURN_IF_HWC_ERROR_FOR("presentOrValidate", error, displayId, UNKNOWN_ERROR);
}
if (state == 1) { //Present Succeeded.
......
displayData.lastPresentFence = outPresentFence;
Students who often watch systrace will definitely be familiar with this function, which is the one of SurfaceFlinger in systrace
prepare()
:
This function is very important, it goes through a series of calls:
HWComposer::prepare()
\_ Display::presentOrValidate()
\_ Composer::presentOrValidateDisplay()
\_ CommandWriter::presentOrvalidateDisplay()
Finally, HwBinder notifies the server of HWC to start DEVICE synthesis. After receiving the request from the client, the server returns a present fence to the client (always remember that fence is used for synchronization across environments, for example, here it is the synchronization between Surfacelinger and HWC). Then when the next HW-VSYNC comes, the synthesized content will be displayed on the screen and the present fence signal will be sent, indicating that this frame has been displayed on the screen.
GPU Compositing
GPU synthesis
lastPresentFence
is
presentAndGetPresentFences()
assigned in :
status_t HWComposer::presentAndGetReleaseFences(DisplayId displayId) {
......
displayData.lastPresentFence = Fence::NO_FENCE;
auto error = hwcDisplay->present(&displayData.lastPresentFence);
The subsequent process is similar to DEVICE synthesis.
Display::present()
In the end, it will also go through a series of calls, notify the HWC server through HwBinder, and call to
presentDisplay()
display the synthesized content on the screen.
Summarize
Having said so much, the principle of calculating the fps of an App can be summarized in one sentence:
Count how many frames the app refreshes to the screen in one second. In the Android world, the sign of each frame being displayed on the screen is: present fence signal. Therefore, calculating the fps of the app can be converted into: how many valid present fence signals the app's layer has in one second (here valid present fence means that the layer has an updated present fence in this VSYNC)