03
Distributed Encoding
After optimizing the above problems, the single-machine live broadcast system can achieve stable 8K 50FPS real-time encoding online. But the challenge we face may be to achieve an encoding speed of 120FPS. We proposed to support distributed transcoding in the live broadcast system. On-demand distributed transcoding should be familiar. After getting a file, cut the file into very small fragments, and send the fragments to multiple machines for processing. Each machine only processes a very short fragment, and finally the transcoded fragments are formed into a complete file.
Live broadcast distributed transcoding actually refers to the on-demand method. Usually, live broadcast transcoding completes all operations such as decoding, watermarking, filtering, encoding, etc. on a single machine, but we have modified the live broadcast process. It does not perform actual encoding, but pulls a live stream and cuts it into small pieces in real time according to the GOP dimension in the live broadcast scenario, sends it to the existing on-demand computing power node, and quickly transcodes it in the form of files. After the conversion, the live broadcast system recycles these small pieces and splices them into a real-time live stream for distribution. The overall idea is similar to on-demand distributed transcoding. From one machine performing all operations to n machines performing real-time transcoding together, the encoding speed and efficiency are greatly improved.
The other part is how to support real-time super-resolution? 4K real-time super-resolution can be achieved through video enhancement, AI enhancement algorithms and other operations, but it is currently difficult to support real-time super-resolution 8K. In this context, we use distributed enhancement capabilities to support super-resolution from 4K to 8K during live broadcasts. After decoding a video frame, the video frame will be compressed and sent to the downstream enhanced computing node in the frame dimension. Each computing node only performs super-resolution operations on a single frame. Through the massive GPU resources of the computing node, super-resolution enhancement from 4K to 8K is achieved in live broadcasts.
04
Network Optimization
As a cloud vendor, there are many scenarios for private cloud deployment, and the network and device environment in the private cloud environment is more complex.
For example, we can see from our monitoring that the entire live transcoding system will experience TCP slowdown every 1-2 hours. The reason may be that after the transcoding service we provide receives the streaming data packet, the ack message is sent from the virtual network card to the physical network card with a 3s delay, which should be instantaneous normally.
First, we suspected that it was a system load problem. When the CPU, memory, and bandwidth utilization were all good, we found that this problem occurred on some machines. The transmission process is TCP→IP layer→bond qdisc→ethX qdisc→ethX pcap. In the process of capturing packets from the virtual network card and the physical network card, we found that the reason for the slow speed was that the bond network card would delay 2-3 seconds to send the ack message. Through in-depth analysis with the netrace tool, we found that from the process of taking packets from qdisc to sending them to the driver, the driver status indicated that the message could not be sent. Finally, we confirmed that the network card driver of the proprietary environment equipment had an exception during large-volume transmission.
05
Distribution Optimization
Another problem encountered was that the customer's network environment was limited, as reflected in the fact that the internal network bandwidth was only a gigabit switch.
Under this condition, a more accurate load balancing algorithm is needed, and higher-performance APIs and system functions are used when sending UDP multicast packets. Here are a few tips:
1. The flow rate of UDP packets can be controlled. Because the encoding bit rate cannot be completely stable and fluctuates greatly, it can be achieved as long as the UDP packet sending rate is controlled below the network bandwidth limit.
2. Make full use of the two network cards of the switch. Configure a bond virtual network card and use this virtual network card for interaction to expand the original single-machine gigabit bandwidth.
After a series of optimizations, the real-time 8K transcoding system was also deployed in the customer's proprietary environment. The system deployed internally by CCTV.com supports real-time 8K encoding.
06
Summary and Outlook
Finally, let’s talk about the summary and outlook. Let’s talk about the transcoding service first:
1. Transcoding services must first optimize the encoder. The encoder optimization is divided into two major directions: first, how to improve the overall encoding parallelism and CPU resource utilization, and second, how to reduce the CPU computing power.
2. Different decoding optimization schemes are implemented for different decoders. For example, for AVS3, the conversion from NV12 to YUV is moved to the encoder kernel layer for operation; for H265, multi-TILE parallel encoding is used for acceleration
3. Solve the bottleneck of memory bandwidth. Optimize memory bandwidth by managing each operation, reducing all memory copies and memory bandwidth usage.
4. Improved stability of the transcoding link. This involves access to remote memory and local memory. It is necessary to plan which CPU each operation will run on, reduce cross-NUMA operations, and improve overall access efficiency.
Transcoding cluster:
1. Distributed transcoding supports up to 8K 120FPS transcoding, 4K to 8K super-resolution, etc. through multi-machine and parallel transcoding capabilities
2. For customer scenarios, we should pay more attention to possible TCP slowness, packet loss and other issues. Secondly, we should smooth the UDP packet sending algorithm and optimize the load balancing algorithm for distribution in the customer's restricted network environment.
Previous article:What is the best way to adjust the power amplifier?
Next article:What is the difference between noise, phase noise, signal-to-noise ratio, and noise figure?
- Popular Resources
- Popular amplifiers
- Red Hat announces definitive agreement to acquire Neural Magic
- 5G network speed is faster than 4G, but the perception is poor! Wu Hequan: 6G standard formulation should focus on user needs
- SEMI report: Global silicon wafer shipments increased by 6% in the third quarter of 2024
- OpenAI calls for a "North American Artificial Intelligence Alliance" to compete with China
- OpenAI is rumored to be launching a new intelligent body that can automatically perform tasks for users
- Arm: Focusing on efficient computing platforms, we work together to build a sustainable future
- AMD to cut 4% of its workforce to gain a stronger position in artificial intelligence chips
- NEC receives new supercomputer orders: Intel CPU + AMD accelerator + Nvidia switch
- RW61X: Wi-Fi 6 tri-band device in a secure i.MX RT MCU
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- 【ST NUCLEO-G071RB Review】SPI
- [BearPi-HM Nano, play with Hongmeng "Touch and Go"]-3-HarmonyOS compilation architecture learning record
- Can NXP's crossover processor RT1052 execute code in RAM?
- What is the difference between an industrial 4G router and a regular 4G router? Which one have you used?
- Download the information and watch the video to win a prize! Tektronix Automotive Electronics Test Solutions
- Circuit Diagram
- Solution to the problem that there is only one File option after opening CCS
- STMicroelectronics Industrial Tour 2019, you are sincerely invited to come!
- Will Too Many Vias in the RF PCB Power Layer Affect Power Integrity?
- "Recommend Chinese Chip" + Qinheng CH579