League of Legends S11 live broadcast delayed by 30 seconds, netizens' reactions this time are a little different
Yang Jing Xiaoxiao sent from Aofei Temple
Quantum Bit Report | Public Account QbitAI
The live broadcast was delayed for dozens of seconds, but netizens still said " It smells so good "? ! !
You heard it right, this happened in a global event like League of Legends S11 , and the delay was as high as 30 seconds.
You should know that the number of viewers of S11 is in the tens of millions . The highest number of simultaneous viewers of last year's finals was 45.95 million .
For top events like this, ensuring low latency in sound and picture quality should be the "basic operation" of all major platforms . Even the slightest extra delay is absolutely intolerable.
Take the Dota2 live broadcast a few days ago as an example. It was delayed by 15 minutes, and netizens were furious...
This time, during the S11 live broadcast, an official channel had a delay of up to tens of seconds.
This seems to be a major live broadcast accident, right?
But what I never expected was that the style of the barrage was so "comfortable" and "really good".
What's going on?
We followed the clues and came to this channel, and found that this is a barrier-free live broadcast room launched by B station specifically for the hearing-impaired:
Unlike regular live broadcast rooms, this room has AI real-time subtitles, and the team names "T1", "poke" and other jargon mentioned by the commentator can basically be displayed correctly.
There will be post-match interviews and sign language commentary, which will be delayed by dozens of seconds compared to regular live broadcasts.
△
It has already been used by people with hearing impairment
In fact, AI real-time speech recognition technology behind live subtitles has been widely used, including YouTube live subtitles, Google mobile device video subtitles, and Microsoft PPT presentation subtitles.
However, there are not many platforms like this that have set aside a barrier-free live broadcast room specifically for live broadcasts.
To create a truly barrier-free live broadcast room, what is the technical difference between it and ordinary real-time subtitles?
We took a deeper look and found that it is more " difficult" than we thought .
What’s so special about accessible speech recognition?
Before understanding the particularity of accessibility, you need to know how real-time subtitles are produced in live broadcasts.
From a process perspective, real-time subtitles are located between live video encoding and decoding.
Real-time subtitles are the result of rapid speech recognition of audio during video encoding and decoding, and then outputting it together with the video. The overall process is roughly like this:
△
Simplified process
It can be seen that the video itself still needs to go through transmission processes such as encoding and decoding, and real-time subtitle production is in the middle of encoding and decoding.
From the perspective of technology itself, real-time subtitles use speech recognition , which is divided into two types: manual recognition and automatic speech recognition (ASR) .
Previously, due to the low ASR accuracy (especially Chinese recognition) and the delay of several minutes for manual recognition, few large-scale live broadcasts used real-time AI subtitles.
With the development of AI technology in recent years, more and more automatic speech recognition (ASR) is used in videos to make subtitles. It is divided into streaming ASR and non-streaming ASR .
Non-streaming refers to the result of inputting a whole segment of speech and then outputting text; streaming refers to the real-time output of the text-to-text result like an "assembly line".
The current streaming ASR can achieve extremely fast output speed (millisecond level, which appears to be real-time to the naked eye) , and can achieve good accuracy after training; but at the same time, it still has a lot of room for optimization.
When choosing a speech recognition method for different live broadcasts, the main considerations are accuracy and recognition speed . For example , news live broadcasts focus more on accuracy, while entertainment event live broadcasts focus more on recognition speed.
BUT, live broadcasts of events will present new challenges for accessibility :
Since hearing-impaired people cannot quickly establish the connection between hearing and sight, typos require more reaction time, and subtitles must be more accurate. In addition, the transcribed words need to have a certain visual fluency. Finally, the delay in live broadcasts of events cannot be too high.
On the one hand, due to the limitations of speech pauses and the length of audio segments, streaming ASR can currently achieve low latency and basic accuracy, but the fluency of visual reading is often reduced. "Every word is known, but after the pause, it is incomprehensible":
△
I know every word, but I just can’t understand it
On the other hand, the streaming ASR model requires audio input of a certain length before it can process and output text, and is highly dependent on the stability of speech speed and fluency.
For example, the host usually speaks very fast in e-sports team battles or on the eve of a goal (for example, Hua Shao can speak 215 words in 18 seconds at the fastest) , or there are frequent sentence breaks due to thinking, which will seriously affect the "performance" of streaming ASR technology.
In this case, if the results of streaming ASR speech recognition are not processed and directly output, the subtitles will be blank, pause frequently, or be output in large bursts.
In order to make the subtitle stream more stable (capable of outputting whole paragraphs and sentences) and more accurate, Bilibili chose to delay the entire barrier-free live broadcast room appropriately when using iFlytek's audio technology for streaming ASR recognition (millisecond delay) to ensure reading fluency. The main operations were as follows:
First, Bilibili has specially sorted out more than 500 proprietary terms related to the League of Legends event, including names of teams, players, regions, game heroes, competition terms, commentary-related terms, famous quotes from the World Championship, etc., and connected these terms to the iFlytek Tingting server to optimize the translation results;
Second, in the case of unstable speech speed, in the text processing part, Bilibili will automatically wrap the text output by streaming ASR according to reading habits to make it more in line with the user's visual understanding;
Third, for the overall reading experience, Bilibili has specially developed an auxiliary software to further streamline the manual review operation and further improve the accuracy of subtitles for the hearing-impaired...
This also explains why there is a slight delay in Bilibili's accessible live broadcast room: it improves the subtitle reading experience.
More than just real-time speech recognition technology
In fact, this barrier-free e-sports live broadcast room is more than just a voice technology service.
For example, sign language live broadcast Want to know more?
Bilibili invited Han Qingquan, a representative of sign language translation, to lead a professional translation team to provide sign language assistance. They will provide real-time sign language translation during the game results broadcast and post-game interviews.
In addition, on every match day, the live broadcast room also launched a viewing assistant, that is, sign language teaching content. Popular terms such as super god, first blood, last hit, mid-lane, support, economy , etc., which netizens are very curious about, have already appeared one by one.
Many people believe that the reason why the sign language assistive function was launched is because voice-to-subtitle conversion can lead to errors, and sign language can help with understanding.
In fact, there are deeper reasons.
Teacher Han Qingquan explained that for friends who know sign language, if there are only two communication methods to choose from, text and sign language, they will definitely choose sign language without hesitation, because communicating with sign language gives them a strong sense of immersion. In this way, hearing-impaired friends can strongly feel that so many people are paying attention to them.
As for the question that everyone is wondering, "Why isn't the entire event live broadcasted in sign language?" In fact, sign language also has dialects . For global game events like League of Legends, if we want to live broadcast the entire event in sign language, we need to establish a new set of terms.
Although the existing real-time voice subtitles cannot be 100% accurate, they can already meet the understanding needs of most hearing-impaired people.
The second national sample survey of people with disabilities in 2006 showed that the number of people with hearing impairments in China reached 27.8 million; and according to the Beijing Hearing Association's estimate in 2017, the number of hearing-impaired people in China has reached 72 million , and this number continues to grow.
Now, in order to take care of the viewing experience of some of these event enthusiasts, Bilibili has specially opened a barrier-free live broadcast room, which has attracted many netizens to give thumbs up:
The pattern is bigger.
I am really blown away by the barrier-free experience this time! Although people with disabilities are a minority, they are also eligible to enjoy everything .
Technology itself should be accessible
Looking at the development history of the entire gaming industry, Bilibili's concern and care for people with disabilities is not unique.
The most well-known is the Xbox Adaptive Controller launched by Microsoft in September 2018 .
The 30cm long handle features two large programmable buttons and 19 sockets for connecting to a range of joysticks, buttons and switches.
Even though some players were dissatisfied with the price, because this controller cost US$99
(about RMB 700)
, which was US$40 more than ordinary controllers, it caused quite a stir and received quite a bit of praise in the industry at the time.
△
B station famous game area UP @-鸦-karas
That year, the product was named one of the 50 greatest inventions by Time magazine and won the Italian Video Game Award for Innovation.
Hardware breakthroughs are eye-catching, but software support is equally important.
The other two of the three major game manufacturers, Sony and Nintendo , have also made a lot of efforts in improving software and hardware in recent years.
When Sony was designing the PS4, it made many hardware optimizations and auxiliary functions for players with disabilities.
For example, buttons (such as handles) can be reprogrammed, and functions such as text-to-speech (TTS) and text amplifier are all designed for people with physical disabilities, visual impairments, etc.
In some games that require QTE (quick response) , players can modify the function of the controller buttons, replacing repeated taps with long presses to achieve the effect of continuous key presses.
Nintendo's product series also include corresponding auxiliary functions, including tactile and auditory feedback, grayscale display, motion control, inverted color and other gameplay, which broaden the range of games that can be played for players with disabilities.
△
This is how a color-blind person sees Mario (right)
Last year, The Game Awards (TGA) , the Oscars of the gaming industry , also set up a Best Accessibility Innovation Award to encourage game developers to serve disabled players.
In addition to updating the accessibility features in products, more technology companies are now beginning to pay attention to the research and development of accessibility-related technologies.
For example, eye tracking technology.
For example, Tobii's eye tracker allows players to control digital interfaces by moving their eyes, and more products have already supported this technology. Another example is Tribe Game's "Super Point" action game, where players can control the entire game through eye tracking technology:
There are also many companies betting on brain-computer interface technology.
Accessible gaming is one of the core application scenarios of brain-computer interfaces. It allows people to complete mechanical control, text input and other operations only through consciousness.
Many technology companies have invested in research on this technology, including Steam. Gabe Newell said that Steam is working with the open source brain-computer interface platform OpenBCI to jointly develop an open source brain-computer interface software project.
It is obvious that more technology companies and platforms are paying attention to the issue of "barrier-free" experience of cultural and entertainment services for people with disabilities.
And this group should not be ignored.
In the past, most of us only saw on TV news that companies and organizations were concerned about their real livelihoods and basic needs, but if you think about it carefully, the spiritual needs of these hearing-impaired friends are also an important part of their lives.
Fortunately, such needs are receiving more and more attention.
In addition to the most direct impact - benefiting people with disabilities, the move towards "barrier-free" technology itself has more additional value.
For them, technicians are becoming the "literacy monks" in the intelligent era .
Today, digital and intelligent services bring convenience to most people, but there is always a group of "aphasics" and "outsiders" who are blocked from technology .
They may be people with varying degrees of disabilities, elderly people with limited mobility, or minority groups who cannot enjoy the benefits of technology for special reasons.
But who will take on the responsibility of being a “literate monk” and bring the benefits of technology to more groups?
As mentioned earlier, it is those who originally changed all this. The so-called " barrier-free" scenario is the training ground for technicians.
How to train troops? In which direction should we train them?
This is inseparable from the core secret of the Literacy Monk - the company's "people-oriented" values.
In a sense, this value is even a key link to the ultimate realization.
Even if the output product has smaller functions and the technology is not so cutting-edge, as long as it can be fully utilized, the value it brings will be more long-term.
This time, Bilibili is focusing on the hearing-impaired group, next time it may be the visually impaired group, and the next time it may be the elderly users... If you think about it, isn’t a platform ultimately made up of so many niche users ?
After all, technology itself should be accessible.
If one day, there are no more "aphasics" or "outsiders" on earth, the ultimate meaning of technological accessibility will be truly realized.
-over-
This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.
List collection! 6 major awards for top AI companies
Registration for the "2021 China Artificial Intelligence Annual Selection" is open! This selection will look for outstanding AI companies from three dimensions: company, person, and product. Welcome to scan the QR code to register and participate. The selection will be announced in December. We look forward to witnessing the honor of these outstanding companies with millions of practitioners!
Click the link to view the selection details: 2021 China Artificial Intelligence Annual Selection Starts: Let more people see the true value of AI
click here