July 17 news, non-profit news studio ProofNews published a blog post yesterday (July 16), stating that large technology companies including Apple, Nvidia, Salesforce and Anthrophic all used video resources from YouTube when training their AI models.
The report said that these technology companies used a dataset called YouTube Subtitles, which is 5.7GB (489 million words) in size, to train their AI models.
The dataset was created by EleutherAI and was first released in 2020. It involves subtitle content of 173,536 YouTube videos from more than 48,000 channels, including subtitle content of more than 12,000 videos that have been deleted by the platform.
The YouTube Subtitles dataset mainly collects resources from popular YouTube channels. IT Home attaches the relevant information as follows:
-
MrBeast (289 million subscribers, 2 videos for training)
-
Marques Brownlee (19 million subscribers, 7 videos)
-
Jacksepticeye (nearly 31 million subscribers, 377 videos)
-
PewDiePie (111 million subscribers, 337 videos)
The YouTube Subtitles dataset is part of a collection of datasets called "The Pile," which includes several other training datasets. Most of The Pile datasets are open to anyone with enough space and computing power.
Previous article:Artificial intelligence lie detection technology is available: better than humans, but should be used with caution
Next article:Build an AI security defense line, Google, Microsoft, Nvidia and other 14 companies form a secure AI alliance
- Popular Resources
- Popular amplifiers
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Detailed explanation of intelligent car body perception system
- How to solve the problem that the servo drive is not enabled
- Why does the servo drive not power on?
- What point should I connect to when the servo is turned on?
- How to turn on the internal enable of Panasonic servo drive?
- What is the rigidity setting of Panasonic servo drive?
- How to change the inertia ratio of Panasonic servo drive
- What is the inertia ratio of the servo motor?
- Is it better for the motor to have a large or small moment of inertia?
- What is the difference between low inertia and high inertia of servo motors?
- How to import images into DDR in CCSv5
- [Qinheng RISC-V core CH582] mesh lighting
- Narrowband Internet of Things (NB-IOT) Standards and Key Technologies - Scanned Version - Bookmarked
- Espressif launches ESP32-S3
- PID control algorithm implementation in C language
- MSP430F149 and PC serial port communication problem
- machine.ADC added in stm32
- Teachers, please give me some advice, what is the role of the capacitor connected in parallel with the MOS tube in the power supply circuit in the figure?
- Detailed explanation of the 3+2 certification scheme based on 5G terminal testing and certification
- GD32E231 DIY Contest (6)——I2C driver AT24C02 completed