Industry丨Scale AI is rising rapidly: Alexander Wang and his data factory
Focus: artificial intelligence, chips and other industries
Welcome all guests to follow and forward
In Showplace Plaza in San Francisco , a commercial building that once belonged to Airbnb recently changed hands.
As most technology companies are generally downsizing their businesses, Scale AI, an artificial intelligence data labeling company founded by Chinese people born after 1995, has resolutely leased approximately 180,000 square feet of office space in downtown San Francisco.
In recent years, Scale AI has shown strong development momentum. By 2021, its corporate valuation has climbed to approximately US$7.3 billion;
After completing a new round of US$1 billion in financing in May 2024, its valuation soared to an astonishing US$13.8 billion.
This round of Series F financing was led by Accel, a well-known Silicon Valley investment institution. In addition to existing investors such as YC and NVIDIA, it also attracted many new investors including Amazon, Meta, AMD, Qualcomm, Cisco, Intel, etc., with a total of 22 participating institutions.
Recently, Scale AI has achieved annual revenue of nearly $1 billion, a four-fold increase compared to the same period last year.
In contrast, OpenAI's annual revenue was only US$1.6 billion at the end of last year, and it is expected to exceed the US$3.5 billion mark this year.
The AI market is huge, but Scale AI only takes a small part
In the vast landscape of AI, data, algorithms and computing power are regarded as the three cornerstones.
If Nvidia is likened to the "shovel seller" in the computing power field, then companies such as Scale AI that focus on data labeling are undoubtedly the "shovel sellers" in the data field.
The research and development of large models depends deeply on the powerful computing power provided by NVIDIA , and the continuous progress of AI models cannot be separated from the support of high-quality, carefully labeled data.
There was a time when the metaphor of "data is the new oil" was quite popular. However, Alexander Wang has a unique view on this.
He believes that as a scarce resource, the value of oil is obvious; data is more abundant and diverse, and not all data are equivalent.
What is truly valuable is high-quality, differentiated data that has been carefully thought out and pieced together.
This concept has become the core concept of Scale AI development.
Alexander Wang proudly stated: [In the gold rush of generative AI, Scale AI plays the role of pickaxe and shovel. ]
While many companies are competing to tap into the gold mine in the field of AI, Scale AI has taken a unique approach and occupied a unique and important position in this fierce competition with its professional data labeling services.
During the booming development of generative AI, the three cornerstones of big models - data, algorithms, and computing power - have entered a new stage of development.
With the continuous evolution of Transformer-based algorithms and the substantial increase in computing power, data has become a key factor limiting the further development of large models.
Currently, large models have almost exhausted all the easily accessible data resources on the Internet. Without the continuous supply of high-quality data, large models may fall into the dilemma of performance stagnation.
Therefore, in the new era of AI, data assets are seen as valuable gold mines that need to be mined, and tool providers (i.e., shovel sellers) who work around data will have unprecedented development opportunities.
Giants are obsessed with big model training, Alexander Wang dropped out of school to start a business
Alexandr Wang was born on January 19, 1997, and his hometown is Los Alamos, New Mexico.
Both of his parents are Chinese immigrants, and both work as physicists at Los Alamos National Laboratory.
Wang has shown outstanding programming talent since high school, and was successfully admitted to the Massachusetts Institute of Technology at the age of 18, specializing in machine learning.
However, amidst the academic halo and broad prospects of MIT, he resolutely made an extremely challenging decision - to drop out and start a business.
In 2016, Wang and Lucy Guo co-founded ScaleAI, aiming to tackle a key challenge in the field of artificial intelligence - data labeling.
Wang is well aware of the importance of data to the success of AI models. He firmly believes that as the scale of the models continues to expand, the demand for data will also grow exponentially.
Therefore, his original intention in founding ScaleAI was to fundamentally solve the data problems in the field of artificial intelligence.
At that time, Scale AI's vision seemed to go against the mainstream trend of the industry. When everyone was committed to replacing manpower with artificial intelligence, Scale AI did the opposite and focused on using a large amount of manpower to complete tasks that artificial intelligence was still difficult to accomplish.
Although the data labeling business seems to have a low threshold on the surface, during the [AI silence period] around 2016, this field was almost in a state of market vacancy, with only a few large companies such as Google and Amazon having their own data labeling departments.
However, this is where Scale AI’s unique advantage comes in. Among the massive amounts of raw data collected by artificial intelligence companies, these data urgently need to be annotated with labels before being fed into the model.
However, most companies can only complete this arduous and complicated work manually.
The emergence of Scale AI provides these companies with a new solution.
It is particularly worth mentioning that Alexander Wang also keenly grasped the opportunity of the rise of autonomous driving.
He leads a team to accurately label the three-dimensional images generated by radars and sensors that self-driving cars rely on.
These labeled high-quality data will undoubtedly greatly improve the performance of the autonomous driving system and contribute to the development of autonomous driving technology.
Transform from outsourcing to focusing on providing large-scale data annotation
In its early stages of development, Scale AI's core business focused on providing data labeling outsourcing services to companies in the autonomous driving industry.
In 2018, Scale AI clearly stated the company's strategic goal, which is to [build a reliable, cost-effective, and scalable infrastructure to simplify and accelerate the process of building impressive applications].
This transformation indicates that Scale AI is no longer satisfied with its role as a traditional data labeling service provider, but is committed to developing into an application development platform with data labeling as its core competitiveness.
However, since 2022, driven by the Scaling Law theory, the parameter scale of large models has expanded rapidly and the demand for training data has increased dramatically.
Against this backdrop, Scale AI has actively adjusted its strategic direction, established partnerships with leading companies such as OpenAI, and gradually transformed itself into a professional service provider focused on providing large-scale data annotation.
In this process, Scale AI has built up powerful data labeling and governance capabilities, becoming a bridge connecting third-party big models and customer application scenarios.
Although they do not directly provide large model products, they are good at using customer private data to adapt and optimize mainstream large models to ensure their accurate application in specific scenarios.
In addition, Scale AI is also actively expanding the G-end market, and its business is rapidly penetrating into government departments.
In particular, the successful cooperation with government agencies such as the U.S. Department of Defense has not only brought considerable economic benefits to the company (such as a single contract of US$250 million in 2022), but also verified the company's application value in the national security and military fields.
Keenly identifying market opportunities and grasping several important turning points in AI
① In the early stages of Scale AI’s establishment, the company keenly realized the field’s need for large-scale and rigorous data labeling.
The advancement of autonomous driving technology is highly dependent on massive amounts of high-precision labeled data, including images of road scenes, pedestrians, and various objects. Automakers are in urgent need of tens of thousands of hours of video data for labeling in order to train and verify their algorithms.
Looking at the entire autonomous driving industry, more than 90% of data labeling work was still dominated by manual operations at that time.
Scale AI凭借高效的数据标注平台,以及创新的模型辅助标注与数据预处理技术,有效加速了数据处理流程,显著降低了标注的成本与时间消耗;
This attracted Waymo, Cruise and other high-profile companies at the time to become its customers, and consolidated its market position in the field of autonomous driving data labeling.
②After achieving initial success in the field of autonomous driving, Scale AI further expanded its business scope to the AIaaS (AI as a service) market.
Starting from a single data labeling service, the company has gradually built a full-chain solution covering data labeling and management, model training and evaluation, and AI application development and deployment.
③ Faced with the problem of data scarcity in some industries, Scale AI is also actively expanding downstream and entering the field of synthetic data generation.
By leveraging existing data resources to create entirely new data sets, the company has effectively assisted the model training process.
In the following years, Scale AI achieved rapid growth in the data field, and its customer base expanded to multiple fields including medical care, national defense, e-commerce, and government services.
In just over two years since the company was founded, its revenue has approached the $50 million mark.
④Scale AI also accurately captured the opportunity of the rise of generative AI.
As early as the GPT-2 era, the company worked with OpenAI to carry out the first collaborative experiment in reinforcement learning that incorporated human feedback, and subsequently expanded these advanced technologies to InstructGPT and other related fields.
Given the urgent need for generative AI models to have massive amounts of training data in order to improve the accuracy and diversity of the content they generate, the booming rise of large language models has greatly driven the industry’s desire for high-quality labeled data.
Scale AI provides solid data support for the development of generative AI by integrating services such as data annotation and data synthesis.
In addition, the company is also committed to providing enterprises with rapid generation services for customized APIs, effectively reducing the complexity and cost of enterprises training their own models.
For the field of generative AI, Scale AI has launched a complete set of platform services, including the developer tool platform Scale Spellbook, the synthetic data product Scale Synthetic, and the enterprise-level GenAI platform.
It aims to ensure that enterprises have sufficient data support in any scenario to promote model training and optimization.
Ending:
尽管 AI行业的蓬勃发展推动了Scale AI销售额的显著增长,但也加剧了行业内的激烈竞争。
Against this backdrop, Alexander Wang expressed concern about the disadvantages the company may face in attracting and retaining key talent.
It is worth noting that Scale AI's score on workplace evaluation platforms such as Glassdoor (3.5 points) is lower than that of industry peers such as OpenAI (4.3 points) and Figma (4.4 points), which undoubtedly poses a challenge to the company's brand image.
Some references: Chuangyebang: "Working as a miscellaneous worker for an AI company, a Chinese born after 1995 has raised its valuation to $13.8 billion", AI Technology Review: "How Alexander Wang built a data labeling kingdom with 240,000 digital nomads", Jiuhe Venture Capital: "The Revelation of Scale AI", Intelligent Hyperparameters: "Silicon Valley investors talk to the founder of Scale AI: Model competition has entered the third stage, and pure model leasing is not a good business", CITIC Securities Research: "Scale AI: From data labeling to the implementation of AI applications", New Wisdom: "27-year-old Chinese genius boy receives financing again, will data labeling be the next outlet?", AAIA Asia-Pacific Artificial Intelligence Association AIGC: "It took 8 years to push Scale AI's valuation from 0 to 13.8 billion, what will happen in the future?"
The articles and pictures published on this public account are from the Internet and are for communication purposes only. If there is any infringement, please contact us and reply. We will deal with it within 24 hours after receiving the information.
Recommended reading: