The economics behind ChatGPT-EEWORLD

Collect

Can ChatGPT replace traditional search engines like Google and Baidu? Why can't China make ChatGPT soon? Currently, most discussions on these issues are limited to the technical feasibility of large language models (LLMs), ignoring or very roughly estimating the economic costs behind achieving these goals, resulting in a misjudgment of the development and application of LLMs that deviates from reality.

Starting from the perspective of economics, the author of this article derives in detail the cost of searching for ChatGPT-like models, training GPT-3, and a general framework for drawing LLM cost trajectories, providing a valuable reference perspective for exploring the LLM cost structure and its future development.

Key points at a glance:

LLM-driven search is already economically viable: a rough estimate is that the cost of high-performance LLM-driven search is about 15% of current estimated advertising revenue/query based on the existing search cost structure.

But economically feasible does not mean economically justified: the unit economics of LLM-driven search are profitable, but for an existing search engine with over $100 billion in search revenue, adding this functionality could mean over $10 billion in additional costs.

Other emerging LLM-driven businesses are highly profitable: for example, Jasper uses LLM to generate copy, and is likely to have a gross margin similar to that of a SaaS service (over 75%).

Training an LLM (even from scratch) isn’t expensive for large companies: today it only costs about $1.4 million to train GPT-3 in a public cloud, and even a state-of-the-art model like PaLM only costs about $11.2 million.

The cost of LLMs is likely to drop significantly: in the two and a half years since GPT-3 was released, the cost of training and inference for models with comparable performance to GPT-3 has dropped by about 80%.

Data is the new bottleneck for LLM performance: increasing the number of model parameters has increasingly smaller marginal benefits compared to increasing the size of high-quality training datasets.

1 Motivation

The amazing performance of LLM has triggered widespread speculation, mainly including the emerging business models that LLM may trigger and the impact on existing models.

Search is an interesting opportunity, with Google alone generating over $100 billion in revenue from search-related advertising in 2021 [1]. The virality of ChatGPT (a chat that uses LLM to generate high-quality answers to search-like queries) has sparked many thoughts on the potential impact on search, one of which is the economic viability of LLM today:

Someone claiming to be a Google employee said on HackerNews that in order to implement LLM-powered search, its cost would need to be reduced by 10 times.

Meanwhile, Microsoft is expected to launch an LLM version of Bing in March［3］, and search startups such as You.com have already embedded the technology into their products［4］.

Recently, The New York Times reported that Google will launch a search engine with chatbot functionality this year［5］.

The broader question is: How economically feasible is it to incorporate LLM into current and new products? In this article, we examine the cost structure of LLM today and analyze its likely future development.

2 Review of LLM

Although the subsequent chapters are more technical, this article does not require any familiarity, so even those who are not familiar with this area can read it with confidence. To illustrate the special features of LLM, a brief review is given below.

The language model predicts the possible output tokens given the context:

Illustration of the input context and output of an Autoregressive Language Model (in practice, tokens are often subwords: i.e., "happy" might be broken down into two tokens, such as "hap", "-py")

To generate text, the language model repeatedly samples new tokens based on the probability of outputting a token. For example, in a service like ChatGPT, the model starts with an initial prompt that takes the user's query as context and generates tokens to build a response. After a new token is generated, it is appended to the context window to prompt the next iteration.

Language models have been around for decades. The current LLM performance is driven by efficient deep neural networks with billions of parameters. Parameters are the matrix weights used for training and prediction, and the number of floating point operations (FLOPS) is usually proportional to the number of parameters (paeter count). These operations are calculated on processors optimized for matrix operations, such as TPUs and other special-purpose ones.

As the number of LLM parameters grows exponentially, these operations require more computational resources, which is a potential reason for the increase in LLM cost.

3 Cost of LLM-driven search

In this section, we estimate the cost of running an LLM-driven search engine. How such a search engine should be implemented is still an active area of research, and we consider two main approaches to estimate the cost range for providing such a service:

ChatGPT Equivalent: An LLM trained on a large training dataset stores knowledge learned during training in the model parameters. During inference (using the model to generate output), the LLM cannot access external knowledge [6].

This approach has two major disadvantages:

It is easy to "fantasize" about facts.

Model knowledge is lagged and only contains information available before the last training date.

2-Stage Search Summarizer: An architecturally similar LLM that can access traditional search engines such as Google or Bing at inference time. In the first stage of this approach, we run the query through the search engine to retrieve the top K results. In the second stage, each result is run through the LLM to generate K responses, and the model returns the highest-scoring response to the user [7].

Compared with ChatGPT Equivalent, the advantages of this method are:

Ability to cite sources from retrieved search results.

Can obtain information.

However, for LLMs with the same number of parameters, this approach requires higher computational cost. The cost of using this approach also increases the existing cost of the search engine, since we add LLMs to the results of the existing search engine.

First-order approximation: Basic model A

The most direct way to estimate the cost is to refer to the price of the existing basic model APIs in the market. The pricing of these services includes a premium part of the cost, which is the source of profit for the supplier. A representative service is OpenAI, which provides a text generation service based on LLM.

OpenAI’s Davinci API is powered by a 175 billion parameter version of GPT-3, the same number of parameters as the GPT-3.5 model that powers ChatGPT [8]. The price of inference with this model is currently about $0.02 per 750 words ($0.02 per 1,000 tokens, where 1,000 tokens is approximately equal to 750 words); the total number of words used to calculate pricing includes both inputs and outputs [9].

Base Model API Pricing by Model Capabilities (OpenAI)

We make some simple assumptions here to estimate the fees that would be paid to OpenAI for its search service:

In the ChatGPT equivalent implementation, we assume that the service generates an average of 400 words of response to a 50-word prompt. To produce higher quality results, we also assume that the model samples 5 responses per query and selects the best response. Therefore:

In the 2-Stage Search Summarizer implementation, the response generation process is similar. However:

The hint is significantly longer because it includes relevant parts of both the query and the search results.

Generate a separate LLM response for each K search results

Assume K = 10 and each relevant section in the search results is 1000 words on average:

Assuming an optimized cache hit rate of 30% (a lower bound on Google’s historical search cache hit rate [10]) and a gross margin of 75% for OpenAI’s cloud service (consistent with typical SaaS services), our first-order estimate implies:

By order of magnitude, the estimated cost of the ChatGPT Equivalent service is $0.010 per session, which is consistent with public comments:

OpenAI CEO Sam Altman on the cost per chat of ChatGPT (Twitter)

Given the above shortcomings of ChatGPT Equivalent (i.e., fantasy facts, outdated model information), in actual operation, LLM-driven search engines are more likely to deploy the 2-Stage Search Summarizer variant.

In 2012, Google's search director said that its search engine processed 100 billion searches per month. [11] World Bank data shows that global Internet penetration has risen from 34% in 2012 to 60% in 2020. [12] Assuming that the search volume grows proportionally, it is estimated that the annual average search volume will reach 2.1 trillion times, and the search-related revenue will reach approximately US$100 billion. [13] The average revenue per search is US$0.048.

In other words, the 2-Stage Search Summarizer's query cost is $0.066 per query, which is about 1.4 times the revenue per query of $0.048.

The estimated cost is reduced to about 1/4 of the original through the following optimizations: 1. Quantization (using lower precision data types) 2. Knowledge distillation (training a smaller model by learning from a larger model) 3. Training a smaller "compute-optimized" model that has the same performance (this will be discussed in more detail later)

Assuming that cloud computing has a gross profit margin of about 50%, running your own (in-house) infrastructure would reduce costs to 1/2 of the current level compared to relying on a cloud service provider.

After combining the above improvements and reducing the cost to 1/8 of the original cost, the cost of incorporating high-performance LLM into search accounts for about 15% of the current query revenue (excluding the existing infrastructure costs). (Note: The cost can be reduced to as low as $0.066 per query * 1/4 * 1/2, agreed to be $0.008, so it accounts for about 15% of the $0.048 per query revenue)