AI Basics for Beginners
Latest update time:2024-08-01
Reads:
█What
exactly is AI?
AI is the abbreviation of artificial intelligence.
Artificial, many students only know half of the word, and think it is an adjective of art. In fact, artificial means "man-made", which is the antonym of natural.
Intelligence, this is not easy to mistake, it means "intelligence". The name of Intel is based on the first five letters of this word.
Taken together, AI means "artificial, man-made intelligence", which creates intelligence through artificial means.
There are many definitions of AI in the industry. A more academic one is as follows:
AI
is a comprehensive science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligent behavior.
This definition is difficult to understand and gives me a headache.
In fact,
we can break down AI.
First of all, the essential attribute of AI is that it is a
science
and a technical field.
It involves knowledge from many disciplines such as computer science, mathematics, statistics, philosophy, psychology, etc., but overall, it is classified under
computer science
.
Secondly, the purpose of AI research is to make a "system
"
intelligent.
This
"system"
can be a software program, a computer, or even a robot.
Third, what level is considered true intelligence?
This is the crux of the matter. At present, it seems that being able to perceive, understand, think, judge, and make decisions like humans is the realization of artificial intelligence.
With the help of physical carriers such as robots and robotic arms, AI can also achieve mobility.
Combining the above three points, it is easier to understand the definition of AI.
█
What is the difference between AI and ordinary computers?
AI is still based on computer basics, using semiconductor chip technology (so it is often referred to as "silicon-based"), as well as some computer systems and platforms.
So, what is the difference between it and traditional computer programs?
A traditional computer program is a collection of rules. Programmers tell the computer the rules through code, and the computer judges and processes the input data based on the rules.
For example, the classic “if...else...” statement -
“If you are over 65, retire. Otherwise, continue to work.”
The computer program will then judge and process all input age data based on this rule.
However, in real life, many factors (such as images and sounds) are extremely complex and diverse, and it is difficult for us to give fixed rules for computers to make high-accuracy judgments and processing.
For example, determining whether a dog is a dog.
There are many breeds of dogs, each with different colors, body shapes, and facial features. Dogs also have different expressions and postures at different times. Dogs are also in different backgrounds.
Therefore, the images of dogs captured by the computer through the camera are endless. It is difficult to help the computer make judgments through a limited number of rules.
If we want computers to achieve human-like intelligence, we cannot use simple rule-driven methods. Instead, we should teach them like we teach children, continuously inputting data and answers, allowing them to summarize features on their own and form their own judgment rules.
In other words, in classical programming, people input rules (i.e. programs) and data, and the system outputs answers.
The AI calculation process is divided into two steps:
In the first step, the input is data and the expected answer, and the system outputs rules.
The second step is to apply the output rules to new data and then output the answer.
The first step is what we call “training
”
. The second step is the real “work
”
.
This is a typical difference between traditional computing programs and current mainstream AI technology. (Note that I am talking about "current mainstream AI". There are some "historical AI" and "non-mainstream AI
"
that play differently. They cannot be generalized.)
AI
, what are the categories?
As mentioned earlier, artificial intelligence is a very large scientific field.
Since its official birth in the 1950s, many scientists have conducted extensive research on artificial intelligence and produced many remarkable results.
These studies are divided into many schools according to their different directions of thinking. The more representative ones are symbolism, connectionism, and behaviorism.
There is no right or wrong in these schools, and there is some cross-integration between them.
In the early days (1960-1990), symbolism (represented by expert systems and knowledge graphs) was the mainstream. Later, starting in 1980, connectionism (represented by neural networks) emerged and has been the mainstream until now.
In the future, perhaps new technologies will emerge and new schools of thought will be formed.
In addition to direction and route, we can also classify AI based on aspects such as intelligence level and application areas.
According to the
level
of
intelligence, it can be divided into
:
Weak
AI
,
Strong
AI
,
and
Super
AI
.
Weak AI is specialized in a single task or a group of related tasks and lacks general intelligence capabilities. We are currently at this stage.
Strong AI is more powerful, with a certain level of general intelligence, able to understand, learn and apply to a variety of tasks. This is still in the theoretical and research stage and has not yet been implemented.
Super artificial intelligence is of course the strongest. It exceeds human intelligence in almost all aspects, including creativity, social skills, etc. Super artificial intelligence is the ultimate form of the future, and we assume that it can be achieved.
We will talk about the classification of AI by application field later.
What
is machine learning?
In fact, when we introduced rule summary earlier, we actually mentioned machine learning.
The core idea of machine learning is to build a model that can learn from data and use this model to make predictions or decisions.
Machine learning is not a specific model or algorithm. It includes many types, such as:
Supervised learning
:
The algorithm learns from a labeled dataset, i.e. each training example has a known outcome
.
Unsupervised learning
:
The algorithm learns from a dataset that has no labels
.
Semi-supervised learning
: combines a small amount of labeled data with a large amount of unlabeled data for training.
Reinforcement learning
: learning through trial and error which behaviors lead to rewards and which behaviors lead to penalties.
What
is deep learning?
Deep learning, specifically, is deep neural network learning.
Deep learning is an important branch of machine learning. There is a "neural network" route under machine learning, and deep learning is an enhanced version of "neural network" learning.
Neural networks are the representative of connectionism. As the name suggests, this approach is to imitate the working principle of the human brain, establish a connection model between neurons, and thus realize artificial neural computing.
The so-called "depth" in deep learning refers to
the number of
"hidden layers"
in the neural network
.
Classic machine learning algorithms use neural networks that have an input layer, one or two
"hidden layers,"
and an output layer.
Deep learning algorithms use many more
“hidden layers”
(hundreds). This makes them more powerful, allowing neural networks to do more difficult tasks.
The relationship between machine learning, neural networks, and deep learning can be seen in the following figure:
█What
are convolutional neural networks and recurrent neural networks?
Since the rise of neural networks in the 1980s, many models and algorithms have been formed. Different models and algorithms have their own characteristics and functions.
Convolutional Neural Network
(CNN
) and
Recurrent Neural Network (RNN
) are relatively well-known neural network models that were born around the 1990s.
Their specific working principles are relatively complicated. Anyway, remember:
A convolutional neural network (CNN) is a type of neural network used to process data with a grid-like structure, such as images and videos. Therefore, it is often used in computer vision and can be used for
image recognition and image classification
.
A recurrent neural network (RNN) is a type of neural network used to process sequence data, such as language models and time series prediction. Therefore, it is often used in
natural language processing and speech recognition
.
What
is a transformer?
Transformer is also a neural network model. It is younger than convolutional neural networks and recurrent neural networks (proposed by the Google research team in 2017) and more powerful.
As a non-professional, you don't need to study how it works, you just need to know:
1. It is a deep learning model;
2. It uses a
mechanism
called
self-attention
;
3. It effectively solves the bottleneck (limitation) problem of convolutional neural networks and recurrent neural networks;
4. It is very suitable for natural language processing (NLP) tasks. Compared with recurrent neural networks, its calculations can be highly parallelized, which simplifies the model architecture and greatly improves training efficiency;
5. It has also been extended to other fields such as computer vision and speech recognition.
6. The large models we often mention now are almost all
based on transformers.
There are many kinds of neural networks. I found a picture on the Internet for reference:
█What
is a big model?
AI has become popular in the past two years because of the popularity of big models. So, what are big models?
A large model
is
a machine learning model
with a large
parameter
scale and complex computational structure
.
Parameters are variables that are learned and adjusted during model training. Parameters define the model's behavior, performance, implementation cost, and computing resource requirements. In simple terms, parameters are the parts of the model that are used to make predictions or decisions.
Large models usually have millions to billions of parameters. Correspondingly, small models have fewer parameters. Small models are also sufficient for some niche fields or scenarios.
Large models need to rely on large amounts of data for training, which consumes a lot of computing resources.
There are many types of big models.
The big models we usually refer to are mainly language big models (trained with text data).
But in fact, there are also visual big models (trained with image data) and multimodal big models (both text and image).
The basic core structure of most large models is Transformer and its variants.
According to the application field, big models can be divided into general big models and industry big models.
The training data set of the general big model is more extensive and covers more comprehensive fields. As the name suggests, the training data of the industry big model comes from a specific industry and is applied to specialized fields (such as finance, medicine, law, and industry).
█
What is the essence of GPT?
GPT-1, GPT-2...GPT-4o, etc. are all large language models launched by the American company OpenAI, and are also based on the Transformer architecture.
The full name of GPT is Generative Pre.trained
Transformer
.
Generative means that the model can generate continuous and logical text content, such as completing conversations, creating stories, writing code, or writing poems and songs.
I would like to mention here that AIGC, which is often referred to now, is AI Generated Content, which can be text, images, audio, video, etc.
The GPT series is text-oriented, and Google has also launched a competing product, BERT.
Among them, the more representative ones are DALL E (also from OpenAI),
Midjourney (well-known) and Stable Diffusion (open source).
Vincent Audio (Music), including
Suno (OpenAI),
Stable Audio Open (
open sourced by Stability.ai
), and
Audiobox (Meta).
Videos can be generated from text, such as Sora (OpenAI),
Stable Video Diffusion
(
open sourced by Stability.ai
), and Soya (open source). Images can also generate videos, such as Tencent's
Follow-Your-Click.
AIGC is a definition of "application dimension
", not a specific technology or model. The emergence of AIGC has expanded the functions of AI,
broken the previous functional limitations of AI mainly used for identification
, and broadened the application scenarios.
Okay, let’s continue to explain the second letter of GPT -
Pre.trained.
Pre.trained means that the model will first be trained on a large-scale unlabeled text corpus to learn the statistical laws and potential structures of the language.
Through pre-training, the model has a certain degree of versatility. The larger the training data (such as web page text, news, etc.), the stronger the model's capabilities.
The surge of attention to AI mainly stems from the popularity of ChatGPT in early 2023.
The "chat" in ChatGPT means chatting.
ChatGPT is
an AI conversation application service developed by OpenAI based on the GPT model (it can also be understood as GPT-3.5).
Through this service, people can experience the power of the GPT model firsthand, which is conducive to the publicity and promotion of the technology.
It turns out that OpenAI's strategy was successful. ChatGPT fully attracted public attention and successfully promoted the development boom in the field of AI.
█
AI, what can it do?
The role of AI is extremely extensive.
In general, compared with traditional computer systems, AI can provide expanded capabilities, including
image recognition
,
speech recognition
,
natural language processing
,
embodied intelligence
, etc.
Image recognition, sometimes also classified as computer vision (
CV
),
enables computers to understand and process images and videos. Common examples include cameras, industrial quality inspection, and face recognition.
Speech recognition is to understand and process audio to obtain the information it carries. Common applications include mobile phone voice assistants, telephone call centers, voice-controlled smart homes, etc., and are mostly used in interactive scenarios.
Natural language processing, as mentioned earlier, is about enabling computers to understand and process natural language and know what we are saying. This is very popular and is mostly used in creative work, such as writing press releases, writing written materials, video production, game development, music creation, etc.
Embodied intelligence is the process of placing artificial intelligence in a physical form ("
body
") to gain and demonstrate intelligence through interaction with the environment.
Robots with AI belong to embodied intelligence.
The "Mobile ALOHA" launched by Stanford University at the beginning of the year
is a typical
household embodied robot. It can cook, make coffee and even play with cats, and it has become popular on the Internet.
It is worth mentioning that not all robots are humanoid robots, and not all robots use AI.
Humanoid Robot
AI is particularly good at processing massive amounts of data. On the one hand, it learns and trains through massive amounts of data, and on the other hand, it completes tasks that cannot be completed manually based on new massive amounts of data. In other words, it finds the potential patterns in massive amounts of data.
At present, the application of AI in various vertical industries in society is mainly based on the extension of the above capabilities.
Let’s take some common examples.
In the medical field,
AI can already be used to analyze X-rays, CT scans, MRI images, etc., to help identify abnormal areas and even make diagnostic judgments.
AI can also be used to identify cell mutations in tissue sections and assist pathologists in cancer screening and diagnosis of other diseases.
AI can also analyze the patient's genomic data to determine the most suitable treatment plan. AI can also help predict the trend of the disease based on the patient's medical history and physiological indicators.
In drug research and development, AI can help simulate the interactions of chemical components and shorten the new drug development cycle.
When a serious public health incident occurs, AI can analyze epidemic data and predict trends in disease spread.
In the financial field,
AI can monitor market trends in real time, identify potential market risks, and formulate corresponding risk hedging strategies.
AI can also
assess credit risk by analyzing borrowers’ credit records, income, consumption behavior and other multi-dimensional data. Of course, AI can also
provide the most appropriate investment portfolio recommendations based on investors’ personal financial situation, risk preferences and return goals.
There are countless similar examples. In almost all fields, including industrial manufacturing, education, culture and tourism, commercial retail, agriculture, forestry, animal husbandry and fishery, public safety, and government governance, AI has already had actual landing scenarios and cases.
AI is changing society and changing the work and life of each and every one of us.
█How
should we view AI?
The commercial and social value of AI is unquestionable, and its rise is unstoppable.
From the perspective of enterprises, AI can
automate repetitive and tedious tasks,
improve production efficiency and quality, and reduce production costs and labor costs.
This advantage is crucial for the manufacturing and service industries, directly affecting the competitiveness and even survival of enterprises.
From the government's perspective, AI can not only improve governance efficiency, but also bring new business models, products and services, and stimulate the economy.
Powerful AI is also a form of national competitiveness. In the science and technology game and national defense, if AI technology is not as good as others, it may bring serious consequences.
From a personal perspective, AI can help us complete some tasks and improve our quality of life.
From the perspective of all humanity, AI can also play an important role in disease treatment, disaster prediction, climate forecasting, and poverty eradication.
But everything has two sides. As a tool, AI has both advantages and disadvantages.
The most realistic disadvantage is that it may threaten a large number of human jobs and lead to mass unemployment.
According to McKinsey's research, between 2030 and 2060, about 50% of occupations may be gradually replaced by AI, especially for knowledge workers.
Image from The New Yorker magazine
In addition, AI is used to wage wars, commit fraud (imitating voices or changing faces to commit fraud), and infringe on citizens' rights (excessive collection of information and invasion of privacy).
If only a few companies have advanced AI technology, it may exacerbate social inequality. AI algorithmic bias may also lead to unfairness.
As AI becomes more and more powerful, people will become dependent on it and lose their ability to think independently and solve problems. AI's powerful creativity may cause humans to lose their motivation and confidence to create.
There are also a series of issues surrounding the development of AI, such as security (data leakage, system crash) and ethics.
We currently have no reliable solutions to all these problems. Therefore, we can only explore, think and solve them bit by bit in the process of developing AI. We must be vigilant and cautious about AI.
As ordinary people, the most realistic approach is to understand and learn it first. First, learn to use common AI tools and platforms to help improve work efficiency and improve the quality of life.
There is a saying:
"In the future, it is not AI that will eliminate you, but the people who master AI
." Instead of being anxious, it is better to face it bravely and embrace it actively, and take the initiative as soon as possible.
Well, that's all for today's article. For an ordinary person, knowing these AI common sense is the first step to embrace AI. At least when you talk to others about AI, you won't be confused.
Thank you for your patience in reading, see you next time!