How are programming languages implemented?

Latest update time：2024-11-07

Reads：

Hello everyone, I’m Liang Xu. As programmers, we are often asked how this is implemented and how that is implemented, but do you know how the commonly used programming languages are implemented?

Let’s talk about this issue today.

Smart humans discovered that combining simple switches can express complex Boolean logic, and built the CPU on this basis. Therefore, the CPU can only simply understand switches, which are expressed in numbers as 0 and 1.

Genesis: Smart Fool

The CPU is quite primitive, just like a single-celled organism, it can only move data from one place to another and simply add it, without any difficult actions. Although these operations seem simple and stupid, the CPU has an incomparable advantage, which is one word: fast. This is something that humans cannot compare with. After the emergence of CPU, humans began to have a second brain.

This is how a primitive species began to dominate another species called programmers.

The one doing the work is the uncle

Generally speaking, if two different species want to communicate, such as humans and birds, there are two ways: either the birds speak human language so that humans can understand; or humans speak bird language so that birds can understand; it depends on who is more powerful.

At first, the CPU won, and programmers began to speak bird language and carefully feel the dominance of the CPU so that the CPU could work. Let's feel how the programmers spoke bird language at the beginning:

Programmers write instructions directly with 0s and 1s according to the CPU's instructions. You read that right, this crap is the code, it is so primitive, and then put it on punched paper tape and input it into the CPU, and the CPU starts working. At this time, the program can really be seen and touched, but it is a bit of a waste of paper.

At this time, the programmer must write code from the perspective of the CPU, and the style is like this:

1101101010011010100100110010100111001000110111101011101101010010

At first glance, do you know what this means? You don’t, and you think, “What the hell is this?” But the CPU knows, and thinks, “This is simply the most beautiful language in the world.”

A great mission from heaven

Finally one day, programmers were fed up with speaking bird language. After all, they are primates, and it is too embarrassing to chatter in bird language. You are entrusted with an important task: to make programmers speak human language.

Instead of working hard, you carefully studied the CPU and found that the instruction set executed by the CPU only has a few instructions, such as addition instructions, jump instructions, etc. Therefore, you made a simple mapping between machine instructions and corresponding specific operations, mapping machine instructions to words that humans can understand, so that the above 01 string becomes:

sub $8, %rspmov $.LC0, %edicall putsmov $0, %eax

In this way, programmers do not have to remember 1011... mechanically, but only need to remember words such as ADD SUB MUL DIV that humans can recognize.

Assembly language was born in this way, and for the first time, something that humans could understand appeared in programming languages.

At this time, programmers finally no longer have to "chirp...", but instead upgrade to "ababa aba...". Although humans recognize the words "ababa aba", there is still a big difference in form between this and human language.

Detail vs. Abstract

Although assembly language has words that humans can understand, assembly language, like machine language, is a low-level language.

The so-called low-level language means that you need to take care of all the details.

What details do we care about? As we said, the CPU is a very primitive thing, it only knows how to move data from one place to another, and then move it from one place to another with simple operations.

Therefore, if you want to program in a low-level language, you need to use multiple simple instructions such as "move data from one place to another, do some simple operations and then move it from one place to another" to implement complex problems such as sorting.

Some students may not be deeply touched by this. It's like, you originally wanted to express "go get me a glass of water":

If you use a low-level language like assembly, you have to implement it like this:

I think you've got it.

Bridging the gap

The CPU is really too simple, so simple that it cannot understand anything slightly abstract such as "bring me a glass of water". However, humans are naturally accustomed to abstract expressions. Is there any way to make up for the gap between humans and machines?

In other words, is there a way to automatically convert human abstract expressions into specific implementations that the CPU can understand? This can obviously greatly enhance the productivity of programmers. Now, this problem needs you to solve.

Routines, all routines

After much thought, you still don’t know how to automatically convert human abstractions into concrete implementations that the CPU can understand. Just when you are about to give up, you take another look at the details that the CPU can understand:

In a flash of inspiration, you discover a lot of routines, or patterns.

Most of the time the instructions executed by the CPU are straightforward, like this:

These are all instructions that tell the CPU to complete a specific action. You give these straightforward instructions a name, let's call them statements.

In addition, you also found such a routine, that is, you need to decide which instruction to follow based on a certain state. This routine is seen by humans as "if... then... else... then...":

if ***  blablablaelse ***  blablabla

In some cases, you need to repeat some instructions over and over again, and this routine seems to be going in circles:

while ***  blablabla

Finally, there are a lot of instructions that look similar, like this:

These instructions are repeated, but they differ in some details. Extract these differences, package the remaining instructions together, and use a code to specify these instructions. This needs a name, let's call it a function:

func abc:  blablabla

Now you have discovered all the routines:

// 条件转移if ***  blablablaelse ***  blablabla
// 循环while ***blablabla
// 函数func abc:  blablabla

These are a qualitative leap compared to assembly language because they are very close to human language.

You then find yourself facing two problems:

What is the blablabla here?
How to convert the above human-readable string into machine instructions that the CPU can understand

Inception

You remember that I said above that most of the code is a straightforward statement. Is the blablabla here just a bunch of statements?

Obviously not. Blablabla can be a declarative sentence, of course, it can also be a conditional transfer if else, a while loop, or a function call. This is reasonable.

While this makes sense, you soon discover another serious problem:

blabalbla can contain if else statements, and if else statements can contain blablabla, and blablabla may contain if else statements, and if else statements may contain b lablabla, and blablabla may contain if else statements again. . .

Just like Inception, there is a dream within a dream, a dream within a dream, a dream within a dream within a dream... one layer within another, and the generations to come will be endless...

At this point you clearly feel that you don’t have enough brain cells. This is too complicated. Despair begins to consume you. God and heaven, someone come and save me!

At this time, your high school teacher comes over, pats you on the shoulder, and hands you a high school math textbook. You get angry and ask, "Why did you give me this broken thing? The problem I'm thinking about is so profound that it can't be solved by a broken high school math textbook." You grab it and throw it on the ground.

At this time, a gust of wind blew, and the textbook stopped at a page with a number column like this:

f(x) = f(x-1) + f(x-2)

What does this recursive formula express? The value of f(x) depends on f(x-1), which in turn depends on f(x-2), which in turn depends on...

One layer is nested within another layer, a dream within a dream, if can be nested within statement, and statement can be nested within if...

Wait a minute, isn’t this recursion? The seemingly endless nesting above can also be expressed recursively!

Your math teacher laughed out loud, saying "too young, too simple", leaving you ashamed and walking away. Something that seemed so high-tech could be solved with high school math. You were so shocked that you didn't know what to do and felt ashamed.

With the help of the concept of recursion, smart IQ begins to take over again.

Recursion: The Essence of Code

Isn't it just nesting, one layer within another? Recursion is born to express this kind of thing (hint: the expression here is not complete, the real programming language is not that simple):

if : if bool statement else statementfor: while bool statementstatement: if | for | statement

It turns out that the Inception layered upon layer can be expressed in such concise sentences. You have given these sentences high-end names and grammar.

Mathematics can make everything so elegant.

All the codes in the world, no matter how complex, can ultimately be attributed to grammar. The reason is very simple. All codes are written in the form of grammar.

At this point, you have invented a real programming language that can be understood by humans.

The first problem mentioned above has been solved, but language alone is not enough.

Making computers understand recursion

Now there is still one problem left: how can we finally convert this language into machine instructions that the CPU can understand?

Humans can write codes according to grammar, and these codes are actually a string of characters. How can we make computers recognize a string of characters expressed using recursive grammar?

This is a matter concerning the fate of mankind, and you can't help but feel a great sense of responsibility, but this last step seems to be fraught with difficulties, and you can't help but sigh to the sky, thinking that computers are too difficult.

At this time, your junior high school teacher came over, patted you on the shoulder, and handed you a junior high school botany textbook. You became angry and asked, "Why did you give me this broken thing? The questions I am thinking about are so profound that they cannot be solved by a broken junior high school textbook." You grabbed it and threw it on the ground.

At this moment, another evil wind blows by, and the book is turned to the chapter introducing trees. You stare at this page in a daze:

Under the trunk are branches, under the branches are leaves, under the branches can also be branches, under the branches can also be branches, eat grapes without spitting out the grape skins, don't eat grapes but spit out the grape skins, eh? This sentence is wrong, back to the above sentence, the trunk gives birth to branches, branches can also give birth to branches, layer by layer, a dream within a dream, endless descendants, high school math teacher, wait a minute, this is also recursion!!! We can use a tree to represent the code written according to recursive grammar!

Your junior high school teacher laughed out loud, thinking it was too naive to think that something that seemed so high-tech could be solved with just junior high school knowledge.

Excellent translator

When a computer processes a programming language, it can organize the code in the form of a tree according to the recursive definition. Since this tree is generated according to the grammar, let's call it a syntax tree.

Now the code is represented in the form of a tree. After careful observation, you will find that the expression of the leaf node is actually very simple and can be easily translated into the corresponding machine instructions. As long as the leaf node is translated into machine instructions, you can apply this result to the parent node of the leaf node. The parent node can refer to the translation result to the parent node of the parent node, passing it upward layer by layer, and finally the entire tree can be translated into specific machine instructions.

The program that completes this task also needs a name. According to the " incomprehensible principle ", you gave this translation-like program a not-so-sounding name, compiler.

Do you still think that data structures such as binary trees are useless?

At this point, you have completed an amazing invention. Programmers can write code using things that humans can understand, and a program called a compiler that you wrote is responsible for translating it into machine instructions that the CPU can understand.

Later generations built C/C++, and later Java and Python based on your ideas, and these languages are still used by a group of people today.

Autumn The recruitment has already begun. If you are not well prepared, Autumn It's hard to find a good job.

Here is a big employment gift package for everyone. You can prepare for the spring recruitment and find a good job!