The first AI programmer caught in fraud, Devin once again "shocked" Silicon Valley! Attached is a detailed explanation of the skin-picking video text.
Baijiao House comes from Aofei Temple
Qubit | Official account QbitAI
The first AI programmer, the demonstration video was significantly fake? ? ?
Devin, who shocked Silicon Valley not long ago, shocked Silicon Valley again - but this time it was counterfeited.
Here's the thing: YouTube programmer blogger Internet of Bugs (hereinafter referred to as Bald Brother) analyzed Devin's video frame by frame , and provided evidence one by one to prove that Devin is not as magical as the demonstration.
There is even a cool operation of "write the bug yourself and then fix it on the spot".
Other "criminal evidence" include but are not limited to:
-
It claims to be able to solve any Upwork task, but the problem solved in the demonstration is not the one that the prompt wants to solve, so it is a useless effort;
-
It looks like bugs are being fixed, but in fact the bugs being fixed are ones that human programmers would never make;
-
Not realizing that two simple steps can solve the problem, and doing all the bells and whistles, you are actually complicating the task;
-
The level of code modification is hard to describe.
In addition, it took the bald man more than half an hour to complete the upwork task in Devin's demonstration video - and it may have taken Devin more than 6 hours to complete the task.
Ah, this, this, this, this, this, it’s really good, big, one, mouthful, melon!
You know, the company behind it, Cognition AI, has 10 IOI gold medals in its hands. It also announced the successful financing of US$21 million in the same month it launched Devin.
Twitter and YC are already abuzz, making this matter a hot topic.
May I ask? I really hate demos that are fake, making it look like the demo can easily achieve unexpected technological progress.
Others said that they were very hurt and would never believe in the various startup companies popping up again.
emmmm... I'll leave all my expectations to companies and institutions like OpenAI, Anthropic, DeepMind, and FAIR.
For complete details, read on below.
Verified frame by frame by practitioners with 35 years of experience
The bald brother who came out to speak out for justice this time has been engaged in the software industry for 35 years. He first stated his position: I am not against high technology, but I am indeed against excessive hype.
He himself often uses GitHub Copilot, ChatGPT, LIama2, and Stable Diffusion.
In fact, when Devin was first launched, he objected to the statement "the world's first AI software engineer".
This time, we mainly focus on some more specific statements.
For example, Devin previously claimed to be able to make money by handling upwork tasks. But Devin didn't do this in the real demo.
Do not believe? It doesn't matter, the bald guy is here with frame-by-frame evidence.
Summarized as follows:
-
The tasks Devin handles are not random, but carefully selected;
-
There is a big discrepancy with the actual needs of customers;
-
During the actual operation, I created bugs several times and then fixed them;
-
Many meaningless operations are equivalent to methods used in C language decades ago;
First, at 2.936 seconds into the demo video, there is a message in the upper left corner of the screen that they have searched for this content. Therefore, this is not a so-called "randomly" chosen task.
Let’s look at the specific needs given by customers. The real requirement is "I want to use this library to do inference. You need to provide detailed instructions. I don't want to discuss the estimated time required to complete this work."
But the requirement given to Devin is: I hope to use this model to perform reasoning in this library. Please figure it out yourself.
The Devin-generated report that appears at the end of the video does not mention what the customer actually needs.
So, what should the final deliverables of this job include?
But what did Devin actually do?
Devin's first real attempt was to modify a file called requirements.txt, which specifies the library versions that the code depends on. The video mentions it's updating code, but it's actually more like modifying a configuration file.
Then according to the needs, Devin needs to be able to build its own reasoning capabilities and only need to use sample data . But the actual project is much more complicated than this.
As a result, Devin quickly encountered the first command line error - failed to open the image, file not found, no such file or directory, etc. But it did not appear when the bald brother actually reappeared. As a result, research found that this file did not exist in the code warehouse at all.
This is equivalent to Devin creating a bug himself and then fixing the bug . In the following operations, Devin experienced this kind of "self-build and self-repair" many times.
I can't say it's very useful, I can only say it's completely unnecessary.
Next, let’s take a look at such a readme file in the code base. As the video demonstrates, the readme file clearly explains the function and usage of the file. There is even a small button on the right side of the page. Click it to copy the entire command, then paste it into the command line window and press Enter to run it.
But Devin didn't understand it at all, and created a project of his own. The code written to read data from the buffer is very bad.
So the bald brother issued a soul torture:
Isn't this the method that was used in C language and so on decades ago? ? ?
This approach is obviously outdated. Who in the right mind would write this code again if they use Python? This kind of code is difficult to debug, its logic is complex, difficult to understand, and it is prone to subtle errors.
Additionally, there was a real bug in the codebase that Devin neither found nor fixed.
Then the bald man searched on Google and modified the code according to a relevant comment on GitHub. It only took 1 minute and 7 seconds to solve the problem.
In the end, it took the bald guy a total of 35 minutes and 55 seconds to reproduce Devin's work. How long did it actually take Devin?
If you look closely at the video demo, you will find that there is a gap of 6 hours and 20 minutes before and after Devin's processing work .
The first part of the video shows a timestamp of 3:25 pm on March 9, but the second half shows a timestamp of 9:41 pm that day.
If you look closely frame by frame, you will find some strange and meaningless operations.
For example, the command head -N 5 results.json | tail -N 5 means to take the first five lines of this JSON file, and then take the last five lines of these lines.
The correct approach should be "head-5 results.json". That -N is redundant. Just say -5, no need for that extraneous stuff.
Finally, the bald man commented that a lot of the content generated by AI is very stupid and will make things more complicated.
When you look at its task list, you think: wow, Devin has done a lot. But that may not actually be the case.
Netizen: At least I have mastered the skill of looking busy
Regarding Devin’s fraud and overturn, many netizens sneered at the current hype of AI products.
I really hate how normalized demo fraud is now
There are even three examples of hype listed: Devin, rabbit, and Humane.
Some netizens also joked: Devin has at least mastered the technique of looking busy.
Um? Migrant workers are understood.
However, there are also some supportive netizens, such as Wharton School professor Ethan Mollick.
He claimed he had early access and found it really fun to experience it.
He believes that it is too early to regard Agent as "hype" and that Agent's capabilities will be very powerful in the next few months.
Known as “the world’s first fully autonomous AI software engineer”
Interestingly, the demo fraud incident broke out only one month after Cognition AI launched Devin.
Let’s review it together.
A month ago, on March 13, Cognition AI introduced its own Devin on Twitter and called it "the world's first AI software engineer . "
It handles an entire development project end-to-end with just one command.
According to the main creator, Devin has put a lot of effort into long-range reasoning and planning , and can plan and execute complex software engineering tasks that require thousands of decisions to complete.
Specifically, there are 6 major functions:
End-to-end build and deployment procedures can solve not only code problems, but also the entire workflow related to them;
-
Find and fix bugs independently;
-
Train and fine-tune your own AI models;
-
Repair open source libraries;
-
Contribute to mature production libraries;
-
Super learning ability, making up for shortcomings in knowledge and ability in real time.
Devin’s complete technical report shows that in the SWE-bench benchmark test, Devin can solve 13.86% of the problems without human assistance.
——This data may not seem high, but in fact it has exceeded the results of all previous large AI models.
GPT-4, currently one of the best, achieved only 1.74% in the same test, and must be equipped with a human to prompt it which files to process.
The Devin team at that time seemed not to be afraid.
Although it is not open for public testing, some places for internal testing have been gradually given out.
After searching on the Internet, I found that the buyer show feedback from people who have experienced it is as follows:
After trying it, Wharton Professor Ethan Molick, who is passionate about AI , believes that its novel real-time interaction method is the most worthy of attention.
He asked Devin to develop a website that explains "dilution in startup financing," and later revealed that AI is not yet capable of doing this autonomously and error-free without any help. .
But some people directly said that they were really shocked after the experience.
Coincidentally, Bubna, the first batch of internal beta users in the screenshot, is the CTO of Modal Labs, an AI infrastructure startup.
Later, he and Devin even teamed up to make news. Devin used his boss's account to sneak into the Modal Labs work group. After some communication with Bubna, he adjusted the code plan based on the responses and solved a technical problem.
△ Behind the spokesperson in the picture is actually Devin
Of course, in addition to the seemingly impressive technology, Devin also has a layer of halo, that is, the company behind it is Cognition. Although it is a small start-up, its recruitment information clearly states:
Our team holds 10 IOI gold medals~
The technical demonstration and team background were eye-catching and directly contributed to Devin’s communication efforts.
It is precisely because of the focus on Devin that the field of code generation has made rapid progress in the past period of time.
For example, the GitHub 30,000 Star project MetaGPT has launched a new “open source version of Devin” called
Data Interpreter
:
Alibaba Qwen member Binyan Hui and others started the OpenDevin project, which has attracted 21.5k stars on GitHub in the past month;
Princeton moved faster and used GPT-4 to create an open source SWE-agent , which can be used out of the box to fix real bugs in GitHub repositories.
On 25% of the SWE-bench test set, it achieved similar accuracy to Devin's demo video - solving 12.29% of the problems.
Various major manufacturers have also begun to recruit their own AI programmers...
One More Thing
As a result, this happened now, how should I put it...
Thinking about it on the bright side, it's really a life-saving thing. All programmers should breathe a sigh of relief. Fortunately, it's okay. AI can't take away my job from end to end for the time being.
Thinking about the worst, it's really terrible that such a high-profile star project is actually a demo that can only live in videos.
Is the world really a huge grassroots team? ? ?
Reference links:
[1]https://twitter.com/oran_ge/status/1778968102610546762?s=46&t=S65Q3TssMnzcxLETGqaDFQ
[2]https://twitter.com/0interestrates/status/1779268441226256500
[3]https://news. ycombinator.com/item?id=40008109
[4]https://www.youtube.com/watch?v=tNmgmwEtoWE
-over-
Countdown to register for the conference⏰
April 17, China AIGC Industry Summit
It only takes one day to experience the new technological paradigm that AIGC’s new applications are leading!
The most mainstream "player" representatives and investors from the fields of products, technology, investment and other fields will discuss with you the new world that AIGC is reshaping at the China AIGC Industry Summit on April 17 . Learn more about the summit.
Welcome to register and attend⬇️
The entire summit will be broadcast live online and offline simultaneously. Welcome to make an appointment for the live broadcast⬇️
Click here ???? Follow me and remember to mark it with a star